simple_json 1.0
simple_json is a simple, fast, complete, correct and extensible JSON encoder/decoder for Python 2.3+. It's pure Python code with no dependencies.
simple_json exists because json-py sucks. simple_json is a drop-in replacement for json-py, but it also exposes more sanely named APIs, and can be extended by subclassing.
Here are the issues I found in json-py after evaluating the source:
- LGPL (does this license have a clear interpretation for Python modules?)
- Doesn't have a proper egg (or source) distribution on Cheese Shop.
- Wonky API. read and write are very bad names to call something that doesn't act file-like!
- No streaming encoder support.
- The decoder is extremely inefficient as it invokes at least one method call per character of input.
- The encoder supports exactly these types: dict, list, tuple, str, unicode, int, long, float plus the singletons True, False, and None. It can't be made to support anything else, not even subclasses of those types. The implementation is in a single function and has no extensibility hook.
- The encoder has no clue about unicode. Depending on the input, it may return a str or unicode. It has no option to escape the output.
- The decoder similarly has no clue about unicode. If it ain't ASCII or escaped, then BOOM!
- It uses custom exception subclasses that descend directly from Exception, so will not be caught by traditional ValueError clauses.
- The source code mixes tabs and spaces. That's uh.. reassuring :)
simple_json is designed to address all of those issues:
- MIT license
- It's on Cheese Shop, so setuptools users can depend on it with a simple install_requires
- The official API follows the familiar convention of marshal and pickle
- Encoding can be streamed (via dump or iterator)
- The decoder is fast, because it uses regular expressions rather than processing each character with Python code
- The encoder can be subclassed and extended to support serialization of any type, and it supports subclasses of dict, list, str, etc. by default
- The encoder outputs ASCII by default, with unicode characters escaped with \uXXXX. Optionally, it can also output a unicode string with ensure_ascii=False.
- The decoder understands encoded strings (and unicode). It defaults to UTF-8, but can use anything ASCII-based. If the input is of an encoding that is not ASCII-based (such as UCS-2), it can be decoded to unicode first.
- Exceptions during encoding or decoding are simply ValueError (though a future version could provide more informative messages)
Hmm, interesting. This summer I wrote a JSONReader C module (basically it will parse into native Python structures through a Python extension) with very high efficiency. I offered it to be included into json-py, and the author said yes, but I haven’t heard from him since.
The module is still unreleased because of this, so maybe it can find it’s way into Simple-JSON? It’s only a decoder, not an encoder. Mail me if you’re interested (I’ll be on vacation the coming week though).
Comment by Koen — 2005-12-26 @ 12:53 pm
It would be nice to have as an optional extension, like elementree vs cElementTree. Do you have benchmarks?
Comment by bob — 2005-12-26 @ 6:14 pm
This is a much better implementation, good job. But to be fair, json-py is a straight Python implementation of the Javascript reference implementation.
Comment by nick — 2005-12-29 @ 9:19 am
Okay, I have just checked my JSONReader against SimpleJSON on a 6MB and a 4.5MB file.
On a Athlon XP 2000+ I get the following times:
6 MB file: 1.2s for JSONReader, 31.5s for SimpleJSON.
4.5 MB file: 1.1s for JSONReader, 24.1s for SimpleJSON.
I have to make some notes:
- JSONReader uses the standard C API to open a file through a filename. If I read the file into memory completely in Python and then use JSONReader’s from-string reading API (e.g. not using the standard C file API), then times are 1.3s, 1.1s respectively. Supporting a StringIO API is non-trivial because of the lex/yacc basis, I haven’t looked at this yet. Perhaps it is possible to make it use a .read() interface, but I think it will be hard.
- Only works on ASCII strings. Characters 128-255 are kept as normal characters and not interpreted at all. So they should come out the way they come in (I do not rely on this behavior, I only use \uXXXX).
Comment by Koen — 2006-01-03 @ 7:32 am