MacPython Logo from __future__ import *

2005-12-26

simple_json 1.0

Filed under: python, simplejson — bob @ 12:29 am

simple_json is a simple, fast, complete, correct and extensible JSON encoder/decoder for Python 2.3+. It's pure Python code with no dependencies.

simple_json exists because json-py sucks. simple_json is a drop-in replacement for json-py, but it also exposes more sanely named APIs, and can be extended by subclassing.

Here are the issues I found in json-py after evaluating the source:

  • LGPL (does this license have a clear interpretation for Python modules?)
  • Doesn't have a proper egg (or source) distribution on Cheese Shop.
  • Wonky API. read and write are very bad names to call something that doesn't act file-like!
  • No streaming encoder support.
  • The decoder is extremely inefficient as it invokes at least one method call per character of input.
  • The encoder supports exactly these types: dict, list, tuple, str, unicode, int, long, float plus the singletons True, False, and None. It can't be made to support anything else, not even subclasses of those types. The implementation is in a single function and has no extensibility hook.
  • The encoder has no clue about unicode. Depending on the input, it may return a str or unicode. It has no option to escape the output.
  • The decoder similarly has no clue about unicode. If it ain't ASCII or escaped, then BOOM!
  • It uses custom exception subclasses that descend directly from Exception, so will not be caught by traditional ValueError clauses.
  • The source code mixes tabs and spaces. That's uh.. reassuring :)

simple_json is designed to address all of those issues:

  • MIT license
  • It's on Cheese Shop, so setuptools users can depend on it with a simple install_requires
  • The official API follows the familiar convention of marshal and pickle
  • Encoding can be streamed (via dump or iterator)
  • The decoder is fast, because it uses regular expressions rather than processing each character with Python code
  • The encoder can be subclassed and extended to support serialization of any type, and it supports subclasses of dict, list, str, etc. by default
  • The encoder outputs ASCII by default, with unicode characters escaped with \uXXXX. Optionally, it can also output a unicode string with ensure_ascii=False.
  • The decoder understands encoded strings (and unicode). It defaults to UTF-8, but can use anything ASCII-based. If the input is of an encoding that is not ASCII-based (such as UCS-2), it can be decoded to unicode first.
  • Exceptions during encoding or decoding are simply ValueError (though a future version could provide more informative messages)

4 Comments »

  1. Hmm, interesting. This summer I wrote a JSONReader C module (basically it will parse into native Python structures through a Python extension) with very high efficiency. I offered it to be included into json-py, and the author said yes, but I haven’t heard from him since.
    The module is still unreleased because of this, so maybe it can find it’s way into Simple-JSON? It’s only a decoder, not an encoder. Mail me if you’re interested (I’ll be on vacation the coming week though).

    Comment by Koen — 2005-12-26 @ 12:53 pm

  2. It would be nice to have as an optional extension, like elementree vs cElementTree. Do you have benchmarks?

    Comment by bob — 2005-12-26 @ 6:14 pm

  3. This is a much better implementation, good job. But to be fair, json-py is a straight Python implementation of the Javascript reference implementation.

    Comment by nick — 2005-12-29 @ 9:19 am

  4. Okay, I have just checked my JSONReader against SimpleJSON on a 6MB and a 4.5MB file.
    On a Athlon XP 2000+ I get the following times:
    6 MB file: 1.2s for JSONReader, 31.5s for SimpleJSON.
    4.5 MB file: 1.1s for JSONReader, 24.1s for SimpleJSON.

    I have to make some notes:
    - JSONReader uses the standard C API to open a file through a filename. If I read the file into memory completely in Python and then use JSONReader’s from-string reading API (e.g. not using the standard C file API), then times are 1.3s, 1.1s respectively. Supporting a StringIO API is non-trivial because of the lex/yacc basis, I haven’t looked at this yet. Perhaps it is possible to make it use a .read() interface, but I think it will be hard.
    - Only works on ASCII strings. Characters 128-255 are kept as normal characters and not interpreted at all. So they should come out the way they come in (I do not rely on this behavior, I only use \uXXXX).

    Comment by Koen — 2006-01-03 @ 7:32 am

RSS feed for comments on this post.

Leave a comment

Powered by WP Hashcash

Powered by WordPress