MacPython Logo from __future__ import *

2005-12-31

simplejson 1.1

Filed under: python, simplejson — bob @ 3:42 pm

simplejson is a simple, fast, complete, correct and extensible JSON encoder/decoder for Python 2.3+. It is pure Python code with no dependencies. It's now the recommended module for Python use by JSON.org (replacing json-py).

simplejson was previously named simple_json, but was renamed to comply with PEP 8 module naming guidelines.

simplejson 1.1 has a much bigger test suite, a full set of documentation and a couple new features.

  • The encoder and decoder have been extended to understand NaN, Infinity, and -Infinity (but this can be turned off via allow_nan=False for strict JSON compliance)
  • The decoder's scanner has been fixed so that it no longer accepts invalid JSON documents
  • The decoder now reports line and column information as well as character numbers for easier debugging
  • The encoder now has a circular reference checker, which can be optionally disabled with check_circular=False
  • dump, dumps, load, loads now accept an optional cls kwarg to use an alternate JSONEncoder or JSONDecoder class for convenience.
  • The read/write compatibility shim for json-py now have deprecation warnings

2005-12-26

simple_json 1.0

Filed under: python, simplejson — bob @ 12:29 am

simple_json is a simple, fast, complete, correct and extensible JSON encoder/decoder for Python 2.3+. It's pure Python code with no dependencies.

simple_json exists because json-py sucks. simple_json is a drop-in replacement for json-py, but it also exposes more sanely named APIs, and can be extended by subclassing.

Here are the issues I found in json-py after evaluating the source:

  • LGPL (does this license have a clear interpretation for Python modules?)
  • Doesn't have a proper egg (or source) distribution on Cheese Shop.
  • Wonky API. read and write are very bad names to call something that doesn't act file-like!
  • No streaming encoder support.
  • The decoder is extremely inefficient as it invokes at least one method call per character of input.
  • The encoder supports exactly these types: dict, list, tuple, str, unicode, int, long, float plus the singletons True, False, and None. It can't be made to support anything else, not even subclasses of those types. The implementation is in a single function and has no extensibility hook.
  • The encoder has no clue about unicode. Depending on the input, it may return a str or unicode. It has no option to escape the output.
  • The decoder similarly has no clue about unicode. If it ain't ASCII or escaped, then BOOM!
  • It uses custom exception subclasses that descend directly from Exception, so will not be caught by traditional ValueError clauses.
  • The source code mixes tabs and spaces. That's uh.. reassuring :)

simple_json is designed to address all of those issues:

  • MIT license
  • It's on Cheese Shop, so setuptools users can depend on it with a simple install_requires
  • The official API follows the familiar convention of marshal and pickle
  • Encoding can be streamed (via dump or iterator)
  • The decoder is fast, because it uses regular expressions rather than processing each character with Python code
  • The encoder can be subclassed and extended to support serialization of any type, and it supports subclasses of dict, list, str, etc. by default
  • The encoder outputs ASCII by default, with unicode characters escaped with \uXXXX. Optionally, it can also output a unicode string with ensure_ascii=False.
  • The decoder understands encoded strings (and unicode). It defaults to UTF-8, but can use anything ASCII-based. If the input is of an encoding that is not ASCII-based (such as UCS-2), it can be decoded to unicode first.
  • Exceptions during encoding or decoding are simply ValueError (though a future version could provide more informative messages)

2005-12-05

Remote JSON - JSONP

Filed under: AJAX, MochiKit, javascript — bob @ 8:21 pm

The browser security model dictates that XMLHttpRequest, frames, etc. must have the same domain in order to communicate. That's not a terrible idea, for security reasons, but it sure does make distributed (service oriented, mash-up, whatever it's called this week) web development suck.

There are traditionally three solutions to solving this problem.

Local proxy:
Needs infrastructure (can't run a serverless client) and you get double-taxed on bandwidth and latency (remote - proxy - client).
Flash:
Remote host needs to deploy a crossdomain.xml file, Flash is relatively proprietary and opaque to use, requires learning a one-off moving target programming langage.
Script tag:
Difficult to know when the content is available, no standard methodology, can be considered a "security risk".

I'm proposing a new technology agnostic standard methodology for the script tag method for cross-domain data fetching: JSON with Padding, or simply JSONP.

The way JSONP works is simple, but requires a little bit of server-side cooperation. Basically, the idea is that you let the client decide on a small chunk of arbitrary text to prepend to the JSON document, and you wrap it in parentheses to create a valid JavaScript document (and possibly a valid function call).

The client decides on the arbitrary prepended text by using a query argument named jsonp with the text to prepend. Simple! With an empty jsonp argument, the result document is simply JSON wrapped in parentheses.

Let's take the del.icio.us JSON API as an example. This API has a "script tag" variant that looks like this:

http://del.icio.us/feeds/json/bob/mochikit+interpreter:

if(typeof(Delicious) == 'undefined') Delicious = {};
Delicious.posts = [{
    "u": "http://mochikit.com/examples/interpreter/index.html",
    "d": "Interpreter - JavaScript Interactive Interpreter",
    "t": [
        "mochikit","webdev","tool","tools",
        "javascript","interactive","interpreter","repl"
    ]
}]

In terms of JSONP, a document semantically identical to this would be available at the following URL:

http://del.icio.us/feeds/json/bob/mochikit+interpreter?jsonp=if(typeof(Delicious)%3D%3D%27undefined%27)Delicious%3D%7B%7D%3BDelicious.posts%3D

That's not very interesting on its own, but let's say I wanted to be notified when the document is available. I could come up with a little system for tracking them:

var delicious_callbacks = {};
function getDelicious(callback, url) {
    var uid = (new Date()).getTime();
    delicious_callbacks[uid] = function () {
        delete delicious_callbacks[uid];
        callback();
    };
    url += "?jsonp=" + encodeURIComponent("delicious_callbacks[" + uid + "]");
    // add the script tag to the document, cross fingers
};

getDelicious(doSomething, "http://del.icio.us/feeds/json/bob/mochikit+interpreter");

The fetched URL from this hypothetical experiment would look something like this:

http://del.icio.us/feeds/json/bob/mochikit+interpreter?jsonp=delicious_callbacks%5B12345%5D

delicious_callbacks[12345]([{
    "u": "http://mochikit.com/examples/interpreter/index.html",
    "d": "Interpreter - JavaScript Interactive Interpreter",
    "t": [
        "mochikit","webdev","tool","tools",
        "javascript","interactive","interpreter","repl"
    ]
}])

See, because we're wrapping with parentheses, a JSONP request can translate into a function call or a plain old JSON literal. All the server needs to do differently is prepend a little bit of text to the beginning and wrap the JSON in parentheses!

Now, of course, you'd have libraries like MochiKit, Dojo, etc. abstracting JSONP so that you don't have to write the ugly DOM script tag insertion yourself, etc.

Of course, this just solves the standardization problem. Your page is still toast if the remote host decides to inject malicious code instead of JSON data. However, if implemented, it'd save a lot of developers some time and allow for generic abstractions, tutorials, and documentation to be built.

Powered by WordPress