simplejson 1.7
simplejson is a simple, fast, complete, correct and extensible JSON (RFC 4627) encoder/decoder for Python 2.3+. It is pure Python code with no dependencies.
simplejson 1.7 is a minor update that improves encoding performance with an optional C extension to speed up str/unicode encoding (by 10-150x or so), which yields an overall speed boost of 2x+ (JSON is string-heavy). Additionally 1.7 adds support for encoding unicode code points outside the BMP to UTF-16 surrogate code pairs (specified by the Strings section of RFC 4627).
please provide a windows binary too, there is almost no way to get the egg built / the C extensions compiled on windows atm (the visual studio 2003 issue)
Comment by d2m — 2007-03-18 @ 3:26 pm
I am willing to allow someone else be a Windows maintainer, but I am definitely not going to support the extension on that platform myself.
I have uploaded a pure python egg for Python 2.4 (built with “python setup.py –without-speedups bdist_egg”).
Comment by bob — 2007-03-18 @ 7:26 pm
Can we talk about an option to de-serialization to return strings in ASCII (Python str()) instead of only unicode? I’ve got projects using old libraries (pymssql, for one) that don’t work with unicode, and the need to always convert is getting to be a PITA.
Comment by ram — 2007-03-19 @ 11:08 am
No. Strings are text and unicode is the only suitable type for text in Python. The only correct solution is to fix your I/O boundaries. Here’s an (untested) proxy for a DB-API cursor that doesn’t support unicode:
def ensure_str_dict(d, encoding): rval = {} for k, v in d.iteritems(): rval[ensure_str(k, encoding)] = ensure_str(v, encoding) return rval def ensure_str(s, encoding): if isintance(s, unicode): return s.encode(encoding) return s class CursorProxy(object): def __init__(self, cur, encoding=’utf-8′): self._cur = cur self._encoding = encoding def __getattr__(self, attr): return getattr(self._cur, attr) def execute(cur, sql, args=[]): if isinstance(args, dict): args = ensure_str_dict(args, self._encoding) else: args = [ensure_str(arg, self._encoding) for arg in args] return self._cur.execute(sql, args)Comment by bob — 2007-03-19 @ 7:58 pm
Hey bob -
The problem with your example is that it doesn’t support embedded arrays of strings - i.e.
{”xyz”: ["abc", "def"]}
But I did some experimentation with the existing hooks that you’ve provided in simplejson and it’s ALMOST possible to in-place encoding of strings without monkeypatching simplejson. It would actually be possible to write a JSONDecoder class that did the string conversion for those of us that need/want it.
The only problem is in decoder.py, you’re using a global JSONScanner that could be passed around in the ‘context’ - you’re already providing a _scanner member of JSONDecoder
Here’s the code I’d like to use:
def MWString(match, context):
“”"
Wrapper around JSONString that optionally re-encodes the string
“”"
result = simplejson.JSONString(match,context)
result_encoding = context.result_encoding
if result_encoding is None:
return result
return result.encode(result_encoding), end
# now locally monkey-patch the list of type handlers
MW_ANYTHING = copy(simplejson.ANYTHING)
MW_ANYTHING.remove(simplejson.JSONString)
MW_ANYTHING.append(simplejson.MWString)
class MWDecoder(simplejson.JSONDecoder):
# this almost does it…
_scanner = simplejson.Scanner(MW_ANYTHING)
def __init__(self, encoding=None, object_hook=None, result_encoding=None, *args, **kwds):
super(MWDecoder, self).__init__(encoding, object_hook, *args, **kwds)
self.result_encoding = result_encoding
The only problem here is that self._scanner is lost because JSONObject and JSONArray reference simplejson.JSONScanner directly rather than getting it from the context. I’m happy to provide a patch to simplejson, but it’s basically just:
- iterscan = JSONScanner.iterscan
+ iterscan = getattr(context, ‘_scanner’, JSONScanner).iterscan
in those two places.
Comment by Alec — 2007-04-09 @ 9:17 am
Embedded arrays don’t happen in SQL. If they do, you’re probably using PostgreSQL, and if you’re using PostgreSQL then you should be able to do unicode stuff just fine.
I’m not particularly interested in taking any patches related to string re-encoding. It’s not a bug that simplejson has consistent handling of strings.
Comment by bob — 2007-04-09 @ 9:35 am
I’m not dealing with SQL at all. I’m dealing with a lot of code that deals with a lot of UTF8, and encoding and decoding json all over the place. It’s a lot more efficient for me to to pass around a handle to the root of a simplejson-produced object than to go back and re-encode my entire structure.
I totally understand that you don’t want to embed custom string encoding into simplejson - these are details that should be handled by the consumer. But, I’m confused as to why you’ve accepted certain hooks for JSON-RPC and SQL but are not interested in more general hooks for string encoding.
Here’s my suggested patch for decoder.py. This does NOT change existing functionality and only tweaks the hooks to be consistent across all the object types, allowing me to write my own JSONDecoder class. I’ve tried to be as consistent as possible with the existing patterns. I’d like to rename “_scanner” to “scanner” but I’ll leave that up to you.
--- dist/simplejson-1.7.1/simplejson/decoder.py 2007-03-17 23:05:43.000000000 -0700
+++ pyroot/simplejson-1.7.1-py2.4-macosx-10.4-i386.egg/simplejson/decoder.py 2007-04-09 10:47:30.000000000 -0700
@@ -125,11 +125,13 @@
return pairs, end + 1
if nextchar != '"':
raise ValueError(errmsg("Expecting property name", s, end))
- end += 1
encoding = getattr(context, 'encoding', None)
- iterscan = JSONScanner.iterscan
+ iterscan = getattr(context, '_scanner', JSONScanner).iterscan
while True:
- key, end = scanstring(s, end, encoding)
+ try:
+ key, end = iterscan(s, idx=end, context=context).next() # scanstring(s, end, encoding)
+ except StopIteration:
+ raise ValueError(errmsg("Expecting key", s, end))
end = _w(s, end).end()
if s[end:end + 1] != ‘:’:
raise ValueError(errmsg(”Expecting : delimiter”, s, end))
@@ -165,7 +167,7 @@
nextchar = s[end:end + 1]
if nextchar == ‘]’:
return values, end + 1
- iterscan = JSONScanner.iterscan
+ iterscan = getattr(context, ‘_scanner’, JSONScanner).iterscan
while True:
try:
value, end = iterscan(s, idx=end, context=context).next()
Comment by Alec — 2007-04-09 @ 9:56 am
I have not accepted any hooks for SQL. However, the code in the above comment you were referring to is a generic proxy for SQL that encodes strings and has no hook at all into simplejson.
simplejson has an object hook on decode, which is not specifically there for JSON-RPC. It’s there for completeness: you can encode custom objects, you should be able to get them back out as they came in with the appropriate code (generic object serialization).
I might consider accepting that patch if you add sufficient tests and documentation for it, and you send it by email or something. Blog comments are no good for preserving whitespace.
Comment by bob — 2007-04-09 @ 10:23 am
Sorry about the patch-in-blog-comments. Of course I’ll send this on by e-mail, along with tests and docs.
Comment by Alec — 2007-04-09 @ 10:24 am
Here is a solution to compile extension modules on Windows without Visual Studio 2003:
http://python.cx.hu/python-cjson/#win32
If anybody knows how to install simplejson with speedups (simplejson-1.7.1-py2.5.egg does not contains _speedups.c itself), please drop me a mail. This helps to make simplejson faster in my throughput comparision.
Viktor
Comment by Viktor Ferenczi — 2007-04-24 @ 2:01 pm
I’ve been trying to figure out what changed between 1.3 and 1.7 with respect to object_hook and loads. In 1.3 the object_hook would get called once with my entire complex object, I could then return the appropriate python objects. In 1.7, all the dicts are passed in separately causing object_hook to be called n times where n is the number of {} found.
Take for instance the following string:
‘{”foobar”:”baz”,”comp”:{”key”:10},”array”:[10,20,30]}’
1.3 would pass the above string into the object_hook as is. With 1.7 the following happens:
first {u’key’: 10} is passed in, then {u’foobar’: u’baz’, u’array’: [10, 20, 30], u’comp’: None} is passed in.
Was this change intentional? or was it a side-effect of switching from sre to re?
Comment by Jesus Rodriguez — 2007-10-16 @ 6:23 am
Is there a way to recieve a json number either as type Decimal or s ating (floats have thier own problems)?
kind regards
Henk-Jan
Comment by henk-jan — 2007-10-20 @ 11:14 am
* Tried cygwin gcc; compiled fine, but “python setup.py install” compailed about VS2003.
* Followed instructions posted at
http://python.cx.hu/python-cjson/#win32
compiled fine with mingw32 gcc but “python setup.py install” compailed about VS2003.
* installed VS2005, “python setup.py install” compailed about VS2003.
* tried “python setup.py install -c mingw32″ but returned unknown cmd
changed ” standard=True” line in setup.py to ” standard=False”, and simplejson installed fine. (without speed-ups, anyways).
Comment by noname — 2007-11-18 @ 2:05 am