Обсуждение: JSON type caster
Hi, I am adding a SQL to Python type caster for the new native JSON type in Postgres. This seems to work, but I wonder if it's really complete / done right and should maybe already built into Psycopg2 since JSON is now a native PG type. Any comments? Thanks, Tobias === import psycopg2 import simplejson import json import sys conn = psycopg2.connect(host = "127.0.0.1", port = 5434, database = "test", user = "test", password = "xxx") conn.autocommit = True print conn.encoding cur = conn.cursor() cur.execute("SELECT null::json, null::json[]") (json_oid, jsonarray_oid) = (cur.description[0][1], cur.description[1][1]) print json_oid, jsonarray_oid # 114 199 => are those fixed for PG or instance specific? def cast_json(value, cur): if value is None: return None try: #o = json.loads(value) o = simplejson.loads(value, use_decimal = True) return o except: raise InterfaceError("bad JSON representation: %r" % value) JSON = psycopg2.extensions.new_type((json_oid,), "JSON", cast_json) psycopg2.extensions.register_type(JSON) psycopg2.extensions.register_type(psycopg2.extensions.new_array_type((jsonarray_oid,), "JSON[]", JSON)) cur.execute("""SELECT now(), 23::decimal, '[23.1, {"a": "jhsd", "b": 23, "c": [1,2,3]}]'::json, (SELECT array_agg(f1) FROM tab1), (SELECT array_to_json(array_agg(f1)) FROM tab1)""") r = cur.fetchone() for i in xrange(0, len(r)): print i, type(r[i]), r[i] === DROP TABLE IF EXISTS tab1; CREATE TABLE tab1 (f1 JSON); INSERT INTO tab1 (f1) VALUES ('"sdfs"'::json); INSERT INTO tab1 (f1) VALUES ('[1,2,3]'::json); SELECT * FROM tab1; SELECT array_agg(f1) FROM tab1;
On Tue, Sep 18, 2012 at 8:51 AM, Tobias Oberstein <tobias.oberstein@gmail.com> wrote: > Hi, > > I am adding a SQL to Python type caster for the new native JSON type in > Postgres. Sounds like a good idea. > This seems to work, but I wonder if it's really complete / done right and > should maybe already built into > Psycopg2 since JSON is now a native PG type. yes, I think we should add something for the next psycopg version. > Any comments? > Thanks, > Tobias > cur.execute("SELECT null::json, null::json[]") > (json_oid, jsonarray_oid) = (cur.description[0][1], cur.description[1][1]) > > print json_oid, jsonarray_oid # 114 199 => are those fixed for PG or > instance specific? Those are fixed since PG 9.2, but not in previous versions where json could be installed as an extension. So it makes sense to register the typecaster with these values but also to provide a function register_json() with interface similar to register_hstore() that would either take a connection or cursor and query them to know the oids, or just take the oids in input for setup where querying is not possible. > def cast_json(value, cur): > if value is None: > return None > try: > #o = json.loads(value) > o = simplejson.loads(value, use_decimal = True) I think this should be dependent on the Python version and use either the builtin module or simplejson where not available. The loads function should be probably exposed to the customer for better customization of the result. So, all in all, I think psycopg should expose a psycopg2.extras.register_json() function taking the loads function in input, callling it with pre-configured parameters in order to create a default typecaster in psycopg2.extensions (using the PG 9.2 oids, using the most appropriate loads function for the python version etc) but leaving the user the possibility to override the result by calling it again (use a different module, use different parameters for loads etc.) > return o > except: > raise InterfaceError("bad JSON representation: %r" % value) Probably should be DataError. Not sure it is the case to dump the entire value into the exception: it may be huge. What should we do as an extension? There is no such a thing as a Python json object. I think we should provide a thin wrapper similar to psycopg2.Binary, that would be used: from psycopg2.extensions import Json data = {'my': ["stuff"]} cur.execute("insert ...", [Json(data)]) If somebody wants to provide the code *and* the tests *and* the docs let us know. I've opened a ticket about the feature (http://psycopg.lighthouseapp.com/projects/62710-psycopg/tickets/124). Best way to contribute would be a git branch. Thank you very much, -- Daniele
Hi Daniele, I will work on the ticket .. need to read into docs/tests habits with Psycopg2 first. > What should we do as an extension? There is no such a thing as a > Python json object. I think we should provide a thin wrapper similar > to psycopg2.Binary, that would be used: > > from psycopg2.extensions import Json > data = {'my': ["stuff"]} > cur.execute("insert ...", [Json(data)]) > That sounds reasonable and looks like it fits well with Psycopg design. Personally, I am mainly calling stored procedures, i.e. select proname, prorettype, proargtypes from pg_proc where proname = 'create_employee'; create_employee,114,114 => a SP which takes 1 JSON arg, and has return type JSON. The return type casting is taken care of by the (new) SQL=>Py JSON caster. The argument type isn't handled automatically. Given the stuff above (Json() thin wrapper), I still need to determine the input arg type from pg_proc ... there isn't any "magic" I can turn on with Psycopg, right? Thanks, Tobias
On Tue, Sep 18, 2012 at 4:57 PM, Tobias Oberstein <tobias.oberstein@gmail.com> wrote: > Personally, I am mainly calling stored procedures, i.e. > > select proname, prorettype, proargtypes from pg_proc where proname = > 'create_employee'; > > create_employee,114,114 > > => a SP which takes 1 JSON arg, and has return type JSON. > > The return type casting is taken care of by the (new) SQL=>Py JSON caster. > > The argument type isn't handled automatically. > > Given the stuff above (Json() thin wrapper), I still need to determine > the input arg type from pg_proc ... there isn't any "magic" I can turn > on with Psycopg, right? Right, this would be the shortcoming. Unfortunately there is no Python class representing a json, and the mapping object -> adapter is by python class only. I assume your create_employee function cannot take *any* json: it may be expecting e.g. a dict and not a list, and maybe some specific attributes while ignoring others. For this case it's probably easy to create a Python object that knows how to adapt to json syntax. But in the generic case a Json can be a python string, or a list, or a dict, or None, each of which have a more reasonable Postgres representation. If anybody comes with a better solution than the Json wrapper I'd be happy to know. BTW, because of the zope problem discussed today, I'd be happy to have a release soon, so if nobody steps ahead in the next few hours, I'll try and implement the json support tonight and have it for psycopg 2.4.6. -- Daniele
> Right, this would be the shortcoming. Unfortunately there is no Python > class representing a json, and the mapping object -> adapter is by > python class only. I assume your create_employee function cannot take > *any* json: it may be expecting e.g. a dict and not a list, and maybe > some specific attributes while ignoring others. For this case it's > probably easy to create a Python object that knows how to adapt to > json syntax. But in the generic case a Json can be a python string, or > a list, or a dict, or None, each of which have a more reasonable > Postgres representation. I have no better suggestion for the general case and I agree that for the general case the thin wrapper totally makes sense. The only idea: maybe have a "convenience option" that when turned on makes Python dicts adapt to PG JSON automatically. If there is no other "default mapping" of Python dicts to any other PG type, that would make a nice fallback that probably allows to handle a lot of situations automatically - such as stored procs that expect their JSON args to be dicts anyway. > > If anybody comes with a better solution than the Json wrapper I'd be > happy to know. No, not for the general case. > > BTW, because of the zope problem discussed today, I'd be happy to have > a release soon, so if nobody steps ahead in the next few hours, I'll > try and implement the json support tonight and have it for psycopg > 2.4.6. Ok. Great! (Not that I want to "freeride" .. but I guess it's a snap for you and me would need to first "learn" Psycopg tests/docs .. Thanks! Tobias > > -- Daniele >
>> #o = json.loads(value) >> o = simplejson.loads(value, use_decimal = True) > > I think this should be dependent on the Python version and use either > the builtin module or simplejson where not available. The loads > function should be probably exposed to the customer for better > customization of the result. So, all in all, I think psycopg should Yes, absolutely right. To allow serialization of decimal, datetime and even custom classes .. >> return o >> except: >> raise InterfaceError("bad JSON representation: %r" % value) > > Probably should be DataError. Not sure it is the case to dump the > entire value into the exception: it may be huge. I agree. Default should be no dumping. I guess it's a very unlikey case anyway, since when PG is handling it as JSON, it _is_ probably a JSON ..
On Tue, Sep 18, 2012 at 9:12 PM, Tobias Oberstein <tobias.oberstein@gmail.com> wrote: > If there is no other "default mapping" of Python dicts to any other > PG type, that would make a nice fallback that probably allows to handle > a lot of situations automatically - such as stored procs that expect > their JSON args to be dicts anyway. I think it can be easily done: currently there's no default adaptation for dicts: there is the hstore adapter but it must be registered manually. If we have the Json wrapper, I suspect adapting dicts would be just a matter of: register_adapter(dict, Json) it would work with lists and other objects as well, but overriding the default adapter (e.g. lists to arrays etc.). Because adapters are currently global this may result in unwanted side effects. It is definitely an use case to be mentioned in the docs. Cheers, -- Daniele
On Tue, Sep 18, 2012 at 5:44 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > BTW, because of the zope problem discussed today, I'd be happy to have > a release soon, so if nobody steps ahead in the next few hours, I'll > try and implement the json support tonight and have it for psycopg > 2.4.6. I've implemented the Json adapter and the register_json function: they are available in the "json" branch in my repository: <https://github.com/dvarrazzo/psycopg/tree/json>. For PG 9.2 calling register_json() is not required. The docs are not online as still not complete; what is available can be read generating the docs locally (make docs) or peeking at the docstrings in _json.py <https://github.com/dvarrazzo/psycopg/blob/json/lib/_json.py> Comments and tests are appreciated. -- Daniele
Great!! > I've implemented the Json adapter and the register_json function: they > are available in the "json" branch in my repository: > <https://github.com/dvarrazzo/psycopg/tree/json>. For PG 9.2 calling > register_json() is not required. > > The docs are not online as still not complete; what is available can > be read generating the docs locally (make docs) or peeking at the > docstrings in _json.py > <https://github.com/dvarrazzo/psycopg/blob/json/lib/_json.py> > > Comments and tests are appreciated. I have tested the branch on FreeBSD 9 STABLE amd64 and PG 9.2. Works flawlessly .. everything as expected. The psycopg2.extensions.register_adapter(dict, psycopg2.extras.Json) feature is nifty! A minor inconvenience: when on PG 9.2 OIDs are well know, but when I need to register a custom JSON typecaster, I nevertheless need to supply the OIDs _or_ provide a connection (which I may want to avoid when I want the behavior globally): loads = lambda x: json.loads(x, parse_float = Decimal) psycopg2.extras.register_json(None, globally = True, loads = loads, oid = 114, array_oid = 199) #psycopg2.extras.register_json(None, globally = True, loads = loads) # won't work I am fine with that, but the example in the docs would probably profit from mentioning this code snippet .. "how to install custom JSON typecaster on PG92 globally". Another thing that's probably inconvenient: psycopg2.extras.Json forwards kwargs for customization, but there is no trivial way of using a different "json" implementation altogether like i.e. simplejson (which has features not in Py json even for 2.7 like use_decimal .. which works correctly without any rounding) ujson (which claims to be the fastes Py json anyway) How about: class Json(object): def __init__(self, adapted, **kwargs): self.adapted = adapted self.kwargs = kwargs def dumps(self, o, **kwargs): return json.dumps(o, **kwargs) def __conform__(self, proto): if proto is ISQLQuote: return self def getquoted(self): s = self.dumps(self.adapted, **self.kwargs) return QuotedString(s).getquoted() class CustomJson(Json): def dumps(self, o, **kwargs): return simplejson.dumps(o, **kwargs) == So there is no need to reimplement getquoted (which may do Psycopg implementation details).. Cheers, Tobias > > -- Daniele >
On Wed, Sep 19, 2012 at 1:36 PM, Tobias Oberstein <tobias.oberstein@gmail.com> wrote: > A minor inconvenience: when on PG 9.2 OIDs are well know, but when > I need to register a custom JSON typecaster, I nevertheless need > to supply the OIDs _or_ provide a connection (which I may want > to avoid when I want the behavior globally): > > loads = lambda x: json.loads(x, parse_float = Decimal) > psycopg2.extras.register_json(None, globally = True, loads = loads, oid = > 114, array_oid = 199) > #psycopg2.extras.register_json(None, globally = True, loads = loads) # won't > work > > I am fine with that, but the example in the docs would probably profit > from mentioning this code snippet .. "how to install custom JSON typecaster > on PG92 globally". Oh, yes, I see the use case. You are right, it's too tricky for its usefulness. The easiest (for us) solution would be to provide symbolic constants in psycopg2.extensions. It results in a still too verbose: psycopg2.extras.register_json(None, globally=True, loads=loads, oid=ext.JSON_OID, array_oid=ext.JSONARRAY_OID) btw there's too use for the conn_or_curs=None: I think I'll set it as default so that it can be dropped if only kwargs are provided. The above would become: psycopg2.extras.register_json(globally=True, loads=loads, oid=ext.JSON_OID, array_oid=ext.JSONARRAY_OID) Uhm... but what about a new function? To be called psycopg2.extras.register_default_json(globally=True, loads=loads) or conn = psycopg2.connect(...) psycopg2.extras.register_default_json(conn, loads=loads) To be implemented like: def register_default_json(conn_or_curs=None, globally=False, loads=None): """Register json adapter for PG92 and following""" return register_json(conn_or_curs=conn_or_curs, globally=globally, loads=loads, oid=114, array_oid=199) > Another thing that's probably inconvenient: psycopg2.extras.Json > forwards kwargs for customization, but there is no trivial way > of using a different "json" implementation altogether What I didn't like reviewing the two sides is the asymmetry: adapter taking **kwargs and typecaster taking loads function. I'm thinking about passing a dumps function instead of the **kwargs. It could be: class Json(object): def __init__(self, adapted, dumps=None): self.adapted = adapted self._dumps = dumps is None and json.dumps or dumps def dumps(self): return self._dumps(self.adapted) def getquoted(self): s = self.dumps() return QuotedString(s).getquoted() This way customization can be performed either functional-style via a closure: def CustomJson(adapted): return Json(adapted, dumps=lambda x: simplejson.dumps(x, parse_float=Decimal)) or oo-style via subclassing: class CustomJson(Json): def dumps(self) return simplejson.dumps(self.adapted, parse_float=Decimal) Thank you for the review. Comments? -- Daniele
> Uhm... but what about a new function? To be called > > psycopg2.extras.register_default_json(globally=True, loads=loads) > > or > conn = psycopg2.connect(...) > psycopg2.extras.register_default_json(conn, loads=loads) > +1 for that: it provides a simple to use interface for the use case "PG >=9.2 and global custom json" >> Another thing that's probably inconvenient: psycopg2.extras.Json >> forwards kwargs for customization, but there is no trivial way >> of using a different "json" implementation altogether > > What I didn't like reviewing the two sides is the asymmetry: adapter > taking **kwargs and typecaster taking loads function. I'm thinking > about passing a dumps function instead of the **kwargs. It could be: > > class Json(object): > def __init__(self, adapted, dumps=None): > self.adapted = adapted > self._dumps = dumps is None and json.dumps or dumps > > def dumps(self): > return self._dumps(self.adapted) > > def getquoted(self): > s = self.dumps() > return QuotedString(s).getquoted() > > This way customization can be performed either functional-style via a closure: > > def CustomJson(adapted): > return Json(adapted, > dumps=lambda x: simplejson.dumps(x, parse_float=Decimal)) > > or oo-style via subclassing: > > class CustomJson(Json): > def dumps(self) > return simplejson.dumps(self.adapted, parse_float=Decimal) > +1 for that also ..adding dump argument and dropping kwargs forwarding, because customizing via kwargs on a per Json-instance basis is probably the less likely case compared to "I want to use one custom loads/dumps everywhere". Probably the default dumps used within Json (when no specific dumps arg was given) could even come via register_default_json i.e. when doing psycopg2.extras.register_default_json( globally = True, loads = simplejson.loads, dumps = simplejson.dumps) which would make the symmetry accessible from 1 function. The provided dumps would need to be stored at module level somewhere to be accessible within Json self._dumps = dumps is None and psycopg2.extras._dumps or dumps Again, the argument would be that a likely use case is PG>=9.2 and wanting to install global json loads/dumps used everywhere. The simpler the API for that, the better. IMHO. Wait, a problem with above: in the "per-connection variant" psycopg2.extras.register_default_json( conn, loads = simplejson.loads, dumps = simplejson.dumps) where to store dumps so that Json instances can access it? Mmh. There is still some asymmetry: loads is configured globally or per connection, dumps is configured "per argument instance" (Json instance) .. > Thank you for the review. Comments? > > -- Daniele >
On Wed, Sep 19, 2012 at 2:42 PM, Tobias Oberstein <tobias.oberstein@gmail.com> wrote: >> Uhm... but what about a new function? To be called >> >> psycopg2.extras.register_default_json(globally=True, loads=loads) >> >> or >> conn = psycopg2.connect(...) >> psycopg2.extras.register_default_json(conn, loads=loads) >> > > +1 for that: > > it provides a simple to use interface for the use case "PG >=9.2 and global > custom json" Done (this function plus the conn_or_curs argument optional if not required, instead of requiring an explicit None) >> class Json(object): >> def __init__(self, adapted, dumps=None): >> self.adapted = adapted >> self._dumps = dumps is None and json.dumps or dumps > +1 for that also ..adding dump argument and dropping kwargs forwarding, > because customizing via kwargs on a per Json-instance basis is probably the > less likely case compared to "I want to use one custom loads/dumps > everywhere". Done this too. > Mmh. There is still some asymmetry: loads is configured > globally or per connection, dumps is configured "per argument instance" > (Json instance) .. Yes, this is a little bit harder to kill. Json is peculiar on not having a specific class to deal with, so the use of the adapter is more explicit than with other data type. Modulo this issue, adapters are only global whereas typecasters can be per connection or per cursor too. I think it could be possible and desirable to register adapters on more ristrected scopes. -- Daniele
I think there is a little issue left: def typecast_json(s, cur): return loads(s) fails if s == None. I'd suggest s.th. like def typecast_json(s, cur): if s is None: return None return loads(s)
On Wed, Sep 19, 2012 at 7:07 PM, Tobias Oberstein <tobias.oberstein@gmail.com> wrote: > I think there is a little issue left: > > def typecast_json(s, cur): > return loads(s) > > fails if s == None. Right: fixed, thank you. -- Daniele