Обсуждение: JSON type caster

Поиск
Список
Период
Сортировка

JSON type caster

От
Tobias Oberstein
Дата:
Hi,

I am adding a SQL to Python type caster for the new native JSON type in
Postgres.

This seems to work, but I wonder if it's really complete / done right
and should maybe already built into
Psycopg2 since JSON is now a native PG type.

Any comments?
Thanks,
Tobias

===
import psycopg2
import simplejson
import json
import sys

conn = psycopg2.connect(host = "127.0.0.1", port = 5434, database =
"test", user = "test", password = "xxx")
conn.autocommit = True

print conn.encoding
cur = conn.cursor()

cur.execute("SELECT null::json, null::json[]")
(json_oid, jsonarray_oid) = (cur.description[0][1], cur.description[1][1])

print json_oid, jsonarray_oid # 114 199 => are those fixed for PG or
instance specific?

def cast_json(value, cur):
    if value is None:
       return None
    try:
       #o = json.loads(value)
       o = simplejson.loads(value, use_decimal = True)
       return o
    except:
       raise InterfaceError("bad JSON representation: %r" % value)

JSON = psycopg2.extensions.new_type((json_oid,), "JSON", cast_json)
psycopg2.extensions.register_type(JSON)
psycopg2.extensions.register_type(psycopg2.extensions.new_array_type((jsonarray_oid,),
"JSON[]", JSON))

cur.execute("""SELECT now(), 23::decimal, '[23.1, {"a": "jhsd", "b": 23,
"c": [1,2,3]}]'::json, (SELECT array_agg(f1) FROM tab1), (SELECT
array_to_json(array_agg(f1)) FROM tab1)""")
r = cur.fetchone()
for i in xrange(0, len(r)):
    print i, type(r[i]), r[i]

===

DROP TABLE IF EXISTS tab1;

CREATE TABLE tab1 (f1 JSON);

INSERT INTO tab1 (f1) VALUES ('"sdfs"'::json);
INSERT INTO tab1 (f1) VALUES ('[1,2,3]'::json);

SELECT * FROM tab1;

SELECT array_agg(f1) FROM tab1;





Re: JSON type caster

От
Daniele Varrazzo
Дата:
On Tue, Sep 18, 2012 at 8:51 AM, Tobias Oberstein
<tobias.oberstein@gmail.com> wrote:
> Hi,
>
> I am adding a SQL to Python type caster for the new native JSON type in
> Postgres.

Sounds like a good idea.


> This seems to work, but I wonder if it's really complete / done right and
> should maybe already built into
> Psycopg2 since JSON is now a native PG type.

yes, I think we should add something for the next psycopg version.


> Any comments?
> Thanks,
> Tobias

> cur.execute("SELECT null::json, null::json[]")
> (json_oid, jsonarray_oid) = (cur.description[0][1], cur.description[1][1])
>
> print json_oid, jsonarray_oid # 114 199 => are those fixed for PG or
> instance specific?

Those are fixed since PG 9.2, but not in previous versions where json
could be installed as an extension. So it makes sense to register the
typecaster with these values but also to provide a function
register_json() with interface similar to register_hstore() that would
either take a connection or cursor and query them to know the oids, or
just take the oids in input for setup where querying is not possible.

> def cast_json(value, cur):
>    if value is None:
>       return None
>    try:
>       #o = json.loads(value)
>       o = simplejson.loads(value, use_decimal = True)

I think this should be dependent on the Python version and use either
the builtin module or simplejson where not available. The loads
function should be probably exposed to the customer for better
customization of the result. So, all in all, I think psycopg should
expose a psycopg2.extras.register_json() function taking the loads
function in input, callling it with pre-configured parameters in order
to create a default typecaster in psycopg2.extensions (using the PG
9.2 oids, using the most appropriate loads function for the python
version etc) but leaving the user the possibility to override the
result by calling it again (use a different module, use different
parameters for loads etc.)

>       return o
>    except:
>       raise InterfaceError("bad JSON representation: %r" % value)

Probably should be DataError. Not sure it is the case to dump the
entire value into the exception: it may be huge.

What should we do as an extension? There is no such a thing as a
Python json object. I think we should provide a thin wrapper similar
to psycopg2.Binary, that would be used:

    from psycopg2.extensions import Json
    data = {'my': ["stuff"]}
    cur.execute("insert ...", [Json(data)])

If somebody wants to provide the code *and* the tests *and* the docs
let us know. I've opened a ticket about the feature
(http://psycopg.lighthouseapp.com/projects/62710-psycopg/tickets/124).
Best way to contribute would be a git branch.

Thank you very much,

-- Daniele


Re: JSON type caster

От
Tobias Oberstein
Дата:
Hi Daniele,

I will work on the ticket .. need to read into docs/tests habits with
Psycopg2 first.

 > What should we do as an extension? There is no such a thing as a
> Python json object. I think we should provide a thin wrapper similar
> to psycopg2.Binary, that would be used:
>
>      from psycopg2.extensions import Json
>      data = {'my': ["stuff"]}
>      cur.execute("insert ...", [Json(data)])
>

That sounds reasonable and looks like it fits well with Psycopg design.

Personally, I am mainly calling stored procedures, i.e.

select proname, prorettype, proargtypes from pg_proc where proname =
'create_employee';

create_employee,114,114

=> a SP which takes 1 JSON arg, and has return type JSON.

The return type casting is taken care of by the (new) SQL=>Py JSON caster.

The argument type isn't handled automatically.

Given the stuff above (Json() thin wrapper), I still need to determine
the input arg type from pg_proc ... there isn't any "magic" I can turn
on with Psycopg, right?

Thanks,
Tobias


Re: JSON type caster

От
Daniele Varrazzo
Дата:
On Tue, Sep 18, 2012 at 4:57 PM, Tobias Oberstein
<tobias.oberstein@gmail.com> wrote:

> Personally, I am mainly calling stored procedures, i.e.
>
> select proname, prorettype, proargtypes from pg_proc where proname =
> 'create_employee';
>
> create_employee,114,114
>
> => a SP which takes 1 JSON arg, and has return type JSON.
>
> The return type casting is taken care of by the (new) SQL=>Py JSON caster.
>
> The argument type isn't handled automatically.
>
> Given the stuff above (Json() thin wrapper), I still need to determine
> the input arg type from pg_proc ... there isn't any "magic" I can turn
> on with Psycopg, right?

Right, this would be the shortcoming. Unfortunately there is no Python
class representing a json, and the mapping object -> adapter is by
python class only. I assume your create_employee function cannot take
*any* json: it may be expecting e.g. a dict and not a list, and maybe
some specific attributes while ignoring others. For this case it's
probably easy to create a Python object that knows how to adapt to
json syntax. But in the generic case a Json can be a python string, or
a list, or a dict, or None, each of which have a more reasonable
Postgres representation.

If anybody comes with a better solution than the Json wrapper I'd be
happy to know.

BTW, because of the zope problem discussed today, I'd be happy to have
a release soon, so if nobody steps ahead in the next few hours, I'll
try and implement the json support tonight and have it for psycopg
2.4.6.

-- Daniele


Re: JSON type caster

От
Tobias Oberstein
Дата:
> Right, this would be the shortcoming. Unfortunately there is no Python
> class representing a json, and the mapping object -> adapter is by
> python class only. I assume your create_employee function cannot take
> *any* json: it may be expecting e.g. a dict and not a list, and maybe
> some specific attributes while ignoring others. For this case it's
> probably easy to create a Python object that knows how to adapt to
> json syntax. But in the generic case a Json can be a python string, or
> a list, or a dict, or None, each of which have a more reasonable
> Postgres representation.

I have no better suggestion for the general case and I agree that for
the general case the thin wrapper totally makes sense.

The only idea: maybe have a "convenience option" that when turned on
makes Python dicts adapt to PG JSON automatically.

If there is no other "default mapping" of Python dicts to any other
PG type, that would make a nice fallback that probably allows to handle
a lot of situations automatically - such as stored procs that expect
their JSON args to be dicts anyway.

>
> If anybody comes with a better solution than the Json wrapper I'd be
> happy to know.

No, not for the general case.

>
> BTW, because of the zope problem discussed today, I'd be happy to have
> a release soon, so if nobody steps ahead in the next few hours, I'll
> try and implement the json support tonight and have it for psycopg
> 2.4.6.

Ok. Great! (Not that I want to "freeride" .. but I guess it's a snap for
you and me would need to first "learn" Psycopg tests/docs ..

Thanks!
Tobias

>
> -- Daniele
>



Re: JSON type caster

От
Tobias Oberstein
Дата:
>>        #o = json.loads(value)
>>        o = simplejson.loads(value, use_decimal = True)
>
> I think this should be dependent on the Python version and use either
> the builtin module or simplejson where not available. The loads
> function should be probably exposed to the customer for better
> customization of the result. So, all in all, I think psycopg should

Yes, absolutely right. To allow serialization of decimal, datetime and
even custom classes ..

>>        return o
>>     except:
>>        raise InterfaceError("bad JSON representation: %r" % value)
>
> Probably should be DataError. Not sure it is the case to dump the
> entire value into the exception: it may be huge.

I agree. Default should be no dumping. I guess it's a very unlikey
case anyway, since when PG is handling it as JSON, it _is_ probably
a JSON ..



Re: JSON type caster

От
Daniele Varrazzo
Дата:
On Tue, Sep 18, 2012 at 9:12 PM, Tobias Oberstein
<tobias.oberstein@gmail.com> wrote:

> If there is no other "default mapping" of Python dicts to any other
> PG type, that would make a nice fallback that probably allows to handle
> a lot of situations automatically - such as stored procs that expect
> their JSON args to be dicts anyway.

I think it can be easily done: currently there's no default adaptation
for dicts: there is the hstore adapter but it must be registered
manually.

If we have the Json wrapper, I suspect adapting dicts would be just a matter of:

    register_adapter(dict, Json)

it would work with lists and other objects as well, but overriding the
default adapter (e.g. lists to arrays etc.). Because adapters are
currently global this may result in unwanted side effects.

It is definitely an use case to be mentioned in the docs.

Cheers,

-- Daniele


Re: JSON type caster

От
Daniele Varrazzo
Дата:
On Tue, Sep 18, 2012 at 5:44 PM, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:

> BTW, because of the zope problem discussed today, I'd be happy to have
> a release soon, so if nobody steps ahead in the next few hours, I'll
> try and implement the json support tonight and have it for psycopg
> 2.4.6.

I've implemented the Json adapter and the register_json function: they
are available in the "json" branch in my repository:
<https://github.com/dvarrazzo/psycopg/tree/json>. For PG 9.2 calling
register_json() is not required.

The docs are not online as still not complete; what is available can
be read generating the docs locally (make docs) or peeking at the
docstrings in _json.py
<https://github.com/dvarrazzo/psycopg/blob/json/lib/_json.py>

Comments and tests are appreciated.

-- Daniele


Re: JSON type caster

От
Tobias Oberstein
Дата:
Great!!

> I've implemented the Json adapter and the register_json function: they
> are available in the "json" branch in my repository:
> <https://github.com/dvarrazzo/psycopg/tree/json>. For PG 9.2 calling
> register_json() is not required.
>
> The docs are not online as still not complete; what is available can
> be read generating the docs locally (make docs) or peeking at the
> docstrings in _json.py
> <https://github.com/dvarrazzo/psycopg/blob/json/lib/_json.py>
>
> Comments and tests are appreciated.

I have tested the branch on FreeBSD 9 STABLE amd64 and PG 9.2. Works
flawlessly .. everything as expected.

The

psycopg2.extensions.register_adapter(dict, psycopg2.extras.Json)

feature is nifty!

A minor inconvenience: when on PG 9.2 OIDs are well know, but when
I need to register a custom JSON typecaster, I nevertheless need
to supply the OIDs _or_ provide a connection (which I may want
to avoid when I want the behavior globally):

loads = lambda x: json.loads(x, parse_float = Decimal)
psycopg2.extras.register_json(None, globally = True, loads = loads, oid
= 114, array_oid = 199)
#psycopg2.extras.register_json(None, globally = True, loads = loads) #
won't work

I am fine with that, but the example in the docs would probably profit
from mentioning this code snippet .. "how to install custom JSON
typecaster on PG92 globally".

Another thing that's probably inconvenient: psycopg2.extras.Json
forwards kwargs for customization, but there is no trivial way
of using a different "json" implementation altogether like i.e.

simplejson (which has features not in Py json even for 2.7 like
use_decimal .. which works correctly without any rounding)

ujson (which claims to be the fastes Py json anyway)


How about:

class Json(object):

     def __init__(self, adapted, **kwargs):
         self.adapted = adapted
         self.kwargs = kwargs

     def dumps(self, o, **kwargs):
         return json.dumps(o, **kwargs)

     def __conform__(self, proto):
         if proto is ISQLQuote:
             return self

     def getquoted(self):
         s = self.dumps(self.adapted, **self.kwargs)
         return QuotedString(s).getquoted()

class CustomJson(Json):
     def dumps(self, o, **kwargs):
         return simplejson.dumps(o, **kwargs)

==

So there is no need to reimplement getquoted (which may do Psycopg
implementation details)..

Cheers,
Tobias


>
> -- Daniele
>



Re: JSON type caster

От
Daniele Varrazzo
Дата:
On Wed, Sep 19, 2012 at 1:36 PM, Tobias Oberstein
<tobias.oberstein@gmail.com> wrote:

> A minor inconvenience: when on PG 9.2 OIDs are well know, but when
> I need to register a custom JSON typecaster, I nevertheless need
> to supply the OIDs _or_ provide a connection (which I may want
> to avoid when I want the behavior globally):
>
> loads = lambda x: json.loads(x, parse_float = Decimal)
> psycopg2.extras.register_json(None, globally = True, loads = loads, oid =
> 114, array_oid = 199)
> #psycopg2.extras.register_json(None, globally = True, loads = loads) # won't
> work
>
> I am fine with that, but the example in the docs would probably profit
> from mentioning this code snippet .. "how to install custom JSON typecaster
> on PG92 globally".

Oh, yes, I see the use case. You are right, it's too tricky for its
usefulness. The easiest (for us) solution would be to provide symbolic
constants in psycopg2.extensions. It results in a still too verbose:

    psycopg2.extras.register_json(None, globally=True, loads=loads,
oid=ext.JSON_OID, array_oid=ext.JSONARRAY_OID)

btw there's too use for the conn_or_curs=None: I think I'll set it as
default so that it can be dropped if only kwargs are provided. The
above would become:

    psycopg2.extras.register_json(globally=True, loads=loads,
oid=ext.JSON_OID, array_oid=ext.JSONARRAY_OID)

Uhm... but what about a new function? To be called

    psycopg2.extras.register_default_json(globally=True, loads=loads)

or
    conn = psycopg2.connect(...)
    psycopg2.extras.register_default_json(conn, loads=loads)

To be implemented like:

    def register_default_json(conn_or_curs=None, globally=False, loads=None):
        """Register json adapter for PG92 and following"""
        return register_json(conn_or_curs=conn_or_curs,
globally=globally, loads=loads,
            oid=114, array_oid=199)


> Another thing that's probably inconvenient: psycopg2.extras.Json
> forwards kwargs for customization, but there is no trivial way
> of using a different "json" implementation altogether

What I didn't like reviewing the two sides is the asymmetry: adapter
taking **kwargs and typecaster taking loads function. I'm thinking
about passing a dumps function instead of the **kwargs. It could be:

    class Json(object):
        def __init__(self, adapted, dumps=None):
            self.adapted = adapted
            self._dumps = dumps is None and json.dumps or dumps

        def dumps(self):
            return self._dumps(self.adapted)

        def getquoted(self):
            s = self.dumps()
            return QuotedString(s).getquoted()

This way customization can be performed either functional-style via a closure:

    def CustomJson(adapted):
        return Json(adapted,
            dumps=lambda x: simplejson.dumps(x, parse_float=Decimal))

or oo-style via subclassing:

    class CustomJson(Json):
        def dumps(self)
            return simplejson.dumps(self.adapted, parse_float=Decimal)

Thank you for the review. Comments?

-- Daniele


Re: JSON type caster

От
Tobias Oberstein
Дата:
> Uhm... but what about a new function? To be called
>
>      psycopg2.extras.register_default_json(globally=True, loads=loads)
>
> or
>      conn = psycopg2.connect(...)
>      psycopg2.extras.register_default_json(conn, loads=loads)
>

+1 for that:

it provides a simple to use interface for the use case "PG >=9.2 and
global custom json"


>> Another thing that's probably inconvenient: psycopg2.extras.Json
>> forwards kwargs for customization, but there is no trivial way
>> of using a different "json" implementation altogether
>
> What I didn't like reviewing the two sides is the asymmetry: adapter
> taking **kwargs and typecaster taking loads function. I'm thinking
> about passing a dumps function instead of the **kwargs. It could be:
>
>      class Json(object):
>          def __init__(self, adapted, dumps=None):
>              self.adapted = adapted
>              self._dumps = dumps is None and json.dumps or dumps
>
>          def dumps(self):
>              return self._dumps(self.adapted)
>
>          def getquoted(self):
>              s = self.dumps()
>              return QuotedString(s).getquoted()
>
> This way customization can be performed either functional-style via a closure:
>
>      def CustomJson(adapted):
>          return Json(adapted,
>              dumps=lambda x: simplejson.dumps(x, parse_float=Decimal))
>
> or oo-style via subclassing:
>
>      class CustomJson(Json):
>          def dumps(self)
>              return simplejson.dumps(self.adapted, parse_float=Decimal)
>

+1 for that also ..adding dump argument and dropping kwargs forwarding,
because customizing via kwargs on a per Json-instance basis is probably
the less likely case compared to "I want to use one custom loads/dumps
everywhere".

Probably the default dumps used within Json (when no specific dumps arg
was given) could even come via register_default_json i.e. when doing

psycopg2.extras.register_default_json(
globally = True,
loads = simplejson.loads,
dumps = simplejson.dumps)

which would make the symmetry accessible from 1 function.

The provided dumps would need to be stored at module level somewhere to
be accessible within Json

self._dumps = dumps is None and psycopg2.extras._dumps or dumps

Again, the argument would be that a likely use case is PG>=9.2 and
wanting to install global json loads/dumps used everywhere.

The simpler the API for that, the better. IMHO.

Wait, a problem with above: in the "per-connection variant"

psycopg2.extras.register_default_json(
conn,
loads = simplejson.loads,
dumps = simplejson.dumps)

where to store dumps so that Json instances can access it?

Mmh. There is still some asymmetry: loads is configured
globally or per connection, dumps is configured "per argument instance"
(Json instance) ..



> Thank you for the review. Comments?
>
> -- Daniele
>



Re: JSON type caster

От
Daniele Varrazzo
Дата:
On Wed, Sep 19, 2012 at 2:42 PM, Tobias Oberstein
<tobias.oberstein@gmail.com> wrote:
>> Uhm... but what about a new function? To be called
>>
>>      psycopg2.extras.register_default_json(globally=True, loads=loads)
>>
>> or
>>      conn = psycopg2.connect(...)
>>      psycopg2.extras.register_default_json(conn, loads=loads)
>>
>
> +1 for that:
>
> it provides a simple to use interface for the use case "PG >=9.2 and global
> custom json"

Done (this function plus the conn_or_curs argument optional if not
required, instead of requiring an explicit None)



>>      class Json(object):
>>          def __init__(self, adapted, dumps=None):
>>              self.adapted = adapted
>>              self._dumps = dumps is None and json.dumps or dumps

> +1 for that also ..adding dump argument and dropping kwargs forwarding,
> because customizing via kwargs on a per Json-instance basis is probably the
> less likely case compared to "I want to use one custom loads/dumps
> everywhere".

Done this too.


> Mmh. There is still some asymmetry: loads is configured
> globally or per connection, dumps is configured "per argument instance"
> (Json instance) ..

Yes, this is a little bit harder to kill. Json is peculiar on not
having a specific class to deal with, so the use of the adapter is
more explicit than with other data type. Modulo this issue, adapters
are only global whereas typecasters can be per connection or per
cursor too. I think it could be possible and desirable to register
adapters on more ristrected scopes.

-- Daniele


Re: JSON type caster

От
Tobias Oberstein
Дата:
I think there is a little issue left:

     def typecast_json(s, cur):
         return loads(s)

fails if s == None.

I'd suggest s.th. like

     def typecast_json(s, cur):
         if s is None:
             return None
         return loads(s)



Re: JSON type caster

От
Daniele Varrazzo
Дата:
On Wed, Sep 19, 2012 at 7:07 PM, Tobias Oberstein
<tobias.oberstein@gmail.com> wrote:
> I think there is a little issue left:
>
>     def typecast_json(s, cur):
>         return loads(s)
>
> fails if s == None.

Right: fixed, thank you.

-- Daniele