Обсуждение: Should the JSON datatype be a specialization of text?

Поиск
Список
Период
Сортировка

Should the JSON datatype be a specialization of text?

От
Joseph Adams
Дата:
Currently, the JSON datatype (repository:
http://git.postgresql.org/gitweb?p=json-datatype.git;a=summary ) is
implemented somewhat like a specialization of TEXT, like XML is.  I'm
beginning to question if this is the right way to go.  This doesn't
concern whether the JSON datatype should retain the given content
verbatim (e.g. '  "string"  '::JSON doesn't strip spaces) or whether
it should be internally stored using varlena (the same way TEXT is
stored).  What I'm talking about revolves around two fundamental
approaches to the API design:
A. JSON as a specialization of TEXT.  json('"string"')::TEXT =
'"string"'.  To unwrap it, you need a special function:
from_json('"string"')::TEXT .B. JSON as a naked type.  json('"string"')::TEXT = 'string'.  To
unwrap it, simply cast to the desired type.

Early in the discussion of the JSON datatype proposal, we leaned in
favor of approach A (see
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00263.php ).
However, based on input I've received (mainly questions about why
from_json and to_json exist), I'm beginning to think that while
approach A makes more sense from an implementor's perspective,
approach B makes a heck of a lot more sense to users.

Although my code currently implements approach A, I am in favor of approach B.

Arguments I can think of in favor of approach A (keeping JSON as a
specialization of TEXT):
* No surprises when casting between JSON and TEXT.  If approach B is
used, '"string"'::json would be '"string"', but '"string"'::json::text
would be 'string'.* 'null'::json and NULL are distinct.  'null'::json is just a string
containing 'null' and won't ever become NULL unless you explicitly
pass it through from_json.  Also, if I'm not mistaken, input functions
can't yield null when given non-null input (see the above link).* For users who just want to store some JSON-encoded
textin a
 
database for a while, approach A probably makes more sense.* Is consistent with the XML datatype.

Arguments in favor of approach B (making JSON a naked data type):
* Makes data more accessible.  Just cast to the type you need, just
like any other data type.  No need to remember to_json and from_json
(though these function names might be used for functions to convert
JSON-formatted TEXT to/from the JSON datatype).* Is consistent with other programming languages.  When you
json_decode something in PHP, you get an object or an array.  When you
paste a JSON literal into JavaScript code, you end up with a native
type, not some object you have to convert down to a native type.
Notice how in these programming languages, you typically carry
verbatim JSON texts around as strings, not a special string type that
performs validation.* JSON was meant to be a format representing types in a programming
language.  It has arrays, objects, strings, true, false and null
because JavaScript and many other popular scripting languages have
those.* Users tend to care more about the underlying data in JSON values
than the notion of JSON-formatted text (though users care about that
too).  If a user really wants to treat JSON like text, why not just
use TEXT along with CHECK (json_validate(content)) ?  Granted, it's
not as fun :-)

One workaround to the null problem of approach B might be to throw an
error when 'null' is passed to the input function (as in, don't allow
the JSON type to even hold 'null' (though null can be nested within an
array/object)), and have a function for converting text to JSON that
returns NULL if 'null' is given.  Note that I am strongly against only
allowing the JSON type to hold objects and arrays, in particular
because it would break being able to extract non-compound values from
JSON trees (e.g. json_get('[0,1,2,3]', '[2]') ).  By the way, how hard
would it be to get 'null'::JSON to yield NULL?

Keep in mind there's a chance someone will standardize JSON/SQL in the
future, so more may be at stake here than just PostgreSQL's codebase
and users.

Although text versus naked is a fundamental design aspect of the JSON
datatype, it shouldn't be a blocker for me moving forward with the
project.  Most of the code in place and in the works shouldn't be
affected by a transition from approach A to B.


Re: Should the JSON datatype be a specialization of text?

От
Robert Haas
Дата:
On Thu, Jun 17, 2010 at 2:29 AM, Joseph Adams
<joeyadams3.14159@gmail.com> wrote:
>        * No surprises when casting between JSON and TEXT.  If approach B is
> used, '"string"'::json would be '"string"', but '"string"'::json::text
> would be 'string'.

As far as I'm concerned, that's a non-starter.  It should be legal to
cast text to json, but what it should do is validate that the string
is already legal JSON, not quote it as a string.  I think you'll find
that things get pretty horribly messy pretty fast if you do it any
other way.  What happens if the user has {1,2,3} in a text column
someplace and wants to tread this as a JSON object?

>        * 'null'::json and NULL are distinct.  'null'::json is just a string
> containing 'null' and won't ever become NULL unless you explicitly
> pass it through from_json.  Also, if I'm not mistaken, input functions
> can't yield null when given non-null input (see the above link).

I believe that keeping a JSON NULL separate from an SQL NULL is
absolutely essential.

By the way, how about posting your code and adding it to the
CommitFest page for others to review?  Early feedback is usually good,
where these things are concerned.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Should the JSON datatype be a specialization of text?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jun 17, 2010 at 2:29 AM, Joseph Adams
> <joeyadams3.14159@gmail.com> wrote:
>> � � � �* No surprises when casting between JSON and TEXT. �If approach B is
>> used, '"string"'::json would be '"string"', but '"string"'::json::text
>> would be 'string'.

> As far as I'm concerned, that's a non-starter.  It should be legal to
> cast text to json, but what it should do is validate that the string
> is already legal JSON, not quote it as a string.

I'm not really convinced about that.  It seems clear to me that there
are two behaviors that we'd like:

1. Take a string that is legal JSON, and make it into a JSON object.

2. Take an arbitrary string (or a number, a bool, etc) and make it a
literal value within a JSON object.

We can make one of these behaviors be invoked by a cast, and the other
by an explicit function call --- the question is which is which.  I'm
inclined to think that associating #2 with casts might be better,
because clearly casting numerics or bools to JSON ought to act like #2.
If we do it as you suggest then casting text to JSON behaves differently
from casting anything else to JSON.
        regards, tom lane


Re: Should the JSON datatype be a specialization of text?

От
Robert Haas
Дата:
On Thu, Jun 17, 2010 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Jun 17, 2010 at 2:29 AM, Joseph Adams
>> <joeyadams3.14159@gmail.com> wrote:
>>>        * No surprises when casting between JSON and TEXT.  If approach B is
>>> used, '"string"'::json would be '"string"', but '"string"'::json::text
>>> would be 'string'.
>
>> As far as I'm concerned, that's a non-starter.  It should be legal to
>> cast text to json, but what it should do is validate that the string
>> is already legal JSON, not quote it as a string.
>
> I'm not really convinced about that.  It seems clear to me that there
> are two behaviors that we'd like:
>
> 1. Take a string that is legal JSON, and make it into a JSON object.
>
> 2. Take an arbitrary string (or a number, a bool, etc) and make it a
> literal value within a JSON object.
>
> We can make one of these behaviors be invoked by a cast, and the other
> by an explicit function call --- the question is which is which.

Up to this point I agree.

> I'm
> inclined to think that associating #2 with casts might be better,
> because clearly casting numerics or bools to JSON ought to act like #2.
> If we do it as you suggest then casting text to JSON behaves differently
> from casting anything else to JSON.

I think this is going to turn into a thicket of semantic ambiguity.
There are also two things you might want on output - (1) take a JSON
object and export it as a string; (2) take a JSON object and extract
from it some natively typed thing.  So what happens, for example, when
someone writes:

json 'true'

Do they get a JSON boolean or a JSON text?  i.e. true or 'true'?

Joseph's proposal also involved foo::text::json::text <> foo::text,
which seems pretty ugly to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Should the JSON datatype be a specialization of text?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jun 17, 2010 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm
>> inclined to think that associating #2 with casts might be better,
>> because clearly casting numerics or bools to JSON ought to act like #2.
>> If we do it as you suggest then casting text to JSON behaves differently
>> from casting anything else to JSON.

> I think this is going to turn into a thicket of semantic ambiguity.

True.  Maybe it would be better to *not have* casts as such between JSON
and non-text data types, but make you write something likejson_literal(numeric)
to get a JSON literal representing a value.  Then json_literal(text)
would do an unsurprising thing (analogous to quote_literal), and we
could use the casts between text and json for the behavior where the
text is interpreted as a valid JSON object.

> Joseph's proposal also involved foo::text::json::text <> foo::text,
> which seems pretty ugly to me.

Agreed, that's not too nice.
        regards, tom lane


Re: Should the JSON datatype be a specialization of text?

От
Robert Haas
Дата:
On Thu, Jun 17, 2010 at 12:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Jun 17, 2010 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I'm
>>> inclined to think that associating #2 with casts might be better,
>>> because clearly casting numerics or bools to JSON ought to act like #2.
>>> If we do it as you suggest then casting text to JSON behaves differently
>>> from casting anything else to JSON.
>
>> I think this is going to turn into a thicket of semantic ambiguity.
>
> True.  Maybe it would be better to *not have* casts as such between JSON
> and non-text data types, but make you write something like
>        json_literal(numeric)
> to get a JSON literal representing a value.  Then json_literal(text)
> would do an unsurprising thing (analogous to quote_literal), and we
> could use the casts between text and json for the behavior where the
> text is interpreted as a valid JSON object.

Yep, I agree.  Except you need a way to generate not only JSON objects
that are quoted strings, but also hashes, arrays, booleans, numbers,
and nulls...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company