Обсуждение: Should the JSON datatype be a specialization of text?
Currently, the JSON datatype (repository: http://git.postgresql.org/gitweb?p=json-datatype.git;a=summary ) is implemented somewhat like a specialization of TEXT, like XML is. I'm beginning to question if this is the right way to go. This doesn't concern whether the JSON datatype should retain the given content verbatim (e.g. ' "string" '::JSON doesn't strip spaces) or whether it should be internally stored using varlena (the same way TEXT is stored). What I'm talking about revolves around two fundamental approaches to the API design: A. JSON as a specialization of TEXT. json('"string"')::TEXT = '"string"'. To unwrap it, you need a special function: from_json('"string"')::TEXT .B. JSON as a naked type. json('"string"')::TEXT = 'string'. To unwrap it, simply cast to the desired type. Early in the discussion of the JSON datatype proposal, we leaned in favor of approach A (see http://archives.postgresql.org/pgsql-hackers/2010-04/msg00263.php ). However, based on input I've received (mainly questions about why from_json and to_json exist), I'm beginning to think that while approach A makes more sense from an implementor's perspective, approach B makes a heck of a lot more sense to users. Although my code currently implements approach A, I am in favor of approach B. Arguments I can think of in favor of approach A (keeping JSON as a specialization of TEXT): * No surprises when casting between JSON and TEXT. If approach B is used, '"string"'::json would be '"string"', but '"string"'::json::text would be 'string'.* 'null'::json and NULL are distinct. 'null'::json is just a string containing 'null' and won't ever become NULL unless you explicitly pass it through from_json. Also, if I'm not mistaken, input functions can't yield null when given non-null input (see the above link).* For users who just want to store some JSON-encoded textin a database for a while, approach A probably makes more sense.* Is consistent with the XML datatype. Arguments in favor of approach B (making JSON a naked data type): * Makes data more accessible. Just cast to the type you need, just like any other data type. No need to remember to_json and from_json (though these function names might be used for functions to convert JSON-formatted TEXT to/from the JSON datatype).* Is consistent with other programming languages. When you json_decode something in PHP, you get an object or an array. When you paste a JSON literal into JavaScript code, you end up with a native type, not some object you have to convert down to a native type. Notice how in these programming languages, you typically carry verbatim JSON texts around as strings, not a special string type that performs validation.* JSON was meant to be a format representing types in a programming language. It has arrays, objects, strings, true, false and null because JavaScript and many other popular scripting languages have those.* Users tend to care more about the underlying data in JSON values than the notion of JSON-formatted text (though users care about that too). If a user really wants to treat JSON like text, why not just use TEXT along with CHECK (json_validate(content)) ? Granted, it's not as fun :-) One workaround to the null problem of approach B might be to throw an error when 'null' is passed to the input function (as in, don't allow the JSON type to even hold 'null' (though null can be nested within an array/object)), and have a function for converting text to JSON that returns NULL if 'null' is given. Note that I am strongly against only allowing the JSON type to hold objects and arrays, in particular because it would break being able to extract non-compound values from JSON trees (e.g. json_get('[0,1,2,3]', '[2]') ). By the way, how hard would it be to get 'null'::JSON to yield NULL? Keep in mind there's a chance someone will standardize JSON/SQL in the future, so more may be at stake here than just PostgreSQL's codebase and users. Although text versus naked is a fundamental design aspect of the JSON datatype, it shouldn't be a blocker for me moving forward with the project. Most of the code in place and in the works shouldn't be affected by a transition from approach A to B.
On Thu, Jun 17, 2010 at 2:29 AM, Joseph Adams <joeyadams3.14159@gmail.com> wrote: > * No surprises when casting between JSON and TEXT. If approach B is > used, '"string"'::json would be '"string"', but '"string"'::json::text > would be 'string'. As far as I'm concerned, that's a non-starter. It should be legal to cast text to json, but what it should do is validate that the string is already legal JSON, not quote it as a string. I think you'll find that things get pretty horribly messy pretty fast if you do it any other way. What happens if the user has {1,2,3} in a text column someplace and wants to tread this as a JSON object? > * 'null'::json and NULL are distinct. 'null'::json is just a string > containing 'null' and won't ever become NULL unless you explicitly > pass it through from_json. Also, if I'm not mistaken, input functions > can't yield null when given non-null input (see the above link). I believe that keeping a JSON NULL separate from an SQL NULL is absolutely essential. By the way, how about posting your code and adding it to the CommitFest page for others to review? Early feedback is usually good, where these things are concerned. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 17, 2010 at 2:29 AM, Joseph Adams > <joeyadams3.14159@gmail.com> wrote: >> � � � �* No surprises when casting between JSON and TEXT. �If approach B is >> used, '"string"'::json would be '"string"', but '"string"'::json::text >> would be 'string'. > As far as I'm concerned, that's a non-starter. It should be legal to > cast text to json, but what it should do is validate that the string > is already legal JSON, not quote it as a string. I'm not really convinced about that. It seems clear to me that there are two behaviors that we'd like: 1. Take a string that is legal JSON, and make it into a JSON object. 2. Take an arbitrary string (or a number, a bool, etc) and make it a literal value within a JSON object. We can make one of these behaviors be invoked by a cast, and the other by an explicit function call --- the question is which is which. I'm inclined to think that associating #2 with casts might be better, because clearly casting numerics or bools to JSON ought to act like #2. If we do it as you suggest then casting text to JSON behaves differently from casting anything else to JSON. regards, tom lane
On Thu, Jun 17, 2010 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jun 17, 2010 at 2:29 AM, Joseph Adams >> <joeyadams3.14159@gmail.com> wrote: >>> * No surprises when casting between JSON and TEXT. If approach B is >>> used, '"string"'::json would be '"string"', but '"string"'::json::text >>> would be 'string'. > >> As far as I'm concerned, that's a non-starter. It should be legal to >> cast text to json, but what it should do is validate that the string >> is already legal JSON, not quote it as a string. > > I'm not really convinced about that. It seems clear to me that there > are two behaviors that we'd like: > > 1. Take a string that is legal JSON, and make it into a JSON object. > > 2. Take an arbitrary string (or a number, a bool, etc) and make it a > literal value within a JSON object. > > We can make one of these behaviors be invoked by a cast, and the other > by an explicit function call --- the question is which is which. Up to this point I agree. > I'm > inclined to think that associating #2 with casts might be better, > because clearly casting numerics or bools to JSON ought to act like #2. > If we do it as you suggest then casting text to JSON behaves differently > from casting anything else to JSON. I think this is going to turn into a thicket of semantic ambiguity. There are also two things you might want on output - (1) take a JSON object and export it as a string; (2) take a JSON object and extract from it some natively typed thing. So what happens, for example, when someone writes: json 'true' Do they get a JSON boolean or a JSON text? i.e. true or 'true'? Joseph's proposal also involved foo::text::json::text <> foo::text, which seems pretty ugly to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 17, 2010 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm >> inclined to think that associating #2 with casts might be better, >> because clearly casting numerics or bools to JSON ought to act like #2. >> If we do it as you suggest then casting text to JSON behaves differently >> from casting anything else to JSON. > I think this is going to turn into a thicket of semantic ambiguity. True. Maybe it would be better to *not have* casts as such between JSON and non-text data types, but make you write something likejson_literal(numeric) to get a JSON literal representing a value. Then json_literal(text) would do an unsurprising thing (analogous to quote_literal), and we could use the casts between text and json for the behavior where the text is interpreted as a valid JSON object. > Joseph's proposal also involved foo::text::json::text <> foo::text, > which seems pretty ugly to me. Agreed, that's not too nice. regards, tom lane
On Thu, Jun 17, 2010 at 12:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jun 17, 2010 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm >>> inclined to think that associating #2 with casts might be better, >>> because clearly casting numerics or bools to JSON ought to act like #2. >>> If we do it as you suggest then casting text to JSON behaves differently >>> from casting anything else to JSON. > >> I think this is going to turn into a thicket of semantic ambiguity. > > True. Maybe it would be better to *not have* casts as such between JSON > and non-text data types, but make you write something like > json_literal(numeric) > to get a JSON literal representing a value. Then json_literal(text) > would do an unsurprising thing (analogous to quote_literal), and we > could use the casts between text and json for the behavior where the > text is interpreted as a valid JSON object. Yep, I agree. Except you need a way to generate not only JSON objects that are quoted strings, but also hashes, arrays, booleans, numbers, and nulls... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company