Re: Proposal: Add JSON support

Поиск
Список
Период
Сортировка
От Joseph Adams
Тема Re: Proposal: Add JSON support
Дата
Msg-id e7e5fefd1003282024q573e5f7am39300cc11917a63c@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Proposal: Add JSON support  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Proposal: Add JSON support  (Robert Haas <robertmhaas@gmail.com>)
Re: Proposal: Add JSON support  (Dimitri Fontaine <dfontaine@hi-media.com>)
Re: Proposal: Add JSON support  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
On Sun, Mar 28, 2010 at 5:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
> <joeyadams3.14159@gmail.com> wrote:
>> Now my thoughts and opinions on the JSON parsing/unparsing itself:
>>
>> It should be built-in, rather than relying on an external library
>> (like XML does).
>
> Why?  I'm not saying you aren't right, but you need to make an
> argument rather than an assertion.  This is a community, so no one is
> entitled to decide anything unilaterally, and people want to be
> convinced - including me.

I apologize; I was just starting the conversation with some of my
ideas to receive feedback.  I didn't want people to have to wade
through too many "I think"s .  I'll be sure to use <opinion> tags in
the future :-)

My reasoning for "It should be built-in" is:* It would be nice to have a built-in serialization format that's
available by default.* It might be a little faster because it doesn't have to link to an
external library.* The code to interface between JSON logic and PostgreSQL will
probably be much larger than the actual JSON encoding/decoding itself.* The externally-maintained and packaged libjson
implementationsI 
saw brought in lots of dependencies (e.g. glib).* "Everyone else" (e.g. PHP) uses a statically-linked JSON
implementation.

Is the code in question "*really*" small?  Well, not really, but it's
not enormous either.  By the way, I found a bug in PHP's JSON_parser
(json_decode("true "); /* with a space */ returns null instead of
true).  I'll have to get around to reporting that.

Now, assuming JSON support is built-in to PostgreSQL and is enabled by
default, it is my opinion that encoding issues should not be dealt
with in the JSON code itself, but that the JSON code itself should
assume UTF-8.  I think conversions should be done to/from UTF-8 before
passing it through the JSON code because this would likely be the
smallest way to implement it (not necessarily the fastest, though).

Mike Rylander pointed out something wonderful, and that is that JSON
code can be stored in plain old ASCII using \u... .  If a target
encoding supports all of Unicode, the JSON serializer could be told
not to generate \u escapes.  Otherwise, the \u escapes would be
necessary.

Thus, here's an example of how (in my opinion) character sets and such
should be handled in the JSON code:

Suppose the client's encoding is UTF-16, and the server's encoding is
Latin-1.  When JSON is stored to the database:1. The client is responsible and sends a valid UTF-16 JSON string.2.
PostgreSQLchecks to make sure it is valid UTF-16, then converts 
it to UTF-8.3. The JSON code parses it (to ensure it's valid).4. The JSON code unparses it (to get a representation
without
needless whitespace).  It is given a flag indicating it should only
output ASCII text.5. The ASCII is stored in the server, since it is valid Latin-1.

When JSON is retrieved from the database:1. ASCII is retrieved from the server2. If user needs to extract one or more
fields,the JSON is parsed, 
and the fields are extracted.3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

Note that I am being biased toward optimizing code size rather than speed.

Here's a question about semantics: should converting JSON to text
guarantee that Unicode will be \u escaped, or should it render actual
Unicode whenever possible (when the client uses a Unicode-complete
charset) ?

As for reinventing the wheel, I'm in the process of writing yet
another JSON implementation simply because I didn't find the other
ones I looked at palatable.  I am aiming for simple code, not fast
code.  I am using malloc for structures and realloc for strings/arrays
rather than resorting to clever buffering tricks.  Of course, I'll
switch it over to palloc/repalloc before migrating it to PostgreSQL.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: GSoC Query
Следующее
От: Hitoshi Harada
Дата:
Сообщение: Re: Proposal: Add JSON support