Обсуждение: Re: JSON Patch for PostgreSQL - BSON Support?

Поиск
Список
Период
Сортировка

Re: JSON Patch for PostgreSQL - BSON Support?

От
Charles Pritchard
Дата:
I'd originally sent this to Joseph Adams, as he has been working on 
adding a JSON datatype.
I've suggested supporting BSON, as there are many client implementations 
available,
and the format is more efficient than xml and json trees for some use cases.

http://bsonspec.org/

On 8/15/10 11:27 AM, Joseph Adams wrote:
> On Sat, Aug 14, 2010 at 6:35 PM, Charles Pritchard<chuck@jumis.com>  wrote:
>    
>> Hello,
>>
>> I saw that you've submitted a patch to add JSON support to PostgreSQL.
>> Thought I'd mention that BSON is gradually picking up in popularity,
>> and does have a good amount of cross-client support.
>>
>> That said, I wouldn't be surprised if a patch were looked at with suspicion,
>> but it does make sense as a contrib module.
>>
>> BSON is a binary json format used by Mongo DB.
>>
>> I don't use it, at all, but it seems like a good fit, and the right time.
>>
>> Returning BSON would be more efficient than returning JSON,
>> for some use cases.
>>
>> -Charles
>>
>>      
> Would you mind posting this to the pgsql-hackers@postgresql.org
> mailing list?  I think a BSON module would be a good idea, as it would
> provide a more efficient alternative to the more text-oriented JSON
> data type.
>
>
> Joey Adams
>    



Re: JSON Patch for PostgreSQL - BSON Support?

От
Tom Lane
Дата:
Charles Pritchard <chuck@jumis.com> writes:
> I'd originally sent this to Joseph Adams, as he has been working on 
> adding a JSON datatype.
> I've suggested supporting BSON, as there are many client implementations 
> available,

I knew there would be a lot of critters crawling out as soon as we
turned over this rock.  Which other data-formats-of-the-week shall
we immortalize as core PG types?
        regards, tom lane


Re: JSON Patch for PostgreSQL - BSON Support?

От
Andrew Dunstan
Дата:

On 08/15/2010 11:03 PM, Tom Lane wrote:
> Charles Pritchard<chuck@jumis.com>  writes:
>> I'd originally sent this to Joseph Adams, as he has been working on
>> adding a JSON datatype.
>> I've suggested supporting BSON, as there are many client implementations
>> available,
> I knew there would be a lot of critters crawling out as soon as we
> turned over this rock.  Which other data-formats-of-the-week shall
> we immortalize as core PG types?
>
>             

If BSON is simply in effect an efficient encoding of JSON, then it's not 
clear to me that we would want another type at all. Rather, we might 
want to consider storing the data in this supposedly more efficient 
format, and maybe also some conversion routines.

I agree that we don't want in core a huge array of general serialization 
formats. The one thing that JSON has going for it for general use, in my 
view, is that, unlike hstore, the structure is not flat. That makes it 
potentially useful for various purposes, especially complex structured 
function arguments, in places where using hstore can be rather limiting, 
and xml overly verbose.

cheers

andrew


Re: JSON Patch for PostgreSQL - BSON Support?

От
Robert Haas
Дата:
On Sun, Aug 15, 2010 at 11:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Charles Pritchard <chuck@jumis.com> writes:
>> I'd originally sent this to Joseph Adams, as he has been working on
>> adding a JSON datatype.
>> I've suggested supporting BSON, as there are many client implementations
>> available,
>
> I knew there would be a lot of critters crawling out as soon as we
> turned over this rock.  Which other data-formats-of-the-week shall
> we immortalize as core PG types?

PER-encoded ASN.1, for when you really need something human-readable?  :-)

I think JSON is a reasonable choice for a core datatype; we don't
really have anything else with the same functionality.  But I'm not
really in favor of adding any more, especially things like YAML and
BSON that are essentially variants of JSON.  Which is not to say I
woudn't like to have those available, making the unproven assumption
that someone wants to write the code, but I don't really see why they
should be in core.  My theory is that XML and JSON are the big two,
and so far I haven't seen much evidence to the contrary.

JSON also has the advantage, as compared with XML and in general, of
being relatively simple and single-purpose.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: JSON Patch for PostgreSQL - BSON Support?

От
Greg Smith
Дата:
Robert Haas wrote: <blockquote cite="mid:AANLkTimZ5FJ8Q8GY7dq1BioN1R92GRs-p6Tj5yFpzXt1@mail.gmail.com"
type="cite"><prewrap="">On Sun, Aug 15, 2010 at 11:03 PM, Tom Lane <a class="moz-txt-link-rfc2396E"
href="mailto:tgl@sss.pgh.pa.us"><tgl@sss.pgh.pa.us></a>wrote: </pre><blockquote type="cite"><pre wrap="">I knew
therewould be a lot of critters crawling out as soon as we
 
turned over this rock.  Which other data-formats-of-the-week shall
we immortalize as core PG types?   </pre></blockquote><pre wrap="">
PER-encoded ASN.1, for when you really need something human-readable?  :-) </pre></blockquote><br /> I like to
prioritizewith "hits on Google" as a proxy for popularity:<br /><br /> XML:  341M<br /> CSV:  132M<br /> JSON:  96M<br
/>YAML:  6M<br /> ASN.1:  1.1M<br /> BSON:  130K<br /> "Protocol Buffers":  86K<br /> OGDL:  45K<br /><br /> I think
there'sa strong case to add JSON, as it may very well be the most popular data->text serialization format out there
notyet supported.  It's certainly beyond a format of the week at this point.<br /><br /> "XML is like violence: if it
doesn'tsolve your problem, you aren't using enough of it." - Chris Maden <br /><br /><pre class="moz-signature"
cols="72">--
 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
<a class="moz-txt-link-abbreviated" href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>   <a
class="moz-txt-link-abbreviated"href="http://www.2ndQuadrant.us">www.2ndQuadrant.us</a>
 
</pre>

Re: JSON Patch for PostgreSQL - BSON Support?

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> If BSON is simply in effect an efficient encoding of JSON, then it's not 
> clear to me that we would want another type at all. Rather, we might 
> want to consider storing the data in this supposedly more efficient 
> format, and maybe also some conversion routines.

Hmm, that's an interesting plan ...
        regards, tom lane


Re: JSON Patch for PostgreSQL - BSON Support?

От
Robert Haas
Дата:
On Mon, Aug 16, 2010 at 11:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> If BSON is simply in effect an efficient encoding of JSON, then it's not
>> clear to me that we would want another type at all. Rather, we might
>> want to consider storing the data in this supposedly more efficient
>> format, and maybe also some conversion routines.
>
> Hmm, that's an interesting plan ...

It is interesting, but I'm not sure that it will actually work out
well in practice.  If what we want to do is compress JSON, TOAST will
do that for us without any additional code, and probably a lot more
efficiently.  Of course, until someone tests it, we're just
speculating wildly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: JSON Patch for PostgreSQL - BSON Support?

От
Christopher Browne
Дата:
On Mon, Aug 16, 2010 at 11:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Aug 16, 2010 at 11:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>> If BSON is simply in effect an efficient encoding of JSON, then it's not
>>> clear to me that we would want another type at all. Rather, we might
>>> want to consider storing the data in this supposedly more efficient
>>> format, and maybe also some conversion routines.
>>
>> Hmm, that's an interesting plan ...
>
> It is interesting, but I'm not sure that it will actually work out
> well in practice.  If what we want to do is compress JSON, TOAST will
> do that for us without any additional code, and probably a lot more
> efficiently.  Of course, until someone tests it, we're just
> speculating wildly.

Yep, that was exactly what struck me.  TOAST is quite likely to be a
good answer for this.

The reason to want some other binary format would be if there were
other benefits to be had.

An "XML encoding" format could be interesting if it allowed having
GIST-ish indexes to search for tags particularly efficiently.  I say
"XML encoding" because I've not got any reason to think that a
JSON/BSON-only format would necessarily be preferable.

But "interesting" isn't the same thing as "the right answer."  For
now, TOAST seems perfectly reasonable.

If there's some "wire format for XML" that would allow more efficient
data transfer, that would be an improvement.  BSON sounds like it's
something like that, but only if it's better than "flavour of the
week."
--
http://linuxfinances.info/info/linuxdistributions.html


Re: JSON Patch for PostgreSQL - BSON Support?

От
"Joshua D. Drake"
Дата:
On Mon, 2010-08-16 at 11:40 -0400, Christopher Browne wrote:
> On Mon, Aug 16, 2010 at 11:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Mon, Aug 16, 2010 at 11:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Andrew Dunstan <andrew@dunslane.net> writes:
> >>> If BSON is simply in effect an efficient encoding of JSON, then it's not
> >>> clear to me that we would want another type at all. Rather, we might
> >>> want to consider storing the data in this supposedly more efficient
> >>> format, and maybe also some conversion routines.
> >>
> >> Hmm, that's an interesting plan ...
> >
> > It is interesting, but I'm not sure that it will actually work out
> > well in practice.  If what we want to do is compress JSON, TOAST will
> > do that for us without any additional code, and probably a lot more
> > efficiently.  Of course, until someone tests it, we're just
> > speculating wildly.
>
> Yep, that was exactly what struck me.  TOAST is quite likely to be a
> good answer for this.

Except: How much JSON data will actually be TOASTed?

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

Re: JSON Patch for PostgreSQL - BSON Support?

От
Charles Pritchard
Дата:
On 8/16/10 8:40 AM, Christopher Browne wrote:
> On Mon, Aug 16, 2010 at 11:21 AM, Robert Haas<robertmhaas@gmail.com>  wrote:
>    
>> On Mon, Aug 16, 2010 at 11:05 AM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>      
>>> Andrew Dunstan<andrew@dunslane.net>  writes:
>>>        
>>>> If BSON is simply in effect an efficient encoding of JSON, then it's not
>>>> clear to me that we would want another type at all. Rather, we might
>>>> want to consider storing the data in this supposedly more efficient
>>>> format, and maybe also some conversion routines.
>>>>          
>>> Hmm, that's an interesting plan ...
>>>        
>> It is interesting, but I'm not sure that it will actually work out
>> well in practice.  If what we want to do is compress JSON, TOAST will
>> do that for us without any additional code, and probably a lot more
>> efficiently.  Of course, until someone tests it, we're just
>> speculating wildly.
>>      
> Yep, that was exactly what struck me.  TOAST is quite likely to be a
> good answer for this.
>
> The reason to want some other binary format would be if there were
> other benefits to be had.
>
> An "XML encoding" format could be interesting if it allowed having
> GIST-ish indexes to search for tags particularly efficiently.  I say
> "XML encoding" because I've not got any reason to think that a
> JSON/BSON-only format would necessarily be preferable.
>
> But "interesting" isn't the same thing as "the right answer."  For
> now, TOAST seems perfectly reasonable.
>
> If there's some "wire format for XML" that would allow more efficient
> data transfer, that would be an improvement.  BSON sounds like it's
> something like that, but only if it's better than "flavour of the
> week."
>    

XML encoding has certainly been investigated within the W3C public docs:
http://www.w3.org/2003/08/binary-interchange-workshop/Report.html  
(discussion)
http://www.w3.org/TR/xbc-characterization/ (summary)

Leading to the current draft of EXI:
http://www.w3.org/XML/EXI/

The spec is a rather large undertaking. It makes sense to add to the XML 
ToDo wiki page.
EXI will certainly be better than TOAST for larger XML docs.

...

BSON does not compress text content -- TOAST would still have its 
advantages.
It mainly shortens the representation of JSON data structures.

Again, I think the primary benefit of BSON would be data traversal.
The benefit is the same for a client receiving BSON, as the server.

Data lengths are specified, allowing quick optimizations for things like 
key_exists
and equivalencies. Client's supporting BSON could benefit from a quick 
pass-through.
And I'd imagine a very slight benefit toward indexing, were GIN / hstore 
processes used.

Still, as has been noted on this thread.. We don't have numbers to work 
with.
With json as a core data type; and "bson" as a possible function working 
with the json
type, there's not much of a risk going in either direction (text + 
TOAST, bson + TOAST).








Re: JSON Patch for PostgreSQL - BSON Support?

От
Joseph Adams
Дата:
On Sun, Aug 15, 2010 at 11:47 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
> If BSON is simply in effect an efficient encoding of JSON, then it's not
> clear to me that we would want another type at all. Rather, we might want to
> consider storing the data in this supposedly more efficient format, and
> maybe also some conversion routines.

An issue is that the current JSON data type implementation preserves
the original text (meaning if you say '[1,2,"\u0020"   ]'::JSON, it
will yield '[1,2,"\u0020"   ]' rather than '[1,2," "]' .  I haven't
researched BSON much at all, but I seriously doubt that part of its
spec includes handling external JSON encoding details like whitespace
and superfluous escapes.

Even though I spent a long time implementing it, the original text
preservation feature should be dropped, in my opinion.  Users tend to
care more about the data inside of a JSON value rather than how it's
encoded, and replacement of values can have funky consequences on
indentation when working with indented JSON text (similar to how
pasting in a text editor puts pasted content at the wrong indent
level).

By dropping original text preservation, in the future (or now), JSON
could be encoded in BSON (or a more efficient format) in the database
rather than in JSON-encoded text.

Also, an idea would be to make json_send and json_recv (binary JSON
send/receive) use BSON rather than JSON-encoded text, as
sending/receiving JSON-encoded text is exactly what text send/receive
do.


Joey Adams


Re: JSON Patch for PostgreSQL - BSON Support?

От
Heikki Linnakangas
Дата:
On 16/08/10 20:17, Joseph Adams wrote:
> Also, an idea would be to make json_send and json_recv (binary JSON
> send/receive) use BSON rather than JSON-encoded text, as
> sending/receiving JSON-encoded text is exactly what text send/receive
> do.

The usual reason to use the binary format is performance, so it doesn't 
make much sense to use BSON for that if the internal storage format is 
something else. It would most likely be slower, not faster, than sending 
the string as is.

Of course, if you switch to using BSON as the internal storage format, 
then it's natural to use that for the binary I/O format too.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: JSON Patch for PostgreSQL - BSON Support?

От
Charles Pritchard
Дата:
On 8/15/10 8:47 PM, Andrew Dunstan wrote:
> On 08/15/2010 11:03 PM, Tom Lane wrote:
>> Charles Pritchard<chuck@jumis.com>  writes:
>>> I'd originally sent this to Joseph Adams, as he has been working on
>>> adding a JSON datatype.
>>> I've suggested supporting BSON, as there are many client 
>>> implementations
>>> available,
>> I knew there would be a lot of critters crawling out as soon as we
>> turned over this rock.  Which other data-formats-of-the-week shall
>> we immortalize as core PG types?
>>
>>
>
> If BSON is simply in effect an efficient encoding of JSON, then it's 
> not clear to me that we would want another type at all. Rather, we 
> might want to consider storing the data in this supposedly more 
> efficient format, and maybe also some conversion routines.
>
> I agree that we don't want in core a huge array of general 
> serialization formats. The one thing that JSON has going for it for 
> general use, in my view, is that, unlike hstore, the structure is not 
> flat. That makes it potentially useful for various purposes, 
> especially complex structured function arguments, in places where 
> using hstore can be rather limiting, and xml overly verbose.
While I certainly haven't done homework on this -- I agree with Andrew.

Storing internally as BSON (if it holds up to its premise) would mean 
more efficient traversal
of internal objects in the future, if we were to have JSON-related 
functions/selectors.

The core type would still be json, and would return as text, a json string,
but internally it would be stored as BSON, and a function would be 
available,
json_to_bson(typedjsoncol::json), returning a binary string.



Re: JSON Patch for PostgreSQL - BSON Support?

От
"Kevin Grittner"
Дата:
Charles Pritchard <chuck@jumis.com> wrote:
> Storing internally as BSON (if it holds up to its premise) would
> mean more efficient traversal of internal objects in the future,
> if we were to have JSON-related functions/selectors.
How about the fact that not all JSON objects can be represented in
BSON (if the JSON object has a very long string), and not all BSON
objects can be represented in JSON (if the BSON object has an
array).  Or do we invent our own flavors of one or both to cover the
mismatch?
-Kevin


Re: JSON Patch for PostgreSQL - BSON Support?

От
"Joshua D. Drake"
Дата:
On Mon, 2010-08-16 at 11:40 -0400, Christopher Browne wrote:
> On Mon, Aug 16, 2010 at 11:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Mon, Aug 16, 2010 at 11:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Andrew Dunstan <andrew@dunslane.net> writes:
> >>> If BSON is simply in effect an efficient encoding of JSON, then it's not
> >>> clear to me that we would want another type at all. Rather, we might
> >>> want to consider storing the data in this supposedly more efficient
> >>> format, and maybe also some conversion routines.
> >>
> >> Hmm, that's an interesting plan ...
> >
> > It is interesting, but I'm not sure that it will actually work out
> > well in practice.  If what we want to do is compress JSON, TOAST will
> > do that for us without any additional code, and probably a lot more
> > efficiently.  Of course, until someone tests it, we're just
> > speculating wildly.
> 
> Yep, that was exactly what struck me.  TOAST is quite likely to be a
> good answer for this.

Except: How much JSON data will actually be TOASTed?

Joshua D. Drake

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt



Re: JSON Patch for PostgreSQL - BSON Support?

От
Tom Lane
Дата:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Charles Pritchard <chuck@jumis.com> wrote:
>> Storing internally as BSON (if it holds up to its premise) would
>> mean more efficient traversal of internal objects in the future,
>> if we were to have JSON-related functions/selectors.
> How about the fact that not all JSON objects can be represented in
> BSON (if the JSON object has a very long string), and not all BSON
> objects can be represented in JSON (if the BSON object has an
> array).

Well, if it's not just a binary encoding of JSON, I think we can forget
about it ... certainly it won't work in the form I was visualizing.
        regards, tom lane


Re: JSON Patch for PostgreSQL - BSON Support?

От
Andres Freund
Дата:
1;2403;0cOn Mon, Aug 16, 2010 at 05:02:47PM -0500, Kevin Grittner wrote:
> Charles Pritchard <chuck@jumis.com> wrote:
>
> > Storing internally as BSON (if it holds up to its premise) would
> > mean more efficient traversal of internal objects in the future,
> > if we were to have JSON-related functions/selectors.
> How about the fact that not all JSON objects can be represented in
> BSON (if the JSON object has a very long string)
Any such long string wont be representable in pg anyway. Or am I
missing something here?

Besides that I have to say that I find it pretty strange to design a
supposedly generic file-format with a 32bit signed integer length...

> , and not all BSON objects can be represented in JSON (if the BSON object has an
> array).  Or do we invent our own flavors of one or both to cover the
> mismatch?
The BSON representation could be purely internal...

Andres


Re: JSON Patch for PostgreSQL - BSON Support?

От
Joseph Adams
Дата:
On Mon, Aug 16, 2010 at 7:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Well, if it's not just a binary encoding of JSON, I think we can forget
> about it ... certainly it won't work in the form I was visualizing.
>
>                        regards, tom lane

I just read the spec, and BSON has a lot of bells and whistles
attached (such as labeling binary data with a subtype like UUID or
MD5).  With a couple exceptions (see below), any JSON can be converted
to BSON (but the way BSON does arrays is silly: item indices are
stored as strings), but not all BSONs can be converted to JSON without
losing some type details.

Others already mentioned that you can't convert 2 billion byte long
JSON strings to BSON.  Another issue is that BSON cannot encode all
JSON numbers without precision loss.  JSON can hold any number
matching

'-'? (0 | [1-9][0-9]*) ('.' [0-9]+)? ([Ee] [+-]? [0-9]+)?

but BSON pidgenholes numeric values to either double, int32, int64, or
a 12-byte MongoDB Object ID.  Thus, for people who expect JSON to be
able to hold arbitrary-precision numbers (which the JSON data type in
my patch can), using BSON for transfer or storage will violate that
expectation.

Now that I know more about BSON, my opinion is that it shouldn't be
used as the transfer or storage format of the JSON data type.  Maybe
if someone wants to do the work, BSON could be implemented as a
contrib module, and functions could be provided in that module to
convert to/from JSON with documented caveats.


Joey Adams


Re: JSON Patch for PostgreSQL - BSON Support?

От
Josh Berkus
Дата:
> but BSON pidgenholes numeric values to either double, int32, int64, or
> a 12-byte MongoDB Object ID.  Thus, for people who expect JSON to be
> able to hold arbitrary-precision numbers (which the JSON data type in
> my patch can), using BSON for transfer or storage will violate that
> expectation.

Good lord.  I'd suggest that maybe we wait for BSON v. 2.0 instead.

Is BSON even any kind of a standard?

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: JSON Patch for PostgreSQL - BSON Support?

От
Christopher Browne
Дата:
On Mon, Aug 16, 2010 at 8:38 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> but BSON pidgenholes numeric values to either double, int32, int64, or
>> a 12-byte MongoDB Object ID.  Thus, for people who expect JSON to be
>> able to hold arbitrary-precision numbers (which the JSON data type in
>> my patch can), using BSON for transfer or storage will violate that
>> expectation.
>
> Good lord.  I'd suggest that maybe we wait for BSON v. 2.0 instead.
>
> Is BSON even any kind of a standard?

You know that if it were a standard, it would be WORSE!

:-)

This falls into big time "no way!"

If there was a standardized binary encoding for XML that was
effectively a tree (e.g. - more or less like a set of Lisp objects
with CAR/CDR linkages), that would be somewhat interesting, as it
could be both comparatively compact, and perhaps offer rapid
navigation through the tree.  BSON doesn't sound like that!
--
http://linuxfinances.info/info/linuxdistributions.html


Re: JSON Patch for PostgreSQL - BSON Support?

От
Tom Lane
Дата:
Joseph Adams <joeyadams3.14159@gmail.com> writes:
> Others already mentioned that you can't convert 2 billion byte long
> JSON strings to BSON.  Another issue is that BSON cannot encode all
> JSON numbers without precision loss.

As somebody already mentioned, the former isn't likely to be an issue
for us anytime in the foreseeable future, because we can't push around
datum values more than 1GB large anyhow.  The latter seems like a pretty
nasty problem though.

I'm good with just dropping this idea for the moment.  The Google hit
statistics that were cited earlier show that there's not enough interest
in BSON to justify a separate datatype, which is what it would
apparently need to be.
        regards, tom lane