Обсуждение: Re: JSON Patch for PostgreSQL - BSON Support?
I'd originally sent this to Joseph Adams, as he has been working on adding a JSON datatype. I've suggested supporting BSON, as there are many client implementations available, and the format is more efficient than xml and json trees for some use cases. http://bsonspec.org/ On 8/15/10 11:27 AM, Joseph Adams wrote: > On Sat, Aug 14, 2010 at 6:35 PM, Charles Pritchard<chuck@jumis.com> wrote: > >> Hello, >> >> I saw that you've submitted a patch to add JSON support to PostgreSQL. >> Thought I'd mention that BSON is gradually picking up in popularity, >> and does have a good amount of cross-client support. >> >> That said, I wouldn't be surprised if a patch were looked at with suspicion, >> but it does make sense as a contrib module. >> >> BSON is a binary json format used by Mongo DB. >> >> I don't use it, at all, but it seems like a good fit, and the right time. >> >> Returning BSON would be more efficient than returning JSON, >> for some use cases. >> >> -Charles >> >> > Would you mind posting this to the pgsql-hackers@postgresql.org > mailing list? I think a BSON module would be a good idea, as it would > provide a more efficient alternative to the more text-oriented JSON > data type. > > > Joey Adams >
Charles Pritchard <chuck@jumis.com> writes: > I'd originally sent this to Joseph Adams, as he has been working on > adding a JSON datatype. > I've suggested supporting BSON, as there are many client implementations > available, I knew there would be a lot of critters crawling out as soon as we turned over this rock. Which other data-formats-of-the-week shall we immortalize as core PG types? regards, tom lane
On 08/15/2010 11:03 PM, Tom Lane wrote: > Charles Pritchard<chuck@jumis.com> writes: >> I'd originally sent this to Joseph Adams, as he has been working on >> adding a JSON datatype. >> I've suggested supporting BSON, as there are many client implementations >> available, > I knew there would be a lot of critters crawling out as soon as we > turned over this rock. Which other data-formats-of-the-week shall > we immortalize as core PG types? > > If BSON is simply in effect an efficient encoding of JSON, then it's not clear to me that we would want another type at all. Rather, we might want to consider storing the data in this supposedly more efficient format, and maybe also some conversion routines. I agree that we don't want in core a huge array of general serialization formats. The one thing that JSON has going for it for general use, in my view, is that, unlike hstore, the structure is not flat. That makes it potentially useful for various purposes, especially complex structured function arguments, in places where using hstore can be rather limiting, and xml overly verbose. cheers andrew
On Sun, Aug 15, 2010 at 11:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Charles Pritchard <chuck@jumis.com> writes: >> I'd originally sent this to Joseph Adams, as he has been working on >> adding a JSON datatype. >> I've suggested supporting BSON, as there are many client implementations >> available, > > I knew there would be a lot of critters crawling out as soon as we > turned over this rock. Which other data-formats-of-the-week shall > we immortalize as core PG types? PER-encoded ASN.1, for when you really need something human-readable? :-) I think JSON is a reasonable choice for a core datatype; we don't really have anything else with the same functionality. But I'm not really in favor of adding any more, especially things like YAML and BSON that are essentially variants of JSON. Which is not to say I woudn't like to have those available, making the unproven assumption that someone wants to write the code, but I don't really see why they should be in core. My theory is that XML and JSON are the big two, and so far I haven't seen much evidence to the contrary. JSON also has the advantage, as compared with XML and in general, of being relatively simple and single-purpose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: <blockquote cite="mid:AANLkTimZ5FJ8Q8GY7dq1BioN1R92GRs-p6Tj5yFpzXt1@mail.gmail.com" type="cite"><prewrap="">On Sun, Aug 15, 2010 at 11:03 PM, Tom Lane <a class="moz-txt-link-rfc2396E" href="mailto:tgl@sss.pgh.pa.us"><tgl@sss.pgh.pa.us></a>wrote: </pre><blockquote type="cite"><pre wrap="">I knew therewould be a lot of critters crawling out as soon as we turned over this rock. Which other data-formats-of-the-week shall we immortalize as core PG types? </pre></blockquote><pre wrap=""> PER-encoded ASN.1, for when you really need something human-readable? :-) </pre></blockquote><br /> I like to prioritizewith "hits on Google" as a proxy for popularity:<br /><br /> XML: 341M<br /> CSV: 132M<br /> JSON: 96M<br />YAML: 6M<br /> ASN.1: 1.1M<br /> BSON: 130K<br /> "Protocol Buffers": 86K<br /> OGDL: 45K<br /><br /> I think there'sa strong case to add JSON, as it may very well be the most popular data->text serialization format out there notyet supported. It's certainly beyond a format of the week at this point.<br /><br /> "XML is like violence: if it doesn'tsolve your problem, you aren't using enough of it." - Chris Maden <br /><br /><pre class="moz-signature" cols="72">-- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support <a class="moz-txt-link-abbreviated" href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a> <a class="moz-txt-link-abbreviated"href="http://www.2ndQuadrant.us">www.2ndQuadrant.us</a> </pre>
Andrew Dunstan <andrew@dunslane.net> writes: > If BSON is simply in effect an efficient encoding of JSON, then it's not > clear to me that we would want another type at all. Rather, we might > want to consider storing the data in this supposedly more efficient > format, and maybe also some conversion routines. Hmm, that's an interesting plan ... regards, tom lane
On Mon, Aug 16, 2010 at 11:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> If BSON is simply in effect an efficient encoding of JSON, then it's not >> clear to me that we would want another type at all. Rather, we might >> want to consider storing the data in this supposedly more efficient >> format, and maybe also some conversion routines. > > Hmm, that's an interesting plan ... It is interesting, but I'm not sure that it will actually work out well in practice. If what we want to do is compress JSON, TOAST will do that for us without any additional code, and probably a lot more efficiently. Of course, until someone tests it, we're just speculating wildly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Mon, Aug 16, 2010 at 11:21 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Aug 16, 2010 at 11:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Andrew Dunstan <andrew@dunslane.net> writes: >>> If BSON is simply in effect an efficient encoding of JSON, then it's not >>> clear to me that we would want another type at all. Rather, we might >>> want to consider storing the data in this supposedly more efficient >>> format, and maybe also some conversion routines. >> >> Hmm, that's an interesting plan ... > > It is interesting, but I'm not sure that it will actually work out > well in practice. If what we want to do is compress JSON, TOAST will > do that for us without any additional code, and probably a lot more > efficiently. Of course, until someone tests it, we're just > speculating wildly. Yep, that was exactly what struck me. TOAST is quite likely to be a good answer for this. The reason to want some other binary format would be if there were other benefits to be had. An "XML encoding" format could be interesting if it allowed having GIST-ish indexes to search for tags particularly efficiently. I say "XML encoding" because I've not got any reason to think that a JSON/BSON-only format would necessarily be preferable. But "interesting" isn't the same thing as "the right answer." For now, TOAST seems perfectly reasonable. If there's some "wire format for XML" that would allow more efficient data transfer, that would be an improvement. BSON sounds like it's something like that, but only if it's better than "flavour of the week." -- http://linuxfinances.info/info/linuxdistributions.html
On Mon, 2010-08-16 at 11:40 -0400, Christopher Browne wrote: > On Mon, Aug 16, 2010 at 11:21 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Aug 16, 2010 at 11:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Andrew Dunstan <andrew@dunslane.net> writes: > >>> If BSON is simply in effect an efficient encoding of JSON, then it's not > >>> clear to me that we would want another type at all. Rather, we might > >>> want to consider storing the data in this supposedly more efficient > >>> format, and maybe also some conversion routines. > >> > >> Hmm, that's an interesting plan ... > > > > It is interesting, but I'm not sure that it will actually work out > > well in practice. If what we want to do is compress JSON, TOAST will > > do that for us without any additional code, and probably a lot more > > efficiently. Of course, until someone tests it, we're just > > speculating wildly. > > Yep, that was exactly what struck me. TOAST is quite likely to be a > good answer for this. Except: How much JSON data will actually be TOASTed? Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On 8/16/10 8:40 AM, Christopher Browne wrote: > On Mon, Aug 16, 2010 at 11:21 AM, Robert Haas<robertmhaas@gmail.com> wrote: > >> On Mon, Aug 16, 2010 at 11:05 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> >>> Andrew Dunstan<andrew@dunslane.net> writes: >>> >>>> If BSON is simply in effect an efficient encoding of JSON, then it's not >>>> clear to me that we would want another type at all. Rather, we might >>>> want to consider storing the data in this supposedly more efficient >>>> format, and maybe also some conversion routines. >>>> >>> Hmm, that's an interesting plan ... >>> >> It is interesting, but I'm not sure that it will actually work out >> well in practice. If what we want to do is compress JSON, TOAST will >> do that for us without any additional code, and probably a lot more >> efficiently. Of course, until someone tests it, we're just >> speculating wildly. >> > Yep, that was exactly what struck me. TOAST is quite likely to be a > good answer for this. > > The reason to want some other binary format would be if there were > other benefits to be had. > > An "XML encoding" format could be interesting if it allowed having > GIST-ish indexes to search for tags particularly efficiently. I say > "XML encoding" because I've not got any reason to think that a > JSON/BSON-only format would necessarily be preferable. > > But "interesting" isn't the same thing as "the right answer." For > now, TOAST seems perfectly reasonable. > > If there's some "wire format for XML" that would allow more efficient > data transfer, that would be an improvement. BSON sounds like it's > something like that, but only if it's better than "flavour of the > week." > XML encoding has certainly been investigated within the W3C public docs: http://www.w3.org/2003/08/binary-interchange-workshop/Report.html (discussion) http://www.w3.org/TR/xbc-characterization/ (summary) Leading to the current draft of EXI: http://www.w3.org/XML/EXI/ The spec is a rather large undertaking. It makes sense to add to the XML ToDo wiki page. EXI will certainly be better than TOAST for larger XML docs. ... BSON does not compress text content -- TOAST would still have its advantages. It mainly shortens the representation of JSON data structures. Again, I think the primary benefit of BSON would be data traversal. The benefit is the same for a client receiving BSON, as the server. Data lengths are specified, allowing quick optimizations for things like key_exists and equivalencies. Client's supporting BSON could benefit from a quick pass-through. And I'd imagine a very slight benefit toward indexing, were GIN / hstore processes used. Still, as has been noted on this thread.. We don't have numbers to work with. With json as a core data type; and "bson" as a possible function working with the json type, there's not much of a risk going in either direction (text + TOAST, bson + TOAST).
On Sun, Aug 15, 2010 at 11:47 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > If BSON is simply in effect an efficient encoding of JSON, then it's not > clear to me that we would want another type at all. Rather, we might want to > consider storing the data in this supposedly more efficient format, and > maybe also some conversion routines. An issue is that the current JSON data type implementation preserves the original text (meaning if you say '[1,2,"\u0020" ]'::JSON, it will yield '[1,2,"\u0020" ]' rather than '[1,2," "]' . I haven't researched BSON much at all, but I seriously doubt that part of its spec includes handling external JSON encoding details like whitespace and superfluous escapes. Even though I spent a long time implementing it, the original text preservation feature should be dropped, in my opinion. Users tend to care more about the data inside of a JSON value rather than how it's encoded, and replacement of values can have funky consequences on indentation when working with indented JSON text (similar to how pasting in a text editor puts pasted content at the wrong indent level). By dropping original text preservation, in the future (or now), JSON could be encoded in BSON (or a more efficient format) in the database rather than in JSON-encoded text. Also, an idea would be to make json_send and json_recv (binary JSON send/receive) use BSON rather than JSON-encoded text, as sending/receiving JSON-encoded text is exactly what text send/receive do. Joey Adams
On 16/08/10 20:17, Joseph Adams wrote: > Also, an idea would be to make json_send and json_recv (binary JSON > send/receive) use BSON rather than JSON-encoded text, as > sending/receiving JSON-encoded text is exactly what text send/receive > do. The usual reason to use the binary format is performance, so it doesn't make much sense to use BSON for that if the internal storage format is something else. It would most likely be slower, not faster, than sending the string as is. Of course, if you switch to using BSON as the internal storage format, then it's natural to use that for the binary I/O format too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 8/15/10 8:47 PM, Andrew Dunstan wrote: > On 08/15/2010 11:03 PM, Tom Lane wrote: >> Charles Pritchard<chuck@jumis.com> writes: >>> I'd originally sent this to Joseph Adams, as he has been working on >>> adding a JSON datatype. >>> I've suggested supporting BSON, as there are many client >>> implementations >>> available, >> I knew there would be a lot of critters crawling out as soon as we >> turned over this rock. Which other data-formats-of-the-week shall >> we immortalize as core PG types? >> >> > > If BSON is simply in effect an efficient encoding of JSON, then it's > not clear to me that we would want another type at all. Rather, we > might want to consider storing the data in this supposedly more > efficient format, and maybe also some conversion routines. > > I agree that we don't want in core a huge array of general > serialization formats. The one thing that JSON has going for it for > general use, in my view, is that, unlike hstore, the structure is not > flat. That makes it potentially useful for various purposes, > especially complex structured function arguments, in places where > using hstore can be rather limiting, and xml overly verbose. While I certainly haven't done homework on this -- I agree with Andrew. Storing internally as BSON (if it holds up to its premise) would mean more efficient traversal of internal objects in the future, if we were to have JSON-related functions/selectors. The core type would still be json, and would return as text, a json string, but internally it would be stored as BSON, and a function would be available, json_to_bson(typedjsoncol::json), returning a binary string.
Charles Pritchard <chuck@jumis.com> wrote: > Storing internally as BSON (if it holds up to its premise) would > mean more efficient traversal of internal objects in the future, > if we were to have JSON-related functions/selectors. How about the fact that not all JSON objects can be represented in BSON (if the JSON object has a very long string), and not all BSON objects can be represented in JSON (if the BSON object has an array). Or do we invent our own flavors of one or both to cover the mismatch? -Kevin
On Mon, 2010-08-16 at 11:40 -0400, Christopher Browne wrote: > On Mon, Aug 16, 2010 at 11:21 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Aug 16, 2010 at 11:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Andrew Dunstan <andrew@dunslane.net> writes: > >>> If BSON is simply in effect an efficient encoding of JSON, then it's not > >>> clear to me that we would want another type at all. Rather, we might > >>> want to consider storing the data in this supposedly more efficient > >>> format, and maybe also some conversion routines. > >> > >> Hmm, that's an interesting plan ... > > > > It is interesting, but I'm not sure that it will actually work out > > well in practice. If what we want to do is compress JSON, TOAST will > > do that for us without any additional code, and probably a lot more > > efficiently. Of course, until someone tests it, we're just > > speculating wildly. > > Yep, that was exactly what struck me. TOAST is quite likely to be a > good answer for this. Except: How much JSON data will actually be TOASTed? Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Charles Pritchard <chuck@jumis.com> wrote: >> Storing internally as BSON (if it holds up to its premise) would >> mean more efficient traversal of internal objects in the future, >> if we were to have JSON-related functions/selectors. > How about the fact that not all JSON objects can be represented in > BSON (if the JSON object has a very long string), and not all BSON > objects can be represented in JSON (if the BSON object has an > array). Well, if it's not just a binary encoding of JSON, I think we can forget about it ... certainly it won't work in the form I was visualizing. regards, tom lane
1;2403;0cOn Mon, Aug 16, 2010 at 05:02:47PM -0500, Kevin Grittner wrote: > Charles Pritchard <chuck@jumis.com> wrote: > > > Storing internally as BSON (if it holds up to its premise) would > > mean more efficient traversal of internal objects in the future, > > if we were to have JSON-related functions/selectors. > How about the fact that not all JSON objects can be represented in > BSON (if the JSON object has a very long string) Any such long string wont be representable in pg anyway. Or am I missing something here? Besides that I have to say that I find it pretty strange to design a supposedly generic file-format with a 32bit signed integer length... > , and not all BSON objects can be represented in JSON (if the BSON object has an > array). Or do we invent our own flavors of one or both to cover the > mismatch? The BSON representation could be purely internal... Andres
On Mon, Aug 16, 2010 at 7:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Well, if it's not just a binary encoding of JSON, I think we can forget > about it ... certainly it won't work in the form I was visualizing. > > regards, tom lane I just read the spec, and BSON has a lot of bells and whistles attached (such as labeling binary data with a subtype like UUID or MD5). With a couple exceptions (see below), any JSON can be converted to BSON (but the way BSON does arrays is silly: item indices are stored as strings), but not all BSONs can be converted to JSON without losing some type details. Others already mentioned that you can't convert 2 billion byte long JSON strings to BSON. Another issue is that BSON cannot encode all JSON numbers without precision loss. JSON can hold any number matching '-'? (0 | [1-9][0-9]*) ('.' [0-9]+)? ([Ee] [+-]? [0-9]+)? but BSON pidgenholes numeric values to either double, int32, int64, or a 12-byte MongoDB Object ID. Thus, for people who expect JSON to be able to hold arbitrary-precision numbers (which the JSON data type in my patch can), using BSON for transfer or storage will violate that expectation. Now that I know more about BSON, my opinion is that it shouldn't be used as the transfer or storage format of the JSON data type. Maybe if someone wants to do the work, BSON could be implemented as a contrib module, and functions could be provided in that module to convert to/from JSON with documented caveats. Joey Adams
> but BSON pidgenholes numeric values to either double, int32, int64, or > a 12-byte MongoDB Object ID. Thus, for people who expect JSON to be > able to hold arbitrary-precision numbers (which the JSON data type in > my patch can), using BSON for transfer or storage will violate that > expectation. Good lord. I'd suggest that maybe we wait for BSON v. 2.0 instead. Is BSON even any kind of a standard? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Mon, Aug 16, 2010 at 8:38 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> but BSON pidgenholes numeric values to either double, int32, int64, or >> a 12-byte MongoDB Object ID. Thus, for people who expect JSON to be >> able to hold arbitrary-precision numbers (which the JSON data type in >> my patch can), using BSON for transfer or storage will violate that >> expectation. > > Good lord. I'd suggest that maybe we wait for BSON v. 2.0 instead. > > Is BSON even any kind of a standard? You know that if it were a standard, it would be WORSE! :-) This falls into big time "no way!" If there was a standardized binary encoding for XML that was effectively a tree (e.g. - more or less like a set of Lisp objects with CAR/CDR linkages), that would be somewhat interesting, as it could be both comparatively compact, and perhaps offer rapid navigation through the tree. BSON doesn't sound like that! -- http://linuxfinances.info/info/linuxdistributions.html
Joseph Adams <joeyadams3.14159@gmail.com> writes: > Others already mentioned that you can't convert 2 billion byte long > JSON strings to BSON. Another issue is that BSON cannot encode all > JSON numbers without precision loss. As somebody already mentioned, the former isn't likely to be an issue for us anytime in the foreseeable future, because we can't push around datum values more than 1GB large anyhow. The latter seems like a pretty nasty problem though. I'm good with just dropping this idea for the moment. The Google hit statistics that were cited earlier show that there's not enough interest in BSON to justify a separate datatype, which is what it would apparently need to be. regards, tom lane