Обсуждение: Basic JSON support
Here is a patch for basic JSON support. It adds only those features: * Add "json" data type, that is binary-compatible with text. * Syntax checking on text to JSON conversion. * json_pretty() -- print JSON tree with indentation. We have "JSON datatype (WIP) 01" item: https://commitfest.postgresql.org/action/patch_view?id=351 but it seems too complex for me to apply all of the feature at once, especially JSON-Path support. So, I'd like to submit only basic and minimal JSON datatype support at first. The most different point of my patch is that JSON parser is reimplemented with flex and bison to make maintenance easier. Note that the JSON parser accept top-level scalar values intensionally, because of requirement when we add functions to extract values from JSON array/object. For example, CREATE FUNCTION json_to_array(json) RETURNS json[] might return naked scalar values in the result array. -- Itagaki Takahiro
Вложения
On Sep 14, 2010, at 7:32 PM, Itagaki Takahiro wrote: > Here is a patch for basic JSON support. It adds only those features: > * Add "json" data type, that is binary-compatible with text. > * Syntax checking on text to JSON conversion. > * json_pretty() -- print JSON tree with indentation. > > We have "JSON datatype (WIP) 01" item: > https://commitfest.postgresql.org/action/patch_view?id=351 > but it seems too complex for me to apply all of the feature > at once, especially JSON-Path support. So, I'd like to submit > only basic and minimal JSON datatype support at first. So should this be added to the commitfest, or replace this one? Best, David
On Thu, Sep 16, 2010 at 1:45 AM, David E. Wheeler <david@kineticode.com> wrote: >> We have "JSON datatype (WIP) 01" item: >> https://commitfest.postgresql.org/action/patch_view?id=351 >> but it seems too complex for me to apply all of the feature >> at once, especially JSON-Path support. So, I'd like to submit >> only basic and minimal JSON datatype support at first. > > So should this be added to the commitfest, or replace this one? I added my patch, but the previous one will be returned with feedback if the author doesn't have a new version adjusted to existing comments. -- Itagaki Takahiro
On Wed, Sep 15, 2010 at 7:23 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Thu, Sep 16, 2010 at 1:45 AM, David E. Wheeler <david@kineticode.com> wrote: >>> We have "JSON datatype (WIP) 01" item: >>> https://commitfest.postgresql.org/action/patch_view?id=351 >>> but it seems too complex for me to apply all of the feature >>> at once, especially JSON-Path support. So, I'd like to submit >>> only basic and minimal JSON datatype support at first. >> >> So should this be added to the commitfest, or replace this one? > > I added my patch, but the previous one will be returned with feedback > if the author doesn't have a new version adjusted to existing comments. Did you extract this from his work, or is this completely independent?I'm a bit disinclined to say we should just toss overboardall the work that's already been done. Why did you write a new patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Thu, Sep 16, 2010 at 9:44 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Did you extract this from his work, or is this completely independent? > I'm a bit disinclined to say we should just toss overboard all the > work that's already been done. Why did you write a new patch? I read his patch and am inspired by the work, but the code is almost rewritten. As my previous comment, http://archives.postgresql.org/pgsql-hackers/2010-08/msg01773.php I think it's not a good idea to develop JSON datatype based on the complex node links. The point of my patch is to simplify it. I'd like to encourage use of the simplified structure to implement his other works, including JSONPath. -- Itagaki Takahiro
On Wed, Sep 15, 2010 at 9:53 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Thu, Sep 16, 2010 at 9:44 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Did you extract this from his work, or is this completely independent? >> I'm a bit disinclined to say we should just toss overboard all the >> work that's already been done. Why did you write a new patch? > > I read his patch and am inspired by the work, but the code is almost > rewritten. As my previous comment, > http://archives.postgresql.org/pgsql-hackers/2010-08/msg01773.php > I think it's not a good idea to develop JSON datatype based on the > complex node links. The point of my patch is to simplify it. > > I'd like to encourage use of the simplified structure to implement > his other works, including JSONPath. I think that if we make a habit of rewriting the contributions of first-time contributors in toto, we will have fewer second-time contributors. I think it would have been a good idea to discuss this on the list before you went and did it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
> I think that if we make a habit of rewriting the contributions of > first-time contributors in toto, we will have fewer second-time > contributors. I think it would have been a good idea to discuss this > on the list before you went and did it. To be fair to Itagaki-san, he DID ask about the status of the SOC project, and didn't get much of an answer. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Tue, Sep 14, 2010 at 10:32 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > Here is a patch for basic JSON support. It adds only those features: > * Add "json" data type, that is binary-compatible with text. > * Syntax checking on text to JSON conversion. > * json_pretty() -- print JSON tree with indentation. > > We have "JSON datatype (WIP) 01" item: > https://commitfest.postgresql.org/action/patch_view?id=351 > but it seems too complex for me to apply all of the feature > at once, especially JSON-Path support. So, I'd like to submit > only basic and minimal JSON datatype support at first. > > The most different point of my patch is that JSON parser is > reimplemented with flex and bison to make maintenance easier. > > Note that the JSON parser accept top-level scalar values > intensionally, because of requirement when we add functions > to extract values from JSON array/object. For example, > CREATE FUNCTION json_to_array(json) RETURNS json[] > might return naked scalar values in the result array. > > -- > Itagaki Takahiro > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > I have written a patch that amends the basic_json-20100915.patch . It does the following: * Fixes bugs in the lexer (namely, ' -0 '::JSON was not accepted, and ' """ '::JSON was accepted). * Adds a json_validate() function. * Adds test strings from original JSON patch along with some new ones. * Tweaks the error message for invalid JSON input. * Fixes the Makefile so json.c won't fail to build because json_parser.h and json_scanner.h haven't been built yet. In the lexer, the following regular expression for parsing a number was incorrect: 0|-?([1-9][0-9]*(\.[0-9]+)?|0\.[0-9]+)([Ee][+-]?[0-9]+)? This regex doesn't accept '-0', but that's a valid JSON number according to the standard. Also, the exclusive state <str> didn't have an <<EOF>> rule, causing the lexer to treat all characters from an unclosed quote to EOF as nothing at all. A less important issue with the lexer is that it doesn't allow the control character \x7F to appear in strings unescaped. The JSON RFC doesn't mention \x7F as a control character, and many implementations of JSON and JavaScript don't treat it as one either. I went ahead and added json_validate() now because it's useful for testing (my test strings use it). I made the error message say "ERROR: invalid input syntax for JSON" rather than the redundant "ERROR: syntax error in json: syntax error". However, the specific parsing error (we haven't defined any yet) is simply ignored by json_validate and company. As for the Makefile, I think the reason you weren't getting build problems but I was is because I normally build with make -j2 (run 2 processes at once), which tends to test makefile rule correctness. Here's one thing I'm worried about: the bison/flex code in your patch looks rather similar to the code in http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the GPL. In particular, the incorrect number regex I discussed above can also be found in jsonval verbatim. However, because there are a lot of differences in both the bison and flex code now, I'm not sure they're close enough to be "copied", but I am not a lawyer. It might be a good idea to contact Ben Spencer and ask him for permission to license our modified version of the code under PostgreSQL's more relaxed license, just to be on the safe side. Finally, could you post a patch with your latest work (with or without my contribution) so we can tinker with it more? Thanks! Joey Adams
Вложения
On Mon, Sep 20, 2010 at 1:38 PM, Joseph Adams <joeyadams3.14159@gmail.com> wrote: > I have written a patch that amends the basic_json-20100915.patch . Thanks. I merged your patch and added json_to_array(), as a demonstration of json_stringify(). As the current code, json_stringify(json) just returns the input text as-is, but json_stringify(json, NULL) trims all of unnecessary whitespaces. We could do it in json_in() and json_parse() and always store values in compressed representation instead. We leave room for discussion. I also merge json_test_strings.sql into the main test file. I slimed some tests -- many test cases seem to be duplicated for me. > I went ahead and added json_validate() now because it's useful for > testing (my test strings use it). Good idea, but how about calling it json_is_well_formed()? We have similar name of functions for xml type. I renamed it in the patch. > Here's one thing I'm worried about: the bison/flex code in your patch > looks rather similar to the code in > http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the > GPL. In particular, the incorrect number regex I discussed above can > also be found in jsonval verbatim. However, because there are a lot > of differences in both the bison and flex code now, I'm not sure > they're close enough to be "copied", but I am not a lawyer. It might > be a good idea to contact Ben Spencer and ask him for permission to > license our modified version of the code under PostgreSQL's more > relaxed license, just to be on the safe side. Sorry for my insincere manner. Surely I read his code. Do you know his contact address? I cannot find it... -- Itagaki Takahiro
Вложения
On Tue, Sep 21, 2010 at 8:38 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > Sorry for my insincere manner. Surely I read his code. > Do you know his contact address? I cannot find it... It alarms me quite a bit that someone who is a committer on this project would accidentally copy code from another project with a different license into PostgreSQL. How does that happen? And how much got copied, besides the regular expression? I would be inclined to flush this patch altogether rather than take ANY risk of GPL contamination. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Tue, Sep 21, 2010 at 9:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: > It alarms me quite a bit that someone who is a committer on this > project would accidentally copy code from another project with a > different license into PostgreSQL. How does that happen? And how > much got copied, besides the regular expression? I would be inclined > to flush this patch altogether rather than take ANY risk of GPL > contamination. Only regular expressions in the scanner. So I've thought it's OK, but I should have been more careful. Sorry. -- Itagaki Takahiro
On Mon, Sep 20, 2010 at 12:38 AM, Joseph Adams <joeyadams3.14159@gmail.com> wrote: > Here's one thing I'm worried about: the bison/flex code in your patch > looks rather similar to the code in > http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the > GPL. In particular, the incorrect number regex I discussed above can > also be found in jsonval verbatim. However, because there are a lot > of differences in both the bison and flex code now, I'm not sure > they're close enough to be "copied", but I am not a lawyer. It might > be a good idea to contact Ben Spencer and ask him for permission to > license our modified version of the code under PostgreSQL's more > relaxed license, just to be on the safe side. With the help and motivation of David Fetter, I sent an e-mail to Ben Spencer (google "jsonval", check out the git repo, and look in the git log to find his e-mail address) a few minutes ago, asking if he would license jsonval under a license compatible with PostgreSQL, and am currently awaiting a response. If he doesn't respond, or outright refuses (which I, for one, doubt will happen), my fallback plan is to rewrite the JSON validation code by drawing from my original code (meaning it won't be in bison/flex) and post a patch for it. Unfortunately, it seems to me that there aren't very many ways of expressing a JSON parser in bison/flex, and thus the idea of JSON parsing with bison/flex is effectively locked down by the GPL unless we can get a more permissive license for jsonval. But, I am not a lawyer.
On Mon, Oct 4, 2010 at 2:50 PM, Joseph Adams <joeyadams3.14159@gmail.com> wrote: > On Mon, Sep 20, 2010 at 12:38 AM, Joseph Adams > <joeyadams3.14159@gmail.com> wrote: >> Here's one thing I'm worried about: the bison/flex code in your patch >> looks rather similar to the code in >> http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the >> GPL. In particular, the incorrect number regex I discussed above can >> also be found in jsonval verbatim. However, because there are a lot >> of differences in both the bison and flex code now, I'm not sure >> they're close enough to be "copied", but I am not a lawyer. It might >> be a good idea to contact Ben Spencer and ask him for permission to >> license our modified version of the code under PostgreSQL's more >> relaxed license, just to be on the safe side. > > With the help and motivation of David Fetter, I sent an e-mail to Ben > Spencer (google "jsonval", check out the git repo, and look in the git > log to find his e-mail address) a few minutes ago, asking if he would > license jsonval under a license compatible with PostgreSQL, and am > currently awaiting a response. > > If he doesn't respond, or outright refuses (which I, for one, doubt > will happen), my fallback plan is to rewrite the JSON validation code > by drawing from my original code (meaning it won't be in bison/flex) > and post a patch for it. Unfortunately, it seems to me that there > aren't very many ways of expressing a JSON parser in bison/flex, and > thus the idea of JSON parsing with bison/flex is effectively locked > down by the GPL unless we can get a more permissive license for > jsonval. But, I am not a lawyer. If someone who hasn't looked at the GPL code sits down and codes something up based on the json.org home page, it's hard to imagine how anyone could be grumpy about that. But we do need to be at some points to make sure that anything we do is truly clean-room, because we simply can't have GPL code creeping into our code base. Thanks for continuing to pursue this! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Oct 4, 2010 at 2:50 PM, Joseph Adams <joeyadams3.14159@gmail.com> wrote: >> If he doesn't respond, or outright refuses (which I, for one, doubt >> will happen), my fallback plan is to rewrite the JSON validation code >> by drawing from my original code (meaning it won't be in bison/flex) >> and post a patch for it. �Unfortunately, it seems to me that there >> aren't very many ways of expressing a JSON parser in bison/flex, and >> thus the idea of JSON parsing with bison/flex is effectively locked >> down by the GPL unless we can get a more permissive license for >> jsonval. �But, I am not a lawyer. > If someone who hasn't looked at the GPL code sits down and codes > something up based on the json.org home page, it's hard to imagine how > anyone could be grumpy about that. Yeah. Joseph seems to be confusing copyrights with patents. The idea of "parse JSON with bison/flex" is not patentable by any stretch of the imagination. But having said that, I wonder whether bison/flex are really the best tool for the job in the first place. From what I understand of JSON (which admittedly ain't much) a bison parser seems like overkill: it'd probably be both bloated and slow compared to a simple handwritten recursive-descent parser. regards, tom lane
All, > But having said that, I wonder whether bison/flex are really the best > tool for the job in the first place. From what I understand of JSON > (which admittedly ain't much) a bison parser seems like overkill: > it'd probably be both bloated and slow compared to a simple handwritten > recursive-descent parser. This appears not to be necessary. The author of JSONval has indicated that, should we choose to include it in PostgreSQL 9.1, he is open to re-licensing. So on a completely *technical* basis, do we want to use JSONval? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 10/04/2010 08:00 PM, Josh Berkus wrote: > All, > >> But having said that, I wonder whether bison/flex are really the best >> tool for the job in the first place. From what I understand of JSON >> (which admittedly ain't much) a bison parser seems like overkill: >> it'd probably be both bloated and slow compared to a simple handwritten >> recursive-descent parser. > This appears not to be necessary. The author of JSONval has indicated > that, should we choose to include it in PostgreSQL 9.1, he is open to > re-licensing. > > So on a completely *technical* basis, do we want to use JSONval? I agree with Tom that a hand-cut RD parser is much more likely to be the way to go. We should not use bison/flex for parsing data values. cheers andrew
On Mon, Oct 4, 2010 at 7:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah. Joseph seems to be confusing copyrights with patents. The idea > of "parse JSON with bison/flex" is not patentable by any stretch of the > imagination. What I meant is, anyone who sets out to write a JSON parser with bison/flex is probably going to end up with something very similar to jsonval even without looking at it. > But having said that, I wonder whether bison/flex are really the best > tool for the job in the first place. From what I understand of JSON > (which admittedly ain't much) a bison parser seems like overkill: > it'd probably be both bloated and slow compared to a simple handwritten > recursive-descent parser. I agree, and also because JSON isn't a moving target. However, in my opinion, the differences in quality between a bison parser and a handwritten parser are to small to be a big deal right now. A bison parser is probably slower and bigger than a simple hand-written one, but I haven't benchmarked it, and it's probably not worth benchmarking at this point. In correctness, I suspect a bison parser and a hand-written one to be about the same. Both pass the same army of testcases I built up while writing my original JSON patch. Granted, there could be weird semantics of bison/flex that make the parser wrong in subtle ways (for instance, the . regex char doesn't match \n and EOF), but there can also be weird errors in hand-written code. Pick your poison. In safety, a handwritten recursive descent parser could stack overflow and crash the server unless the proper guards are in place (e.g if someone tries to encode '[[[[[[...' as JSON). bison guards against this (chopping the maximum depth of JSON trees to 9997 levels or so), but I don't know if the real stack is smaller than bison thinks it is, or if bison uses its own buffer. A handwritten parser could also accept trees of any depth using a manually managed stack, but it wouldn't do much good because code that works with JSON trees (e.g. json_stringify) is currently recursive and has the same problem. In any case, I think limiting the depth of JSON, to something generous like 1000, is the right way to go. The RFC allows us to do that. In my opinion, the path of least resistance is the best path for now. If we use the bison parser now, then we can replace it with a hand-written one that's functionally equivalent later. Joey Adams
2010/10/5 Tom Lane <tgl@sss.pgh.pa.us>: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Oct 4, 2010 at 2:50 PM, Joseph Adams <joeyadams3.14159@gmail.com> wrote: >>> If he doesn't respond, or outright refuses (which I, for one, doubt >>> will happen), my fallback plan is to rewrite the JSON validation code >>> by drawing from my original code (meaning it won't be in bison/flex) >>> and post a patch for it. Unfortunately, it seems to me that there >>> aren't very many ways of expressing a JSON parser in bison/flex, and >>> thus the idea of JSON parsing with bison/flex is effectively locked >>> down by the GPL unless we can get a more permissive license for >>> jsonval. But, I am not a lawyer. > >> If someone who hasn't looked at the GPL code sits down and codes >> something up based on the json.org home page, it's hard to imagine how >> anyone could be grumpy about that. > > Yeah. Joseph seems to be confusing copyrights with patents. The idea > of "parse JSON with bison/flex" is not patentable by any stretch of the > imagination. > > But having said that, I wonder whether bison/flex are really the best > tool for the job in the first place. From what I understand of JSON > (which admittedly ain't much) a bison parser seems like overkill: > it'd probably be both bloated and slow compared to a simple handwritten > recursive-descent parser. PostgreSQL use a bison for same simple things - cube, PostGis, bootstrap .. so I don't see some special overhead. I am thinking so lex can be shared from core parser. bison parser for JSON means a less rows for maintaining a repeating some a basic pattern in pg source code. Regards Pavel Stehule ./contrib/seg/segparse.y ./contrib/cube/cubeparse.y ./src/backend/bootstrap/bootparse.y ./src/backend/parser/gram.y ./src/pl/plpgsql/src/gram.y ./src/interfaces/ecpg/preproc/preproc.y > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >