Обсуждение: Calling json_* functions with JSONB data
Hi there,
I noticed it was very easy to accidentally call the json_* form of JSON manipulation functions with jsonb data as input. This is pretty sub-optimal, since it involves rendering the jsonb then reparsing it and calling the json_* form of the function.
Fortunately, this seems quite easy to resolve by taking advantage of our ability to add json_*(jsonb) form of the functions.
--
I talked this over with Andrew who had no objections and suggested I float it on the list before writing a patch. Looks pretty straightforward, just a few new data rows in pg_proc.h.
Anyone have any concerns or suggestions?
-p
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut
Hi there,I noticed it was very easy to accidentally call the json_* form of JSON manipulation functions with jsonb data as input. This is pretty sub-optimal, since it involves rendering the jsonb then reparsing it and calling the json_* form of the function.Fortunately, this seems quite easy to resolve by taking advantage of our ability to add json_*(jsonb) form of the functions.I talked this over with Andrew who had no objections and suggested I float it on the list before writing a patch. Looks pretty straightforward, just a few new data rows in pg_proc.h.Anyone have any concerns or suggestions?
Please provide an example of what you are talking about.
SELECT json_array_length('[1,2]'::jsonb)
ERROR: function json_array_length(jsonb) does not exist
-- The function name is "jsonb_array_length"; and there is no implicit cast between the two.
David J.
On Mon, May 23, 2016 at 11:14 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
Hi there,I noticed it was very easy to accidentally call the json_* form of JSON manipulation functions with jsonb data as input. This is pretty sub-optimal, since it involves rendering the jsonb then reparsing it and calling the json_* form of the function.Fortunately, this seems quite easy to resolve by taking advantage of our ability to add json_*(jsonb) form of the functions.I talked this over with Andrew who had no objections and suggested I float it on the list before writing a patch. Looks pretty straightforward, just a few new data rows in pg_proc.h.Anyone have any concerns or suggestions?Please provide an example of what you are talking about.SELECT json_array_length('[1,2]'::jsonb)ERROR: function json_array_length(jsonb) does not exist-- The function name is "jsonb_array_length"; and there is no implicit cast between the two.
He is saying that he accidentally calls json_array_length() instead of jsonb_array_length() and that it is an annoying usability problem. It happens to me too and I agree it would be better if you could just call json_array_length() regardless if the type is JSON or JSONB. If there is some significant functionality difference from the user's perspective then having separate "json_" and "jsonb_" functions makes sense, but in your example there is not.
Peter van Hardenberg <pvh@pvh.ca> writes: > I talked this over with Andrew who had no objections and suggested I float > it on the list before writing a patch. Looks pretty straightforward, just a > few new data rows in pg_proc.h. I think you might find that you need to add new C function entry points to keep opr_sanity.sql from complaining. That's still pretty easy though. regards, tom lane
On 2016-05-23 18:55, Peter van Hardenberg wrote: > I talked this over with Andrew who had no objections and suggested I float > it on the list before writing a patch. Looks pretty straightforward, just a > few new data rows in pg_proc.h. > > Anyone have any concerns or suggestions? What about cases like json_whatever($1) which previously worked but will now be ambiguous? (Or will they somehow not be ambiguous?) .m
Marko Tiikkaja <marko@joh.to> writes: > On 2016-05-23 18:55, Peter van Hardenberg wrote: >> Anyone have any concerns or suggestions? > What about cases like json_whatever($1) which previously worked but > will now be ambiguous? (Or will they somehow not be ambiguous?) Good point, that would have to be looked into. regards, tom lane
Great question, Marko. If you can point me towards an example I'll take a look, but I'll proceed with the current understanding and suggestions and see what people have to say.
On Mon, May 23, 2016 at 10:47 AM, Marko Tiikkaja <marko@joh.to> wrote:
On 2016-05-23 18:55, Peter van Hardenberg wrote:I talked this over with Andrew who had no objections and suggested I float
it on the list before writing a patch. Looks pretty straightforward, just a
few new data rows in pg_proc.h.
Anyone have any concerns or suggestions?
What about cases like json_whatever($1) which previously worked but will now be ambiguous? (Or will they somehow not be ambiguous?)
.m
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut
Peter van Hardenberg <pvh@pvh.ca> writes: > Great question, Marko. If you can point me towards an example I'll take a > look, but I'll proceed with the current understanding and suggestions and > see what people have to say. I believe Marko's just complaining about the case for unknown-type arguments, for example: regression=# select json_array_length('[1,2,3]');json_array_length ------------------- 3 (1 row) The parser has no trouble resolving this because there is only one json_array_length(); but if there were two, it would fail to make a determination of which one you meant. AFAICS the only way to fix that would be to introduce some preference between the two types. For example, we could move both 'json' and 'jsonb' into their own typcategory ('J' is unused...) and then mark 'jsonb' as the preferred type in that category. This would require a fair amount of experimentation to determine if it upsets any cases that work conveniently today; but right offhand I don't see any fatal problems with such an idea. regards, tom lane
I'll look into it, thanks for the explanation.
On Mon, May 23, 2016 at 1:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter van Hardenberg <pvh@pvh.ca> writes:
> Great question, Marko. If you can point me towards an example I'll take a
> look, but I'll proceed with the current understanding and suggestions and
> see what people have to say.
I believe Marko's just complaining about the case for unknown-type
arguments, for example:
regression=# select json_array_length('[1,2,3]');
json_array_length
-------------------
3
(1 row)
The parser has no trouble resolving this because there is only one
json_array_length(); but if there were two, it would fail to make a
determination of which one you meant.
AFAICS the only way to fix that would be to introduce some preference
between the two types. For example, we could move both 'json' and 'jsonb'
into their own typcategory ('J' is unused...) and then mark 'jsonb' as
the preferred type in that category. This would require a fair amount of
experimentation to determine if it upsets any cases that work conveniently
today; but right offhand I don't see any fatal problems with such an idea.
regards, tom lane
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut
Peter van Hardenberg <pvh@pvh.ca> writes:
> Great question, Marko. If you can point me towards an example I'll take a
> look, but I'll proceed with the current understanding and suggestions and
> see what people have to say.
I believe Marko's just complaining about the case for unknown-type
arguments, for example:
regression=# select json_array_length('[1,2,3]');
json_array_length
-------------------
3
(1 row)
The parser has no trouble resolving this because there is only one
json_array_length(); but if there were two, it would fail to make a
determination of which one you meant.
AFAICS the only way to fix that would be to introduce some preference
between the two types. For example, we could move both 'json' and 'jsonb'
into their own typcategory ('J' is unused...) and then mark 'jsonb' as
the preferred type in that category. This would require a fair amount of
experimentation to determine if it upsets any cases that work conveniently
today; but right offhand I don't see any fatal problems with such an idea.
regards, tom lane
I guess the relevant point in the documentation is the parenthetical sentence:
(The processing functions consider the last value as the operative one.)
Which normalizes the behaviors of jsonb and json as they pass through one of these functions. Though only the multi-key is noted which means white-space (immaterial) and key-order (potentially material) behaviors differ; though the later coming through the function unscathed is not something that user's should be relying upon. Specifically I'm thinking of the behavior that "json_each(...)" would exhibit.
David J.
On 5/23/16 11:55 AM, Peter van Hardenberg wrote: > Fortunately, this seems quite easy to resolve by taking advantage of our > ability to add json_*(jsonb) form of the functions. Another issue no one has mentioned is functions that return JSON/JSONB. IMO those should NOT be overloaded, because that will make it very easy to accidentally change from one type to the other without meaning to. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On 5/23/16 11:55 AM, Peter van Hardenberg wrote:Fortunately, this seems quite easy to resolve by taking advantage of our
ability to add json_*(jsonb) form of the functions.
Another issue no one has mentioned is functions that return JSON/JSONB. IMO those should NOT be overloaded, because that will make it very easy to accidentally change from one type to the other without meaning to.
Actually, by definition they cannot be overloaded. A function's signature is derived from its input types only.
"""
The name of the new function must not match any existing function with the same input argument types in the same schema. However, functions of different argument types can share a name (this is called overloading).
"""
Admittedly the absence of "output" is not emphasized but overloading in (most?) languages (small sample size for personal knowledge) is subject to the same restriction.
David J.
On 5/23/16 4:45 PM, David G. Johnston wrote: > On Mon, May 23, 2016 at 5:38 PM, Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com>>wrote: > > On 5/23/16 11:55 AM, Peter van Hardenberg wrote: > > Fortunately, this seems quite easy to resolve by taking > advantage of our > ability to add json_*(jsonb) form of the functions. > > > Another issue no one has mentioned is functions that return > JSON/JSONB. IMO those should NOT be overloaded, because that will > make it very easy to accidentally change from one type to the other > without meaning to. > > > Actually, by definition they cannot be overloaded. A function's > signature is derived from its input types only. My point is that while it would be possible to create something like json_strip_nulls(jsonb) to go with json_strip_nulls(json), that would be a bad idea, because it makes it easily to accidentally turn your jsonb into plain json. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461