Обсуждение: JSON[B] arrays are second-class citizens

Поиск
Список
Период
Сортировка

JSON[B] arrays are second-class citizens

От
David Fetter
Дата:
Folks,

While querying some JSONB blobs at work in preparation for a massive
rework of the data infrastructure, I ran into things that really
puzzled me, to wit:

SELECT * FROM unnest('["a","b","c"]'::jsonb);
ERROR:  function unnest(jsonb) does not exist

SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);value 
───────"a""b""c"
(3 rows)

Similar things happen with the other functions matching

jsonb?_array_elements(_text)?

These functions correctly identify JSON[B] things which are not, at
their top level, arrays, and error out appropriately.

What this hints to me is that json_array_elements() and friends have
access to things that at least in theory UNNEST could have access to.

Is making those things accessible to UNNEST, etc., a reasonable
direction to go?

Another option I came up with is to make functions that match

jsonb?_array_elements(_text)?(_with_ordinality), but that seems
somewhat tedious and error-prone on the maintenance side.

What say?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: JSON[B] arrays are second-class citizens

От
"David G. Johnston"
Дата:
On Tue, May 31, 2016 at 4:34 PM, David Fetter <david@fetter.org> wrote:
Folks,

While querying some JSONB blobs at work in preparation for a massive
rework of the data infrastructure, I ran into things that really
puzzled me, to wit:

SELECT * FROM unnest('["a","b","c"]'::jsonb);
ERROR:  function unnest(jsonb) does not exist

SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
 value
───────
 "a"
 "b"
 "c"
(3 rows)


​I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that we endeavor to emulate.

Having an overloaded form: <unnest(jsonb) : setof jsonb> is unappealing.  While likely not that common the introduction of an ambiguity makes raises the bar considerably.

That said we do seem to be lacking any easy way to take a json array and attempt to convert it directly into a PostgreSQL array.  Just a conversion is not always going to succeed though the capability seems worthwhile if as yet unasked for.  The each->convert->array_agg pattern works but is likely inefficient for homogeneous json array cases.

David J.

Re: JSON[B] arrays are second-class citizens

От
Corey Huinker
Дата:
On Tue, May 31, 2016 at 5:06 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, May 31, 2016 at 4:34 PM, David Fetter <david@fetter.org> wrote:
Folks,

While querying some JSONB blobs at work in preparation for a massive
rework of the data infrastructure, I ran into things that really
puzzled me, to wit:

SELECT * FROM unnest('["a","b","c"]'::jsonb);
ERROR:  function unnest(jsonb) does not exist

SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
 value
───────
 "a"
 "b"
 "c"
(3 rows)


​I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that we endeavor to emulate.

Having an overloaded form: <unnest(jsonb) : setof jsonb> is unappealing.  While likely not that common the introduction of an ambiguity makes raises the bar considerably.

That said we do seem to be lacking any easy way to take a json array and attempt to convert it directly into a PostgreSQL array.  Just a conversion is not always going to succeed though the capability seems worthwhile if as yet unasked for.  The each->convert->array_agg pattern works but is likely inefficient for homogeneous json array cases.

David J.

If there is no list of people asking for that function, let me be the first.

In the mean time, I've resigned myself to carting this around from db to db...

create function jsonb_array_to_text_array(jsonb_arr jsonb) returns text[]
language sql as $$
select  array_agg(r) from jsonb_array_elements_text(jsonb_arr) r;
$$;

Re: JSON[B] arrays are second-class citizens

От
David Fetter
Дата:
On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
> On Tue, May 31, 2016 at 4:34 PM, David Fetter <david@fetter.org> wrote:
> 
> > Folks,
> >
> > While querying some JSONB blobs at work in preparation for a massive
> > rework of the data infrastructure, I ran into things that really
> > puzzled me, to wit:
> >
> > SELECT * FROM unnest('["a","b","c"]'::jsonb);
> > ERROR:  function unnest(jsonb) does not exist
> >
> > SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
> >  value
> > ───────
> >  "a"
> >  "b"
> >  "c"
> > (3 rows)
> ​I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that
> we endeavor to emulate.

You cut out the part where I introduced the part that's not
equivalent, so "there is more than one way to do it" isn't on point
here.  More on that specific issue below.

UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY
than the json_array_elements-like functions do.  Is it really more
efficient to build and maintain those capabilities separately in the
JSON[B] set-returning functions, or to force our end users to use
atrocious hacks like putting
generate_series(1,jsonb_array_length(foo)) in the target list than
just to make UNNEST and ROWS FROM do the right thing?

> Having an overloaded form: <unnest(jsonb) : setof jsonb> is unappealing.

On what grounds?

> While likely not that common the introduction of an ambiguity makes
> raises the bar considerably.

What ambiguity?
   SELECT jsonb_array_elements('{"a":2,"b":1}'::jsonb);   ERROR:  cannot extract elements from an object

The json_array_elements family manages to do the right thing.  Why
would it be harder to make sure UNNEST and ROWS FROM() do so?

> That said we do seem to be lacking any easy way to take a json array and
> attempt to convert it directly into a PostgreSQL array.  Just a conversion
> is not always going to succeed though the capability seems worthwhile if as
> yet unasked for.  The each->convert->array_agg pattern works but is likely
> inefficient for homogeneous json array cases.

To your earlier point about "there is more than one way to do it," we
have made no attempt to make some sort of language composed of
orthonormal vectors, and the SQL standard actually requires that we
not do so.  For a trivial example, there's

SELECT * FROM foo
and
TABLE foo

which are equivalent and both spelled out in the standard.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: JSON[B] arrays are second-class citizens

От
"David G. Johnston"
Дата:
On Tue, May 31, 2016 at 5:46 PM, David Fetter <david@fetter.org> wrote:
On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
> On Tue, May 31, 2016 at 4:34 PM, David Fetter <david@fetter.org> wrote:
>
> > Folks,
> >
> > While querying some JSONB blobs at work in preparation for a massive
> > rework of the data infrastructure, I ran into things that really
> > puzzled me, to wit:
> >
> > SELECT * FROM unnest('["a","b","c"]'::jsonb);
> > ERROR:  function unnest(jsonb) does not exist
> >
> >
​​
SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
> >  value
> > ───────
> >  "a"
> >  "b"
> >  "c"
> > (3 rows)
> ​I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that
> we endeavor to emulate.

You cut out the part where I introduced the part that's not
equivalent, so "there is more than one way to do it" isn't on point
here.  More on that specific issue below.

UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY
than the json_array_elements-like functions do.
  Is it really more
efficient to build and maintain those capabilities separately in the
JSON[B] set-returning functions, or to force our end users to use
atrocious hacks like putting
generate_series(1,jsonb_array_length(foo)) in the target list than
just to make UNNEST and ROWS FROM do the right thing?


​​Both of these work in 9.5...

SELECT * FROM ROWS FROM (jsonb_array_elements('["a","b","c"]'::jsonb)) WITH ORDINALITY
​value | ordinality
----------------------
"a" | 1
"b" | 2
"c" | 3​​​
 
> Having an overloaded form: <unnest(jsonb) : setof jsonb> is unappealing.

On what grounds?

​Aesthetic.  YMAV (your mileage apparently varies).


> While likely not that common the introduction of an ambiguity makes
> raises the bar considerably.

What ambiguity?

    SELECT jsonb_array_elements('{"a":2,"b":1}'::jsonb);
    ERROR:  cannot extract elements from an object


I stand corrected.  I was thinking you could somehow craft unnest('<literal value here>') but there is no way to auto-convert to "anyarray"...

The json_array_elements family manages to do the right thing.  Why
would it be harder to make sure UNNEST and ROWS FROM() do so?

I have apparently failed to understand your point.  All I saw was that you wanted "unnest(jsonb)" to work in an identical fashion to "​jsonb_array_elements(jsonb)".  If there is some aspect beyond this being an aliasing situation then you have failed to communicate it such that I comprehended that fact.


> That said we do seem to be lacking any easy way to take a json array and
> attempt to convert it directly into a PostgreSQL array.  Just a conversion
> is not always going to succeed though the capability seems worthwhile if as
> yet unasked for.  The each->convert->array_agg pattern works but is likely
> inefficient for homogeneous json array cases.

To your earlier point about "there is more than one way to do it," we
have made no attempt to make some sort of language composed of
orthonormal vectors, and the SQL standard actually requires that we
not do so.  For a trivial example, there's

SELECT * FROM foo
and
TABLE foo

which are equivalent and both spelled out in the standard.


​And would we have done so had we not been compelled to by the standard?

Maybe it should be enshrined somewhere more obvious but Alvaro, and indirectly Tom, recently made the same claim[1] against TIMTOWTDI so I feel justified making such a claim on behalf of the project - as well as my general personal feeling.  Adding user-friendly UI to the system, to correct deficiencies, does have merit, but that is not what I personally see here.

Re: JSON[B] arrays are second-class citizens

От
Tom Lane
Дата:
David Fetter <david@fetter.org> writes:
> On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
>> While likely not that common the introduction of an ambiguity makes
>> raises the bar considerably.

> What ambiguity?

My first thought about it was that

select unnest('{1,2,3}');

would start failing.  But it turns out it already does fail:

ERROR:  function unnest(unknown) is not unique

You get that as a result of the recent introduction of unnest(tsvector),
which we debated a few weeks ago and seem to have decided to leave as-is.
But it failed before 9.6 too, with

ERROR:  could not determine polymorphic type because input has type "unknown"

So at least in this particular case, adding unnest(jsonb) wouldn't be a
problem from the standpoint of not being able to resolve calls that we
could resolve before.

Nonetheless, there *is* an ambiguity here, which is specific to json(b):
what type of array are you expecting to get?  The reason we have both
json[b]_array_elements() and json[b]_array_elements_text() is that there
are plausible use-cases for returning either json or plain text.  It's not
hard to imagine that somebody will want json[b]_array_elements_numeric()
before long, too.  If you want to have an unnest(jsonb) then you will need
to make an arbitrary decision about which type it will return, and that
doesn't seem like an especially great idea to me.


> UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY
> than the json_array_elements-like functions do.

AFAICT, this is nonsense.  We did not tie WITH ORDINALITY to UNNEST;
it works for any set-returning function.

regression=# select * from unnest(array[1,2,3]) with ordinality;unnest | ordinality 
--------+------------     1 |          1     2 |          2     3 |          3
(3 rows)

regression=# select * from jsonb_array_elements('["a","b","c"]'::jsonb) with ordinality;value | ordinality 
-------+------------"a"   |          1"b"   |          2"c"   |          3
(3 rows)

        regards, tom lane



Re: JSON[B] arrays are second-class citizens

От
Peter van Hardenberg
Дата:
The idea of converting a JSONB array to a PG array is appealing and would potentially be more general-purpose than adding a new unnest. I'm not sure how feasible either suggestion is.

I will say that I think the current state of affairs is gratuitously verbose and expects users to memorize a substantial number of long function names to perform simple tasks.

-p


On Tue, May 31, 2016 at 2:06 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, May 31, 2016 at 4:34 PM, David Fetter <david@fetter.org> wrote:
Folks,

While querying some JSONB blobs at work in preparation for a massive
rework of the data infrastructure, I ran into things that really
puzzled me, to wit:

SELECT * FROM unnest('["a","b","c"]'::jsonb);
ERROR:  function unnest(jsonb) does not exist

SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
 value
───────
 "a"
 "b"
 "c"
(3 rows)


​I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that we endeavor to emulate.

Having an overloaded form: <unnest(jsonb) : setof jsonb> is unappealing.  While likely not that common the introduction of an ambiguity makes raises the bar considerably.

That said we do seem to be lacking any easy way to take a json array and attempt to convert it directly into a PostgreSQL array.  Just a conversion is not always going to succeed though the capability seems worthwhile if as yet unasked for.  The each->convert->array_agg pattern works but is likely inefficient for homogeneous json array cases.

David J.



--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut

Re: JSON[B] arrays are second-class citizens

От
David Fetter
Дата:
On Tue, May 31, 2016 at 06:20:26PM -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
> >> While likely not that common the introduction of an ambiguity makes
> >> raises the bar considerably.
> 
> > What ambiguity?
> 
> My first thought about it was that
> 
> select unnest('{1,2,3}');
> 
> would start failing.  But it turns out it already does fail:
> 
> ERROR:  function unnest(unknown) is not unique
> 
> You get that as a result of the recent introduction of unnest(tsvector),
> which we debated a few weeks ago and seem to have decided to leave as-is.
> But it failed before 9.6 too, with
> 
> ERROR:  could not determine polymorphic type because input has type "unknown"
> 
> So at least in this particular case, adding unnest(jsonb) wouldn't be a
> problem from the standpoint of not being able to resolve calls that we
> could resolve before.
> 
> Nonetheless, there *is* an ambiguity here, which is specific to json(b):
> what type of array are you expecting to get?  The reason we have both
> json[b]_array_elements() and json[b]_array_elements_text() is that there
> are plausible use-cases for returning either json or plain text.  It's not
> hard to imagine that somebody will want json[b]_array_elements_numeric()
> before long, too.  If you want to have an unnest(jsonb) then you will need
> to make an arbitrary decision about which type it will return, and that
> doesn't seem like an especially great idea to me.

How about making casts work?  UNNEST(jsonb)::NUMERIC or similar,
whatever won't make the parser barf.

> > UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY
> > than the json_array_elements-like functions do.
> 
> AFAICT, this is nonsense.  We did not tie WITH ORDINALITY to UNNEST;
> it works for any set-returning function.

Oops.  My mistake.  Sorry about the noise.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: JSON[B] arrays are second-class citizens

От
Jim Nasby
Дата:
On 5/31/16 7:04 PM, Peter van Hardenberg wrote:
> The idea of converting a JSONB array to a PG array is appealing and
> would potentially be more general-purpose than adding a new unnest. I'm
> not sure how feasible either suggestion is.

The one part I think is missing right now is unnest allows you to 
'stitch' or 'zip' multiple arrays together into a single recordset via 
unnest(array1, array2, ...). Presumably that could be added to the json 
equivalents.

> I will say that I think the current state of affairs is gratuitously
> verbose and expects users to memorize a substantial number of long
> function names to perform simple tasks.

+100. It's *much* easier to deal with JSON in other languages because 
they have native support for the concept of a dictionary, so changing an 
element is as simple as json['foo'][3] = 'new'. Trying to do that in 
Postgres is horrible partly because of the need to remember some odd 
operator, but moreso because it's ultimately still an operator. What we 
need is a form of *addressing*. If you could directly access items in a 
JSON doc with [] notation then a lot of the current functions could go 
away, *especially* if the [] notation allowed things like a slice and a 
list of values (ie: json['foo', 'bar', 'baz'] = '[42,{"my": "nice 
object"},"with a random string"]'. Or = row(42, ...).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: JSON[B] arrays are second-class citizens

От
"David G. Johnston"
Дата:
On Wed, Jun 1, 2016 at 11:55 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 5/31/16 7:04 PM, Peter van Hardenberg wrote:
The idea of converting a JSONB array to a PG array is appealing and
would potentially be more general-purpose than adding a new unnest. I'm
not sure how feasible either suggestion is.

The one part I think is missing right now is unnest allows you to 'stitch' or 'zip' multiple arrays together into a single recordset via unnest(array1, array2, ...). Presumably that could be added to the json equivalents.

You can just use "ROWS FROM" to get the same result.

"""
The special table function UNNEST may be called with any number of array parameters, and it returns a corresponding number of columns, as if UNNEST (Section 9.18) had been called on each parameter separately and combined using the ROWS FROM construct.
​"""

​Yes, the unnest form can be used within the target-list but that argument is not going to get you very far as that use of SRF is greatly frowned upon now that we have LATERAL.

David J.

Re: JSON[B] arrays are second-class citizens

От
"David G. Johnston"
Дата:
On Tue, May 31, 2016 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Fetter <david@fetter.org> writes:
> On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
>> While likely not that common the introduction of an ambiguity makes
>> raises the bar considerably.

> What ambiguity?

You get that as a result of the recent introduction of unnest(tsvector),
which we debated a few weeks ago and seem to have decided to leave as-is.
But it failed before 9.6 too, with

So at least in this particular case, adding unnest(jsonb) wouldn't be a
problem from the standpoint of not being able to resolve calls that we
could resolve before.

Nonetheless, there *is* an ambiguity here, which is specific to json(b):
what type of array are you expecting to get?  The reason we have both
json[b]_array_elements() and json[b]_array_elements_text() is that there
are plausible use-cases for returning either json or plain text.  It's not
hard to imagine that somebody will want json[b]_array_elements_numeric()
before long, too.  If you want to have an unnest(jsonb) then you will need
to make an arbitrary decision about which type it will return, and that
doesn't seem like an especially great idea to me.

​I'm on the fence given the presence of the tsvector overload and the lack of any syntactic concerns.​

I would either have it keep the same form as our main unnest function: <unnest(anyarray) : setof anyelement>

and/or have two functions

unnest(json, anyelement) : anyelement
unnest(jsonb, anyelement) : anyelement

The first one cannot fail at runtime (do to type conversion) while the later two can.

If you can't tell I do like our introduction of what are basically Java generics into our idiomatic json implementation.

​I'd call this implementing a better option for polymorphism than creating a new function every time someone wants typed output.​

David J.

Re: JSON[B] arrays are second-class citizens

От
David Fetter
Дата:
On Tue, May 31, 2016 at 06:15:32PM -0400, David G. Johnston wrote:
> I stand corrected.  I was thinking you could somehow craft unnest('<literal
> value here>') but there is no way to auto-convert to "anyarray"...
> 
> > The json_array_elements family manages to do the right thing.  Why
> > would it be harder to make sure UNNEST and ROWS FROM() do so?
> >
> 
> I have apparently failed to understand your point.  All I saw was that you
> wanted "unnest(jsonb)" to work in an identical fashion to
> "​jsonb_array_elements(jsonb)".  If there is some aspect beyond this being
> an aliasing situation then you have failed to communicate it such that I
> comprehended that fact.
> 

Upon further investigation, I think UNNEST should Just Work™ which is
to say that it should unnest arrays into their top-level constituent
elements if the standard doesn't specify some other behavior.

Separately, I suppose, I think there needs to be an easy way to cast
the output of UNNEST.  Lacking knowledge of the intricacies of
parsing, etc., I'd propose CAST(UNNEST(...) AS ...), or better yet,
UNNEST(...):: at least in the case without WITH ORDINALITY.

Further out in the future, at least so it seems to me, it would be
nice to have a feature where one could cast a column to an expanded
row type, e.g.:
   SELECT my_jsonb::(i INT, t TEXT, p POINT), foo, bar   FROM ...

and get a result set with 5 columns in it.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: JSON[B] arrays are second-class citizens

От
Pavel Stehule
Дата:


2016-06-01 17:55 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 5/31/16 7:04 PM, Peter van Hardenberg wrote:
The idea of converting a JSONB array to a PG array is appealing and
would potentially be more general-purpose than adding a new unnest. I'm
not sure how feasible either suggestion is.

The one part I think is missing right now is unnest allows you to 'stitch' or 'zip' multiple arrays together into a single recordset via unnest(array1, array2, ...). Presumably that could be added to the json equivalents.

I will say that I think the current state of affairs is gratuitously
verbose and expects users to memorize a substantial number of long
function names to perform simple tasks.

+100. It's *much* easier to deal with JSON in other languages because they have native support for the concept of a dictionary, so changing an element is as simple as json['foo'][3] = 'new'. Trying to do that in Postgres is horrible partly because of the need to remember some odd operator, but moreso because it's ultimately still an operator. What we need is a form of *addressing*. If you could directly access items in a JSON doc with [] notation then a lot of the current functions could go away, *especially* if the [] notation allowed things like a slice and a list of values (ie: json['foo', 'bar', 'baz'] = '[42,{"my": "nice object"},"with a random string"]'. Or = row(42, ...).

these features I would to see in Postgres too.

Regards

Pavel
 
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers