Обсуждение: row_to_json(), NULL values, and AS

Поиск
Список
Период
Сортировка

row_to_json(), NULL values, and AS

От
Neil Conway
Дата:
Hi,

The following behavior does not seem self-consistent to me:

postgres=# select json_agg(row_to_json(x)) from (select *, null from
generate_series(1, 3)) x;
                                                       json_agg

-----------------------------------------------------------------------------------------------------------------------
 [{"generate_series":1,"?column?":null},
{"generate_series":2,"?column?":null},
{"generate_series":3,"?column?":null}]
(1 row)

postgres=# select json_agg(row_to_json(x)) from (select *,
row_to_json(null) as jjj from generate_series(1, 3)) x;
                                                json_agg
--------------------------------------------------------------------------------------------------------
 [{"generate_series":1,"jjj":null}, {"generate_series":2,"jjj":null},
{"generate_series":3,"jjj":null}]
(1 row)

postgres=# select json_agg(row_to_json(x)) from (select *,
row_to_json(null) from generate_series(1, 3)) x;
      json_agg
--------------------
 [null, null, null]
(1 row)

In particular, it is unclear to me why removing the targetlist alias
in the subquery in the third example should change the result set of
the parent query.

Thanks,
Neil


Re: row_to_json(), NULL values, and AS

От
Tom Lane
Дата:
[ Hi Neil, long time no see ]

Neil Conway <neil.conway@gmail.com> writes:
> The following behavior does not seem self-consistent to me:

Likewise.

> In particular, it is unclear to me why removing the targetlist alias
> in the subquery in the third example should change the result set of
> the parent query.

Looking at "explain verbose" output, it seems like it's not row_to_json's
fault; rather, we seem to be mishandling expansion of the whole-row Var:

regression=# explain verbose select json_agg(row_to_json(x)) from (select *,
row_to_json(null) as jjj from generate_series(1, 3)) x;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Aggregate  (cost=15.00..15.01 rows=1 width=32)
   Output: json_agg(row_to_json(ROW(generate_series.generate_series, NULL::json)))
   ->  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00 rows=1000 width=4)
         Output: generate_series.generate_series
         Function Call: generate_series(1, 3)
(5 rows)

That's fine, but:

regression=# explain verbose select json_agg(row_to_json(x)) from (select *,
row_to_json(null) from generate_series(1, 3)) x;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Aggregate  (cost=12.50..12.51 rows=1 width=32)
   Output: json_agg(NULL::json)
   ->  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00 rows=1000 width=0)
         Output: generate_series.generate_series
         Function Call: generate_series(1, 3)
(5 rows)

That looks like it might be a bug in what we do with whole-row Vars
during subquery flattening.  But if you put an "offset 0" into the
subquery to prevent flattening, you get different but just as weird
misbehavior:

regression=# explain verbose select json_agg(row_to_json(x)) from (select *,
row_to_json(null) as jjj from generate_series(1, 3) offset 0) x;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Aggregate  (cost=25.00..25.02 rows=1 width=32)
   Output: json_agg(row_to_json(x.*))
   ->  Subquery Scan on x  (cost=0.00..20.00 rows=1000 width=28)
         Output: x.*
         ->  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00 rows=1000 width=36)
               Output: generate_series.generate_series, NULL::json
               Function Call: generate_series(1, 3)
(7 rows)

regression=# explain verbose select json_agg(row_to_json(x)) from (select *,
row_to_json(null) from generate_series(1, 3) offset 0) x;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Aggregate  (cost=22.50..22.52 rows=1 width=32)
   Output: json_agg((NULL::json))
   ->  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00 rows=1000 width=36)
         Output: NULL::integer, NULL::json
         Function Call: generate_series(1, 3)
(5 rows)

I'm not sure if this is just another artifact of the same problem.
The extra parens in the json_agg() argument are suspicious to put
it mildly, but I've not dug into it to see what the plan tree
really looks like.

            regards, tom lane


Re: row_to_json(), NULL values, and AS

От
Tom Lane
Дата:
I wrote:
> Neil Conway <neil.conway@gmail.com> writes:
>> The following behavior does not seem self-consistent to me:

> Likewise.

Oh!  What is actually happening is

(1) With no explicit alias, the column name of the sub-select's second
output column is chosen to be "row_to_json".

(2) That makes the outer query's notation row_to_json(x) ambiguous: it
could be a function-syntax reference to the column x.row_to_json.

(3) As it happens, the column interpretation is chosen when it's
ambiguous (cf ParseComplexProjection).

I'm a bit hesitant to muck with this behavior, given that it's stood
for ~20 years.  However, if we did want to touch it, maybe the right
thing would be to give up the absolutist position that f(x) and x.f
are exactly interchangeable.  We could say instead that we prefer the
function interpretation if function syntax is used, and the column
interpretation if column syntax is used.  I don't know how likely
that is to break existing apps ... perhaps not very, but I wouldn't
risk back-patching it in any case.

            regards, tom lane


Re: row_to_json(), NULL values, and AS

От
Neil Conway
Дата:
[ Hi Tom! Hope you're doing well. ]

On Fri, Jun 15, 2018 at 7:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Oh!  What is actually happening is
>
> (1) With no explicit alias, the column name of the sub-select's second
> output column is chosen to be "row_to_json".
>
> (2) That makes the outer query's notation row_to_json(x) ambiguous: it
> could be a function-syntax reference to the column x.row_to_json.
>
> (3) As it happens, the column interpretation is chosen when it's
> ambiguous (cf ParseComplexProjection).
>
> I'm a bit hesitant to muck with this behavior, given that it's stood
> for ~20 years.  However, if we did want to touch it, maybe the right
> thing would be to give up the absolutist position that f(x) and x.f
> are exactly interchangeable.  We could say instead that we prefer the
> function interpretation if function syntax is used, and the column
> interpretation if column syntax is used.

Interesting! Your proposed change seems quite reasonable to me.

Thanks for digging into it.

Neil


Re: row_to_json(), NULL values, and AS

От
Tom Lane
Дата:
Neil Conway <neil.conway@gmail.com> writes:
> On Fri, Jun 15, 2018 at 7:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (3) As it happens, the column interpretation is chosen when it's
>> ambiguous (cf ParseComplexProjection).
>> 
>> I'm a bit hesitant to muck with this behavior, given that it's stood
>> for ~20 years.  However, if we did want to touch it, maybe the right
>> thing would be to give up the absolutist position that f(x) and x.f
>> are exactly interchangeable.  We could say instead that we prefer the
>> function interpretation if function syntax is used, and the column
>> interpretation if column syntax is used.

> Interesting! Your proposed change seems quite reasonable to me.

Here's a proposed patch for that.  (It needs to be applied over the
fixes in <14497.1529089235@sss.pgh.pa.us>, which are unrelated but
touch some of the same code.)  I didn't add any test cases yet,
but probably it's desirable to have one.

I wrote the doc change on the assumption that we'd sneak this into
v11, but if anyone doesn't want to do that, I'll add it to the next
commitfest instead.

            regards, tom lane

diff --git a/doc/src/sgml/rowtypes.sgml b/doc/src/sgml/rowtypes.sgml
index 3f24293..2f924b1 100644
*** a/doc/src/sgml/rowtypes.sgml
--- b/doc/src/sgml/rowtypes.sgml
*************** SELECT c.somefunc FROM inventory_item c;
*** 441,449 ****
      Because of this behavior, it's unwise to give a function that takes a
      single composite-type argument the same name as any of the fields of
      that composite type.  If there is ambiguity, the field-name
!     interpretation will be preferred, so that such a function could not be
!     called without tricks.  One way to force the function interpretation is
!     to schema-qualify the function name, that is, write

<literal><replaceable>schema</replaceable>.<replaceable>func</replaceable>(<replaceable>compositevalue</replaceable>)</literal>.
     </para>
    </tip>
--- 441,452 ----
      Because of this behavior, it's unwise to give a function that takes a
      single composite-type argument the same name as any of the fields of
      that composite type.  If there is ambiguity, the field-name
!     interpretation will be chosen if field-name syntax is used, while the
!     function will be chosen if function-call syntax is used.  However,
!     <productname>PostgreSQL</productname> versions before 11 always chose the
!     field-name interpretation, unless the syntax of the call required it to
!     be a function call.  One way to force the function interpretation in
!     older versions is to schema-qualify the function name, that is, write

<literal><replaceable>schema</replaceable>.<replaceable>func</replaceable>(<replaceable>compositevalue</replaceable>)</literal>.
     </para>
    </tip>
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 21ddd5b..6891cbc 100644
*** a/src/backend/parser/parse_func.c
--- b/src/backend/parser/parse_func.c
*************** static Node *ParseComplexProjection(Pars
*** 49,63 ****
   *    For historical reasons, Postgres tries to treat the notations tab.col
   *    and col(tab) as equivalent: if a single-argument function call has an
   *    argument of complex type and the (unqualified) function name matches
!  *    any attribute of the type, we take it as a column projection.  Conversely
!  *    a function of a single complex-type argument can be written like a
!  *    column reference, allowing functions to act like computed columns.
   *
   *    Hence, both cases come through here.  If fn is null, we're dealing with
!  *    column syntax not function syntax, but in principle that should not
!  *    affect the lookup behavior, only which error messages we deliver.
!  *    The FuncCall struct is needed however to carry various decoration that
!  *    applies to aggregate and window functions.
   *
   *    Also, when fn is null, we return NULL on failure rather than
   *    reporting a no-such-function error.
--- 49,65 ----
   *    For historical reasons, Postgres tries to treat the notations tab.col
   *    and col(tab) as equivalent: if a single-argument function call has an
   *    argument of complex type and the (unqualified) function name matches
!  *    any attribute of the type, we can interpret it as a column projection.
!  *    Conversely a function of a single complex-type argument can be written
!  *    like a column reference, allowing functions to act like computed columns.
!  *
!  *    If both interpretations are possible, we prefer the one matching the
!  *    syntactic form, but otherwise the form does not matter.
   *
   *    Hence, both cases come through here.  If fn is null, we're dealing with
!  *    column syntax not function syntax.  In the function-syntax case,
!  *    the FuncCall struct is needed to carry various decoration that applies
!  *    to aggregate and window functions.
   *
   *    Also, when fn is null, we return NULL on failure rather than
   *    reporting a no-such-function error.
*************** ParseFuncOrColumn(ParseState *pstate, Li
*** 84,89 ****
--- 86,92 ----
      bool        agg_distinct = (fn ? fn->agg_distinct : false);
      bool        func_variadic = (fn ? fn->func_variadic : false);
      WindowDef  *over = (fn ? fn->over : NULL);
+     bool        could_be_projection;
      Oid            rettype;
      Oid            funcid;
      ListCell   *l;
*************** ParseFuncOrColumn(ParseState *pstate, Li
*** 202,237 ****
      }

      /*
!      * Check for column projection: if function has one argument, and that
!      * argument is of complex type, and function name is not qualified, then
!      * the "function call" could be a projection.  We also check that there
!      * wasn't any aggregate or variadic decoration, nor an argument name.
       */
!     if (nargs == 1 && !proc_call &&
!         agg_order == NIL && agg_filter == NULL && !agg_star &&
!         !agg_distinct && over == NULL && !func_variadic && argnames == NIL &&
!         list_length(funcname) == 1)
!     {
!         Oid            argtype = actual_arg_types[0];

!         if (argtype == RECORDOID || ISCOMPLEX(argtype))
!         {
!             retval = ParseComplexProjection(pstate,
!                                             strVal(linitial(funcname)),
!                                             first_arg,
!                                             location);
!             if (retval)
!                 return retval;

!             /*
!              * If ParseComplexProjection doesn't recognize it as a projection,
!              * just press on.
!              */
!         }
      }

      /*
-      * Okay, it's not a column projection, so it must really be a function.
       * func_get_detail looks up the function in the catalogs, does
       * disambiguation for polymorphic functions, handles inheritance, and
       * returns the funcid and type and set or singleton status of the
--- 205,243 ----
      }

      /*
!      * Decide whether it's legitimate to consider the construct to be a column
!      * projection.  For that, there has to be a single argument of complex
!      * type, the function name must not be qualified, and there cannot be any
!      * syntactic decoration that'd require it to be a function (such as
!      * aggregate or variadic decoration, or named arguments).
       */
!     could_be_projection = (nargs == 1 && !proc_call &&
!                            agg_order == NIL && agg_filter == NULL &&
!                            !agg_star && !agg_distinct && over == NULL &&
!                            !func_variadic && argnames == NIL &&
!                            list_length(funcname) == 1 &&
!                            (actual_arg_types[0] == RECORDOID ||
!                             ISCOMPLEX(actual_arg_types[0])));

!     /*
!      * If it's column syntax, check for column projection case first.
!      */
!     if (could_be_projection && is_column)
!     {
!         retval = ParseComplexProjection(pstate,
!                                         strVal(linitial(funcname)),
!                                         first_arg,
!                                         location);
!         if (retval)
!             return retval;

!         /*
!          * If ParseComplexProjection doesn't recognize it as a projection,
!          * just press on.
!          */
      }

      /*
       * func_get_detail looks up the function in the catalogs, does
       * disambiguation for polymorphic functions, handles inheritance, and
       * returns the funcid and type and set or singleton status of the
*************** ParseFuncOrColumn(ParseState *pstate, Li
*** 527,532 ****
--- 533,552 ----
      else
      {
          /*
+          * Not found as a function.  Check for column projection case if
+          * allowed and we didn't do so already.
+          */
+         if (could_be_projection && !is_column)
+         {
+             retval = ParseComplexProjection(pstate,
+                                             strVal(linitial(funcname)),
+                                             first_arg,
+                                             location);
+             if (retval)
+                 return retval;
+         }
+
+         /*
           * Oops.  Time to die.
           *
           * If we are dealing with the attribute notation rel.function, let the

Re: row_to_json(), NULL values, and AS

От
Robert Haas
Дата:
On Fri, Jun 15, 2018 at 10:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm a bit hesitant to muck with this behavior, given that it's stood
> for ~20 years.  However, if we did want to touch it, maybe the right
> thing would be to give up the absolutist position that f(x) and x.f
> are exactly interchangeable.  We could say instead that we prefer the
> function interpretation if function syntax is used, and the column
> interpretation if column syntax is used.  I don't know how likely
> that is to break existing apps ... perhaps not very, but I wouldn't
> risk back-patching it in any case.

For the record, I fear that this change (committed as
b97a3465d73bfc2a9f5bcf5def1983dbaa0a26f8) is going to break more
things than have been foreseen in this thread.  I do not have a
specific theory about how that's going to happen, just vague unease.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: row_to_json(), NULL values, and AS

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jun 15, 2018 at 10:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm a bit hesitant to muck with this behavior, given that it's stood
>> for ~20 years.  However, if we did want to touch it, maybe the right
>> thing would be to give up the absolutist position that f(x) and x.f
>> are exactly interchangeable.  We could say instead that we prefer the
>> function interpretation if function syntax is used, and the column
>> interpretation if column syntax is used.  I don't know how likely
>> that is to break existing apps ... perhaps not very, but I wouldn't
>> risk back-patching it in any case.

> For the record, I fear that this change (committed as
> b97a3465d73bfc2a9f5bcf5def1983dbaa0a26f8) is going to break more
> things than have been foreseen in this thread.  I do not have a
> specific theory about how that's going to happen, just vague unease.

Yeah, that's why I wouldn't back-patch it.  But certainly the behavior
Neil complained of is pretty bad as well, and it's only going to get
worse as we invent more functions taking record.  I think the new
behavior is a clear improvement.  If it breaks any apps that were
relying on interpreting f(x) as a column, they can fix it by writing
x.f instead.  We inflict worse upgrade pain than that on a regular
basis.

            regards, tom lane