Обсуждение: ruleutils vs. empty targetlists

Поиск
Список
Период
Сортировка

ruleutils vs. empty targetlists

От
Tom Lane
Дата:
Thinking some more about bug #8648, it occurred to me that ruleutils.c
isn't exactly prepared for the case either:

regression=# create table nocols();
CREATE TABLE
regression=# create view vv1 as select exists (select * from nocols);
CREATE VIEW
regression=# \d+ vv1
                  View "public.vv1"
 Column |  Type   | Modifiers | Storage | Description
--------+---------+-----------+---------+-------------
 exists | boolean |           | plain   |
View definition:
 SELECT (EXISTS ( SELECT
           FROM nocols)) AS "exists";

which of course is illegal syntax.  I thought at first that this could be
fixed trivially by emitting "*" if get_target_list found no columns.
However, that doesn't quite work; consider

create view vv2 as select exists (select nocols.* from nocols, somecols);

If we regurgitate this as "exists (select * from nocols, somecols)", it
wouldn't mean the same thing.

But on the third hand, at least in the context of an EXISTS() subquery,
it doesn't really matter because the tlist is irrelevant, at least as long
as it only contains Vars.  So you could argue that emitting "*" is plenty
good enough even in the above example.  I'm not certain that that applies
everywhere that a tlist could appear, though.

I experimented with code that would attempt to regurgitate "nocols.*"
in the above example; see attached draft patch.  I don't like this patch
much: it's ugly, it'd be a pain to backport (because it's digging into
data structures that have changed repeatedly), and I'm not sure how much
I trust it anyway.  So I'm leaning towards just doing

+         if (colno == 0)
+             appendStringInfoString(buf, " *");

at least till such time as somebody shows a case where this isn't good
enough.

Note that I wouldn't be thinking of this as something to back-patch
at all, except that if someone did have a view that looked like this,
they'd find that pg_dump output wouldn't restore.  You could construct
scenarios where that could seem like a denial of service, particularly
if the DBA wasn't smart enough to figure out what was wrong with his
dump.  (And who wants to deal with such a thing at 4AM anyway ...)

Comments?

            regards, tom lane

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 348f620..d479276 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
*************** get_target_list(List *targetList, depars
*** 4610,4615 ****
--- 4610,4652 ----
          appendStringInfoString(buf, targetbuf.data);
      }

+     /*
+      * An empty targetlist (or RETURNING list) is syntactically impossible,
+      * but we might nonetheless find no elements in the list, as a result of
+      * constructs such as "SELECT * FROM zero_column_table".  You might think
+      * we could just print a star and be done, but that fails in cases like
+      * "SELECT z.* FROM zero_column_table z, normal_table".  So, dig through
+      * the namespace lists looking for a zero-column RTE and use its alias;
+      * this is valid even if the original syntax was just "*".
+      */
+     if (colno == 0)
+     {
+         foreach(l, context->namespaces)
+         {
+             deparse_namespace *dpns = (deparse_namespace *) lfirst(l);
+             ListCell   *ln,
+                        *lc;
+
+             forboth(ln, dpns->rtable_names, lc, dpns->rtable_columns)
+             {
+                 char       *refname = (char *) lfirst(ln);
+                 deparse_columns *colinfo = (deparse_columns *) lfirst(lc);
+
+                 if (refname && colinfo->num_cols == 0)
+                 {
+                     appendStringInfo(buf, " %s.*", quote_identifier(refname));
+                     colno++;
+                     break;
+                 }
+             }
+             if (colno)
+                 break;
+         }
+         /* If we didn't find a suitable RTE, just print a star. */
+         if (colno == 0)
+             appendStringInfoString(buf, " *");
+     }
+
      /* clean up */
      pfree(targetbuf.data);
  }

Re: ruleutils vs. empty targetlists

От
Dean Rasheed
Дата:
On 3 December 2013 23:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thinking some more about bug #8648, it occurred to me that ruleutils.c
> isn't exactly prepared for the case either:
>
> regression=# create table nocols();
> CREATE TABLE
> regression=# create view vv1 as select exists (select * from nocols);
> CREATE VIEW
> regression=# \d+ vv1
>                   View "public.vv1"
>  Column |  Type   | Modifiers | Storage | Description
> --------+---------+-----------+---------+-------------
>  exists | boolean |           | plain   |
> View definition:
>  SELECT (EXISTS ( SELECT
>            FROM nocols)) AS "exists";
>
> which of course is illegal syntax.  I thought at first that this could be
> fixed trivially by emitting "*" if get_target_list found no columns.
> However, that doesn't quite work; consider
>
> create view vv2 as select exists (select nocols.* from nocols, somecols);
>
> If we regurgitate this as "exists (select * from nocols, somecols)", it
> wouldn't mean the same thing.
>
> But on the third hand, at least in the context of an EXISTS() subquery,
> it doesn't really matter because the tlist is irrelevant, at least as long
> as it only contains Vars.  So you could argue that emitting "*" is plenty
> good enough even in the above example.  I'm not certain that that applies
> everywhere that a tlist could appear, though.
>
> I experimented with code that would attempt to regurgitate "nocols.*"
> in the above example; see attached draft patch.  I don't like this patch
> much: it's ugly, it'd be a pain to backport (because it's digging into
> data structures that have changed repeatedly), and I'm not sure how much
> I trust it anyway.  So I'm leaning towards just doing
>
> +               if (colno == 0)
> +                       appendStringInfoString(buf, " *");
>
> at least till such time as somebody shows a case where this isn't good
> enough.
>

Well here's a contrived example with grouping:

create table nocols();
create table somecols(a int, b int);
create view v as select exists(select nocols.* from nocols, somecols
group by somecols.a);

Simply turning that tlist into "*" makes the query invalid because
somecols.b doesn't appear in the group by clause. So in this case, it
needs to try to output a tlist that contains no columns.

Regards,
Dean


> Note that I wouldn't be thinking of this as something to back-patch
> at all, except that if someone did have a view that looked like this,
> they'd find that pg_dump output wouldn't restore.  You could construct
> scenarios where that could seem like a denial of service, particularly
> if the DBA wasn't smart enough to figure out what was wrong with his
> dump.  (And who wants to deal with such a thing at 4AM anyway ...)
>
> Comments?
>
>                         regards, tom lane
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



Re: ruleutils vs. empty targetlists

От
Tom Lane
Дата:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 3 December 2013 23:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Thinking some more about bug #8648, it occurred to me that ruleutils.c
>> isn't exactly prepared for the case either:
>> ...  So I'm leaning towards just doing
>> 
>> +               if (colno == 0)
>> +                       appendStringInfoString(buf, " *");
>> 
>> at least till such time as somebody shows a case where this isn't good
>> enough.

> Well here's a contrived example with grouping:

> create table nocols();
> create table somecols(a int, b int);
> create view v as select exists(select nocols.* from nocols, somecols
> group by somecols.a);

Hm, that's cute :-(.  And I realized last night that my patch's approach
of seizing on some random zero-column RTE isn't too bulletproof against
ALTER TABLE ADD COLUMN operations, either, ie, the referenced table might
have some columns by now.  Which is more or less the entire reason we
expand "*" at parse time in the first place.

What I'm thinking about this today is that really the *right* solution
is to allow syntactically-empty SELECT lists; once we've bought into the
notion of zero-column tables, the notion that you can't have an empty
select list is just fundamentally at odds with that.  And since you can
already have semantically-empty SELECT lists, this should in theory not
create much risk of new bugs.  If we did that, the existing ruleutils
code is just fine, as are any existing dump files containing this sort
of query.

That change might still be thought too aggressive for a back-patch,
though.  Comments?
        regards, tom lane



Re: ruleutils vs. empty targetlists

От
Alvaro Herrera
Дата:
Tom Lane escribió:

> What I'm thinking about this today is that really the *right* solution
> is to allow syntactically-empty SELECT lists; once we've bought into the
> notion of zero-column tables, the notion that you can't have an empty
> select list is just fundamentally at odds with that.  And since you can
> already have semantically-empty SELECT lists, this should in theory not
> create much risk of new bugs.  If we did that, the existing ruleutils
> code is just fine, as are any existing dump files containing this sort
> of query.

Wow, as strange-sounding as that is, you're probably correct.

This might probably be seen as a deviation from the standard, but then
so are zero-column tables.  Of course, syntactically-empty select lists
would also work with (standard-conforming) tables containing columns,
but it's hard to see that that would be a problem in practice.

> That change might still be thought too aggressive for a back-patch,
> though.  Comments?

Well, no correct query will start failing due to this change; the only
visible change would be queries that previously throw errors would start
working.  It's hard to see that as a backward-incompatibility.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: ruleutils vs. empty targetlists

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane escribi�:
>> What I'm thinking about this today is that really the *right* solution
>> is to allow syntactically-empty SELECT lists; once we've bought into the
>> notion of zero-column tables, the notion that you can't have an empty
>> select list is just fundamentally at odds with that.  And since you can
>> already have semantically-empty SELECT lists, this should in theory not
>> create much risk of new bugs.  If we did that, the existing ruleutils
>> code is just fine, as are any existing dump files containing this sort
>> of query.

> Wow, as strange-sounding as that is, you're probably correct.

I experimented with this a bit, and found that things seem to pretty much
work, although unsurprisingly there are a couple of regression tests that
expected to get a syntax error and no longer do.

The only thing I've come across that arguably doesn't work is SELECT
DISTINCT:

regression=# select distinct from pg_database;
--
(8 rows)

The reason this says "8 rows" is that the produced plan is just a seqscan
of pg_database returning no columns.  The parser produced a Query with an
empty distinctClause, so the planner thinks no unique-ification is
required, so it doesn't produce any sort/uniq or hashagg node.  This seems
a bit bogus to me; mathematically, it seems like what this ought to mean
is that all rows are equivalent and so you get just one empty row out.
However, I don't see any practical use-case for that behavior, so I'm
disinclined to spend time on inventing a solution --- and any solution
would probably require a change in Query representation, making it not
back-patchable anyway.  Instead I suggest we add an error check in parse
analysis that throws an error for DISTINCT with no columns.  If anyone is
ever motivated to make this case do something sane, they can remove that
check and fix up whatever needs fixing up.  The attached first-draft patch
doesn't yet have such a prohibition, though.

Another point is that there is a second use of target_list in the grammar,
which is "RETURNING target_list".  I did not change that one into
"RETURNING opt_target_list", because if I had, it would have an issue
similar to DISTINCT's: RETURNING with no arguments would be represented
the same as no RETURNING at all, leading to probably not the behavior
the user expected.  We've already got that problem if you say "RETURNING
zero_column_table.*", making me wonder if a parse-analysis-time error
for that case wouldn't be a good thing.

Comments?

            regards, tom lane

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8fced44..f9d4577 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** static Node *makeRecursiveViewSelect(cha
*** 334,340 ****
                  name_list from_clause from_list opt_array_bounds
                  qualified_name_list any_name any_name_list
                  any_operator expr_list attrs
!                 target_list insert_column_list set_target_list
                  set_clause_list set_clause multiple_set_clause
                  ctext_expr_list ctext_row def_list indirection opt_indirection
                  reloption_list group_clause TriggerFuncArgs select_limit
--- 334,340 ----
                  name_list from_clause from_list opt_array_bounds
                  qualified_name_list any_name any_name_list
                  any_operator expr_list attrs
!                 target_list opt_target_list insert_column_list set_target_list
                  set_clause_list set_clause multiple_set_clause
                  ctext_expr_list ctext_row def_list indirection opt_indirection
                  reloption_list group_clause TriggerFuncArgs select_limit
*************** select_clause:
*** 9259,9265 ****
   * However, this is not checked by the grammar; parse analysis must check it.
   */
  simple_select:
!             SELECT opt_distinct target_list
              into_clause from_clause where_clause
              group_clause having_clause window_clause
                  {
--- 9259,9265 ----
   * However, this is not checked by the grammar; parse analysis must check it.
   */
  simple_select:
!             SELECT opt_distinct opt_target_list
              into_clause from_clause where_clause
              group_clause having_clause window_clause
                  {
*************** ctext_row: '(' ctext_expr_list ')'                    {
*** 12215,12220 ****
--- 12215,12224 ----
   *
   *****************************************************************************/

+ opt_target_list: target_list                        { $$ = $1; }
+             | /* EMPTY */                            { $$ = NIL; }
+         ;
+
  target_list:
              target_el                                { $$ = list_make1($1); }
              | target_list ',' target_el                { $$ = lappend($1, $3); }
diff --git a/src/test/regress/expected/errors.out b/src/test/regress/expected/errors.out
index 4061512..b63cb54 100644
*** a/src/test/regress/expected/errors.out
--- b/src/test/regress/expected/errors.out
*************** select 1;
*** 15,35 ****
  --
  --
  -- SELECT
! -- missing relation name
  select;
! ERROR:  syntax error at or near ";"
! LINE 1: select;
!               ^
  -- no such relation
  select * from nonesuch;
  ERROR:  relation "nonesuch" does not exist
  LINE 1: select * from nonesuch;
                        ^
- -- missing target list
- select from pg_database;
- ERROR:  syntax error at or near "from"
- LINE 1: select from pg_database;
-                ^
  -- bad name in target list
  select nonesuch from pg_database;
  ERROR:  column "nonesuch" does not exist
--- 15,30 ----
  --
  --
  -- SELECT
! -- this used to be a syntax error, but now we allow an empty target list
  select;
! --
! (1 row)
!
  -- no such relation
  select * from nonesuch;
  ERROR:  relation "nonesuch" does not exist
  LINE 1: select * from nonesuch;
                        ^
  -- bad name in target list
  select nonesuch from pg_database;
  ERROR:  column "nonesuch" does not exist
*************** select * from pg_database where pg_datab
*** 45,56 ****
  ERROR:  column "nonesuch" does not exist
  LINE 1: ...ect * from pg_database where pg_database.datname = nonesuch;
                                                                ^
! -- bad select distinct on syntax, distinct attribute missing
! select distinct on (foobar) from pg_database;
! ERROR:  syntax error at or near "from"
! LINE 1: select distinct on (foobar) from pg_database;
!                                     ^
! -- bad select distinct on syntax, distinct attribute not in target list
  select distinct on (foobar) * from pg_database;
  ERROR:  column "foobar" does not exist
  LINE 1: select distinct on (foobar) * from pg_database;
--- 40,46 ----
  ERROR:  column "nonesuch" does not exist
  LINE 1: ...ect * from pg_database where pg_database.datname = nonesuch;
                                                                ^
! -- bad attribute name in select distinct on
  select distinct on (foobar) * from pg_database;
  ERROR:  column "foobar" does not exist
  LINE 1: select distinct on (foobar) * from pg_database;
diff --git a/src/test/regress/sql/errors.sql b/src/test/regress/sql/errors.sql
index 2ee707c..80d1b62 100644
*** a/src/test/regress/sql/errors.sql
--- b/src/test/regress/sql/errors.sql
*************** select 1;
*** 16,29 ****
  --
  -- SELECT

! -- missing relation name
  select;

  -- no such relation
  select * from nonesuch;

- -- missing target list
- select from pg_database;
  -- bad name in target list
  select nonesuch from pg_database;
  -- bad attribute name on lhs of operator
--- 16,27 ----
  --
  -- SELECT

! -- this used to be a syntax error, but now we allow an empty target list
  select;

  -- no such relation
  select * from nonesuch;

  -- bad name in target list
  select nonesuch from pg_database;
  -- bad attribute name on lhs of operator
*************** select * from pg_database where nonesuch
*** 32,43 ****
  -- bad attribute name on rhs of operator
  select * from pg_database where pg_database.datname = nonesuch;

!
! -- bad select distinct on syntax, distinct attribute missing
! select distinct on (foobar) from pg_database;
!
!
! -- bad select distinct on syntax, distinct attribute not in target list
  select distinct on (foobar) * from pg_database;


--- 30,36 ----
  -- bad attribute name on rhs of operator
  select * from pg_database where pg_database.datname = nonesuch;

! -- bad attribute name in select distinct on
  select distinct on (foobar) * from pg_database;



Re: ruleutils vs. empty targetlists

От
Dean Rasheed
Дата:
On 13 December 2013 01:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The only thing I've come across that arguably doesn't work is SELECT
> DISTINCT:
>
> regression=# select distinct from pg_database;
> --
> (8 rows)
>
> The reason this says "8 rows" is that the produced plan is just a seqscan
> of pg_database returning no columns.  The parser produced a Query with an
> empty distinctClause, so the planner thinks no unique-ification is
> required, so it doesn't produce any sort/uniq or hashagg node.  This seems
> a bit bogus to me; mathematically, it seems like what this ought to mean
> is that all rows are equivalent and so you get just one empty row out.
> However, I don't see any practical use-case for that behavior, so I'm
> disinclined to spend time on inventing a solution --- and any solution
> would probably require a change in Query representation, making it not
> back-patchable anyway.  Instead I suggest we add an error check in parse
> analysis that throws an error for DISTINCT with no columns.  If anyone is
> ever motivated to make this case do something sane, they can remove that
> check and fix up whatever needs fixing up.  The attached first-draft patch
> doesn't yet have such a prohibition, though.
>
> Another point is that there is a second use of target_list in the grammar,
> which is "RETURNING target_list".  I did not change that one into
> "RETURNING opt_target_list", because if I had, it would have an issue
> similar to DISTINCT's: RETURNING with no arguments would be represented
> the same as no RETURNING at all, leading to probably not the behavior
> the user expected.  We've already got that problem if you say "RETURNING
> zero_column_table.*", making me wonder if a parse-analysis-time error
> for that case wouldn't be a good thing.
>
> Comments?
>

I can't think of any practical uses for this kind of query, so I don't
think it's worth worrying too much about its results until/unless
someone comes up with a real use-case.

However, given that we currently support queries like "select distinct
* from nocols" (albeit with rather odd results), I don't think we
should start throwing new errors for them. Perhaps the actual risk of
a backwards-compatibility break is small, but so too is any benefit
from adding such new errors.

So +1 for the patch as-is, with no new errors.

Regards,
Dean



Re: ruleutils vs. empty targetlists

От
Tom Lane
Дата:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> I can't think of any practical uses for this kind of query, so I don't
> think it's worth worrying too much about its results until/unless
> someone comes up with a real use-case.

> However, given that we currently support queries like "select distinct
> * from nocols" (albeit with rather odd results), I don't think we
> should start throwing new errors for them. Perhaps the actual risk of
> a backwards-compatibility break is small, but so too is any benefit
> from adding such new errors.

> So +1 for the patch as-is, with no new errors.

How about as-is in the back branches, and throw the new errors only
in HEAD?
        regards, tom lane



Re: ruleutils vs. empty targetlists

От
Dean Rasheed
Дата:
On 13 December 2013 15:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> I can't think of any practical uses for this kind of query, so I don't
>> think it's worth worrying too much about its results until/unless
>> someone comes up with a real use-case.
>
>> However, given that we currently support queries like "select distinct
>> * from nocols" (albeit with rather odd results), I don't think we
>> should start throwing new errors for them. Perhaps the actual risk of
>> a backwards-compatibility break is small, but so too is any benefit
>> from adding such new errors.
>
>> So +1 for the patch as-is, with no new errors.
>
> How about as-is in the back branches, and throw the new errors only
> in HEAD?
>

Seems reasonable.

Regards,
Dean



Re: ruleutils vs. empty targetlists

От
Tom Lane
Дата:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 13 December 2013 15:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> How about as-is in the back branches, and throw the new errors only
>> in HEAD?

> Seems reasonable.

After further experimentation I've concluded that maybe we'd better
not back-patch this change.  The reason for my change of heart is
that in 8.4, the patch made plpgsql stop complaining that "return ;"
was missing an expression.  While we could possibly fix that, or
just decide not to patch 8.4, it occurs to me that there might be
applications out there that are expecting "SELECT ;" to fail, too.
So the risk of undesirable behavior changes seems a little larger
than I'd previously believed, and I'm feeling that fixing this
corner case in the back branches is not worth that risk.
        regards, tom lane