Обсуждение: ruleutils vs. empty targetlists
Thinking some more about bug #8648, it occurred to me that ruleutils.c isn't exactly prepared for the case either: regression=# create table nocols(); CREATE TABLE regression=# create view vv1 as select exists (select * from nocols); CREATE VIEW regression=# \d+ vv1 View "public.vv1" Column | Type | Modifiers | Storage | Description --------+---------+-----------+---------+------------- exists | boolean | | plain | View definition: SELECT (EXISTS ( SELECT FROM nocols)) AS "exists"; which of course is illegal syntax. I thought at first that this could be fixed trivially by emitting "*" if get_target_list found no columns. However, that doesn't quite work; consider create view vv2 as select exists (select nocols.* from nocols, somecols); If we regurgitate this as "exists (select * from nocols, somecols)", it wouldn't mean the same thing. But on the third hand, at least in the context of an EXISTS() subquery, it doesn't really matter because the tlist is irrelevant, at least as long as it only contains Vars. So you could argue that emitting "*" is plenty good enough even in the above example. I'm not certain that that applies everywhere that a tlist could appear, though. I experimented with code that would attempt to regurgitate "nocols.*" in the above example; see attached draft patch. I don't like this patch much: it's ugly, it'd be a pain to backport (because it's digging into data structures that have changed repeatedly), and I'm not sure how much I trust it anyway. So I'm leaning towards just doing + if (colno == 0) + appendStringInfoString(buf, " *"); at least till such time as somebody shows a case where this isn't good enough. Note that I wouldn't be thinking of this as something to back-patch at all, except that if someone did have a view that looked like this, they'd find that pg_dump output wouldn't restore. You could construct scenarios where that could seem like a denial of service, particularly if the DBA wasn't smart enough to figure out what was wrong with his dump. (And who wants to deal with such a thing at 4AM anyway ...) Comments? regards, tom lane diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 348f620..d479276 100644 *** a/src/backend/utils/adt/ruleutils.c --- b/src/backend/utils/adt/ruleutils.c *************** get_target_list(List *targetList, depars *** 4610,4615 **** --- 4610,4652 ---- appendStringInfoString(buf, targetbuf.data); } + /* + * An empty targetlist (or RETURNING list) is syntactically impossible, + * but we might nonetheless find no elements in the list, as a result of + * constructs such as "SELECT * FROM zero_column_table". You might think + * we could just print a star and be done, but that fails in cases like + * "SELECT z.* FROM zero_column_table z, normal_table". So, dig through + * the namespace lists looking for a zero-column RTE and use its alias; + * this is valid even if the original syntax was just "*". + */ + if (colno == 0) + { + foreach(l, context->namespaces) + { + deparse_namespace *dpns = (deparse_namespace *) lfirst(l); + ListCell *ln, + *lc; + + forboth(ln, dpns->rtable_names, lc, dpns->rtable_columns) + { + char *refname = (char *) lfirst(ln); + deparse_columns *colinfo = (deparse_columns *) lfirst(lc); + + if (refname && colinfo->num_cols == 0) + { + appendStringInfo(buf, " %s.*", quote_identifier(refname)); + colno++; + break; + } + } + if (colno) + break; + } + /* If we didn't find a suitable RTE, just print a star. */ + if (colno == 0) + appendStringInfoString(buf, " *"); + } + /* clean up */ pfree(targetbuf.data); }
On 3 December 2013 23:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thinking some more about bug #8648, it occurred to me that ruleutils.c > isn't exactly prepared for the case either: > > regression=# create table nocols(); > CREATE TABLE > regression=# create view vv1 as select exists (select * from nocols); > CREATE VIEW > regression=# \d+ vv1 > View "public.vv1" > Column | Type | Modifiers | Storage | Description > --------+---------+-----------+---------+------------- > exists | boolean | | plain | > View definition: > SELECT (EXISTS ( SELECT > FROM nocols)) AS "exists"; > > which of course is illegal syntax. I thought at first that this could be > fixed trivially by emitting "*" if get_target_list found no columns. > However, that doesn't quite work; consider > > create view vv2 as select exists (select nocols.* from nocols, somecols); > > If we regurgitate this as "exists (select * from nocols, somecols)", it > wouldn't mean the same thing. > > But on the third hand, at least in the context of an EXISTS() subquery, > it doesn't really matter because the tlist is irrelevant, at least as long > as it only contains Vars. So you could argue that emitting "*" is plenty > good enough even in the above example. I'm not certain that that applies > everywhere that a tlist could appear, though. > > I experimented with code that would attempt to regurgitate "nocols.*" > in the above example; see attached draft patch. I don't like this patch > much: it's ugly, it'd be a pain to backport (because it's digging into > data structures that have changed repeatedly), and I'm not sure how much > I trust it anyway. So I'm leaning towards just doing > > + if (colno == 0) > + appendStringInfoString(buf, " *"); > > at least till such time as somebody shows a case where this isn't good > enough. > Well here's a contrived example with grouping: create table nocols(); create table somecols(a int, b int); create view v as select exists(select nocols.* from nocols, somecols group by somecols.a); Simply turning that tlist into "*" makes the query invalid because somecols.b doesn't appear in the group by clause. So in this case, it needs to try to output a tlist that contains no columns. Regards, Dean > Note that I wouldn't be thinking of this as something to back-patch > at all, except that if someone did have a view that looked like this, > they'd find that pg_dump output wouldn't restore. You could construct > scenarios where that could seem like a denial of service, particularly > if the DBA wasn't smart enough to figure out what was wrong with his > dump. (And who wants to deal with such a thing at 4AM anyway ...) > > Comments? > > regards, tom lane > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 3 December 2013 23:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Thinking some more about bug #8648, it occurred to me that ruleutils.c >> isn't exactly prepared for the case either: >> ... So I'm leaning towards just doing >> >> + if (colno == 0) >> + appendStringInfoString(buf, " *"); >> >> at least till such time as somebody shows a case where this isn't good >> enough. > Well here's a contrived example with grouping: > create table nocols(); > create table somecols(a int, b int); > create view v as select exists(select nocols.* from nocols, somecols > group by somecols.a); Hm, that's cute :-(. And I realized last night that my patch's approach of seizing on some random zero-column RTE isn't too bulletproof against ALTER TABLE ADD COLUMN operations, either, ie, the referenced table might have some columns by now. Which is more or less the entire reason we expand "*" at parse time in the first place. What I'm thinking about this today is that really the *right* solution is to allow syntactically-empty SELECT lists; once we've bought into the notion of zero-column tables, the notion that you can't have an empty select list is just fundamentally at odds with that. And since you can already have semantically-empty SELECT lists, this should in theory not create much risk of new bugs. If we did that, the existing ruleutils code is just fine, as are any existing dump files containing this sort of query. That change might still be thought too aggressive for a back-patch, though. Comments? regards, tom lane
Tom Lane escribió: > What I'm thinking about this today is that really the *right* solution > is to allow syntactically-empty SELECT lists; once we've bought into the > notion of zero-column tables, the notion that you can't have an empty > select list is just fundamentally at odds with that. And since you can > already have semantically-empty SELECT lists, this should in theory not > create much risk of new bugs. If we did that, the existing ruleutils > code is just fine, as are any existing dump files containing this sort > of query. Wow, as strange-sounding as that is, you're probably correct. This might probably be seen as a deviation from the standard, but then so are zero-column tables. Of course, syntactically-empty select lists would also work with (standard-conforming) tables containing columns, but it's hard to see that that would be a problem in practice. > That change might still be thought too aggressive for a back-patch, > though. Comments? Well, no correct query will start failing due to this change; the only visible change would be queries that previously throw errors would start working. It's hard to see that as a backward-incompatibility. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane escribi�: >> What I'm thinking about this today is that really the *right* solution >> is to allow syntactically-empty SELECT lists; once we've bought into the >> notion of zero-column tables, the notion that you can't have an empty >> select list is just fundamentally at odds with that. And since you can >> already have semantically-empty SELECT lists, this should in theory not >> create much risk of new bugs. If we did that, the existing ruleutils >> code is just fine, as are any existing dump files containing this sort >> of query. > Wow, as strange-sounding as that is, you're probably correct. I experimented with this a bit, and found that things seem to pretty much work, although unsurprisingly there are a couple of regression tests that expected to get a syntax error and no longer do. The only thing I've come across that arguably doesn't work is SELECT DISTINCT: regression=# select distinct from pg_database; -- (8 rows) The reason this says "8 rows" is that the produced plan is just a seqscan of pg_database returning no columns. The parser produced a Query with an empty distinctClause, so the planner thinks no unique-ification is required, so it doesn't produce any sort/uniq or hashagg node. This seems a bit bogus to me; mathematically, it seems like what this ought to mean is that all rows are equivalent and so you get just one empty row out. However, I don't see any practical use-case for that behavior, so I'm disinclined to spend time on inventing a solution --- and any solution would probably require a change in Query representation, making it not back-patchable anyway. Instead I suggest we add an error check in parse analysis that throws an error for DISTINCT with no columns. If anyone is ever motivated to make this case do something sane, they can remove that check and fix up whatever needs fixing up. The attached first-draft patch doesn't yet have such a prohibition, though. Another point is that there is a second use of target_list in the grammar, which is "RETURNING target_list". I did not change that one into "RETURNING opt_target_list", because if I had, it would have an issue similar to DISTINCT's: RETURNING with no arguments would be represented the same as no RETURNING at all, leading to probably not the behavior the user expected. We've already got that problem if you say "RETURNING zero_column_table.*", making me wonder if a parse-analysis-time error for that case wouldn't be a good thing. Comments? regards, tom lane diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 8fced44..f9d4577 100644 *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *************** static Node *makeRecursiveViewSelect(cha *** 334,340 **** name_list from_clause from_list opt_array_bounds qualified_name_list any_name any_name_list any_operator expr_list attrs ! target_list insert_column_list set_target_list set_clause_list set_clause multiple_set_clause ctext_expr_list ctext_row def_list indirection opt_indirection reloption_list group_clause TriggerFuncArgs select_limit --- 334,340 ---- name_list from_clause from_list opt_array_bounds qualified_name_list any_name any_name_list any_operator expr_list attrs ! target_list opt_target_list insert_column_list set_target_list set_clause_list set_clause multiple_set_clause ctext_expr_list ctext_row def_list indirection opt_indirection reloption_list group_clause TriggerFuncArgs select_limit *************** select_clause: *** 9259,9265 **** * However, this is not checked by the grammar; parse analysis must check it. */ simple_select: ! SELECT opt_distinct target_list into_clause from_clause where_clause group_clause having_clause window_clause { --- 9259,9265 ---- * However, this is not checked by the grammar; parse analysis must check it. */ simple_select: ! SELECT opt_distinct opt_target_list into_clause from_clause where_clause group_clause having_clause window_clause { *************** ctext_row: '(' ctext_expr_list ')' { *** 12215,12220 **** --- 12215,12224 ---- * *****************************************************************************/ + opt_target_list: target_list { $$ = $1; } + | /* EMPTY */ { $$ = NIL; } + ; + target_list: target_el { $$ = list_make1($1); } | target_list ',' target_el { $$ = lappend($1, $3); } diff --git a/src/test/regress/expected/errors.out b/src/test/regress/expected/errors.out index 4061512..b63cb54 100644 *** a/src/test/regress/expected/errors.out --- b/src/test/regress/expected/errors.out *************** select 1; *** 15,35 **** -- -- -- SELECT ! -- missing relation name select; ! ERROR: syntax error at or near ";" ! LINE 1: select; ! ^ -- no such relation select * from nonesuch; ERROR: relation "nonesuch" does not exist LINE 1: select * from nonesuch; ^ - -- missing target list - select from pg_database; - ERROR: syntax error at or near "from" - LINE 1: select from pg_database; - ^ -- bad name in target list select nonesuch from pg_database; ERROR: column "nonesuch" does not exist --- 15,30 ---- -- -- -- SELECT ! -- this used to be a syntax error, but now we allow an empty target list select; ! -- ! (1 row) ! -- no such relation select * from nonesuch; ERROR: relation "nonesuch" does not exist LINE 1: select * from nonesuch; ^ -- bad name in target list select nonesuch from pg_database; ERROR: column "nonesuch" does not exist *************** select * from pg_database where pg_datab *** 45,56 **** ERROR: column "nonesuch" does not exist LINE 1: ...ect * from pg_database where pg_database.datname = nonesuch; ^ ! -- bad select distinct on syntax, distinct attribute missing ! select distinct on (foobar) from pg_database; ! ERROR: syntax error at or near "from" ! LINE 1: select distinct on (foobar) from pg_database; ! ^ ! -- bad select distinct on syntax, distinct attribute not in target list select distinct on (foobar) * from pg_database; ERROR: column "foobar" does not exist LINE 1: select distinct on (foobar) * from pg_database; --- 40,46 ---- ERROR: column "nonesuch" does not exist LINE 1: ...ect * from pg_database where pg_database.datname = nonesuch; ^ ! -- bad attribute name in select distinct on select distinct on (foobar) * from pg_database; ERROR: column "foobar" does not exist LINE 1: select distinct on (foobar) * from pg_database; diff --git a/src/test/regress/sql/errors.sql b/src/test/regress/sql/errors.sql index 2ee707c..80d1b62 100644 *** a/src/test/regress/sql/errors.sql --- b/src/test/regress/sql/errors.sql *************** select 1; *** 16,29 **** -- -- SELECT ! -- missing relation name select; -- no such relation select * from nonesuch; - -- missing target list - select from pg_database; -- bad name in target list select nonesuch from pg_database; -- bad attribute name on lhs of operator --- 16,27 ---- -- -- SELECT ! -- this used to be a syntax error, but now we allow an empty target list select; -- no such relation select * from nonesuch; -- bad name in target list select nonesuch from pg_database; -- bad attribute name on lhs of operator *************** select * from pg_database where nonesuch *** 32,43 **** -- bad attribute name on rhs of operator select * from pg_database where pg_database.datname = nonesuch; ! ! -- bad select distinct on syntax, distinct attribute missing ! select distinct on (foobar) from pg_database; ! ! ! -- bad select distinct on syntax, distinct attribute not in target list select distinct on (foobar) * from pg_database; --- 30,36 ---- -- bad attribute name on rhs of operator select * from pg_database where pg_database.datname = nonesuch; ! -- bad attribute name in select distinct on select distinct on (foobar) * from pg_database;
On 13 December 2013 01:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The only thing I've come across that arguably doesn't work is SELECT > DISTINCT: > > regression=# select distinct from pg_database; > -- > (8 rows) > > The reason this says "8 rows" is that the produced plan is just a seqscan > of pg_database returning no columns. The parser produced a Query with an > empty distinctClause, so the planner thinks no unique-ification is > required, so it doesn't produce any sort/uniq or hashagg node. This seems > a bit bogus to me; mathematically, it seems like what this ought to mean > is that all rows are equivalent and so you get just one empty row out. > However, I don't see any practical use-case for that behavior, so I'm > disinclined to spend time on inventing a solution --- and any solution > would probably require a change in Query representation, making it not > back-patchable anyway. Instead I suggest we add an error check in parse > analysis that throws an error for DISTINCT with no columns. If anyone is > ever motivated to make this case do something sane, they can remove that > check and fix up whatever needs fixing up. The attached first-draft patch > doesn't yet have such a prohibition, though. > > Another point is that there is a second use of target_list in the grammar, > which is "RETURNING target_list". I did not change that one into > "RETURNING opt_target_list", because if I had, it would have an issue > similar to DISTINCT's: RETURNING with no arguments would be represented > the same as no RETURNING at all, leading to probably not the behavior > the user expected. We've already got that problem if you say "RETURNING > zero_column_table.*", making me wonder if a parse-analysis-time error > for that case wouldn't be a good thing. > > Comments? > I can't think of any practical uses for this kind of query, so I don't think it's worth worrying too much about its results until/unless someone comes up with a real use-case. However, given that we currently support queries like "select distinct * from nocols" (albeit with rather odd results), I don't think we should start throwing new errors for them. Perhaps the actual risk of a backwards-compatibility break is small, but so too is any benefit from adding such new errors. So +1 for the patch as-is, with no new errors. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > I can't think of any practical uses for this kind of query, so I don't > think it's worth worrying too much about its results until/unless > someone comes up with a real use-case. > However, given that we currently support queries like "select distinct > * from nocols" (albeit with rather odd results), I don't think we > should start throwing new errors for them. Perhaps the actual risk of > a backwards-compatibility break is small, but so too is any benefit > from adding such new errors. > So +1 for the patch as-is, with no new errors. How about as-is in the back branches, and throw the new errors only in HEAD? regards, tom lane
On 13 December 2013 15:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> I can't think of any practical uses for this kind of query, so I don't >> think it's worth worrying too much about its results until/unless >> someone comes up with a real use-case. > >> However, given that we currently support queries like "select distinct >> * from nocols" (albeit with rather odd results), I don't think we >> should start throwing new errors for them. Perhaps the actual risk of >> a backwards-compatibility break is small, but so too is any benefit >> from adding such new errors. > >> So +1 for the patch as-is, with no new errors. > > How about as-is in the back branches, and throw the new errors only > in HEAD? > Seems reasonable. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 13 December 2013 15:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> How about as-is in the back branches, and throw the new errors only >> in HEAD? > Seems reasonable. After further experimentation I've concluded that maybe we'd better not back-patch this change. The reason for my change of heart is that in 8.4, the patch made plpgsql stop complaining that "return ;" was missing an expression. While we could possibly fix that, or just decide not to patch 8.4, it occurs to me that there might be applications out there that are expecting "SELECT ;" to fail, too. So the risk of undesirable behavior changes seems a little larger than I'd previously believed, and I'm feeling that fixing this corner case in the back branches is not worth that risk. regards, tom lane