Обсуждение: Confusing order by error
I'm hoping someone can give us a little help understanding an error in the ORDER BY clause, because when I read https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-ORDERBYI just don't see anything that explains the behavior. This is with Pg-9.5.1 on Centos (not that I think the OS matters here). Consider this table and data, stripped down example of real code: CREATE TABLE IF NOT EXISTS test_table ( pk INTEGER PRIMARY KEY, id INTEGER NOT NULL, name TEXT NOT NULL, ref INTEGER REFERENCES test_table ); INSERT INTO test_table ( pk, id, name, ref ) VALUES ( 1, 1000, 'fred', null ), ( 2, 2000, 'barney', 1 ), ( 3, 3000, 'betty', 2 ), ( 4, 4000, 'wilma', 1 ) ON CONFLICT DO NOTHING; select * from test_table; pk | id | name | ref ----+------+--------+----- 1 | 1000 | fred | 2 | 2000 | barney | 1 3 | 3000 | betty | 2 4 | 4000 | wilma | 1 (4 rows) So far so good, but when we try to use the data in a more meaningful way: SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name ) AS ref_display FROM test_table as t1 LEFT JOIN test_table as t2 ON t1.ref = t2.pk ORDER BY name; pk | name | ref | ref_display ----+--------+-----+------------- 2 | barney | 1 | 1000:fred 3 | betty | 2 | 2000:barney 1 | fred | | : 4 | wilma | 1 | 1000:fred (4 rows) That looks reasonable ... but if we change the ORDER BY clause to normalize should the name be mixed case: SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name ) AS ref_display FROM test_table as t1 LEFT JOIN test_table as t2 ON t1.ref = t2.pk ORDER BY UPPER(name); ERROR: column reference "name" is ambiguous LINE 4: ORDER BY UPPER(name); ^ Eh? The parser (or whatever phase) understood "ORDER BY name" in the first query, so why did that UPPER() string functionmake a difference in the second query? I can almost make sense of it in that when the result tuples are created as it works, there are 2 name fields present: t1.name& t2.name. In the first example they should have the same value but in the second they'd potentially have differentvalues (1 raw and 1 up-cased). But that also doesn't really make sense either as I'd think the first query shouldhave the same issue. I'd think (obviously incorrectly :) that we'd get either both working or both failing, not 1 workswhile the other fails. So what's going on here? Thanks, Kevin
From the docs you linked: "Each expression can be the name or ordinal number of an output column (SELECT list item), or it can be an arbitrary expression formed from input-column values." The "name" in your order by is a reference to the output column. The following example shows the same with "foo" instead of name. Once you use UPPER() it is now an arbitrary expression where the 'name' you are referring to becomes ambiguous. SELECT t1.pk, t1.name as foo, t1.ref, CONCAT( t2.id , ':', t2.name ) AS ref_display FROM test_table as t1 LEFT JOIN test_table as t2 ON t1.ref = t2.pk ORDER BY foo; pk | foo | ref | ref_display ----+--------+-----+------------- 2 | barney | 1 | 1000:fred 3 | betty | 2 | 2000:barney 1 | fred | | : 4 | wilma | 1 | 1000:fred (4 rows) Someone may correct me if I'm wrong here, but since "name" matches an output column, it assumes that is what you mean and doesn't bother to consider that the output column happens to have the same name as a column in the source tables. On Fri, Mar 31, 2017 at 3:39 PM, <kbrannen@pwhome.com> wrote: > I'm hoping someone can give us a little help understanding an error in the ORDER BY clause, because when I read https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-ORDERBYI just don't see anything that explains the behavior. > > This is with Pg-9.5.1 on Centos (not that I think the OS matters here). > > Consider this table and data, stripped down example of real code: > > CREATE TABLE IF NOT EXISTS test_table ( > pk INTEGER PRIMARY KEY, > id INTEGER NOT NULL, > name TEXT NOT NULL, > ref INTEGER REFERENCES test_table > ); > > INSERT INTO test_table > ( pk, id, name, ref ) > VALUES > ( 1, 1000, 'fred', null ), > ( 2, 2000, 'barney', 1 ), > ( 3, 3000, 'betty', 2 ), > ( 4, 4000, 'wilma', 1 ) > ON CONFLICT DO NOTHING; > > select * from test_table; > > pk | id | name | ref > ----+------+--------+----- > 1 | 1000 | fred | > 2 | 2000 | barney | 1 > 3 | 3000 | betty | 2 > 4 | 4000 | wilma | 1 > (4 rows) > > So far so good, but when we try to use the data in a more meaningful way: > > SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name ) AS ref_display > FROM test_table as t1 > LEFT JOIN test_table as t2 ON t1.ref = t2.pk > ORDER BY name; > > pk | name | ref | ref_display > ----+--------+-----+------------- > 2 | barney | 1 | 1000:fred > 3 | betty | 2 | 2000:barney > 1 | fred | | : > 4 | wilma | 1 | 1000:fred > (4 rows) > > That looks reasonable ... but if we change the ORDER BY clause to normalize should the name be mixed case: > > SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name ) AS ref_display > FROM test_table as t1 > LEFT JOIN test_table as t2 ON t1.ref = t2.pk > ORDER BY UPPER(name); > > ERROR: column reference "name" is ambiguous > LINE 4: ORDER BY UPPER(name); > ^ > > Eh? The parser (or whatever phase) understood "ORDER BY name" in the first query, so why did that UPPER() string functionmake a difference in the second query? > > I can almost make sense of it in that when the result tuples are created as it works, there are 2 name fields present:t1.name & t2.name. In the first example they should have the same value but in the second they'd potentially havedifferent values (1 raw and 1 up-cased). But that also doesn't really make sense either as I'd think the first queryshould have the same issue. I'd think (obviously incorrectly :) that we'd get either both working or both failing, not1 works while the other fails. > > So what's going on here? > > Thanks, > Kevin > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
On Friday, March 31, 2017, <kbrannen@pwhome.com> wrote:
I'm hoping someone can give us a little help understanding an error in the ORDER BY clause, because when I read https://www.postgresql.org/docs/9.5/static/sql-select. html#SQL-ORDERBY I just don't see anything that explains the behavior.
SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name ) AS ref_display
FROM test_table as t1
LEFT JOIN test_table as t2 ON t1.ref = t2.pk
ORDER BY name;
References the output column due to inherent preference.
SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name ) AS ref_display
FROM test_table as t1
LEFT JOIN test_table as t2 ON t1.ref = t2.pk
ORDER BY UPPER(name);
ERROR: column reference "name" is ambiguous
LINE 4: ORDER BY UPPER(name);
^
Eh? The parser (or whatever phase) understood "ORDER BY name" in the first query, so why did that UPPER() string function make a difference in the second query?
This is an expression so names cannot be output columns. Attempts to pick an input column and finds two candidates and so emits the error.
Each expression can be the name or ordinal number of an output column (SELECT list item), or it can be an arbitrary expression formed from input-column values.
David J.
Thank you Brian and David, I'll use David's reply to refine my question... --- david.g.johnston@gmail.com wrote: > From: "David G. Johnston" <david.g.johnston@gmail.com> > > On Friday, March 31, 2017, <kbrannen@pwhome.com> wrote: > > > > SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name ) AS > > ref_display > > FROM test_table as t1 > > LEFT JOIN test_table as t2 ON t1.ref = t2.pk > > ORDER BY UPPER(name); > > > > ERROR: column reference "name" is ambiguous > > LINE 4: ORDER BY UPPER(name); > > ^ > > > > Eh? The parser (or whatever phase) understood "ORDER BY name" in the first > > query, so why did that UPPER() string function make a difference in the > > second query? > This is an expression so names cannot be output columns. Attempts to pick > an input column and finds two candidates and so emits the error. > Each expression can be the name or ordinal number of an output column ( > SELECT list item), or it can be an arbitrary expression formed from > input-column values. So, what I get out of those explanations is that I should view it as: ORDER BY expr and if "expr" is in the output list (optionally prefixed with a table name), then that is used as is and explains the firstexample; but if "expr" is not in the output verbatim -- "upper(name)" in the example -- then it goes to the input list to satisfywhat it's ordering by. I can go with that now that I understand it (and I did not read the docs that way), but I guess I was expecting it to seethat "upper(name)" isn't there, so it should pull that expression apart, find "name" and see that in the output list beforeit starts searching the input list. After all, that's what happens when it wants to search the input list, it's tornthe expression apart to find a column name. So why did it not look for "name" in the output list but was willing to dothat for the input list. I guess that was really the heart of my question. :) If it makes it easier, this was my (expected) mental process model, thinking it would work thru the steps and stop when somethingis found: 1. look for expression in output list: upper(name) 2. look for column names in expression in output list: name 3. look for expression in input list: upper(name) 4. look for column names in expression in input list: name I expected it do that and stop at #2, but it seems to have no #2 and went to #4. :( Thanks, Kevin
I can go with that now that I understand it (and I did not read the docs that way), but I guess I was expecting it to see that "upper(name)" isn't there, so it should pull that expression apart, find "name" and see that in the output list before it starts searching the input list. After all, that's what happens when it wants to search the input list, it's torn the expression apart to find a column name. So why did it not look for "name" in the output list but was willing to do that for the input list. I guess that was really the heart of my question. :)
This probably comes up twice a year or so. The depth of detail of the answer depends on who decided to respond. No ones seem motivated enough to write a decent FAQ article...and enough people are willing to just answer to the (public/searchable)
Here's a decent context response by Tom Lane that I quickly found from a few years back.
David J.
"" <kbrannen@pwhome.com> writes: > I can go with that now that I understand it (and I did not read the docs that way), but I guess I was expecting it to seethat "upper(name)" isn't there, so it should pull that expression apart, find "name" and see that in the output list beforeit starts searching the input list. After all, that's what happens when it wants to search the input list, it's tornthe expression apart to find a column name. So why did it not look for "name" in the output list but was willing to dothat for the input list. I guess that was really the heart of my question. :) The short answer here is that the SQL committee changed their minds between SQL92 and later editions of the standard, and PG is attempting to split the difference and be reasonably compliant with both versions. SQL92 allowed only output column names or numbers (no expressions) in ORDER BY. Later editions essentially dropped that altogether, and say that ORDER BY items are expressions over the set of available table columns. The column-number case is not so hard to continue supporting, because "ORDER BY 1" is pretty pointless if you think that "1" means a literal constant 1 (and anyway ordering by a constant is disallowed in the later versions of the spec). But the column name case is problematic. We can't insist on uniquely identifiable names, since plain old "SELECT x FROM ... ORDER BY x" would break that --- x is then visible as both an input and output name. Worse, the user might do weird stuff like "SELECT x AS y" from a table that also contains a column y. We ended up with the rule of checking first for an output name (thus satisfying the SQL92 spec exactly) and failing that, or if the item is an expression, interpreting names as input names only. That meets the SQL99 rules as long as you don't do stuff like "SELECT x AS y ... ORDER BY y" and expect "y" to be resolved as some other column name. What it does *not* do is allow you to treat an output column name as something you can re-use in an expression. If you want that, use a sub-select, viz SELECT * FROM (SELECT x+y AS z, q FROM ...) ss ORDER BY z+q regards, tom lane
> --- david.g.johnston@gmail.com wrote: > > On Fri, Mar 31, 2017 at 2:45 PM, <kbrannen@pwhome.com> wrote: > > > I can go with that now that I understand it (and I did not read the docs > > that way), but I guess I was expecting it to see that "upper(name)" isn't > > there, so it should pull that expression apart, find "name" and see that in > > the output list before it starts searching the input list. After all, > > that's what happens when it wants to search the input list, it's torn the > > expression apart to find a column name. So why did it not look for "name" > > in the output list but was willing to do that for the input list. I guess > > that was really the heart of my question. :) > > > This probably comes up twice a year or so. The depth of detail of the > answer depends on who decided to respond. No ones seem motivated enough > to write a decent FAQ article...and enough people are willing to just > answer to the (public/searchable) > > Here's a decent context response by Tom Lane that I quickly found from a > few years back. > > https://www.postgresql.org/message-id/12023.1376506569%40sss.pgh.pa.us Thanks David! That explains it even more and how it got this way. I'll see about finding a few spare cycles to create a doc patch based on Tom's answer and maybe it'll get accepted and helpothers down the road. Kevin
> --- tgl@sss.pgh.pa.us wrote: > The short answer here is that the SQL committee changed their minds > between SQL92 and later editions of the standard, and PG is attempting > to split the difference and be reasonably compliant with both versions. > ... Thanks Tom! I'll use this as well to make a doc patch/suggestion. Kevin