Обсуждение: Confusing order by error

Поиск
Список
Период
Сортировка

Confusing order by error

От
""
Дата:
I'm hoping someone can give us a little help understanding an error in the ORDER BY clause, because when I read
https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-ORDERBYI just don't see anything that explains the
behavior.

This is with Pg-9.5.1 on Centos (not that I think the OS matters here).

Consider this table and data, stripped down example of real code:

CREATE TABLE IF NOT EXISTS test_table (
       pk    INTEGER PRIMARY KEY,
       id    INTEGER NOT NULL,
       name  TEXT    NOT NULL,
       ref   INTEGER REFERENCES test_table
);

INSERT INTO test_table
   ( pk, id, name, ref )
VALUES
   ( 1, 1000, 'fred',    null ),
   ( 2, 2000, 'barney',  1 ),
   ( 3, 3000, 'betty',   2 ),
   ( 4, 4000, 'wilma',   1 )
ON CONFLICT DO NOTHING;

select * from test_table;

 pk |  id  |  name  | ref
----+------+--------+-----
  1 | 1000 | fred   |
  2 | 2000 | barney |   1
  3 | 3000 | betty  |   2
  4 | 4000 | wilma  |   1
(4 rows)

So far so good, but when we try to use the data in a more meaningful way:

    SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name )  AS ref_display
      FROM test_table as t1
 LEFT JOIN test_table as t2 ON  t1.ref = t2.pk
  ORDER BY name;

 pk |  name  | ref | ref_display
----+--------+-----+-------------
  2 | barney |   1 | 1000:fred
  3 | betty  |   2 | 2000:barney
  1 | fred   |     | :
  4 | wilma  |   1 | 1000:fred
(4 rows)

That looks reasonable ... but if we change the ORDER BY clause to normalize should the name be mixed case:

    SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name )  AS ref_display
      FROM test_table as t1
 LEFT JOIN test_table as t2 ON  t1.ref = t2.pk
  ORDER BY UPPER(name);

ERROR:  column reference "name" is ambiguous
LINE 4:   ORDER BY UPPER(name);
                         ^

Eh? The parser (or whatever phase) understood "ORDER BY name" in the first query, so why did that UPPER() string
functionmake a difference in the second query? 

I can almost make sense of it in that when the result tuples are created as it works, there are 2 name fields present:
t1.name& t2.name. In the first example they should have the same value but in the second they'd potentially have
differentvalues (1 raw and 1 up-cased). But that also doesn't really make sense either as I'd think the first query
shouldhave the same issue. I'd think (obviously incorrectly :) that we'd get either both working or both failing, not 1
workswhile the other fails. 

So what's going on here?

Thanks,
Kevin


Re: Confusing order by error

От
Brian Dunavant
Дата:
From the docs you linked:

"Each expression can be the name or ordinal number of an output column
(SELECT list item), or it can be an arbitrary expression formed from
input-column values."

The "name" in your order by is a reference to the output column.  The
following example shows the same with "foo" instead of name.  Once you
use UPPER() it is now an arbitrary expression where the 'name' you are
referring to becomes ambiguous.

SELECT t1.pk, t1.name as foo, t1.ref, CONCAT( t2.id , ':', t2.name )
AS ref_display
      FROM test_table as t1
 LEFT JOIN test_table as t2 ON  t1.ref = t2.pk
  ORDER BY foo;
 pk |  foo   | ref | ref_display
----+--------+-----+-------------
  2 | barney |   1 | 1000:fred
  3 | betty  |   2 | 2000:barney
  1 | fred   |     | :
  4 | wilma  |   1 | 1000:fred
(4 rows)

Someone may correct me if I'm wrong here, but since "name" matches an
output column, it assumes that is what you mean and doesn't bother to
consider that the output column happens to have the same name as a
column in the source tables.


On Fri, Mar 31, 2017 at 3:39 PM,  <kbrannen@pwhome.com> wrote:
> I'm hoping someone can give us a little help understanding an error in the ORDER BY clause, because when I read
https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-ORDERBYI just don't see anything that explains the
behavior.
>
> This is with Pg-9.5.1 on Centos (not that I think the OS matters here).
>
> Consider this table and data, stripped down example of real code:
>
> CREATE TABLE IF NOT EXISTS test_table (
>        pk    INTEGER PRIMARY KEY,
>        id    INTEGER NOT NULL,
>        name  TEXT    NOT NULL,
>        ref   INTEGER REFERENCES test_table
> );
>
> INSERT INTO test_table
>    ( pk, id, name, ref )
> VALUES
>    ( 1, 1000, 'fred',    null ),
>    ( 2, 2000, 'barney',  1 ),
>    ( 3, 3000, 'betty',   2 ),
>    ( 4, 4000, 'wilma',   1 )
> ON CONFLICT DO NOTHING;
>
> select * from test_table;
>
>  pk |  id  |  name  | ref
> ----+------+--------+-----
>   1 | 1000 | fred   |
>   2 | 2000 | barney |   1
>   3 | 3000 | betty  |   2
>   4 | 4000 | wilma  |   1
> (4 rows)
>
> So far so good, but when we try to use the data in a more meaningful way:
>
>     SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name )  AS ref_display
>       FROM test_table as t1
>  LEFT JOIN test_table as t2 ON  t1.ref = t2.pk
>   ORDER BY name;
>
>  pk |  name  | ref | ref_display
> ----+--------+-----+-------------
>   2 | barney |   1 | 1000:fred
>   3 | betty  |   2 | 2000:barney
>   1 | fred   |     | :
>   4 | wilma  |   1 | 1000:fred
> (4 rows)
>
> That looks reasonable ... but if we change the ORDER BY clause to normalize should the name be mixed case:
>
>     SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name )  AS ref_display
>       FROM test_table as t1
>  LEFT JOIN test_table as t2 ON  t1.ref = t2.pk
>   ORDER BY UPPER(name);
>
> ERROR:  column reference "name" is ambiguous
> LINE 4:   ORDER BY UPPER(name);
>                          ^
>
> Eh? The parser (or whatever phase) understood "ORDER BY name" in the first query, so why did that UPPER() string
functionmake a difference in the second query? 
>
> I can almost make sense of it in that when the result tuples are created as it works, there are 2 name fields
present:t1.name & t2.name. In the first example they should have the same value but in the second they'd potentially
havedifferent values (1 raw and 1 up-cased). But that also doesn't really make sense either as I'd think the first
queryshould have the same issue. I'd think (obviously incorrectly :) that we'd get either both working or both failing,
not1 works while the other fails. 
>
> So what's going on here?
>
> Thanks,
> Kevin
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: Confusing order by error

От
"David G. Johnston"
Дата:
On Friday, March 31, 2017, <kbrannen@pwhome.com> wrote:
I'm hoping someone can give us a little help understanding an error in the ORDER BY clause, because when I read https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-ORDERBY I just don't see anything that explains the behavior.

    SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name )  AS ref_display
      FROM test_table as t1
 LEFT JOIN test_table as t2 ON  t1.ref = t2.pk
  ORDER BY name;


References the output column due to inherent preference.
 
  
    SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name )  AS ref_display
      FROM test_table as t1
 LEFT JOIN test_table as t2 ON  t1.ref = t2.pk
  ORDER BY UPPER(name);

ERROR:  column reference "name" is ambiguous
LINE 4:   ORDER BY UPPER(name);
                         ^

Eh? The parser (or whatever phase) understood "ORDER BY name" in the first query, so why did that UPPER() string function make a difference in the second query?


This is an expression so names cannot be output columns.  Attempts to pick an input column and finds two candidates and so emits the error. 

Each expression can be the name or ordinal number of an output column (SELECT list item), or it can be an arbitrary expression formed from input-column values.

David J.

Re: Confusing order by error

От
""
Дата:
Thank you Brian and David, I'll use David's reply to refine my question...

--- david.g.johnston@gmail.com wrote:

> From: "David G. Johnston" <david.g.johnston@gmail.com>

> > On Friday, March 31, 2017, <kbrannen@pwhome.com> wrote:

> >
> >     SELECT t1.pk, t1.name, t1.ref, CONCAT( t2.id , ':', t2.name )  AS
> > ref_display
> >       FROM test_table as t1
> >  LEFT JOIN test_table as t2 ON  t1.ref = t2.pk
> >   ORDER BY UPPER(name);
> >
> > ERROR:  column reference "name" is ambiguous
> > LINE 4:   ORDER BY UPPER(name);
> >                          ^
> >
> > Eh? The parser (or whatever phase) understood "ORDER BY name" in the first
> > query, so why did that UPPER() string function make a difference in the
> > second query?

> This is an expression so names cannot be output columns.  Attempts to pick
> an input column and finds two candidates and so emits the error.

> Each expression can be the name or ordinal number of an output column (
> SELECT list item), or it can be an arbitrary expression formed from
> input-column values.

So, what I get out of those explanations is that I should view it as:

    ORDER BY expr

and if "expr" is in the output list (optionally prefixed with a table name), then that is used as is and explains the
firstexample; 
but if "expr" is not in the output verbatim -- "upper(name)" in the example -- then it goes to the input list to
satisfywhat it's ordering by. 

I can go with that now that I understand it (and I did not read the docs that way), but I guess I was expecting it to
seethat "upper(name)" isn't there, so it should pull that expression apart, find "name" and see that in the output list
beforeit starts searching the input list. After all, that's what happens when it wants to search the input list, it's
tornthe expression apart to find a column name. So why did it not look for "name" in the output list but was willing to
dothat for the input list. I guess that was really the heart of my question. :) 

If it makes it easier, this was my (expected) mental process model, thinking it would work thru the steps and stop when
somethingis found: 
1. look for expression in output list: upper(name)
2. look for column names in expression in output list: name
3. look for expression in input list: upper(name)
4. look for column names in expression in input list: name

I expected it do that and stop at #2, but it seems to have no #2 and went to #4. :(

Thanks,
Kevin




Re: Confusing order by error

От
"David G. Johnston"
Дата:
On Fri, Mar 31, 2017 at 2:45 PM, <kbrannen@pwhome.com> wrote:

I can go with that now that I understand it (and I did not read the docs that way), but I guess I was expecting it to see that "upper(name)" isn't there, so it should pull that expression apart, find "name" and see that in the output list before it starts searching the input list. After all, that's what happens when it wants to search the input list, it's torn the expression apart to find a column name. So why did it not look for "name" in the output list but was willing to do that for the input list. I guess that was really the heart of my question. :)


​This probably comes up twice a year or so.  The depth of detail of the answer depends ​on who decided to respond.  No ones seem motivated enough to write a decent FAQ article...and enough people are willing to just answer to the (public/searchable)

Here's a decent context response by Tom Lane that I quickly found from a few years back.


David J.

Re: Confusing order by error

От
Tom Lane
Дата:
"" <kbrannen@pwhome.com> writes:
> I can go with that now that I understand it (and I did not read the docs that way), but I guess I was expecting it to
seethat "upper(name)" isn't there, so it should pull that expression apart, find "name" and see that in the output list
beforeit starts searching the input list. After all, that's what happens when it wants to search the input list, it's
tornthe expression apart to find a column name. So why did it not look for "name" in the output list but was willing to
dothat for the input list. I guess that was really the heart of my question. :) 

The short answer here is that the SQL committee changed their minds
between SQL92 and later editions of the standard, and PG is attempting
to split the difference and be reasonably compliant with both versions.

SQL92 allowed only output column names or numbers (no expressions) in
ORDER BY.  Later editions essentially dropped that altogether, and
say that ORDER BY items are expressions over the set of available
table columns.

The column-number case is not so hard to continue supporting, because
"ORDER BY 1" is pretty pointless if you think that "1" means a literal
constant 1 (and anyway ordering by a constant is disallowed in the
later versions of the spec).  But the column name case is problematic.
We can't insist on uniquely identifiable names, since plain old
"SELECT x FROM ... ORDER BY x" would break that --- x is then visible
as both an input and output name.  Worse, the user might do weird stuff
like "SELECT x AS y" from a table that also contains a column y.
We ended up with the rule of checking first for an output name (thus
satisfying the SQL92 spec exactly) and failing that, or if the item
is an expression, interpreting names as input names only.  That meets
the SQL99 rules as long as you don't do stuff like "SELECT x AS y ...
ORDER BY y" and expect "y" to be resolved as some other column name.

What it does *not* do is allow you to treat an output column name as
something you can re-use in an expression.  If you want that, use a
sub-select, viz

    SELECT * FROM (SELECT x+y AS z, q FROM ...) ss ORDER BY z+q

            regards, tom lane


Re: Confusing order by error

От
""
Дата:
> --- david.g.johnston@gmail.com wrote:
>
> On Fri, Mar 31, 2017 at 2:45 PM, <kbrannen@pwhome.com> wrote:
>
> > I can go with that now that I understand it (and I did not read the docs
> > that way), but I guess I was expecting it to see that "upper(name)" isn't
> > there, so it should pull that expression apart, find "name" and see that in
> > the output list before it starts searching the input list. After all,
> > that's what happens when it wants to search the input list, it's torn the
> > expression apart to find a column name. So why did it not look for "name"
> > in the output list but was willing to do that for the input list. I guess
> > that was really the heart of my question. :)
> >
> This probably comes up twice a year or so.  The depth of detail of the
> answer depends on who decided to respond.  No ones seem motivated enough
> to write a decent FAQ article...and enough people are willing to just
> answer to the (public/searchable)
>
> Here's a decent context response by Tom Lane that I quickly found from a
> few years back.
>
> https://www.postgresql.org/message-id/12023.1376506569%40sss.pgh.pa.us

Thanks David! That explains it even more and how it got this way.

I'll see about finding a few spare cycles to create a doc patch based on Tom's answer and maybe it'll get accepted and
helpothers down the road. 

Kevin



Re: Confusing order by error

От
""
Дата:
> --- tgl@sss.pgh.pa.us wrote:

> The short answer here is that the SQL committee changed their minds
> between SQL92 and later editions of the standard, and PG is attempting
> to split the difference and be reasonably compliant with both versions.
> ...

Thanks Tom! I'll use this as well to make a doc patch/suggestion.

Kevin