Обсуждение: GROUP BY column alias?
Maybe I'm getting too old to SQL anymore, but I ran across something yesterday in a machine generated query that took meover an hour to figure out. Here's a little testcase. Maybe somebody can explain why the last "Not Expected" case does what it does. select version(); PostgreSQL 8.4.1 on i386-apple-darwin10.0.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build5646), 64-bit create table foo(day timestamp); insert into foo values (now()); insert into foo values (now()); Expected: select day, count(*) from foo group by day; day | count ----------------------------+------- 2010-02-18 15:41:37.335357 | 1 2010-02-18 15:41:39.471746 | 1 (2 rows) Expected: select day::date, count(*) from foo group by day; day | count ------------+------- 2010-02-18 | 1 2010-02-18 | 1 (2 rows) Expected: select day::date, count(*) from foo group by day::date; day | count ------------+------- 2010-02-18 | 2 (1 row) Expected: select day::date as bar, count(*) from foo group by bar; bar | count ------------+------- 2010-02-18 | 2 (1 row) Not Expected: select day::date as day, count(*) from foo group by day; day | count ------------+------- 2010-02-18 | 1 2010-02-18 | 1 (2 rows) Note in the last case, the "day" column is aliased as "day", but the group by using the physical "day" column, not the alias. That boggles my mind, especially when you consider the case above it, where "day" is aliased as "bar" and groupingby "bar" works as expected. eric
Eric B. Ridge wrote: > Maybe I'm getting too old to SQL anymore, but I ran across something yesterday in a machine generated query that took meover an hour to figure out. > > Here's a little testcase. Maybe somebody can explain why the last "Not Expected" case does what it does. > > select version(); > PostgreSQL 8.4.1 on i386-apple-darwin10.0.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build5646), 64-bit > > create table foo(day timestamp); > insert into foo values (now()); > insert into foo values (now()); > > Expected: > select day, count(*) from foo group by day; > day | count > ----------------------------+------- > 2010-02-18 15:41:37.335357 | 1 > 2010-02-18 15:41:39.471746 | 1 > (2 rows) > > Expected: > select day::date, count(*) from foo group by day; > day | count > ------------+------- > 2010-02-18 | 1 > 2010-02-18 | 1 > (2 rows) > > Expected: > select day::date, count(*) from foo group by day::date; > day | count > ------------+------- > 2010-02-18 | 2 > (1 row) > > Expected: > select day::date as bar, count(*) from foo group by bar; > bar | count > ------------+------- > 2010-02-18 | 2 > (1 row) > > Not Expected: > select day::date as day, count(*) from foo group by day; > day | count > ------------+------- > 2010-02-18 | 1 > 2010-02-18 | 1 > (2 rows) > > Note in the last case, the "day" column is aliased as "day", but the group by using the physical "day" column, not thealias. That boggles my mind, especially when you consider the case above it, where "day" is aliased as "bar" and groupingby "bar" works as expected. > > eric I'm not sure why you would be surprised by that behavior. You are grouping by a timestamp, so any microsecond difference will be a new group. If you want to make that work try: SELECT day::date, --no need to alias as same name COUNT(*) FROM foo GROUP BY day::date; Scott
On Feb 18, 2010, at 4:31 PM, Scott Bailey wrote: > I'm not sure why you would be surprised by that behavior. You are grouping by a timestamp, so any microsecond differencewill be a new group. I get that. ;) Hence the ::date. This is what doesn't make sense: Expected: select day::date as bar, count(*) from foo group by bar; Not Expected: select day::date as day, count(*) from foo group by day; If I alias it to something other than the actual column name, it does what I expect. But if I alias it to the column name,it doesn't. I would have thought that the precedence rules would resolve the alias first, then the column name, but that doesn't seemto be the case. > If you want to make that work try: > SELECT day::date, --no need to alias as same name The "no need to alias as same name" isn't true in my case because the queries I'm dealing with are machine generated, andthat's what the generator does, in all cases. eric
Eric B. Ridge wrote: > On Feb 18, 2010, at 4:31 PM, Scott Bailey wrote: >> I'm not sure why you would be surprised by that behavior. You are grouping by a timestamp, so any microsecond differencewill be a new group. > > I get that. ;) Hence the ::date. This is what doesn't make sense: > > Expected: select day::date as bar, count(*) from foo group by bar; > Not Expected: select day::date as day, count(*) from foo group by day; > > If I alias it to something other than the actual column name, it does what I expect. But if I alias it to the column name,it doesn't. > > I would have thought that the precedence rules would resolve the alias first, then the column name, but that doesn't seemto be the case. SQL name resolution rules are that column names have higher precedence than aliases and variables. So it will always bind to the column not the alias. Scott
On Feb 18, 2010, at 5:52 PM, Scott Bailey wrote: > SQL name resolution rules are that column names have higher precedence than aliases and variables. So it will always bindto the column not the alias. That explains it. Thanks. Breaks the rule of least surprise, but it is SQL. eric
Scott Bailey wrote: >> SQL name resolution rules are that column names >> have higher precedence than aliases and variables. >> So it will always bind to the column not the alias. Eric B. Ridge wrote: > That explains it. Thanks. Breaks the rule of least surprise, but it is SQL. I don't think it does break the rule of least surprise. How would one expect the column or the alias to have precedence without knowledge of the rule from documentation? The only thing I would expect if I were unaware of the rule is that there might be a conflict and that I'd have to look up the precedence. -- Lew
Lew <noone@lwsc.ehost-services.com> writes: > Eric B. Ridge wrote: >> That explains it. Thanks. Breaks the rule of least surprise, but it is SQL. > I don't think it does break the rule of least surprise. How would one > expect the column or the alias to have precedence without knowledge of > the rule from documentation? The only thing I would expect if I were > unaware of the rule is that there might be a conflict and that I'd > have to look up the precedence. There's actually a good deal more there than meets the eye. Our SELECT reference page spells it out, if you read the fine print. In particular it's worth noting this bit in the Compatibility section: Namespace Available to GROUP BY and ORDER BY In the SQL-92 standard, an ORDER BY clause can only use output column names or numbers, while a GROUP BY clause can only use expressions based on input column names. PostgreSQL extends each of these clauses to allow the other choice as well (but it uses the standard's interpretation if there is ambiguity). PostgreSQL also allows both clauses to specify arbitrary expressions. Note that names appearing in an expression will always be taken as input-column names, not as output-column names. SQL:1999 and later use a slightly different definition which is not entirely upward compatible with SQL-92. In most cases, however, PostgreSQL will interpret an ORDER BY or GROUP BY expression the same way SQL:1999 does. regards, tom lane
On Fri, Feb 19, 2010 at 12:07:42AM -0500, Tom Lane wrote: > Lew <noone@lwsc.ehost-services.com> writes: > > Eric B. Ridge wrote: > >> That explains it. Thanks. Breaks the rule of least surprise, > >> but it is SQL. > > SQL:1999 and later use a slightly different definition which is not > entirely upward compatible with SQL-92. In most cases, however, > PostgreSQL will interpret an ORDER BY or GROUP BY expression the > same way SQL:1999 does. The current SQL standard *supersedes* all previous ones. There isn't a hierarchy in the sense of "higher levels of compliance" that our docs implicitly and falsely assume in many spots, and we need to make them stop including this idea. The only standard actually worth citing today is SQL:2008, and the day the next one comes out, we need to change all our references to cite it. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Fri, Feb 19, 2010 at 12:07:42AM -0500, Tom Lane wrote: >> SQL:1999 and later use a slightly different definition which is not >> entirely upward compatible with SQL-92. In most cases, however, >> PostgreSQL will interpret an ORDER BY or GROUP BY expression the >> same way SQL:1999 does. > The current SQL standard *supersedes* all previous ones. That is the opinion of the SQL committee, all right, but it has got precious little to do with the real world. regards, tom lane