Обсуждение: Re: pg, mysql comparison with "group by" clause

Поиск
Список
Период
Сортировка

Re: pg, mysql comparison with "group by" clause

От
"Anthony Molinaro"
Дата:
You're 100% correct, this is a bug in mysql.

Sadly, they tout this as a feature!

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Rick Schumeyer
Sent: Tuesday, October 11, 2005 5:12 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] pg, mysql comparison with "group by" clause

I'm not sure what I was thinking, but I tried the following query in pg:

SELECT * FROM t GROUP BY state;

pg returns an error.

Mysql, OTOH, returns the first row for each state.  (The first row with
"AK", the first row with "PA", etc.)

I'm no SQL expert, but it seems to me that the pg behavior is correct,
and
the mysql result is just weird.  Am I correct?


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
              http://www.postgresql.org/docs/faq


Re: pg, mysql comparison with "group by" clause

От
"Anthony Molinaro"
Дата:
> Not always, but I'd rather get the right answer with difficulty than
the
wrong one with ease. :)

agreed.

I made it a point to mention this so called "feature" in my book.

This is a bug they never fixed and they decided to call it a feature.

It is, imo, *ridiculous*.

Regards, Anthony

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Scott Marlowe
Sent: Wednesday, October 12, 2005 6:25 PM
To: Greg Stark
Cc: Stephan Szabo; Rick Schumeyer; pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

On Wed, 2005-10-12 at 16:54, Greg Stark wrote:
> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>
> > On Tue, 11 Oct 2005, Rick Schumeyer wrote:
> >
> > > I'm not sure what I was thinking, but I tried the following query
in pg:
> > >
> > > SELECT * FROM t GROUP BY state;
> > >
> > > pg returns an error.
> > >
> > > Mysql, OTOH, returns the first row for each state.  (The first row
with
> > > "AK", the first row with "PA", etc.)
> > >
> > > I'm no SQL expert, but it seems to me that the pg behavior is
correct, and
> > > the mysql result is just weird.  Am I correct?
> >
> > In your case, it sounds like the mysql result is wrong. I believe
SQL99
> > would allow it if the other columns were functionally dependant upon
state
> > (as there'd by definition only be one value for the other columns
per
> > group).
>
> I believe this is a documented feature.

Hehe.  When I turn on my windshield wipers and my airbag deploys, is it
a documented "feature" if the dealership told me about this behaviour
ahead of time?  In much the same way, while this behaviour may be
documented by MySQL, I can't imagine it really being called a feature.
But at least this misbehaviour is documented.  However, I think most
people in the MySQL universe just stumble onto it by accident when they
try it and it works.  I'd at least prefer it to throw a warning or
notice or something.

> MySQL treats "select a,b from t group by a" equivalently to Postgres's

> "select distinct on (a) a,b from t"
>
> I suppose "equivalent" isn't quite true. It's more general since it
allows
> aggregate functions as well. The equivalently general Postgres syntax
is to
> have a first() aggregate function and do "select a,first(b) from t
group by a".

A Subselect would let you do such a thing as well, and while it's more
complicated to write, it is likely to be easier to tell just what it's
doing.

> I'm sure it's very convenient.

Not always, but I'd rather get the right answer with difficulty than the
wrong one with ease. :)

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
              http://archives.postgresql.org


Re: pg, mysql comparison with "group by" clause

От
"Anthony Molinaro"
Дата:
> I don't see why you think people stumble on this by accident. I think
it's
> actually an extremely common need.

I can't imagine how that's a common need at all.

It makes no sense.

When you add an additional column in the select, it must be included in
the group by as it changes the meaning of the query.

Consider:

select deptno, count(*) from emp group by deptno;
   DEPTNO   COUNT(*)
---------- ----------       10          3       20          5       30          6


the query above counts the number of employees in each department.


Now consider the following query:

select deptno,job,count(*) from emp group by deptno,job;
   DEPTNO JOB         COUNT(*)
---------- --------- ----------       10 CLERK              1       10 MANAGER            1       10 PRESIDENT
1      20 CLERK              2       20 ANALYST            2       20 MANAGER            1       30 CLERK
1      30 MANAGER            1       30 SALESMAN           4 


the query above counts the number of different job types in each
department.

In mysql, you would be allowed to do the following:

select deptno,job,count(*) from emp group by deptno;

but it makes no sense. What value would it return and what does it mean?
How can that possibly represent reliable data? What would the result set
above look like? It would be meaningless in a production system.
honestly.


It's a silly bug that mysql has touted has a feature
and I can't imagine why people think it's useful.


> I think most MySQL users don't stumble on it, they learn it as the way
to
> handle the common use case when you join a master table against a
detail
> table and then want to aggregate all the detail records.

Huh? I don't follow that at all....
Perhaps your confusing the concept with window functions that neither pg
nor mysql have implemented yet?
For example, using window functions allows you to return aggregate and
detailed info simultaneously:

select ename,      deptno,      job,      count(*)over(partition by deptno) as emps_per_dept,
count(*)over(partitionby deptno,job) as job_per_dept,      count(*)over() as total  from emp 

ENAME  DEPTNO JOB       EMPS_PER_DEPT JOB_PER_DEPT TOTAL
------ ------ --------- ------------- ------------ -----
MILLER     10 CLERK                 3            1    14
CLARK      10 MANAGER               3            1    14
KING       10 PRESIDENT             3            1    14
SCOTT      20 ANALYST               5            2    14
FORD       20 ANALYST               5            2    14
SMITH      20 CLERK                 5            2    14
ADAMS      20 CLERK                 5            2    14
JONES      20 MANAGER               5            1    14
JAMES      30 CLERK                 6            1    14
BLAKE      30 MANAGER               6            1    14
ALLEN      30 SALESMAN              6            4    14
MARTIN     30 SALESMAN              6            4    14
TURNER     30 SALESMAN              6            4    14
WARD       30 SALESMAN              6            4    14


But this is not a group by, this is aggregating and windowing,
which is quite different from mysql adding that nasty little bug
and calling it a feature.
- a

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Greg Stark
Sent: Wednesday, October 12, 2005 9:13 PM
To: Scott Marlowe
Cc: Greg Stark; Stephan Szabo; Rick Schumeyer; pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

Scott Marlowe <smarlowe@g2switchworks.com> writes:

> Hehe.  When I turn on my windshield wipers and my airbag deploys, is
it
> a documented "feature" if the dealership told me about this behaviour
> ahead of time?

Well it's more like my car where the dashboard dims when I turn on my
headlights which annoys me to no end since I learned to always put my
headlights on even in the day.

> In much the same way, while this behaviour may be documented by MySQL,
I
> can't imagine it really being called a feature. But at least this
> misbehaviour is documented. However, I think most people in the MySQL
> universe just stumble onto it by accident when they try it and it
works. I'd
> at least prefer it to throw a warning or notice or something.

I don't see why you think people stumble on this by accident. I think
it's
actually an extremely common need. So common that Postgres has the same
feature (though less general) and invented a whole syntax to handle it.

I think most MySQL users don't stumble on it, they learn it as the way
to
handle the common use case when you join a master table against a detail
table
and then want to aggregate all the detail records. In standard SQL you
have to
write GROUP BY ... and list every single column you need from the master
table. Forcing the database to do a lot of redundant comparisons and
sort on
uselessly long keys where in fact you only really need it to sort and
group by
the primary key.

Remember, most MySQL users learn MySQL first, and only later learn what
is
standard SQL and what isn't.

> A Subselect would let you do such a thing as well, and while it's more
> complicated to write, it is likely to be easier to tell just what it's
> doing.

Subselects have their own problems here. Mainly Postgres's optimizer, as
good
as it is, doesn't treat them with the same code paths as joins and can't
find
all the same plans for them. But in any case you cannot always write a
subselect that's equivalent to an arbitrary join.


--
greg


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
              http://archives.postgresql.org


Re: pg, mysql comparison with "group by" clause

От
"Anthony Molinaro"
Дата:
Tom, Don't do it man... It's not the same. The standard has it wrong (that or I misunderstood
your explanation). But this statement:

> you shouldn't have to explicitly list the other columns from that
> table --- they are guaranteed to have unique values per group anyway

...is way off.
By changing the values in the select/group by you are changing
Group! How can you arbitrarily add or exclude a column?
You can't do it.

Look, all it takes is one example to prove the standard wrong...



Consider this simple example (in oracle, but the same in pg):


create table foo (id number primary key, name varchar2(10));

insert into foo values (1,'joe');

insert into foo values (2,'joe');

insert into foo values (3,'jim');


select * from foo;
       ID NAME
---------- ----------        1 joe        2 joe        3 jim


ok, lets get the count by ID (which is unique):

select id, count(*) from foo group by id;

ID   COUNT(*)
-- ----------1          12          13          1


makes sense...

the values for ID, 1,2,3 are our groups
and the count aggregate op shows that there's
one member per group.


now let's get the count by name:

select name, count(*) from foo group by name;

NAME         COUNT(*)
---------- ----------
jim                 1
joe                 2

so far so good. The queries and results are representative of the actual
data.  The values for name, "jim" and "joe" are our groups
and the count aggregate op shows that there's 1 member in the "jim"
group and two members in the "joe" group.


But, as soon as we add id to the group by...

select name, count(*) from foo group by name,id;

NAME         COUNT(*)
---------- ----------
jim                 1
joe                 1
joe                 1

it changes the query! Even tho id is unique, it changes the query.
The group by it's definition (it's in the group by) is no longer
Name or id, it's both name and id.
How can you simply remove id? Which result set should it return
The first or second? Makes no sense...

If it's aint obvious why, simply plug id into the select:

select id, name, count(*) from foo group by name,id;

ID NAME         COUNT(*)
--- ---------- ---------- 3 jim                 1 1 joe                 1 2 joe                 1



so, how can it be that you don't have to list the other columns
(in this case "name")? it makes no sense because if you remove the
either column from the group by (id or name) it changes the meaning of
the
query.
The way you guys do it now is correct. Mysql has it wrong.
And if the standard states that you can leave out the pk from a group
by,
They are wrong too, as the simple examples above prove.

Fyi, Oracle just bought innodb, so, I'd not be too concerned with mysql
and they so called "features" anyway.


Regards, Anthony

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Wednesday, October 12, 2005 9:51 PM
To: Greg Stark
Cc: Scott Marlowe; Stephan Szabo; Rick Schumeyer;
pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

Greg Stark <gsstark@mit.edu> writes:
> I think most MySQL users don't stumble on it, they learn it as the way
> to handle the common use case when you join a master table against a
> detail table and then want to aggregate all the detail records. In
> standard SQL you have to write GROUP BY ... and list every single
> column you need from the master table. Forcing the database to do a
> lot of redundant comparisons and sort on uselessly long keys where in
> fact you only really need it to sort and group by the primary key.

Actually, if you're grouping by a table's primary key, the SQL99 spec
says you shouldn't have to explicitly list the other columns from that
table --- they are guaranteed to have unique values per group anyway.
This is a single case in the "functional dependency" stuff.  That
verbiage is incredibly dense and I don't think we want to tackle all of
it any time soon, but the primary-key case probably wouldn't be very
hard to implement.  We really ought to have this in TODO ... I'm sure
it's been discussed before.
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to      choose an index scan if your joining column's
datatypesdo not      match 


Re: pg, mysql comparison with "group by" clause

От
Greg Stark
Дата:
"Anthony Molinaro" <amolinaro@wgen.net> writes:

> By changing the values in the select/group by you are changing 
> Group! How can you arbitrarily add or exclude a column?
> You can't do it.

Go back and reread the previous posts again. You missed the whole point.

-- 
greg



Re: pg, mysql comparison with "group by" clause

От
"Anthony Molinaro"
Дата:
Greg, You'll have to pardon me...
I saw this comment:

"I don't see why you think people stumble on this by accident.
I think it's actually an extremely common need."

Which, if referring to the ability to have items in the select that do
not
need to be included in the group, (excluding constants and the like)
is just silly.

OTOH, if you're all poking fun at a mysql bug that they try to pass
off as a feature, then yes, I'm a clod and I missed that the first time
around :)

Regards, Anthony

-----Original Message-----
From: gsstark@mit.edu [mailto:gsstark@mit.edu]
Sent: Wednesday, October 12, 2005 11:43 PM
To: Anthony Molinaro
Cc: Tom Lane; Greg Stark; Scott Marlowe; Stephan Szabo; Rick Schumeyer;
pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

"Anthony Molinaro" <amolinaro@wgen.net> writes:

> By changing the values in the select/group by you are changing
> Group! How can you arbitrarily add or exclude a column?
> You can't do it.

Go back and reread the previous posts again. You missed the whole point.

--
greg



Re: pg, mysql comparison with "group by" clause

От
Greg Stark
Дата:
"Anthony Molinaro" <amolinaro@wgen.net> writes:

> Greg,
>   You'll have to pardon me...
>  
> I saw this comment:
> 
> "I don't see why you think people stumble on this by accident. 
> I think it's actually an extremely common need."
> 
> Which, if referring to the ability to have items in the select that do not
> need to be included in the group, (excluding constants and the like) is just
> silly.

Well the "constants and the like" are precisely the point. There are plenty of
cases where adding the column to the GROUP BY is unnecessary and since
Postgres makes no attempt to prune them out, inefficient. And constants aren't
the only such case. The most common case is columns that are coming from a
table where the primary key is already included in the GROUP BY list.

In the case of columns coming from a table where the primary key is already in
the GROUP BY list it's possible for the database to deduce that it's
unnecessary to group on that column. 

But it's also possible to have cases where the programmer has out of band
knowledge that it's unnecessary but the database doesn't have that knowledge.
The most obvious case that comes to mind is a denormalized data model that
includes a redundant column.
 select dept_id, dept_name, count(*) from employee_list

For example if dept_name is guaranteed to be the same for all records with the
same dept_id. Of course that's generally considered poor design but it doesn't
mean there aren't thousands of databases out there with data models like that.

-- 
greg



Re: pg, mysql comparison with "group by" clause

От
"Anthony Molinaro"
Дата:
Greg,

Ok, I think I see where you're going (I don't agree, but I think
I get you now).

So, using your example of:
"dept_name is guaranteed to be the same for all records with the
same dept_id."

Here:

select d.deptno,d.dname  from emp  e, dept d where e.deptno=d.deptno

DEPTNO DNAME
------ --------------   10 ACCOUNTING   10 ACCOUNTING   10 ACCOUNTING   20 RESEARCH   20 RESEARCH   20 RESEARCH   20
RESEARCH  20 RESEARCH   30 SALES   30 SALES   30 SALES   30 SALES   30 SALES   30 SALES 

ok, so there's your scenario.

And you're suggesting that one should be able to
Do the following query?

select d.deptno,d.dname,count(*)  from emp  e, dept d where e.deptno=d.deptno


if that's what you suggest, then we'll just have to agree to disagree.

That query needs a group by. What you're suggesting is, imo,
a wholly unnecessary shortcut (almost as bad as that ridiculous "natural
join" - whoever came up with that should be tarred and feathered).

I think I see your point now, I just disagree.
Your depending on syntax to work based on data integrity?
Hmmm.... don't think I like that idea
What performance improvement do you get from leaving group by out?
Look at the query above, doesn't a count of distinct deptno,dname pairs
have
to take place anyway? What do you save by excluding group by?
Are you suggesting COUNT be computed for each row (windowed) or that
COUNT is computed for each group?

If you want repeating rows, then you want windowing.
For example:

select d.deptno,d.dname,count(*)over(partition by d.deptno,d.dname) cnt  from emp  e, dept d where e.deptno=d.deptno

DEPTNO DNAME          CNT
------ -------------- ---   10 ACCOUNTING       3   10 ACCOUNTING       3   10 ACCOUNTING       3   20 RESEARCH
5  20 RESEARCH         5   20 RESEARCH         5   20 RESEARCH         5   20 RESEARCH         5   30 SALES
6  30 SALES            6   30 SALES            6   30 SALES            6   30 SALES            6   30 SALES
6



if you want "groups", then use group by:

select d.deptno,d.dname,count(*) cnt  from emp  e, dept dwhere e.deptno=d.deptno group by d.deptno,d.dname

DEPTNO DNAME          CNT
------ -------------- ---   10 ACCOUNTING       3   20 RESEARCH         5   30 SALES            6


what your suggesting doesn't seem to fit in at all,
particularly when pg implements window functions.

If you're suggesting the pg optimizer isn't doing the right thing
with group by queries, then this is an optimizer issue and
that should be hacked, not group by. If you're suggesting certain
rows be ditched or shortcuts be taken, then the optimizer should do
that, not the programmer writing sql.

Db2 and oracle have no problem doing these queries, I don't see
why pg should have a problem.

imo, the only items that should not be listed in the group by
are:

1. constants and deterministic functions
2. scalar subqueries
3. window functions

1 - because the value is same for each row
2&3 - because they are evaluated after the grouping takes place

regards, Anthony

-----Original Message-----
From: gsstark@mit.edu [mailto:gsstark@mit.edu]
Sent: Thursday, October 13, 2005 12:25 AM
To: Anthony Molinaro
Cc: gsstark@mit.edu; Tom Lane; Scott Marlowe; Stephan Szabo; Rick
Schumeyer; pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

"Anthony Molinaro" <amolinaro@wgen.net> writes:

> Greg,
>   You'll have to pardon me...
>
> I saw this comment:
>
> "I don't see why you think people stumble on this by accident.
> I think it's actually an extremely common need."
>
> Which, if referring to the ability to have items in the select that do
not
> need to be included in the group, (excluding constants and the like)
is just
> silly.

Well the "constants and the like" are precisely the point. There are
plenty of
cases where adding the column to the GROUP BY is unnecessary and since
Postgres makes no attempt to prune them out, inefficient. And constants
aren't
the only such case. The most common case is columns that are coming from
a
table where the primary key is already included in the GROUP BY list.

In the case of columns coming from a table where the primary key is
already in
the GROUP BY list it's possible for the database to deduce that it's
unnecessary to group on that column.

But it's also possible to have cases where the programmer has out of
band
knowledge that it's unnecessary but the database doesn't have that
knowledge.
The most obvious case that comes to mind is a denormalized data model
that
includes a redundant column.
 select dept_id, dept_name, count(*) from employee_list

For example if dept_name is guaranteed to be the same for all records
with the
same dept_id. Of course that's generally considered poor design but it
doesn't
mean there aren't thousands of databases out there with data models like
that.

--
greg



Re: pg, mysql comparison with "group by" clause

От
Andrew Sullivan
Дата:
On Thu, Oct 13, 2005 at 12:24:55AM -0400, Greg Stark wrote:

> Well the "constants and the like" are precisely the point. There
> are plenty of cases where adding the column to the GROUP BY is
> unnecessary and since Postgres makes no attempt to prune them out,
> inefficient.

But inefficient pruning is an optimiser problem, not something that
should lead one to invent a whole syntax change that (a) violates the
standard and (b) is at least somewhat opaque in meaning.  The right
thing to do is surely to make the optimiser smarter, no?  (Think,
"What does DB2 have that we don't?")

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.     --Dennis Ritchie


Re: pg, mysql comparison with "group by" clause

От
"Anthony Molinaro"
Дата:
Greg,
"
You would prefer:

select user_id,       any(username) as username, any(firstname) as firstname,       any(lastname) as lastname,
any(address)as address,      any(city) as city, any(street) as street, any(phone) as phone,      any(last_update) as
last_update,any(last_login) as last_login,      any(referrer_id) as referrer_id, any(register_date) as 
register_date,      ...      sum(money) as balance,      count(money) as num_txns from user join user_money using
(user_id)group by user_id 
"

yes, that's right!

Guess what? It's been that way for years. Why change it now?

You're arguing something that works perfectly
and has been understood for years.
Changing the syntax cuz pg doesn't optimize it the way you like is
ridiculous.

Perhaps this change would make the newbies happy but I cant imagine
an experienced developer asking for this, let alone argue for it.

> I'm pretty unsympathetic to the "we should make a language less
powerful
> and more awkward because someone might use it wrong" argument.

More awkward? What *you're* suggesting is more awkward. You realize that
right?
How can syntax that is understood and accepted for years be more
awkward?

Again, you're asking for changes that no one but a newbie would ask
for....

I'm not at all suggesting you are/aren't a newbie
(so don't take offense to this :),
all I'm saying is that for experienced developers,
we'd hope that the source code developers for pg/oracle/db2 etc
are focusing on more important things, not rewriting things that already
work because something doesn't wanna type out column names...

regards, Anthony

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Greg Stark
Sent: Thursday, October 13, 2005 2:27 PM
To: Scott Marlowe
Cc: Greg Stark; Stephan Szabo; Rick Schumeyer; pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause


Scott Marlowe <smarlowe@g2switchworks.com> writes:

> Sorry, but it's worse than that.  It is quite possible that two people
> could run this query at the same time and get different data from the
> same set and the same point in time.  That shouldn't happen
accidentally
> in SQL, you should know it's coming.

I'm pretty unsympathetic to the "we should make a language less powerful
and
more awkward because someone might use it wrong" argument.

> > In standard SQL you have to
> > write GROUP BY ... and list every single column you need from the
master
> > table. Forcing the database to do a lot of redundant comparisons and
sort on
> > uselessly long keys where in fact you only really need it to sort
and group by
> > the primary key.
>
> But again, you're getting whatever row the database feels like giving
> you.  A use of a simple, stupid aggregate like an any() aggregate
would
> be fine here, and wouldn't require a lot of overhead, and would meet
the
> SQL spec.

Great, so I have a user table with, oh, say, 40 columns. And I want to
return
all those columns plus their current account balance in a single query.

The syntax under discussion would be:

select user.*, sum(money) from user join user_money using (user_id)
group by user_id

You would prefer:

select user_id,       any(username) as username, any(firstname) as firstname,       any(lastname) as lastname,
any(address)as address,      any(city) as city, any(street) as street, any(phone) as phone,      any(last_update) as
last_update,any(last_login) as last_login,      any(referrer_id) as referrer_id, any(register_date) as 
register_date,      ...      sum(money) as balance,      count(money) as num_txns from user join user_money using
(user_id)group by user_id 


Having a safeties is fine but when I have to disengage the safety for
every
single column it starts to get more than a little annoying.

Note that you cannot write the above as a subquery since there are two
aggregates. You could write it as a join against a view but don't expect
to
get the same plans from Postgres for that.


> Actually, for things like aggregates, I've often been able to improve
> performance with sub selects in PostgreSQL.

If your experience is like mine it's a case of two wrongs cancelling
each
other out. The optimizer underestimates the efficiency of nested loops
which
is another problem. Since subqueries' only eligible plan is basically a
nested
loop it often turns out to be faster than the more exotic plans a join
can
reach.

In an ideal world subqueries would be transformed into the equivalent
join (or
some more general join structure that can cover both sets of semantics)
and
then planned through the same code path. In an ideal world the user
should be
guaranteed that equivalent queries would always result in the same plan
regardless of how they're written.

--
greg


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
              http://www.postgresql.org/docs/faq


Re: pg, mysql comparison with "group by" clause

От
"Anthony Molinaro"
Дата:
Well...

An additional gripe is that this isn't a good feature (standard or not).
Oracle doesn't do it. Db2 doesn't do it. I strongly suggest you guys
don't
do it.

If you wanna do the optimizations under the covers, cool, but I can't
imagine how this would be useful other than for saving some typing...

Seems more trouble than it's worth and changes a concept that's tried
and true for many years.

Regards, Anthony

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Thursday, October 13, 2005 2:50 PM
To: Scott Marlowe
Cc: Greg Stark; Stephan Szabo; Rick Schumeyer; pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

>>>> In standard SQL you have to
>>>> write GROUP BY ... and list every single column you need from the
master
>>>> table.

This thread seems to have gone off on a tangent that depends on the
assumption that the above is a correct statement.  It's not.  It *was*
true, in SQL92, but SQL99 lets you omit unnecessary GROUP BY columns.

The gripe against mysql, I think, is that they don't enforce the
conditions that guarantee the query will give a unique result.

The gripe against postgres is that we haven't implemented the SQL99
semantics yet.
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
              http://archives.postgresql.org


Re: pg, mysql comparison with "group by" clause

От
Tom Lane
Дата:
"Anthony Molinaro" <amolinaro@wgen.net> writes:
> An additional gripe is that this isn't a good feature (standard or not).
> Oracle doesn't do it. Db2 doesn't do it.

You sure about that?  It's hard to believe that the SQL committee would
put a feature into the spec that neither Oracle nor IBM intended to
implement.  Those two pretty much control the committee after all ...
        regards, tom lane


Re: pg, mysql comparison with "group by" clause

От
"Anthony Molinaro"
Дата:
Tom,
 I'm sure there's all sorts of cool optimizations under the covers  to perhaps maybe to this short circuiting,   but as
thesql goes, yeah, I'm sure. 

Here's an example on oracle 10g release 2 (copy paste from my screen so
you can see the error messages and all):



SQL> create table foo(id number primary key, name varchar2(10));

Table created.

SQL> insert into foo values (1,'sam');

1 row created.

SQL> insert into foo values (2,'sam');

1 row created.

SQL> commit;

Commit complete.

SQL> select id, count(*) from foo;
select id, count(*) from foo      *
ERROR at line 1:
ORA-00937: not a single-group group function


SQL> select name, count(*) from foo;
select name, count(*) from foo      *
ERROR at line 1:
ORA-00937: not a single-group group function


SQL> select name, count(*) from foo group by id;
select name, count(*) from foo group by id      *
ERROR at line 1:
ORA-00979: not a GROUP BY expression


SQL> select name, count(*) from foo group by name;

NAME         COUNT(*)
---------- ----------
sam                 2

SQL> select name, count(*) from foo group by name,id;

NAME         COUNT(*)
---------- ----------
sam                 1
sam                 1

SQL>


I can't imagine Oracle making a change such as the one we're discussing
at this point. Perhaps in 8.1.6, ~7 years ago, when *tons* of sql
changes were implemented (analytics, CASE, ordering in inline views,
CUBE, ROLLUP),
but not now...

then again, oracle is 100% completely driven by money, so, if enough
customers ask for it, it will happen eventually. I just can't imagine
anyone asking for this feature when we're paying 40k per cpu just to
run oracle; there are much more important things for them to be workin
on...


Regards, Anthony

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, October 13, 2005 3:17 PM
To: Anthony Molinaro
Cc: Scott Marlowe; Greg Stark; Stephan Szabo; Rick Schumeyer;
pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

"Anthony Molinaro" <amolinaro@wgen.net> writes:
> An additional gripe is that this isn't a good feature (standard or
not).
> Oracle doesn't do it. Db2 doesn't do it.

You sure about that?  It's hard to believe that the SQL committee would
put a feature into the spec that neither Oracle nor IBM intended to
implement.  Those two pretty much control the committee after all ...
        regards, tom lane


Re: pg, mysql comparison with "group by" clause

От
Greg Stark
Дата:
"Anthony Molinaro" <amolinaro@wgen.net> writes:

> More awkward? What *you're* suggesting is more awkward. You realize that
> right? How can syntax that is understood and accepted for years be more
> awkward?

Well gosh, I would say that that's something only a newbie could say about
SQL of all things...

I had a whole thing written after that but I just deleted it. I grow tired of
this thread.

I am pretty happy to hear that the SQL standard endorsed the idea having the
right thing happen if the primary key is present in the grouping list. That
would be a wonderful feature for Postgres.

-- 
greg



Re: pg, mysql comparison with "group by" clause

От
"Anthony Molinaro"
Дата:
Sheesh...

I work with 2 MIT guys, and man, you guys will never admit you're wrong.
Must be something in the water down there ;)

Hey man, you know what, to each his own
(but Apostol is one of my favorites, so maybe have that in common? :)

You apparently like this shortcut, so be it.

I'll say this tho, Oracle and db2 don't do it even tho
it's in the standard and their optimizer is already doing
the right thing. That's gotta tell ya something, no?

don't look for this feature to be something you can do everywhere.
If for some reason postgres implements it, it will be the only
vendor to do so (though, this seems like a very MySQL-ish thing to do
so maybe not just Postgres)

take care, Anthony

-----Original Message-----
From: gsstark@mit.edu [mailto:gsstark@mit.edu]
Sent: Friday, October 14, 2005 3:21 AM
To: Anthony Molinaro
Cc: Greg Stark; Scott Marlowe; Stephan Szabo; Rick Schumeyer;
pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause


"Anthony Molinaro" <amolinaro@wgen.net> writes:

> More awkward? What *you're* suggesting is more awkward. You realize
that
> right? How can syntax that is understood and accepted for years be
more
> awkward?

Well gosh, I would say that that's something only a newbie could say
about
SQL of all things...

I had a whole thing written after that but I just deleted it. I grow
tired of
this thread.

I am pretty happy to hear that the SQL standard endorsed the idea having
the
right thing happen if the primary key is present in the grouping list.
That
would be a wonderful feature for Postgres.

--
greg



Re: pg, mysql comparison with "group by" clause

От
Chris Travers
Дата:
Greg Stark wrote:

>"Anthony Molinaro" <amolinaro@wgen.net> writes:
>
>  
>
>>Greg,
>>  You'll have to pardon me...
>> 
>>I saw this comment:
>>
>>"I don't see why you think people stumble on this by accident. 
>>I think it's actually an extremely common need."
>>
>>Which, if referring to the ability to have items in the select that do not
>>need to be included in the group, (excluding constants and the like) is just
>>silly.
>>    
>>
>
>Well the "constants and the like" are precisely the point. There are plenty of
>cases where adding the column to the GROUP BY is unnecessary and since
>Postgres makes no attempt to prune them out, inefficient. And constants aren't
>the only such case. The most common case is columns that are coming from a
>table where the primary key is already included in the GROUP BY list.
>  
>
I sort of see what you are saying.... but you have yet to convince me....

>In the case of columns coming from a table where the primary key is already in
>the GROUP BY list it's possible for the database to deduce that it's
>unnecessary to group on that column. 
>  
>
Well....  The question is really whether two things should be true:
1)  whether you want to assume that the programmer is going to know 
about Single/Multi Value Dependency issues per column.  IMO, this is 
more of a DB design issue than a client app issue.  And I would *not* 
want to make that assumption because for higher normal forms where this 
is likely to be a consideration, you are likely to have denormalized 
access via VIEWs anyway.

2)  Whether you are willing to rely on looking at the data first to 
determine whether the query is valid

Alternatively we are back to the ability to get the wrong answer with 
ease and in very difficult to debug ways.  I suspect that MySQL places 
an implicit MIN() around columns not included in the group by 
statement.  I fail to see why this is not an appropriate answer to his 
concern.

>But it's also possible to have cases where the programmer has out of band
>knowledge that it's unnecessary but the database doesn't have that knowledge.
>The most obvious case that comes to mind is a denormalized data model that
>includes a redundant column.
>
>  select dept_id, dept_name, count(*) from employee_list
>  
>
Ok.  You have a few choices:
SELECT MIN(dept_id), dept_name, count(*) FROM employee_list GROUP BY 
dept_name;
SELECT dept_id, dept_name, count(*) FROM employee_list GROUP BY 
dept_name, dept_id;
SELECT dept_id, MIN(dept_name), count(*) FROM employee_list GROUP BY 
dept_id;

And yes, it is bad design in every case I can think of.//
Why is this a problem?

Best Wishes,
Chris Travers
Metatron Technology Consulting