Обсуждение: Why does "group by" need to match select fields?

Поиск
Список
Период
Сортировка

Why does "group by" need to match select fields?

От
Omar Eljumaily
Дата:
Sorry if this isn't exactly postgresql specific.  I periodically run
into this problem, and I'm running into it now.  I'm wondering if
there's something about "group by" that I don't understand.  As an
example what I'd want to do is return the "id" value for the check to
each payee that has the highest amount.  It seems like there's no
problem with ambiguity in logic, but postgresql + other sql servers balk
at it.  The group by fields need to explicitly match the select fields
with the exception of the aggregate function(s?).

create table checks
{
    id serial,
    payee text,
    amount double
};

select max(amount), payee, id from checks group by payee;

Why won't the above work?  Is there another way to get the id for the
record with the highest amount for each payee?

Thanks.


Re: Why does "group by" need to match select fields?

От
Bill Moran
Дата:
Omar Eljumaily <omar2@omnicode.com> wrote:
>
> Sorry if this isn't exactly postgresql specific.  I periodically run
> into this problem, and I'm running into it now.  I'm wondering if
> there's something about "group by" that I don't understand.  As an
> example what I'd want to do is return the "id" value for the check to
> each payee that has the highest amount.  It seems like there's no
> problem with ambiguity in logic, but postgresql + other sql servers balk
> at it.  The group by fields need to explicitly match the select fields
> with the exception of the aggregate function(s?).
>
> create table checks
> {
>     id serial,
>     payee text,
>     amount double
> };
>
> select max(amount), payee, id from checks group by payee;
>
> Why won't the above work?  Is there another way to get the id for the
> record with the highest amount for each payee?

Because it's ambiguous.  If you're grabbing max() for amount, which
id tuple do you want?

Perhaps the way you're storing your data, those answers aren't ambiguous,
but the database doesn't know that.  Take this query as an example:

select max(amount), max(checknumber), payee from checks group by payee;

In that case, the highest checknumber and the highest check amount
probably won't come from the same tuple.  If you were to throw in
there:

select max(amount), max(checknumber), payee, id from checks group by payee;

Which id does it give you?  The one that matches max(amount) or the one
that matches max(checknumber)?

--
Bill Moran
Collaborative Fusion Inc.

Re: Why does "group by" need to match select fields?

От
Omar Eljumaily
Дата:
OK, I see what's going on.  I can have more than one max(amount) with
the same amount and payee.  Thanks so much.  Like I said, it's sort of
dogged me off and on many times.

Thanks.


Bill Moran wrote:
> Omar Eljumaily <omar2@omnicode.com> wrote:
>
>> Sorry if this isn't exactly postgresql specific.  I periodically run
>> into this problem, and I'm running into it now.  I'm wondering if
>> there's something about "group by" that I don't understand.  As an
>> example what I'd want to do is return the "id" value for the check to
>> each payee that has the highest amount.  It seems like there's no
>> problem with ambiguity in logic, but postgresql + other sql servers balk
>> at it.  The group by fields need to explicitly match the select fields
>> with the exception of the aggregate function(s?).
>>
>> create table checks
>> {
>>     id serial,
>>     payee text,
>>     amount double
>> };
>>
>> select max(amount), payee, id from checks group by payee;
>>
>> Why won't the above work?  Is there another way to get the id for the
>> record with the highest amount for each payee?
>>
>
> Because it's ambiguous.  If you're grabbing max() for amount, which
> id tuple do you want?
>
> Perhaps the way you're storing your data, those answers aren't ambiguous,
> but the database doesn't know that.  Take this query as an example:
>
> select max(amount), max(checknumber), payee from checks group by payee;
>
> In that case, the highest checknumber and the highest check amount
> probably won't come from the same tuple.  If you were to throw in
> there:
>
> select max(amount), max(checknumber), payee, id from checks group by payee;
>
> Which id does it give you?  The one that matches max(amount) or the one
> that matches max(checknumber)?
>
>


Re: Why does "group by" need to match select fields?

От
Erik Jones
Дата:
On Feb 28, 2007, at 6:57 PM, Omar Eljumaily wrote:

OK, I see what's going on.  I can have more than one max(amount) with the same amount and payee.  Thanks so much.  Like I said, it's sort of dogged me off and on many times.

Thanks.


Bill Moran wrote:
Omar Eljumaily <omar2@omnicode.com> wrote:

  

Sorry if this isn't exactly postgresql specific.  I periodically run into this problem, and I'm running into it now.  I'm wondering if there's something about "group by" that I don't understand.  As an example what I'd want to do is return the "id" value for the check to each payee that has the highest amount.  It seems like there's no problem with ambiguity in logic, but postgresql + other sql servers balk at it.  The group by fields need to explicitly match the select fields with the exception of the aggregate function(s?).

create table checks
{
    id serial,
    payee text,
    amount double
};

select max(amount), payee, id from checks group by payee;

Why won't the above work?  Is there another way to get the id for the record with the highest amount for each payee?

    


Because it's ambiguous.  If you're grabbing max() for amount, which
id tuple do you want?

Perhaps the way you're storing your data, those answers aren't ambiguous,
but the database doesn't know that.  Take this query as an example:

select max(amount), max(checknumber), payee from checks group by payee;

In that case, the highest checknumber and the highest check amount
probably won't come from the same tuple.  If you were to throw in
there:

select max(amount), max(checknumber), payee, id from checks group by payee;

Which id does it give you?  The one that matches max(amount) or the one
that matches max(checknumber)?

Omar, note that in many cases you can do:

select c.amount, c.payee, c.id
from checks c
where c.amount = (select max(amount)
                                   from checks
   where payee=c.payee);

erik jones <erik@myemma.com>
sofware developer
615-296-0838
emma(r)



Re: Why does "group by" need to match select fields?

От
Martijn van Oosterhout
Дата:
On Wed, Feb 28, 2007 at 04:57:08PM -0800, Omar Eljumaily wrote:
> OK, I see what's going on.  I can have more than one max(amount) with
> the same amount and payee.  Thanks so much.  Like I said, it's sort of
> dogged me off and on many times.

Note that in the special case of max/min, you can use things like ORDER
BY/LIMIT to acheive the effect you want. And DISTINCT ON () is also
useful in such situations.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: Why does "group by" need to match select fields?

От
Martijn van Oosterhout
Дата:
On Wed, Feb 28, 2007 at 04:57:08PM -0800, Omar Eljumaily wrote:
> OK, I see what's going on.  I can have more than one max(amount) with
> the same amount and payee.  Thanks so much.  Like I said, it's sort of
> dogged me off and on many times.

Note that in the case of min/max you can use ORDER BY/LIMIT or DISTINCT
ON () to acheive the effect you want.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: Why does "group by" need to match select fields?

От
Bruno Wolff III
Дата:
On Wed, Feb 28, 2007 at 16:19:02 -0800,
  Omar Eljumaily <omar2@omnicode.com> wrote:
> select max(amount), payee, id from checks group by payee;
>
> Why won't the above work?  Is there another way to get the id for the
> record with the highest amount for each payee?

While the DISTINCT ON approach is probably best if you can live with a
Postgres specific solution, the general way to do this is use the group by
query to get a set of primary keys with aggregates and then you join this
back to the original table to get the other data.

Some databases will also recognize that you are grouping by a candidate
key and allow you to specify normal columns since they must all have the
same value for rows with the same candidate key value. Unfortunately Postgres
doesn't do that now.