Обсуждение: DISTINCT ON without ORDER BY

Поиск
Список
Период
Сортировка

DISTINCT ON without ORDER BY

От
Martijn van Oosterhout
Дата:
Hi,

I was going through the queries of an SQL application and came across
queries like:

SELECT * FROM foo
WHERE id in (SELECT max(id) FROM foo GROUP BY bar);

I thought, here's a case where this could be better written using
DISTINCT ON, since then you avoid the self-join:

SELECT DISTINCT ON (bar) * FROM
ORDER BY bar, id DESC;

However, this was slower because the original query could use a hash
aggregate whereas the new query needed to do a sort. The way DISTINCT
ON is defined it requires an ORDER BY whereas semantically the ordering
on the first attribute is just a by product of the old implementation.

Is there a way to acheive the above result without a sort and without a
self-join?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: DISTINCT ON without ORDER BY

От
Gregory Stark
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:

> SELECT * FROM foo
> WHERE id in (SELECT max(id) FROM foo GROUP BY bar);
>
> Is there a way to acheive the above result without a sort and without a
> self-join?

Something like

SELECT bar, (magic_agg_func(foo)).* FROM foo GROUP BY bar

where you define an aggregate function magic_agg_func to remember the whole
record for the largest value of id. Something like:

postgres=# create function magic_transition(a,a) returns a as 'select case when $1.aid > $2.aid then $1 else $2 end'
languagesql; 
postgres=# create aggregate magic (a) (sfunc = magic_transition, stype = a);

Not sure it'll be faster though.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: DISTINCT ON without ORDER BY

От
Jasen Betts
Дата:
On 2009-04-19, Martijn van Oosterhout <kleptog@svana.org> wrote:
> Hi,
>
> I was going through the queries of an SQL application and came across
> queries like:
>
> SELECT * FROM foo
> WHERE id in (SELECT max(id) FROM foo GROUP BY bar);
>
> I thought, here's a case where this could be better written using
> DISTINCT ON, since then you avoid the self-join:
>
> SELECT DISTINCT ON (bar) * FROM
> ORDER BY bar, id DESC;
>
> However, this was slower because the original query could use a hash
> aggregate whereas the new query needed to do a sort. The way DISTINCT
> ON is defined it requires an ORDER BY whereas semantically the ordering
> on the first attribute is just a by product of the old implementation.
>
> Is there a way to acheive the above result without a sort and without a
> self-join?

anyway you could possibly write an agregate function that returns a
copy of the row with the highest id?



Re: DISTINCT ON without ORDER BY

От
Martijn van Oosterhout
Дата:
On Tue, Apr 21, 2009 at 12:11:26PM +0000, Jasen Betts wrote:
> > Is there a way to acheive the above result without a sort and without a
> > self-join?
>
> anyway you could possibly write an agregate function that returns a
> copy of the row with the highest id?

Put that way it sounds like something for a window function. Not sure
if they can use a HashAggrgate though.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Вложения