Re: [PATCH] Lazy hashaggregate when no aggregation is needed

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [PATCH] Lazy hashaggregate when no aggregation is needed
Дата
Msg-id CA+TgmoamkSz1wyfU1CF8URChNJDCXTFrO23p9tsdCWiB5i2CYw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Lazy hashaggregate when no aggregation is needed  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Ответы Re: [PATCH] Lazy hashaggregate when no aggregation is needed  (Ants Aasma <ants@cybertec.at>)
Список pgsql-hackers
On Fri, Jun 15, 2012 at 6:55 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>>>> A user complained on pgsql-performance that SELECT col FROM table
>>>> GROUP BY col LIMIT 2; performs a full table scan. ISTM that it's safe
>>>> to return tuples from hash-aggregate as they are found when no
>>>> aggregate functions are in use. Attached is a first shot at that.
>>>
>>>
>>> As I commented in the other thread, the user would be a lot better off
>>> if he'd had an index on the column in question. I'm not sure it's worth
>>> complicating the hashagg logic when an indexscan + groupagg would
>>> address the case better.
>>
>> Would this patch help in the case where "table" is actually a
>> set-returning function, and thus can't have an index?
>
> ISTM that in many cases, the result size of a set-returning function is not
> so large compared with that of a full plain table scan.  So, in such a case
> a full hash aggregation is not so time consuming.  Am I wrong?

This query is a little unusual in that it involves both an aggregate
and a limit.

Now, sorted aggregates work pretty well with limit, because you can be
sure upon seeing the beginning of the next group that you are done
with the previous group.  But in a hash aggregate, you normally can't
start returning results until you've read the entire input, so it
doesn't work so well with limit.

However, as Ants points out, we could make it work better for the
special case where we're not actually doing any aggregation, because
in that case we can emit the row for each group when the group is
created, rather than waiting until end-of-input.  This is only going
to help when there is a LIMIT, though.  Moreover, if there happens to
be an ORDER BY, then the data will have to be pre-sorted, in which
case you may as well use a sorted aggregate.  So the use case for this
optimization is basically DISTINCT plus LIMIT but not ORDER BY.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Honza Horak
Дата:
Сообщение: Re: Ability to listen on two unix sockets
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [COMMITTERS] pgsql: New SQL functons pg_backup_in_progress() and pg_backup_start_tim