Re: Group by more efficient than distinct?

Поиск

Список

Период

Сортировка

От	PFC
Тема	Re: Group by more efficient than distinct?
Дата	18 апреля 2008 г. 10:30:40
Msg-id	op.t9sycqo6cigqcu@apollo13.peufeu.com обсуждение исходный текст
Ответ на	Re: Group by more efficient than distinct? (Gregory Stark <stark@enterprisedb.com>)
Список	pgsql-performance

Дерево обсуждения

On Fri, 18 Apr 2008 11:36:02 +0200, Gregory Stark <stark@enterprisedb.com>
wrote:

> "Francisco Reyes" <lists@stringsutils.com> writes:
>
>> Is there any dissadvantage of using "group by" to obtain a unique list?
>>
>> On a small dataset the difference was about 20% percent.
>>
>> Group by
>> HashAggregate  (cost=369.61..381.12 rows=1151 width=8) (actual
>> time=76.641..85.167 rows=2890 loops=1)

    Basically :

    - If you process up to some percentage of your RAM worth of data, hashing
is going to be a lot faster
    - If the size of the hash grows larger than your RAM, hashing will fail
miserably and sorting will be much faster since PG's disksort is really
good
    - GROUP BY knows this and acts accordingly
    - DISTINCT doesn't know this, it only knows sorting, so it sorts
    - If you need DISTINCT x ORDER BY x, sorting may be faster too (depending
on the % of distinct rows)
    - If you need DISTINCT ON, well, you're stuck with the Sort
    - So, for the time being, you can replace DISTINCT with GROUP BY...

В списке pgsql-performance по дате отправления:

Предыдущее

От: Gregory Stark
Дата: 18 апреля 2008 г., 09:36:27
Сообщение: Re: Group by more efficient than distinct?

Следующее

От: Matthew
Дата: 18 апреля 2008 г., 11:01:27
Сообщение: Re: Strange behavior: pgbench and new Linux kernels

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Group by more efficient than distinct?

Предыдущее

Следующее