Re: Group by more efficient than distinct?

Поиск

Список

Период

Сортировка

От	Mark Mielke
Тема	Re: Group by more efficient than distinct?
Дата	22 апреля 2008 г. 10:04:35
Msg-id	480DE25E.4080507@mark.mielke.cc обсуждение
Ответ на	Re: Group by more efficient than distinct? (Matthew Wakeling <matthew@flymine.org>)
Список	pgsql-performance

Дерево обсуждения

Matthew Wakeling wrote:
> On Tue, 22 Apr 2008, Mark Mielke wrote:
>> The poster I responded to said that the memory required for a hash
>> join was relative to the number of distinct values, not the number of
>> rows. They gave an example of millions of rows, but only a few
>> distinct values. Above, you agree with me that it it would include
>> the rows (or at least references to the rows) as well. If it stores
>> rows, or references to rows, then memory *is* relative to the number
>> of rows, and millions of records would require millions of rows (or
>> row references).
>
> Yeah, I think we're talking at cross-purposes, due to hash tables
> being used in two completely different places in Postgres. Firstly,
> you have hash joins, where Postgres loads the references to the actual
> rows, and puts those in the hash table. For that situation, you want a
> small number of rows. Secondly, you have hash aggregates, where
> Postgres stores an entry for each "group" in the hash table, and does
> not store the actual rows. For that situation, you can have a
> bazillion individual rows, but only a small number of distinct groups.

That makes sense with my reality. :-)

Thanks,
mark

--
Mark Mielke <mark@mielke.cc>

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Group by more efficient than distinct?