Re: Group by more efficient than distinct?

Поиск
Список
Период
Сортировка
От Mark Mielke
Тема Re: Group by more efficient than distinct?
Дата
Msg-id 480DE25E.4080507@mark.mielke.cc
обсуждение исходный текст
Ответ на Re: Group by more efficient than distinct?  (Matthew Wakeling <matthew@flymine.org>)
Список pgsql-performance
Matthew Wakeling wrote:
> On Tue, 22 Apr 2008, Mark Mielke wrote:
>> The poster I responded to said that the memory required for a hash
>> join was relative to the number of distinct values, not the number of
>> rows. They gave an example of millions of rows, but only a few
>> distinct values. Above, you agree with me that it it would include
>> the rows (or at least references to the rows) as well. If it stores
>> rows, or references to rows, then memory *is* relative to the number
>> of rows, and millions of records would require millions of rows (or
>> row references).
>
> Yeah, I think we're talking at cross-purposes, due to hash tables
> being used in two completely different places in Postgres. Firstly,
> you have hash joins, where Postgres loads the references to the actual
> rows, and puts those in the hash table. For that situation, you want a
> small number of rows. Secondly, you have hash aggregates, where
> Postgres stores an entry for each "group" in the hash table, and does
> not store the actual rows. For that situation, you can have a
> bazillion individual rows, but only a small number of distinct groups.

That makes sense with my reality. :-)

Thanks,
mark

--
Mark Mielke <mark@mielke.cc>


В списке pgsql-performance по дате отправления:

Предыдущее
От: Matthew Wakeling
Дата:
Сообщение: Re: Group by more efficient than distinct?
Следующее
От: Thomas Spreng
Дата:
Сообщение: Re: Oddly slow queries