Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id CAH2-Wznidojad-zbObnFOzDA5RTCS0JLsqcZkDNu+ou1NGYQYQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Sat, Jul 25, 2020 at 4:56 PM Jeff Davis <pgsql@j-davis.com> wrote:
> I wrote a quick patch to use HyperLogLog to estimate the number of
> groups contained in a spill file. It seems to reduce the
> overpartitioning effect, and is a more principled approach than what I
> was doing before.

This pretty much fixes the issue that I observed with overparitioning.
At least in the sense that the number of partitions grows more
predictably -- even when the number of partitions planned is reduced
the change in the number of batches seems smooth-ish. It "looks nice".

> It does seem to hurt the runtime slightly when spilling to disk in some
> cases. I haven't narrowed down whether this is because we end up
> recursing multiple times, or if it's just more efficient to
> overpartition, or if the cost of doing the HLL itself is significant.

I'm glad that this better principled approach is possible. It's hard
to judge how much of a problem this really is, though. We'll need to
think about this aspect some more.

Thanks
-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk