Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id d5af7930265b8c4bbda5381364d7e21955597538.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote:
> What worries me a bit is the sharp discontinuities when spilling with
> significantly less work_mem than the "optimal" amount. For example,
> with Tomas' TPC-H query (against my smaller TPC-H dataset), I find
> that setting work_mem to 6MB looks like this:

...

>          Planned Partitions: 128  Peak Memory Usage: 6161kB  Disk
> Usage: 2478080kB  HashAgg Batches: 128

...

>          Planned Partitions: 128  Peak Memory Usage: 5393kB  Disk
> Usage: 2482152kB  HashAgg Batches: 11456

...

> My guess that this is because the
> recursive hash aggregation misbehaves in a self-similar fashion once
> a
> certain tipping point has been reached.

It looks like it might be fairly easy to use HyperLogLog as an
estimator for the recursive step. That should reduce the
overpartitioning, which I believe is the cause of this discontinuity.

It's not clear to me that overpartitioning is a real problem in this
case -- but I think the fact that it's causing confusion is enough
reason to see if we can fix it.

Regards,
    Jeff Davis





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: hashagg slowdown due to spill changes
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk