Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id CAH2-WzkhNFqSeTmzfYKNUi-6hXxH1uL=1yZLh2uEO18wWWn94g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Sat, Jul 25, 2020 at 10:23 AM Jeff Davis <pgsql@j-davis.com> wrote:
> There's also another effect at work that can cause the total number of
> batches to be higher for larger work_mem values: when we do recurse, we
> again need to estimate the number of partitions needed. Right now, we
> overestimate the number of partitions needed (to be conservative),
> which leads to a wider fan-out and lots of tiny partitions, and
> therefore more batches.

What worries me a bit is the sharp discontinuities when spilling with
significantly less work_mem than the "optimal" amount. For example,
with Tomas' TPC-H query (against my smaller TPC-H dataset), I find
that setting work_mem to 6MB looks like this:

   ->  HashAggregate  (cost=2700529.47..3020654.22 rows=1815500
width=40) (actual time=21039.788..32278.703 rows=2000000 loops=1)
         Output: lineitem.l_partkey, (0.2 * avg(lineitem.l_quantity))
         Group Key: lineitem.l_partkey
         Planned Partitions: 128  Peak Memory Usage: 6161kB  Disk
Usage: 2478080kB  HashAgg Batches: 128

(And we have a sensible looking number of batches that match the
number of planned partitions with higher work_mem settings, too.)

However, if I set work_mem to 5MB (or less), it looks like this:

    ->  HashAggregate  (cost=2700529.47..3020654.22 rows=1815500
width=40) (actual time=20849.490..37027.533 rows=2000000 loops=1)
         Output: lineitem.l_partkey, (0.2 * avg(lineitem.l_quantity))
         Group Key: lineitem.l_partkey
         Planned Partitions: 128  Peak Memory Usage: 5393kB  Disk
Usage: 2482152kB  HashAgg Batches: 11456

So the number of partitions is still 128, but the number of batches
explodes to 11,456 all at once. My guess that this is because the
recursive hash aggregation misbehaves in a self-similar fashion once a
certain tipping point has been reached. I expect that the exact nature
of that tipping point is very complicated, and generally dependent on
the dataset, clustering, etc. But I don't think that this kind of
effect will be uncommon.

(FWIW this example requires ~620MB work_mem to complete without
spilling at all -- so it's kind of extreme, though not quite as
extreme as many of the similar test results from Tomas.)

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Difference for Binary format vs Text format for client-server communication
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: hashagg slowdown due to spill changes