Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id CAH2-Wzkv7bnoRTkniW4h4RVWbn6mGVySjik7opUiYsc5RGrfBA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Fri, Jul 24, 2020 at 12:16 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Maybe, but we're nowhere close to these limits. See this table which I
> posted earlier:
>
>        2MB       Planned Partitions:  64    HashAgg Batches:  4160
>        4MB       Planned Partitions: 128    HashAgg Batches: 16512
>        8MB       Planned Partitions: 256    HashAgg Batches: 21488
>       64MB       Planned Partitions:  32    HashAgg Batches:  2720
>      256MB       Planned Partitions:   8    HashAgg Batches:     8
>
> This is from the non-parallel runs on the i5 machine with 32GB data set,
> the first column is work_mem. We're nowhere near the 1024 limit, and the
> cardinality estimates are pretty good.
>
> OTOH the number o batches is much higher, so clearly there was some
> recursive spilling happening. What I find strange is that this grows
> with work_mem and only starts dropping after 64MB.

Could that be caused by clustering in the data?

If the input data is in totally random order then we have a good
chance of never having to spill skewed "common" values. That is, we're
bound to encounter common values before entering spill mode, and so
those common values will continue to be usefully aggregated until
we're done with the initial groups (i.e. until the in-memory hash
table is cleared in order to process spilled input tuples). This is
great because the common values get aggregated without ever spilling,
and most of the work is done before we even begin with spilled tuples.

If, on the other hand, the common values are concentrated together in
the input...

Assuming that I have this right, then I would also expect simply
having more memory to ameliorate the problem. If you only have/need 4
or 8 partitions then you can fit a higher proportion of the total
number of groups for the whole dataset in the hash table (at the point
when you first enter spill mode). I think it follows that the "nailed"
hash table entries/groupings will "better characterize" the dataset as
a whole.

> Also, how could the amount of I/O be almost constant in all these cases?
> Surely more recursive spilling should do more I/O, but the Disk Usage
> reported by explain analyze does not show anything like ...

Not sure, but might that just be because of the fact that logtape.c
can recycle disk space?

As I said in my last e-mail, it's pretty reasonable to assume that the
vast majority of external sorts are one-pass. It follows that disk
usage can be thought of as almost the same thing as total I/O for
tuplesort. But the same heuristic isn't reasonable when thinking about
hash agg. Hash agg might write out much less data than the total
memory used for the equivalent "peak optimal nospill" hash agg case --
or much more. (Again, reiterating what I said in my last e-mail.)

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] fix GIN index search sometimes losing results
Следующее
От: Ranier Vilela
Дата:
Сообщение: Re: Improving connection scalability: GetSnapshotData()