Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id CAH2-Wz=osB4oi_nH8MnosYhVVSNOm5q3=exGe-b3q6gWOgf98w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Default setting for enable_hashagg_disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Fri, Jun 26, 2020 at 4:59 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I agree larger work_mem for hashagg (and thus less spilling) may mean
> lower work_mem for so some other nodes that are less sensitive to this.
> But I think this needs to be formulated as a cost-based decision,
> although I don't know how to do that for the reasons I explained before
> (bottom-up plan construction vs. distributing the memory budget).

Why do you think that it needs to be formulated as a cost-based
decision? That's probably true of a scheme that allocates memory very
intelligently, but what about an approach that's slightly better than
work_mem?

What problems do you foresee (if any) with adding a hash_mem GUC that
gets used for both planning and execution for hash aggregate and hash
join nodes, in about the same way as work_mem is now?

> FWIW some databases already do something like this - SQL Server has
> something called "memory grant" which I think mostly does what you
> described here.

Same is true of Oracle. But Oracle also has simple work_mem-like
settings for sorting and hashing. People don't really use them
anymore, but apparently it was once common for the DBA to explicitly
give over more memory to hashing -- much like the hash_mem setting I
asked about. IIRC the same is true of DB2.

> The difference between sort and hashagg spills is that for sorts there
> is no behavior change. Plans that did (not) spill in v12 will behave the
> same way on v13, modulo some random perturbation. For hashagg that's not
> the case - some queries that did not spill before will spill now.
>
> So even if the hashagg spills are roughly equal to sort spills, both are
> significantly more expensive than not spilling.

Just to make sure we're on the same page: both are significantly more
expensive than a hash aggregate not spilling *specifically*. OTOH, a
group aggregate may not be much slower when it spills compared to an
in-memory sort group aggregate. It may even be noticeably faster, due
to caching effects, as you mentioned at one point upthread.

This is the property that makes hash aggregate special, and justifies
giving it more memory than other nodes on a system-wide basis (the
same thing applies to hash join). This could even work as a multiplier
of work_mem.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Review for GetWALAvailability()