Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id CAH2-WzmbycnZnxFTYnwkSkCPTVQbRoLiuEvo8gkZCKiQTL8GAQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Default setting for enable_hashagg_disk  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Thu, Jul 9, 2020 at 3:58 PM Stephen Frost <sfrost@snowman.net> wrote:
> > That's not the only justification. The other justification is that
> > it's generally reasonable to prefer giving hash aggregate more memory.
>
> Sure, and it's generally reasonably to prefer giving Sorts more memory
> too... as long as you've got it available.

Did you actually read any of the discussion?

The value of doing a hash aggregate all in memory is generally far
greater than the value of doing a sort all in memory. They're just
very different situations, owing to the fundamental laws-of-physics
principles that apply in each case. Even to the extent that sometimes
an external sort can actually be slightly *faster* (it's not really
supposed to be, but it is). Try it out yourself.

I'm not going to repeat this in full again. The thread is already long enough.

> > This would even be true in a world where all grouping estimates were
> > somehow magically accurate. These two justifications coincide in a way
> > that may seem a bit too convenient to truly be an accident of history.
> > And if they do: I agree. It's no accident.
>
> I disagree that the lack of HashAgg's ability to spill to disk was
> because, for this one particular node, we should always just give it
> however much memory it needs, regardless of if it's anywhere near how
> much we thought it'd need or not.

This is a straw man.

It's possible to give hash agg an amount of memory that exceeds
work_mem, but is less than infinity. That's more or less what I
propose to enable by inventing a new hash_mem GUC, in fact.

There is also the separate "escape hatch" idea that David Rowley
proposed, that I consider to be a plausible way of resolving the
problem. That wouldn't "always give it [hash agg] however much memory
it asks for", either. It would only do that when the setting indicated
that hash agg should be given however much memory it asks for.

> I disagree that we were reasonably happy with this bug or that it
> somehow makes sense to retain it.

While we're far from resolving this open item, I think that you'll
find that most people agree that it's reasonable to think of hash agg
as special -- at least in some contexts. The central disagreement
seems to be on the question of how to maintain some kind of continuity
with the old behavior, how ambitious our approach should be in
Postgres 13, etc.

> > Yes, we were concerned about the risk of OOM for many years, but it
> > was considered a risk worth taking. We knew what the trade-off was. We
> > never quite admitted it, but what does it matter?
>
> This is not some well designed feature of HashAgg that had a lot of
> thought put into it, whereby the community agreed that we should just
> let it be and hope no one noticed or got bit by it- I certainly have
> managed to kill servers by a HashAgg gone bad and I seriously doubt I'm
> alone in that.

I was talking about the evolutionary pressures that led to this
curious state of affairs, where hashagg's overuse of memory was often
actually quite desirable. I understand that it also sometimes causes
OOMs, and that OOMs are bad. Both beliefs are compatible, just as a
design that takes both into account is possible.

If it isn't practical to do that in Postgres 13, then an escape hatch
is highly desirable, if not essential.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: output columns of \dAo and \dAp
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk