Re: Default setting for enable_hashagg_disk

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: Default setting for enable_hashagg_disk
Дата	10 июля 2020 г. 01:32:20
Msg-id	CAH2-WzmB04dxgRgjDGvDbugEFfHxMi2pyuJqsYr30x1RAEqSXg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Default setting for enable_hashagg_disk (Stephen Frost <sfrost@snowman.net>)
Ответы	Re: Default setting for enable_hashagg_disk (Stephen Frost <sfrost@snowman.net>)
Список	pgsql-hackers

Дерево обсуждения

On Thu, Jul 9, 2020 at 7:03 AM Stephen Frost <sfrost@snowman.net> wrote:
> > The one for the planner is already there, and it looks like we need one
> > for the executor as well (to tell HashAgg to ignore the memory limit
> > just like v12).
>
> No, ignoring the limit set was, as agreed above, a bug, and I don't
> think it makes sense to add some new user tunable for this.

It makes more sense than simply ignoring what our users will see as a
simple regression. (Though I still lean towards fixing the problem by
introducing hash_mem, which at least tries to fix the problem head
on.)

> If folks
> want to let HashAgg use more memory then they can set work_mem higher,
> just the same as if they want a Sort node to use more memory or a
> HashJoin.  Yes, that comes with potential knock-on effects about other
> nodes (possibly) using more memory but that's pretty well understood for
> all the other cases and I don't think that it makes sense to have a
> special case for HashAgg when the only justification is that "well, you
> see, it used to have this bug, so...".

That's not the only justification. The other justification is that
it's generally reasonable to prefer giving hash aggregate more memory.
This would even be true in a world where all grouping estimates were
somehow magically accurate. These two justifications coincide in a way
that may seem a bit too convenient to truly be an accident of history.
And if they do: I agree. It's no accident.

It seems likely that we have been "complicit" in enabling
"applications that live beyond their means", work_mem-wise. We knew
that hash aggregate had this "bug" forever, and yet we were reasonably
happy to have it be the common case for years. It's very fast, and
didn't actually explode most of the time (even though grouping
estimates are often pretty poor). Hash agg was and is the common case.
Yes, we were concerned about the risk of OOM for many years, but it
was considered a risk worth taking. We knew what the trade-off was. We
never quite admitted it, but what does it matter?

Our own tacit attitude towards hash agg + work_mem mirrors that of our
users (or at least the users that will be affected by this issue, of
which there will be plenty). Declaring this behavior a bug with no
redeeming qualities now seems a bit rich.

-- 
Peter Geoghegan

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Alvaro Herrera
Дата: 10 июля 2020 г., 01:20:04
Сообщение: Re: min_safe_lsn column in pg_replication_slots view

Следующее

От: Tom Lane
Дата: 10 июля 2020 г., 01:46:20
Сообщение: Re: Intermittent BRIN failures on hyrax and lousyjack

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Default setting for enable_hashagg_disk

Предыдущее

Следующее