Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id 8bff2e4e8020c3caa16b61a46918d21b573eaf78.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
Re: Default setting for enable_hashagg_disk  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Tue, 2020-06-09 at 21:15 -0500, Justin Pryzby wrote:
> The behavior of the GUC is inconsistent with the other GUCs, which is
> confusing.  See also Robert's comments in this thread.
> 
https://www.postgresql.org/message-id/20200407223900.GT2228%40telsasoft.com

enable_* GUCs are planner GUCs, so it would be confusing to me if they
affected execution-time behavior.

I think the point of confusion is that it's not enabling/disabling an
entire execution node; it only "disables" HashAgg if it thinks it will
spill. I agree that is a difference with the other GUCs, and could
cause confusion.

Stepping back, I was trying to solve two problems with these GUCs:

1. Testing the spilling of hashed grouping sets: I'm inclined to just
get rid of enable_groupingsets_hash_disk and use Melanie's stats-
hacking approach instead.

2. Trying to provide an escape hatch for someone who experiences a
performance regression and wants something like the old behavior back.
There are two aspects of the old behavior that a user could potentially
want back:
  a. Don't choose HashAgg if it's expected to have more groups than fit
into a work_mem-sized hashtable.
  b. If executing HashAgg, and the hash table exceeds work_mem, just
keep going.

The behavior in v13 master is, by default, analagous to Sort or
anything else that adapts at runtime to spill. If we had spillable
HashAgg the whole time, we wouldn't be worried about #2 at all. But,
out of conservatism, I am trying to accommodate users who want an
escape hatch, at least for a release or two until users feel more
comfortable with disk-based HashAgg.

Setting enable_hash_disk=false implements 2(a). This name apparently
causes confusion, but it's hard to come up with a better one because
the v12 behavior has nuance that's hard to express succinctly. I don't
think the names you suggested quite fit, but the idea to use a more
interesting GUC value might help express the behavior. Perhaps making
enable_hashagg a ternary "enable_hashagg=on|off|avoid_disk"? The word
"reject" is too definite for the planner, which is working with
imperfect information.

In master, there is no explicit way to get 2(b), but you can just set
work_mem higher in a lot of cases. If enough people want 2(b), I can
add it easily. Perhaps hashagg_overflow=on|off, which would control
execution time behavior?

Regards,
    Jeff Davis





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexey Kondratov
Дата:
Сообщение: Re: Physical replication slot advance is not persistent
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk