Re: Default setting for enable_hashagg_disk (hash_mem)

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Default setting for enable_hashagg_disk (hash_mem)
Дата
Msg-id CAApHDvocm8dm5vBBy+uSr88-athC=41SW3HAevNOXKeMHtTbEQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk (hash_mem)  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Default setting for enable_hashagg_disk (hash_mem)  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Default setting for enable_hashagg_disk (hash_mem)  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Tue, 7 Jul 2020 at 16:57, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Sun, 2020-07-05 at 16:47 -0700, Peter Geoghegan wrote:
> > Where does that leave the hash_mem idea (or some other similar
> > proposal)?
>
> hash_mem is acceptable to me if the consensus is moving toward that,
> but I'm not excited about it.

FWIW, I'm not a fan of the hash_mem idea. It was my impression that we
aimed to provide an escape hatch for people we have become accustomed
to <= PG12 behaviour and hash_mem sounds like it's not that. Surely a
GUC by that name would control what Hash Join does too?  Otherwise, it
would be called hashagg_mem. I'd say changing the behaviour of Hash
join is not well aligned to the goal of allowing users to get
something closer to what PG12 did.

I know there has been talk over the years to improve how work_mem
works. I see Tomas mentioned memory grants on the other thread [1]. I
do imagine this is the long term solution to the problem where users
must choose very conservative values for work_mem. We're certainly not
going to get that for PG13, so I do think what we need here is just a
simple escape hatch. I mentioned my thoughts in [2], so won't go over
it again here. Once we've improved the situation in some future
version of postgres, perhaps along the lines of what Tomas mentioned,
then we can get rid of the escape hatch.

Here are my reasons for not liking the hash_mem idea:

1. if it also increases the amount of memory that Hash Join can use
then that makes the partition-wise hash join problem of hash_mem *
npartitions even bigger when users choose to set hash_mem higher than
work_mem to get Hash Agg doing what they're used to.
2. Someone will one day ask for sort_mem and then materialize_mem.
Maybe then cte_mem. Once those are done we might as well just add a
GUC to control every executor node that uses work_mem.
3. I'm working on a Result cache node [3]. It uses a hash table
internally. Should it constraint its memory consumption according to
hash_mem or work_mem? It's not really that obvious to people that it
internally uses a hash table. "Hash" does not appear in the node name.
Do people need to look that up in the documents?

David

[1] https://www.postgresql.org/message-id/20200626235850.gvl3lpfyeobu4evi@development
[2] https://www.postgresql.org/message-id/CAApHDvqFZikXhAGW=UKZKq1_FzHy+XzmUzAJiNj6RWyTHH4UfA@mail.gmail.com
[3] https://www.postgresql.org/message-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Multi-byte character case-folding