Re: Default setting for enable_hashagg_disk (hash_mem)

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: Default setting for enable_hashagg_disk (hash_mem)
Дата
Msg-id 20200703145620.GK4107@telsasoft.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk (hash_mem)  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
On Fri, Jul 03, 2020 at 10:08:08AM -0400, Bruce Momjian wrote:
> On Thu, Jul  2, 2020 at 08:35:40PM -0700, Peter Geoghegan wrote:
> > But the problem isn't really the hashaggs-that-spill patch itself.
> > Rather, the problem is the way that work_mem is supposed to behave in
> > general, and the impact that that has on hash aggregate now that it
> > has finally been brought into line with every other kind of executor
> > node. There just isn't much reason to think that we should give the
> > same amount of memory to a groupagg + sort as a hash aggregate. The
> > patch more or less broke an existing behavior that is itself
> > officially broken. That is, the problem that we're trying to fix here
> > is only a problem to the extent that the previous scheme isn't really
> > operating as intended (because grouping estimates are inherently very
> > hard). A revert doesn't seem like it helps anyone.
> > 
> > I accept that the idea of inventing hash_mem to fix this problem now
> > is unorthodox. In a certain sense it solves problems beyond the
> > problems that we're theoretically obligated to solve now. But any
> > "more conservative" approach that I can think of seems like a big
> > mess.
> 
> Well, the bottom line is that we are designing features during beta.
> People are supposed to be testing PG 13 behavior during beta, including
> optimizer behavior.  We don't even have a user report yet of a
> regression compared to PG 12, or one that can't be fixed by increasing
> work_mem.
> 
> If we add a new behavior to PG 13, we then have the pre-PG 13 behavior,
> the pre-patch behavior, and the post-patch behavior.  How are people
> supposed to test all of that?  Add to that that some don't even feel we
> need a new behavior, which is delaying any patch from being applied.

If we default hash_mem=-1, the post-patch behavior by default would be same as
the pre-patch behavior.

Actually, another reason it should be -1 is simply to reduce the minimum,
essential number of GUCs everyone has to change or review on a new installs of
a dedicated or nontrivial instance.  shared_buffers, max_wal_size,
checkpoint_timeout, eff_cache_size, work_mem.

I don't think anybody said it before, but now it occurs to me that one
advantage of making hash_mem a multiplier (I'm thinking of
hash_mem_scale_factor) rather than an absolute is that one wouldn't need to
remember to increase hash_mem every time they increase work_mem.  Otherwise,
this is kind of a foot-gun: hash_mem would default to 16MB, and people
experiencing poor performance would increase work_mem to 256MB like they've
been doing for decades, and see no effect.  Or someone would increase work_mem
from 4MB to 256MB, which exceeds hash_mem default of 16MB, so then (if Peter
has his way) hash_mem is ignored.

Due to these behaviors, I'll retract my previous preference:
| "I feel it should same as work_mem, as it's written, and not a multiplier."

I think the better ideas are:
 - hash_mem=-1
 - hash_mem_scale_factor=1 ?

Maybe as a separate patch we'd set default hash_mem_scale_factor=4, possibly
only in master not and v13.

-- 
Justin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: warnings for invalid function casts
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: Parallell hashjoin sometimes ignores temp_tablespaces