Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id CA+TgmoY4-zSMFf8XbpO6uRsMX4vNPPLXvtajAZpG3f9eoEyjdA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Mon, Jun 22, 2020 at 1:30 PM Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2020-06-22 at 10:52 -0400, Robert Haas wrote:
> > So I feel like the really important thing here is to fix the cases
> > that don't come out well with default settings.
>
> ...with the caveat that perfection is not something to expect from our
> planner.

+1.

> >  If we can't do that,
> > then the feature is half-baked and maybe should not have been
> > committed in the first place.
>
> HashAgg started out half-baked at the dawn of time, and stayed that way
> through version 12. Disk-based HashAgg was designed to fix it.
>
> Other major planner features generally offer a way to turn them off
> (e.g. parallelism, JIT), and we don't call those half-baked.

Sure, and I'm not calling this half-baked either, but there is a
difference. JIT and parallelism are discrete features to a far greater
extent than this is. I think we can explain to people the pros and
cons of those things and ask them to make an intelligent choice about
whether they want them. You can say things like "well, JIT is liable
to make your queries run faster once they get going, but it adds to
the startup time and creates a dependency on LLVM" and the user can
decide whether they want that or not. At least to me, something like
this isn't so easy to consider as a separate feature. As you say:

> I agree that the single GUC added in v13 (hashagg_avoid_disk_plan) is
> weird because it's half of a disable switch. But it's not weird because
> of my changes in v13; it's weird because the planner behavior in v12
> was weird. I hope not many people need to set it, and I hope we can
> remove it soon.

The weirdness is the problem here, at least for me. Generally, I don't
like GUCs of the form give_me_the_old_strange_behavior=true, because
either they tend to be either unnecessary (because nobody wants the
old strange behavior) or hard to eliminate (because the new behavior
is also strange and is not categorically better).

> If you think we will never be able to remove the GUC, then we should
> think a little harder about whether we really need it. I am open to
> that discussion, but I don't think the presence of this GUC implies
> that disk-based hashagg is half-baked.

I don't think it necessarily implies that either. I do however have
some concerns about people using the GUC as a crutch. I am slightly
worried that this is going to have hard-to-fix problems and that we'll
be stuck with the GUC for that reason. Now if that is the case, is
removing the GUC any better? Maybe not. These decisions are hard, and
I am not trying to pretend like I have all the answers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Maciek Sakrejda
Дата:
Сообщение: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Parallel Seq Scan vs kernel read ahead