Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id 6292f5a766198d5745eb8cc17ac736366056671d.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Default setting for enable_hashagg_disk
Список pgsql-hackers
On Thu, 2020-04-09 at 15:26 -0400, Robert Haas wrote:
> I think it's actually pretty different. All of the other enable_*
> GUCs
> disable an entire type of plan node, except for cases where that
> would
> otherwise result in planning failure. This just disables a portion of
> the planning logic for a certain kind of node, without actually
> disabling the whole node type. I'm not sure that's a bad idea, but it
> definitely seems to be inconsistent with what we've done in the past.

The patch adds two GUCs. Both are slightly weird, to be honest, but let
me explain the reasoning. I am open to other suggestions.

1. enable_hashagg_disk (default true):

This is essentially there just to get some of the old behavior back, to
give people an escape hatch if they see bad plans while we are tweaking
the costing. The old behavior was weird, so this GUC is also weird.

Perhaps we can make this a compatibility GUC that we eventually drop? I
don't necessarily think this GUC would make sense, say, 5 versions from
now. I'm just trying to be conservative because I know that, even if
the plans are faster for 90% of people, the other 10% will be unhappy
and want a way to work around it.

2. enable_groupingsets_hash_disk (default false):

This is about how we choose which grouping sets to hash and which to
sort when generating mixed mode paths.

Even before this patch, there are quite a few paths that could be
generated. It tries to estimate the size of each grouping set's hash
table, and then see how many it can fit in work_mem (knapsack), while
also taking advantage of any path keys, etc.

With Disk-based Hash Aggregation, in principle we can generate paths
representing any combination of hashing and sorting for the grouping
sets. But that would be overkill (and grow to a huge number of paths if
we have more than a handful of grouping sets). So I think the existing
planner logic for grouping sets is fine for now. We might come up with
a better approach later.

But that created a testing problem, because if the planner estimates
correctly, no hashed grouping sets will spill, and the spilling code
won't be exercised. This GUC makes the planner disregard which grouping
sets' hash tables will fit, making it much easier to exercise the
spilling code. Is there a better way I should be testing this code
path?

Regards,
    Jeff Davis





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Parallel copy
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: where should I stick that backup?