Re: Default setting for enable_hashagg_disk

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: Default setting for enable_hashagg_disk
Дата	24 июля 2020 г. 18:03:54
Msg-id	CAH2-WznP_v5dO-vC=GKXoDSDF6KyVR_La4dJVdr=1KxR_TbpMg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Default setting for enable_hashagg_disk (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Default setting for enable_hashagg_disk
Список	pgsql-hackers

Дерево обсуждения

On Fri, Jul 24, 2020 at 8:19 AM Robert Haas <robertmhaas@gmail.com> wrote:
> This is all really good analysis, I think, but this seems like the key
> finding. It seems like we don't really understand what's actually
> getting written. Whether we use hash or sort doesn't seem like it
> should have this kind of impact on how much data gets written, and
> whether we use CP_SMALL_TLIST or project when needed doesn't seem like
> it should matter like this either.

Isn't this more or less the expected behavior in the event of
partitions that are spilled recursively? The case that Tomas tested
were mostly cases where work_mem was tiny relative to the data being
aggregated.

The following is an extract from commit 1f39bce0215 showing some stuff
added to the beginning of nodeAgg.c:

+ * We also specify a min and max number of partitions per spill. Too few might
+ * mean a lot of wasted I/O from repeated spilling of the same tuples. Too
+ * many will result in lots of memory wasted buffering the spill files (which
+ * could instead be spent on a larger hash table).
+ */
+#define HASHAGG_PARTITION_FACTOR 1.50
+#define HASHAGG_MIN_PARTITIONS 4
+#define HASHAGG_MAX_PARTITIONS 1024

-- 
Peter Geoghegan

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Default setting for enable_hashagg_disk