Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id 20200407223900.GT2228@telsasoft.com
обсуждение исходный текст
Ответ на Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Default setting for enable_hashagg_disk
Список pgsql-hackers
On Tue, Apr 07, 2020 at 11:20:46AM -0700, Jeff Davis wrote:
> The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on
> costing. If false, it only generates a HashAgg path if it thinks it will fit
> in work_mem, similar to the old behavior (though it wlil now spill to disk if
> the planner was wrong about it fitting in work_mem).  The current default is
> true.

Are there any other GUCs that behave like that ?  It's confusing to me when I
see "Disk Usage: ... kB", despite setting it to "disable", and without the
usual disable_cost.  I realize that postgres chose the plan on the hypothesis
that it would *not* exceed work_mem, and that spilling to disk is considered
preferable to ignoring the setting, and that "going back" to planning phase
isn't a possibility.

template1=# explain (analyze, costs off, summary off) SELECT a, COUNT(1) FROM generate_series(1,999999) a GROUP BY 1 ;
 HashAggregate (actual time=1370.945..2877.250 rows=999999 loops=1)
   Group Key: a
   Peak Memory Usage: 5017 kB
   Disk Usage: 22992 kB
   HashAgg Batches: 84
   ->  Function Scan on generate_series a (actual time=314.507..741.517 rows=999999 loops=1)

A previous version of the docs said this, which I thought was confusing, and you removed it.
But I guess this is the behavior it was trying to .. explain.

+      <term><varname>enable_hashagg_disk</varname> (<type>boolean</type>)
+        ... This only affects the planner choice;
+        execution time may still require using disk-based hash
+        aggregation. The default is <literal>on</literal>.

I suggest that should be reworded and then re-introduced, unless there's some
further behavior change allowing the previous behavior of
might-exceed-work-mem.

"This setting determines whether the planner will elect to use a hash plan
which it expects will exceed work_mem and spill to disk.  During execution,
hash nodes which exceed work_mem will spill to disk even if this setting is
disabled.  To avoid spilling to disk, either increase work_mem (or set
enable_hashagg=off)."

For sure the release notes should recommend re-calibrating work_mem.

-- 
Justin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: proposal \gcsv
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] Restricting maximum keep segments by repslots