Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id CAAKRu_aisiENMsPM2gC4oUY1hHG3yrCwY-fXUg22C6_MJUwQdA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Default setting for enable_hashagg_disk  (Justin Pryzby <pryzby@telsasoft.com>)
Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers


On Thu, Apr 9, 2020 at 1:02 PM Jeff Davis <pgsql@j-davis.com> wrote:
2. enable_groupingsets_hash_disk (default false):

This is about how we choose which grouping sets to hash and which to
sort when generating mixed mode paths.

Even before this patch, there are quite a few paths that could be
generated. It tries to estimate the size of each grouping set's hash
table, and then see how many it can fit in work_mem (knapsack), while
also taking advantage of any path keys, etc.

With Disk-based Hash Aggregation, in principle we can generate paths
representing any combination of hashing and sorting for the grouping
sets. But that would be overkill (and grow to a huge number of paths if
we have more than a handful of grouping sets). So I think the existing
planner logic for grouping sets is fine for now. We might come up with
a better approach later.

But that created a testing problem, because if the planner estimates
correctly, no hashed grouping sets will spill, and the spilling code
won't be exercised. This GUC makes the planner disregard which grouping
sets' hash tables will fit, making it much easier to exercise the
spilling code. Is there a better way I should be testing this code
path?


So, I was catching up on email and noticed the last email in this
thread.

I think I am not fully understanding what enable_groupingsets_hash_disk
does. Is it only for testing?

Using the tests you added to src/test/regress/sql/groupingsets.sql, I
did get a plan that looks like hashagg is spilling to disk (goes through
hashagg_spill_tuple() code path and has number of batches reported in
Explain) in a MixedAgg plan for a grouping sets query even with
enable_groupingsets_hash_disk set to false. You don't have the exact
query I tried (below) in the test suite, but it is basically what is
already there, so I must be missing something.

set enable_hashagg_disk = true;
SET enable_groupingsets_hash_disk = false;
SET work_mem='64kB';
set enable_hashagg = true;
set jit_above_cost = 0;
drop table if exists gs_hash_1;
create table gs_hash_1 as
select g1000, g100, g10, sum(g::numeric), count(*), max(g::text) from
  (select g%1000 as g1000, g%100 as g100, g%10 as g10, g
   from generate_series(0,199999) g) s
group by cube (g1000,g100,g10);

explain (analyze, costs off, timing off)
select g1000, g100, g10
from gs_hash_1 group by cube (g1000,g100,g10);

                          QUERY PLAN                          
--------------------------------------------------------------
 MixedAggregate (actual rows=9648 loops=1)
   Hash Key: g10
   Hash Key: g10, g1000
   Hash Key: g100
   Hash Key: g100, g10
   Group Key: g1000, g100, g10
   Group Key: g1000, g100
   Group Key: g1000
   Group Key: ()
   Peak Memory Usage: 233 kB
   Disk Usage: 1600 kB
   HashAgg Batches: 2333
   ->  Sort (actual rows=4211 loops=1)
         Sort Key: g1000, g100, g10
         Sort Method: external merge  Disk: 384kB
         ->  Seq Scan on gs_hash_1 (actual rows=4211 loops=1)

Anyway, when I throw in the stats trick that is used in join_hash.sql:

alter table gs_hash_1 set (autovacuum_enabled = 'false');
update pg_class set reltuples = 10 where relname = 'gs_hash_1';

I get a MixedAgg plan that doesn't have any Sort below and uses much
more disk.

                        QUERY PLAN                        
----------------------------------------------------------
 MixedAggregate (actual rows=4211 loops=1)
   Hash Key: g1000, g100, g10
   Hash Key: g1000, g100
   Hash Key: g1000
   Hash Key: g100, g10
   Hash Key: g100
   Hash Key: g10, g1000
   Hash Key: g10
   Group Key: ()
   Peak Memory Usage: 405 kB
   Disk Usage: 59712 kB
   HashAgg Batches: 4209
   ->  Seq Scan on gs_hash_1 (actual rows=200000 loops=1)

I'm not sure if this is more what you were looking for--or maybe I am
misunderstanding the guc.
 
--
Melanie Plageman

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: elog(DEBUG2 in SpinLocked section.
Следующее
От: torikoshia
Дата:
Сообщение: Re: Is it useful to record whether plans are generic or custom?