[HACKERS] Hash support for grouping sets

Поиск
Список
Период
Сортировка
От Andrew Gierth
Тема [HACKERS] Hash support for grouping sets
Дата
Msg-id 87vatszyhj.fsf@news-spur.riddles.org.uk
обсуждение исходный текст
Ответы Re: [HACKERS] Hash support for grouping sets  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] Hash support for grouping sets  (Thom Brown <thom@linux.com>)
Список pgsql-hackers
Herewith a patch for doing grouping sets via hashing or mixed hashing
and sorting.

The principal objective is to pick whatever combination of grouping sets
has an estimated size that fits in work_mem, and minimizes the number of
sorting passes we need to do over the data, and hash those.  (Yes, this
is a knapsack problem.)

This is achieved by adding another strategy to the Agg node; AGG_MIXED
means that both hashing and (sorted or plain) grouping happens.  In
addition, support for multiple hash tables in AGG_HASHED mode is added.

Sample plans look like this:

explain select two,ten from onek group by cube(two,ten);
                          QUERY PLAN                          
--------------------------------------------------------------
 MixedAggregate  (cost=0.00..89.33 rows=33 width=8)
   Hash Key: two, ten
   Hash Key: two
   Hash Key: ten
   Group Key: ()
   ->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=8)

explain select two,ten from onek group by two, cube(ten,twenty);
                          QUERY PLAN                           
---------------------------------------------------------------
 HashAggregate  (cost=86.50..100.62 rows=162 width=12)
   Hash Key: two, ten, twenty
   Hash Key: two, ten
   Hash Key: two
   Hash Key: two, twenty
   ->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=12)

set work_mem='64kB';
explain select count(*) from tenk1
  group by grouping sets (unique1,thousand,hundred,ten,two);
                               QUERY PLAN                               
------------------------------------------------------------------------
 MixedAggregate  (cost=1535.39..3023.89 rows=11112 width=28)
   Hash Key: two
   Hash Key: ten
   Hash Key: hundred
   Group Key: unique1
   Sort Key: thousand
     Group Key: thousand
   ->  Sort  (cost=1535.39..1560.39 rows=10000 width=20)
         Sort Key: unique1
         ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=20)
(10 rows)

Columns with unhashable or unsortable data types are handled
appropriately, though obviously every individual grouping set must end
up containing compatible column types.

There is one current weakness which I don't see a good solution for: the
planner code still has to pick a single value for group_pathkeys before
planning the input path.  This means that we sometimes can't choose a
minimal set of sorts, because we may have chosen a sort order for a
grouping set that should have been hashed, for example:

explain select count(*) from tenk1
  group by grouping sets (two,unique1,thousand,hundred,ten);
                               QUERY PLAN                               
------------------------------------------------------------------------
 MixedAggregate  (cost=1535.39..4126.28 rows=11112 width=28)
   Hash Key: ten
   Hash Key: hundred
   Group Key: two
   Sort Key: unique1
     Group Key: unique1
   Sort Key: thousand
     Group Key: thousand
   ->  Sort  (cost=1535.39..1560.39 rows=10000 width=20)
         Sort Key: two
         ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=20)

(3 sorts, vs. 2 in the previous example that listed the same grouping
sets in different order)

Current status of the work: probably still needs cleanup, maybe some
better regression tests, and Andres has expressed some concern over the
performance impact of additional complexity in advance_aggregates; it
would be useful if this could be tested.  But it should be reviewable
as-is.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: Re: [HACKERS] Supporting huge pages on Windows
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: [HACKERS] Faster methods for getting SPI results