Parallel aggregates in PG 16.1

Поиск
Список
Период
Сортировка
От ZIMANYI Esteban
Тема Parallel aggregates in PG 16.1
Дата
Msg-id DB9P190MB1419B520B63F120567BD61918DAEA@DB9P190MB1419.EURP190.PROD.OUTLOOK.COM
обсуждение исходный текст
Ответы Re: Parallel aggregates in PG 16.1
Список pgsql-hackers
In MobilityDB
https://github.com/MobilityDB/MobilityDB
we have defined a tstzspan type which is a fixed-size equivalent of the tstzrange type in PostgreSQL.

We have a span_union aggregate function which is the equivalent of the range_agg function in PostgreSQL defined as follows

CREATE FUNCTION tstzspan_union_finalfn(internal)
  RETURNS tstzspanset
  AS 'MODULE_PATHNAME', 'Span_union_finalfn'
  LANGUAGE C IMMUTABLE PARALLEL SAFE;

CREATE AGGREGATE span_union(tstzspan) (
  SFUNC = array_agg_transfn,
  STYPE = internal,
  COMBINEFUNC = array_agg_combine,
  SERIALFUNC = array_agg_serialize,
  DESERIALFUNC = array_agg_deserialize,
  FINALFUNC = tstzspan_union_finalfn
);

As can be seen, we reuse the array_agg function to accumulate the values in an array and the final function just does similar work as the range_agg_finalfn to merge the overlapping spans.

I am testing the parallel aggregate features of PG 16.1

test=# select version();
                                                version
-------------------------------------------------------------------------------------------------------
 PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit

I create a table with 1M random spans and another one with the same data converted to tstzrange

CREATE TABLE tbl_tstzspan_1M AS
SELECT k, random_tstzspan('2001-01-01', '2002-12-31', 10) AS t
FROM generate_series(1, 1e6) AS k;

CREATE TABLE tbl_tstzrange_1M AS
SELECT k, t::tstzrange
FROM tbl_tstzspan_1M;

test=# analyze;
ANALYZE
test=#

The tstzrange DOES NOT support parallel aggregates

test=# EXPLAIN
SELECT k%10, range_agg(t) AS t
FROM tbl_tstzrange_1M
group by k%10
order by k%10;

                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=66706.17..203172.65 rows=1000000 width=64)
   Group Key: ((k % '10'::numeric))
   ->  Gather Merge  (cost=66706.17..183172.65 rows=1000000 width=54)
         Workers Planned: 2
         ->  Sort  (cost=65706.15..66747.81 rows=416667 width=54)
               Sort Key: ((k % '10'::numeric))
               ->  Parallel Seq Scan on tbl_tstzrange_1m  (cost=0.00..12568.33 rows=416667 width=54)
(7 rows)

The array_agg function supports parallel aggregates

test=# EXPLAIN
SELECT k%10, array_agg(t) AS t
FROM tbl_tstzspan_1M
group by k%10
order by k%10;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=66706.17..193518.60 rows=1000000 width=64)
   Group Key: ((k % '10'::numeric))
   ->  Gather Merge  (cost=66706.17..172268.60 rows=833334 width=64)
         Workers Planned: 2
         ->  Partial GroupAggregate  (cost=65706.15..75081.15 rows=416667 width=64)
               Group Key: ((k % '10'::numeric))
               ->  Sort  (cost=65706.15..66747.81 rows=416667 width=56)
                     Sort Key: ((k % '10'::numeric))
                     ->  Parallel Seq Scan on tbl_tstzspan_1m  (cost=0.00..12568.33 rows=416667 width=56)
(9 rows)

We are not able to make span_union aggregate support parallel aggregates

test=# EXPLAIN
SELECT k%10, span_union(t) AS t
FROM tbl_tstzspan_1M
group by k%10
order by k%10;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 GroupAggregate  (cost=187879.84..210379.84 rows=1000000 width=64)
   Group Key: ((k % '10'::numeric))
   ->  Sort  (cost=187879.84..190379.84 rows=1000000 width=56)
         Sort Key: ((k % '10'::numeric))
         ->  Seq Scan on tbl_tstzspan_1m  (cost=0.00..19860.00 rows=1000000 width=56)

Any suggestion?

Thanks

Esteban

 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: A recent message added to pg_upgade
Следующее
От: "Drouvot, Bertrand"
Дата:
Сообщение: Re: Synchronizing slots from primary to standby