Parallel aggregates in PG 16.1
От | ZIMANYI Esteban |
---|---|
Тема | Parallel aggregates in PG 16.1 |
Дата | |
Msg-id | DB9P190MB1419B520B63F120567BD61918DAEA@DB9P190MB1419.EURP190.PROD.OUTLOOK.COM обсуждение исходный текст |
Ответы |
Re: Parallel aggregates in PG 16.1
|
Список | pgsql-hackers |
In MobilityDB
https://github.com/MobilityDB/MobilityDB
we have defined a tstzspan type which is a fixed-size equivalent of the tstzrange type in PostgreSQL.
We have a span_union aggregate function which is the equivalent of the range_agg function in PostgreSQL defined as follows
CREATE FUNCTION tstzspan_union_finalfn(internal)
RETURNS tstzspanset
AS 'MODULE_PATHNAME', 'Span_union_finalfn'
LANGUAGE C IMMUTABLE PARALLEL SAFE;
CREATE AGGREGATE span_union(tstzspan) (
SFUNC = array_agg_transfn,
STYPE = internal,
COMBINEFUNC = array_agg_combine,
SERIALFUNC = array_agg_serialize,
DESERIALFUNC = array_agg_deserialize,
FINALFUNC = tstzspan_union_finalfn
);
As can be seen, we reuse the array_agg function to accumulate the values in an array and the final function just does similar work as the range_agg_finalfn to merge the overlapping spans.
I am testing the parallel aggregate features of PG 16.1
test=# select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
I create a table with 1M random spans and another one with the same data converted to tstzrange
CREATE TABLE tbl_tstzspan_1M AS
SELECT k, random_tstzspan('2001-01-01', '2002-12-31', 10) AS t
FROM generate_series(1, 1e6) AS k;
CREATE TABLE tbl_tstzrange_1M AS
SELECT k, t::tstzrange
FROM tbl_tstzspan_1M;
test=# analyze;
ANALYZE
test=#
The tstzrange DOES NOT support parallel aggregates
test=# EXPLAIN
SELECT k%10, range_agg(t) AS t
FROM tbl_tstzrange_1M
group by k%10
order by k%10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
GroupAggregate (cost=66706.17..203172.65 rows=1000000 width=64)
Group Key: ((k % '10'::numeric))
-> Gather Merge (cost=66706.17..183172.65 rows=1000000 width=54)
Workers Planned: 2
-> Sort (cost=65706.15..66747.81 rows=416667 width=54)
Sort Key: ((k % '10'::numeric))
-> Parallel Seq Scan on tbl_tstzrange_1m (cost=0.00..12568.33 rows=416667 width=54)
(7 rows)
The array_agg function supports parallel aggregates
test=# EXPLAIN
SELECT k%10, array_agg(t) AS t
FROM tbl_tstzspan_1M
group by k%10
order by k%10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=66706.17..193518.60 rows=1000000 width=64)
Group Key: ((k % '10'::numeric))
-> Gather Merge (cost=66706.17..172268.60 rows=833334 width=64)
Workers Planned: 2
-> Partial GroupAggregate (cost=65706.15..75081.15 rows=416667 width=64)
Group Key: ((k % '10'::numeric))
-> Sort (cost=65706.15..66747.81 rows=416667 width=56)
Sort Key: ((k % '10'::numeric))
-> Parallel Seq Scan on tbl_tstzspan_1m (cost=0.00..12568.33 rows=416667 width=56)
(9 rows)
We are not able to make span_union aggregate support parallel aggregates
test=# EXPLAIN
SELECT k%10, span_union(t) AS t
FROM tbl_tstzspan_1M
group by k%10
order by k%10;
QUERY PLAN
--------------------------------------------------------------------------------------
GroupAggregate (cost=187879.84..210379.84 rows=1000000 width=64)
Group Key: ((k % '10'::numeric))
-> Sort (cost=187879.84..190379.84 rows=1000000 width=56)
Sort Key: ((k % '10'::numeric))
-> Seq Scan on tbl_tstzspan_1m (cost=0.00..19860.00 rows=1000000 width=56)
Any suggestion?
Thanks
Esteban
В списке pgsql-hackers по дате отправления: