No partition pruning when initializing query plan with LATERAL JOINand aggregates

Поиск
Список
Период
Сортировка
От Marcin Barczyński
Тема No partition pruning when initializing query plan with LATERAL JOINand aggregates
Дата
Msg-id CAP3o3PdcYTbHHX9sz7K6EsWnf2L64omM_de6r1eana0=go3YEw@mail.gmail.com
обсуждение исходный текст
Список pgsql-general
I wonder why partition pruning doesn't work with LATERAL JOIN and aggregates.
Below is my example tested on PostgreSQL 12.1 (Ubuntu 12.1-1.pgdg18.04+1):

CREATE TABLE demo(key BIGINT) PARTITION BY LIST (key);
CREATE TABLE demo_key_1 PARTITION OF demo FOR VALUES IN (1);
CREATE TABLE demo_key_2 PARTITION OF demo FOR VALUES IN (2);
INSERT INTO demo(key) VALUES (1), (2);
ANALYZE demo;

CREATE TABLE demo2(key BIGINT) PARTITION BY LIST (key);
CREATE TABLE demo2_key_1 PARTITION OF demo2 FOR VALUES IN (1);
CREATE TABLE demo2_key_2 PARTITION OF demo2 FOR VALUES IN (2);
INSERT INTO demo2(key) VALUES (1), (2);
ANALYZE demo2;

Now, if there are no aggregates in SELECT under LATERAL JOIN, everything works as expected - only a single partition of each table is scanned:

EXPLAIN ANALYZE
         SELECT * FROM demo
         JOIN LATERAL (
                  SELECT key AS key2
                  FROM demo2
                  WHERE demo2.key = demo.key
         ) d ON TRUE
         WHERE demo.key = 1;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2.03 rows=1 width=16) (actual time=0.007..0.008 rows=1 loops=1)
   ->  Seq Scan on demo_key_1  (cost=0.00..1.01 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=1)
         Filter: (key = 1)
   ->  Seq Scan on demo2_key_1  (cost=0.00..1.01 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1)
         Filter: (key = 1)
 Planning Time: 0.191 ms
 Execution Time: 0.025 ms
(7 rows)

However, when I try a very similar query that contains an aggregate function, partitions of demo2 are not pruned from the query plan:

EXPLAIN ANALYZE
         SELECT * FROM demo
         JOIN LATERAL (
                  SELECT sum(demo2.key) AS sum2
                  FROM demo2
                  WHERE demo2.key = demo.key
         ) d ON TRUE
         WHERE demo.key = 1;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2.03..3.07 rows=1 width=40) (actual time=0.018..0.018 rows=1 loops=1)
   ->  Seq Scan on demo_key_1  (cost=0.00..1.01 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=1)
         Filter: (key = 1)
   ->  Aggregate  (cost=2.03..2.04 rows=1 width=32) (actual time=0.011..0.011 rows=1 loops=1)
         ->  Append  (cost=0.00..2.03 rows=2 width=8) (actual time=0.004..0.005 rows=1 loops=1)
               ->  Seq Scan on demo2_key_1  (cost=0.00..1.01 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
                     Filter: (key = demo_key_1.key)
               ->  Seq Scan on demo2_key_2  (cost=0.00..1.01 rows=1 width=8) (never executed)
                     Filter: (key = demo_key_1.key)
 Planning Time: 0.174 ms
 Execution Time: 0.082 ms
(11 rows)

Of course, Seq Scan on demo2_key_2 was never executed, but why wasn't it pruned from the query plan? More complex queries with hundreds of partitions are affected badly by that.

The workaround is to add a redundant condition to the subquery:

EXPLAIN ANALYZE
         SELECT * FROM demo
         JOIN LATERAL (
                  SELECT sum(demo2.key) AS sum2
                  FROM demo2
                  WHERE demo2.key = 1 AND demo2.key = demo.key
         ) d ON TRUE
         WHERE demo.key = 1;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.01..2.05 rows=1 width=40) (actual time=0.011..0.011 rows=1 loops=1)
   ->  Seq Scan on demo_key_1  (cost=0.00..1.01 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=1)
         Filter: (key = 1)
   ->  Aggregate  (cost=1.01..1.02 rows=1 width=32) (actual time=0.006..0.006 rows=1 loops=1)
         ->  Result  (cost=0.00..1.01 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
               One-Time Filter: (demo_key_1.key = 1)
               ->  Seq Scan on demo2_key_1  (cost=0.00..1.01 rows=1 width=8) (actual time=0.001..0.002 rows=1 loops=1)
                     Filter: (key = 1)
 Planning Time: 0.079 ms
 Execution Time: 0.031 ms
(10 rows)

-- 
M.B.

В списке pgsql-general по дате отправления:

Предыдущее
От: dagamier
Дата:
Сообщение: Re: pg_repack in cluster
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: pg_repack in cluster