Re: force partition pruning

Поиск

Список

Период

Сортировка

От	Vijaykumar Jain
Тема	Re: force partition pruning
Дата	10 мая 2021 г. 15:47:02
Msg-id	CAM+6J96NB=w33TpMc_ikMEtU1R716rjYNyweA1Zei+abZF=gtA@mail.gmail.com обсуждение исходный текст
Ответ на	force partition pruning (Niels Jespersen <NJN@dst.dk>)
Ответы	Re: force partition pruning (Vijaykumar Jain <vijaykumarjain.github@gmail.com>)
Список	pgsql-general

Дерево обсуждения

I do not know how to put this in words,

but see below when the predicate is explicitly applied to the main table with partition.

postgres=# \d+ prt1

Partitioned table "public.prt1"

--------+-------------------+-----------+----------+---------+----------+--------------+-------------

Partition key: RANGE (a)

Partitions: prt1_p1 FOR VALUES FROM (0) TO (250),

prt1_p2 FOR VALUES FROM (250) TO (500),

prt1_p3 FOR VALUES FROM (500) TO (600)

(failed reverse-i-search)`': ^C

postgres=# \d+ b

Table "public.b"

--------+---------+-----------+----------+---------+---------+--------------+-------------

Indexes:

"b_id_idx" btree (id)

Access method: heap

postgres=# table b;

-----

200

400

(2 rows)

-- basically if the table is joined and predicate can be applied to the outer table which has constraints matching,

partition pruning takes place.

I do not know the theory, or even what i did is correct, but just FYI.

postgres=# explain analyze select prt1.* from prt1 where a in ( select id from b where id in (1, 100, 200) );

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------

Hash Semi Join (cost=1.05..9.36 rows=2 width=13) (actual time=0.034..0.074 rows=1 loops=1)

Hash Cond: (prt1.a = b.id)

-> Append (cost=0.00..7.50 rows=300 width=13) (actual time=0.006..0.043 rows=300 loops=1)

-> Seq Scan on prt1_p1 prt1_1 (cost=0.00..2.25 rows=125 width=13) (actual time=0.005..0.013 rows=125 loops=1)

-> Seq Scan on prt1_p2 prt1_2 (cost=0.00..2.25 rows=125 width=13) (actual time=0.003..0.009 rows=125 loops=1)

-> Seq Scan on prt1_p3 prt1_3 (cost=0.00..1.50 rows=50 width=13) (actual time=0.002..0.004 rows=50 loops=1)

-> Hash (cost=1.03..1.03 rows=2 width=4) (actual time=0.005..0.005 rows=1 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on b (cost=0.00..1.03 rows=2 width=4) (actual time=0.003..0.003 rows=1 loops=1)

Filter: (id = ANY ('{1,100,200}'::integer[]))

Rows Removed by Filter: 1

Planning Time: 0.181 ms

Execution Time: 0.089 ms

(13 rows)

postgres=# explain analyze select prt1.* from prt1 where a in ( select id from b where b.id = prt1.a) and a in (1, 100, 200);

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------

Index Scan using iprt1_p1_a on prt1_p1 prt1 (cost=0.14..14.03 rows=2 width=13) (actual time=0.024..0.025 rows=1 loops=1)

Index Cond: (a = ANY ('{1,100,200}'::integer[]))

Filter: (SubPlan 1)

Rows Removed by Filter: 1

SubPlan 1

-> Seq Scan on b (cost=0.00..1.02 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=2)

Filter: (id = prt1.a)

Rows Removed by Filter: 1

Planning Time: 0.120 ms

Execution Time: 0.041 ms

(10 rows)

postgres=# explain analyze select prt1.* from prt1 where exists ( select 1 from b where b.id = prt1.a) and a in (1, 100, 200);

QUERY PLAN

-------------------------------------------------------------------------------------------------------------

Hash Semi Join (cost=1.04..3.79 rows=2 width=13) (actual time=0.024..0.028 rows=1 loops=1)

Hash Cond: (prt1.a = b.id)

-> Seq Scan on prt1_p1 prt1 (cost=0.00..2.72 rows=3 width=13) (actual time=0.011..0.017 rows=2 loops=1)

Filter: (a = ANY ('{1,100,200}'::integer[]))

Rows Removed by Filter: 123

-> Hash (cost=1.02..1.02 rows=2 width=4) (actual time=0.004..0.004 rows=2 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on b (cost=0.00..1.02 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1)

Planning Time: 0.192 ms

Execution Time: 0.043 ms

(10 rows)

postgres=# explain analyze select prt1.* from prt1 inner join b on prt1.a = b.id where a in (1, 100, 200);

QUERY PLAN

-------------------------------------------------------------------------------------------------------------

Hash Join (cost=1.04..3.79 rows=2 width=13) (actual time=0.024..0.028 rows=1 loops=1)

Hash Cond: (prt1.a = b.id)

-> Seq Scan on prt1_p1 prt1 (cost=0.00..2.72 rows=3 width=13) (actual time=0.012..0.018 rows=2 loops=1)

Filter: (a = ANY ('{1,100,200}'::integer[]))

Rows Removed by Filter: 123

-> Hash (cost=1.02..1.02 rows=2 width=4) (actual time=0.003..0.004 rows=2 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on b (cost=0.00..1.02 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1)

Planning Time: 0.181 ms

Execution Time: 0.043 ms

(10 rows)

On Mon, 10 May 2021 at 17:09, Niels Jespersen <NJN@dst.dk> wrote:

Hi all

I need a litte advice on how to

Postgres 13.2

A metadata query pulls partition keys:

select m.period_version from register.register_metadata m where m.current_version and m.period between '201712' and '201912' ;

A query using these in an in-list easily makes the planner do partition pruning.

select * from register.register d where d.period_version in ('201712_1', '201812_1', '201912_1');

However combining the metadataquery into the dataquery makes the planner decide to scan all partitions.

select * from register.register d where d.period_version in (select m.period_version from register.register_metadata m where m.current_version and m.period between '201712' and '201912');

I am quite aware that the latter query requires partition pruning to take place during execution not during planning.

My question here is how do I package the two-step proces into an interface that analysts can actually use?

One possibility is to have a prepare step that creates a temporary view with the hard-coded values built-in. And then query data via the temp view. This works ok, but there is an issue with possible naming conflicts on the temp view (not that this could not be worked around).

Ideally I would like a function to figure out the query and then return the data from that dynamically executed query. Complicating matters is the fact that there are more than one set of data/metatable tables and each datatable has a different set of columns. This excludes a table returning function since that must list the columns present.

Best regards

Niels Jespersen
Chief Adviser
IT Center

Mobile phone:+45 42 42 93 73
Email: njn@dst.dk

Statistics Denmark, Sejrøgade 11, DK-2100 Copenhagen
www.dst.dk/en | Twitter | LinkedIn | Facebook