Обсуждение: Is it worth pushing conditions to sublink/subplan?

Поиск
Список
Период
Сортировка

Is it worth pushing conditions to sublink/subplan?

От
Wenjing
Дата:
Hi Hackers,

Recently, a issue has been bothering me, This is about conditional push-down in SQL.
I use cases from regression testing as an example.
I found that the conditions  (B =1)  can be pushed down into the subquery, However, it cannot be pushed down to sublink/subplan.
If a sublink/subplan clause contains a partition table, it can be useful to get the conditions for pruning.
So, is it worth pushing conditions to sublink/subplan?
Anybody have any ideas?


regards,
Wenjing


example:
create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create table q (a int, b int, c int) partition by list (a);
create table q1 partition of q for values in (1) partition by list (b);
create table q11 partition of q1 for values in (1) partition by list (c);
create table q111 partition of q11 for values in (1);
create table q2 partition of q for values in (2) partition by list (b);
create table q21 partition of q2 for values in (1);
create table q22 partition of q2 for values in (2);
insert into q22 values (2, 2, 3);


postgres-# explain (costs off)
postgres-# select temp.b  from 
postgres-# (
postgres(# select a,b from ab x where x.a = 1
postgres(# union all 
postgres(# (values(1,1)) 
postgres(# ) temp,
postgres-# ab y
postgres-# where  y.b = temp.b and y.a = 1 and y.b=1;
                    QUERY PLAN                     
---------------------------------------------------
 Nested Loop
   ->  Seq Scan on ab_a1_b1 y
         Filter: ((b = 1) AND (a = 1))
   ->  Append
         ->  Subquery Scan on "*SELECT* 1"
               ->  Seq Scan on ab_a1_b1 x
                     Filter: ((a = 1) AND (b = 1))
         ->  Result
(8 rows)

The conditions  (B =1)  can be pushed down into the subquery.

postgres=# explain (costs off)
postgres-# select
postgres-# y.a,
postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b
postgres-# from ab y where a = 1 and b = 1;
                    QUERY PLAN                     
---------------------------------------------------
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1))
   SubPlan 1
     ->  Append
           ->  Seq Scan on ab_a1_b1 x_1
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b2 x_2
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b3 x_3
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b1 x_4
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b2 x_5
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b3 x_6
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b1 x_7
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b2 x_8
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b3 x_9
                 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in targetlist.

postgres=# explain (costs off)
postgres-# select y.a
postgres-# from ab y 
postgres-# where
postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and
postgres-# y.a = 1 and y.b = 1;
                    QUERY PLAN                     
---------------------------------------------------
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1) AND (SubPlan 1))
   SubPlan 1
     ->  Append
           ->  Seq Scan on ab_a1_b1 x_1
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b2 x_2
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b3 x_3
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b1 x_4
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b2 x_5
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b3 x_6
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b1 x_7
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b2 x_8
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b3 x_9
                 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions  (B=1 and A=1)  cannot be pushed down to sublink/subplan in where clause.



Вложения

Re: Is it worth pushing conditions to sublink/subplan?

От
Wenjing
Дата:


2021年8月16日 17:15,Wenjing <wenjing.zwj@alibaba-inc.com> 写道:

Hi Hackers,

Recently, a issue has been bothering me, This is about conditional push-down in SQL.
I use cases from regression testing as an example.
I found that the conditions  (B =1)  can be pushed down into the subquery, However, it cannot be pushed down to sublink/subplan.
If a sublink/subplan clause contains a partition table, it can be useful to get the conditions for pruning.
So, is it worth pushing conditions to sublink/subplan?
Anybody have any ideas?


regards,
Wenjing


example:
create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create table q (a int, b int, c int) partition by list (a);
create table q1 partition of q for values in (1) partition by list (b);
create table q11 partition of q1 for values in (1) partition by list (c);
create table q111 partition of q11 for values in (1);
create table q2 partition of q for values in (2) partition by list (b);
create table q21 partition of q2 for values in (1);
create table q22 partition of q2 for values in (2);
insert into q22 values (2, 2, 3);
Sorry, I messed up the structure of the table.
It is should be:
create table ab (a int not null, b int not null) partition by list (a);
create table ab_a2 partition of ab for values in(2) partition by list (b);
create table ab_a2_b1 partition of ab_a2 for values in (1);
create table ab_a2_b2 partition of ab_a2 for values in (2);
create table ab_a2_b3 partition of ab_a2 for values in (3);
create table ab_a1 partition of ab for values in(1) partition by list (b);
create table ab_a1_b1 partition of ab_a1 for values in (1);
create table ab_a1_b2 partition of ab_a1 for values in (2);
create table ab_a1_b3 partition of ab_a1 for values in (3);
create table ab_a3 partition of ab for values in(3) partition by list (b);
create table ab_a3_b1 partition of ab_a3 for values in (1);
create table ab_a3_b2 partition of ab_a3 for values in (2);
create table ab_a3_b3 partition of ab_a3 for values in (3);




postgres-# explain (costs off)
postgres-# select temp.b  from 
postgres-# (
postgres(# select a,b from ab x where x.a = 1
postgres(# union all 
postgres(# (values(1,1)) 
postgres(# ) temp,
postgres-# ab y
postgres-# where  y.b = temp.b and y.a = 1 and y.b=1;
                    QUERY PLAN                     
---------------------------------------------------
 Nested Loop
   ->  Seq Scan on ab_a1_b1 y
         Filter: ((b = 1) AND (a = 1))
   ->  Append
         ->  Subquery Scan on "*SELECT* 1"
               ->  Seq Scan on ab_a1_b1 x
                     Filter: ((a = 1) AND (b = 1))
         ->  Result
(8 rows)

The conditions  (B =1)  can be pushed down into the subquery.

postgres=# explain (costs off)
postgres-# select
postgres-# y.a,
postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b
postgres-# from ab y where a = 1 and b = 1;
                    QUERY PLAN                     
---------------------------------------------------
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1))
   SubPlan 1
     ->  Append
           ->  Seq Scan on ab_a1_b1 x_1
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b2 x_2
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b3 x_3
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b1 x_4
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b2 x_5
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b3 x_6
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b1 x_7
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b2 x_8
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b3 x_9
                 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in targetlist.

postgres=# explain (costs off)
postgres-# select y.a
postgres-# from ab y 
postgres-# where
postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and
postgres-# y.a = 1 and y.b = 1;
                    QUERY PLAN                     
---------------------------------------------------
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1) AND (SubPlan 1))
   SubPlan 1
     ->  Append
           ->  Seq Scan on ab_a1_b1 x_1
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b2 x_2
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b3 x_3
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b1 x_4
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b2 x_5
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b3 x_6
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b1 x_7
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b2 x_8
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b3 x_9
                 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions  (B=1 and A=1)  cannot be pushed down to sublink/subplan in where clause.




Вложения

Re: Is it worth pushing conditions to sublink/subplan?

От
li jie
Дата:
Indeed, this may be useful for partition pruning.
I am also curious about why this has not been achieved.

Wenjing <wenjing.zwj@alibaba-inc.com> 于2021年8月23日周一 上午10:46写道:
Hi Hackers,

Recently, a issue has been bothering me, This is about conditional push-down in SQL.
I use cases from regression testing as an example.
I found that the conditions  (B =1)  can be pushed down into the subquery, However, it cannot be pushed down to sublink/subplan.
If a sublink/subplan clause contains a partition table, it can be useful to get the conditions for pruning.
So, is it worth pushing conditions to sublink/subplan?
Anybody have any ideas?


regards,
Wenjing


example:
create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create table q (a int, b int, c int) partition by list (a);
create table q1 partition of q for values in (1) partition by list (b);
create table q11 partition of q1 for values in (1) partition by list (c);
create table q111 partition of q11 for values in (1);
create table q2 partition of q for values in (2) partition by list (b);
create table q21 partition of q2 for values in (1);
create table q22 partition of q2 for values in (2);
insert into q22 values (2, 2, 3);


postgres-# explain (costs off)
postgres-# select temp.b  from 
postgres-# (
postgres(# select a,b from ab x where x.a = 1
postgres(# union all 
postgres(# (values(1,1)) 
postgres(# ) temp,
postgres-# ab y
postgres-# where  y.b = temp.b and y.a = 1 and y.b=1;
                    QUERY PLAN                     
---------------------------------------------------
 Nested Loop
   ->  Seq Scan on ab_a1_b1 y
         Filter: ((b = 1) AND (a = 1))
   ->  Append
         ->  Subquery Scan on "*SELECT* 1"
               ->  Seq Scan on ab_a1_b1 x
                     Filter: ((a = 1) AND (b = 1))
         ->  Result
(8 rows)

The conditions  (B =1)  can be pushed down into the subquery.

postgres=# explain (costs off)
postgres-# select
postgres-# y.a,
postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b
postgres-# from ab y where a = 1 and b = 1;
                    QUERY PLAN                     
---------------------------------------------------
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1))
   SubPlan 1
     ->  Append
           ->  Seq Scan on ab_a1_b1 x_1
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b2 x_2
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b3 x_3
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b1 x_4
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b2 x_5
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b3 x_6
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b1 x_7
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b2 x_8
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b3 x_9
                 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in targetlist.

postgres=# explain (costs off)
postgres-# select y.a
postgres-# from ab y 
postgres-# where
postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and
postgres-# y.a = 1 and y.b = 1;
                    QUERY PLAN                     
---------------------------------------------------
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1) AND (SubPlan 1))
   SubPlan 1
     ->  Append
           ->  Seq Scan on ab_a1_b1 x_1
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b2 x_2
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b3 x_3
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b1 x_4
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b2 x_5
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b3 x_6
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b1 x_7
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b2 x_8
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b3 x_9
                 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions  (B=1 and A=1)  cannot be pushed down to sublink/subplan in where clause.



Re: Is it worth pushing conditions to sublink/subplan?

От
shawn wang
Дата:
I tested it the way you said and increased the number of sub-tables.
I created a hash partition table of 1000 sub-tables.
Test according to your first SQL, the optimizer cuts the unnecessary sub-tables well.
You can see the plan:

postgres=# explain analyze

postgres-# select temp.p1  from 

postgres-# (

postgres(# select p1,p2 from test1.test1hashtable x where x.p1 = '1'

postgres(# union all 

postgres(# (values('1','1')) 

postgres(# ) temp,

postgres-# test1.test1hashtable y

postgres-# where  y.p2 = temp.p2 and y.p1 = '1' and y.p1='1';

                                                     QUERY PLAN                                                      

---------------------------------------------------------------------------------------------------------------------

 Nested Loop  (cost=0.00..25.55 rows=1 width=32) (actual time=0.004..0.004 rows=0 loops=1)

   Join Filter: (x.p2 = y.p2)

   ->  Seq Scan on test1hashtable826 y  (cost=0.00..12.75 rows=1 width=32) (actual time=0.002..0.002 rows=0 loops=1)

         Filter: (p1 = '1'::text)

   ->  Append  (cost=0.00..12.78 rows=2 width=64) (never executed)

         ->  Seq Scan on test1hashtable826 x  (cost=0.00..12.75 rows=1 width=64) (never executed)

               Filter: (p1 = '1'::text)

         ->  Result  (cost=0.00..0.01 rows=1 width=64) (never executed)

 Planning Time: 0.158 ms

 Execution Time: 0.022 ms

(10 rows)


But when the second one runs, the planning time reaches 13.942ms.
The plan:


postgres=# explain analyze

postgres-# select

postgres-# y.p1,

postgres-# (Select x.p2 from test1.test1hashtable x where y.p1 =x.p1 and y.p2=x.p2) as b

postgres-# from test1.test1hashtable y where p1 = '1' and p2 = '1';

                                                    QUERY PLAN                                                    

------------------------------------------------------------------------------------------------------------------

 Seq Scan on test1hashtable826 y  (cost=0.00..13318.30 rows=1 width=64) (actual time=0.004..0.047 rows=0 loops=1)

   Filter: ((p1 = '1'::text) AND (p2 = '1'::text))

   SubPlan 1

     ->  Append  (cost=0.00..13305.00 rows=1000 width=32) (never executed)

           ->  Seq Scan on test1hashtable1 x_1  (cost=0.00..13.30 rows=1 width=32) (never executed)

                 Filter: ((y.p1 = p1) AND (y.p2 = p2))

           ->  Seq Scan on test1hashtable1000 x_1000  (cost=0.00..13.30 rows=1 width=32) (never executed)

                 Filter: ((y.p1 = p1) AND (y.p2 = p2))

 Planning Time: 13.942 ms

 Execution Time: 4.899 ms

(2006 rows)


This is a very worthwhile thing to do. In a relatively large business system, a large number of partition tables and high concurrency are often used. If the planning time is too long, this will greatly affect the business.


regards,

Shawn.


Wenjing <wenjing.zwj@alibaba-inc.com> 于2021年8月17日周二 上午10:31写道:


2021年8月16日 17:15,Wenjing <wenjing.zwj@alibaba-inc.com> 写道:

Hi Hackers,

Recently, a issue has been bothering me, This is about conditional push-down in SQL.
I use cases from regression testing as an example.
I found that the conditions  (B =1)  can be pushed down into the subquery, However, it cannot be pushed down to sublink/subplan.
If a sublink/subplan clause contains a partition table, it can be useful to get the conditions for pruning.
So, is it worth pushing conditions to sublink/subplan?
Anybody have any ideas?


regards,
Wenjing


example:
create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create table q (a int, b int, c int) partition by list (a);
create table q1 partition of q for values in (1) partition by list (b);
create table q11 partition of q1 for values in (1) partition by list (c);
create table q111 partition of q11 for values in (1);
create table q2 partition of q for values in (2) partition by list (b);
create table q21 partition of q2 for values in (1);
create table q22 partition of q2 for values in (2);
insert into q22 values (2, 2, 3);
Sorry, I messed up the structure of the table.
It is should be:
create table ab (a int not null, b int not null) partition by list (a);
create table ab_a2 partition of ab for values in(2) partition by list (b);
create table ab_a2_b1 partition of ab_a2 for values in (1);
create table ab_a2_b2 partition of ab_a2 for values in (2);
create table ab_a2_b3 partition of ab_a2 for values in (3);
create table ab_a1 partition of ab for values in(1) partition by list (b);
create table ab_a1_b1 partition of ab_a1 for values in (1);
create table ab_a1_b2 partition of ab_a1 for values in (2);
create table ab_a1_b3 partition of ab_a1 for values in (3);
create table ab_a3 partition of ab for values in(3) partition by list (b);
create table ab_a3_b1 partition of ab_a3 for values in (1);
create table ab_a3_b2 partition of ab_a3 for values in (2);
create table ab_a3_b3 partition of ab_a3 for values in (3);




postgres-# explain (costs off)
postgres-# select temp.b  from 
postgres-# (
postgres(# select a,b from ab x where x.a = 1
postgres(# union all 
postgres(# (values(1,1)) 
postgres(# ) temp,
postgres-# ab y
postgres-# where  y.b = temp.b and y.a = 1 and y.b=1;
                    QUERY PLAN                     
---------------------------------------------------
 Nested Loop
   ->  Seq Scan on ab_a1_b1 y
         Filter: ((b = 1) AND (a = 1))
   ->  Append
         ->  Subquery Scan on "*SELECT* 1"
               ->  Seq Scan on ab_a1_b1 x
                     Filter: ((a = 1) AND (b = 1))
         ->  Result
(8 rows)

The conditions  (B =1)  can be pushed down into the subquery.

postgres=# explain (costs off)
postgres-# select
postgres-# y.a,
postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b
postgres-# from ab y where a = 1 and b = 1;
                    QUERY PLAN                     
---------------------------------------------------
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1))
   SubPlan 1
     ->  Append
           ->  Seq Scan on ab_a1_b1 x_1
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b2 x_2
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b3 x_3
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b1 x_4
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b2 x_5
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b3 x_6
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b1 x_7
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b2 x_8
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b3 x_9
                 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in targetlist.

postgres=# explain (costs off)
postgres-# select y.a
postgres-# from ab y 
postgres-# where
postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and
postgres-# y.a = 1 and y.b = 1;
                    QUERY PLAN                     
---------------------------------------------------
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1) AND (SubPlan 1))
   SubPlan 1
     ->  Append
           ->  Seq Scan on ab_a1_b1 x_1
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b2 x_2
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b3 x_3
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b1 x_4
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b2 x_5
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b3 x_6
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b1 x_7
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b2 x_8
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b3 x_9
                 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions  (B=1 and A=1)  cannot be pushed down to sublink/subplan in where clause.




回复:Re: Is it worth pushing conditions to sublink/subplan?

От
"曾文旌(义从)"
Дата:
Hi Hackers

For my previous proposal, I developed a prototype and passed regression testing.
It works similarly to subquery's qual pushdown. We know that sublink expands
at the beginning of each level of query. At this stage, The query's conditions and
equivalence classes are not processed. But after generate_base_implied_equalities
the conditions are processed,  which is why qual can push down to subquery but sublink not.

My POC implementation chose to delay the sublink expansion in the SELECT clause (targetList)
and where clause. Specifically, it is delayed after generate_base_implied_equalities. Thus,
the equivalent conditions already established in the Up level query can be easily obtained
in the sublink expansion process (make_subplan). 

For example, if the up level query has a.id = 10 and the sublink query has a.id = b.id, then
we get b.id = 10 and push it down to the sublink quey. If b is a partitioned table and is
partitioned by id, then a large number of unrelated subpartitions are pruned out, 
This optimizes a significant amount of Planner and SQL execution time, especially
if the partitioned table has a large number of subpartitions and is what I want.

Currently, There were two SQL failures in the regression test, because the expansion
order of sublink was changed, which did not affect the execution result of SQL.

Look forward to your suggestions on this proposal.

Thanks

Wenjing



------------------原始邮件 ------------------
发件人:shawn wang <shawn.wang.pg@gmail.com>
发送时间:Wed Sep 1 10:54:50 2021
收件人:曾文旌(义从) <wenjing.zwj@alibaba-inc.com>
抄送:PostgreSQL Hackers <pgsql-hackers@postgresql.org>, wjzeng <wjzeng2012@gmail.com>
主题:Re: Is it worth pushing conditions to sublink/subplan?
I tested it the way you said and increased the number of sub-tables.
I created a hash partition table of 1000 sub-tables.
Test according to your first SQL, the optimizer cuts the unnecessary sub-tables well.
You can see the plan:

postgres=# explain analyze

postgres-# select temp.p1  from 

postgres-# (

postgres(# select p1,p2 from test1.test1hashtable x where x.p1 = '1'

postgres(# union all 

postgres(# (values('1','1')) 

postgres(# ) temp,

postgres-# test1.test1hashtable y

postgres-# where  y.p2 = temp.p2 and y.p1 = '1' and y.p1='1';

                                                     QUERY PLAN                                                      

---------------------------------------------------------------------------------------------------------------------

 Nested Loop  (cost=0.00..25.55 rows=1 width=32) (actual time=0.004..0.004 rows=0 loops=1)

   Join Filter: (x.p2 = y.p2)

   ->  Seq Scan on test1hashtable826 y  (cost=0.00..12.75 rows=1 width=32) (actual time=0.002..0.002 rows=0 loops=1)

         Filter: (p1 = '1'::text)

   ->  Append  (cost=0.00..12.78 rows=2 width=64) (never executed)

         ->  Seq Scan on test1hashtable826 x  (cost=0.00..12.75 rows=1 width=64) (never executed)

               Filter: (p1 = '1'::text)

         ->  Result  (cost=0.00..0.01 rows=1 width=64) (never executed)

 Planning Time: 0.158 ms

 Execution Time: 0.022 ms

(10 rows)


But when the second one runs, the planning time reaches 13.942ms.
The plan:


postgres=# explain analyze

postgres-# select

postgres-# y.p1,

postgres-# (Select x.p2 from test1.test1hashtable x where y.p1 =x.p1 and y.p2=x.p2) as b

postgres-# from test1.test1hashtable y where p1 = '1' and p2 = '1';

                                                    QUERY PLAN                                                    

------------------------------------------------------------------------------------------------------------------

 Seq Scan on test1hashtable826 y  (cost=0.00..13318.30 rows=1 width=64) (actual time=0.004..0.047 rows=0 loops=1)

   Filter: ((p1 = '1'::text) AND (p2 = '1'::text))

   SubPlan 1

     ->  Append  (cost=0.00..13305.00 rows=1000 width=32) (never executed)

           ->  Seq Scan on test1hashtable1 x_1  (cost=0.00..13.30 rows=1 width=32) (never executed)

                 Filter: ((y.p1 = p1) AND (y.p2 = p2))

           ->  Seq Scan on test1hashtable1000 x_1000  (cost=0.00..13.30 rows=1 width=32) (never executed)

                 Filter: ((y.p1 = p1) AND (y.p2 = p2))

 Planning Time: 13.942 ms

 Execution Time: 4.899 ms

(2006 rows)


This is a very worthwhile thing to do. In a relatively large business system, a large number of partition tables and high concurrency are often used. If the planning time is too long, this will greatly affect the business.


regards,

Shawn.


Wenjing <wenjing.zwj@alibaba-inc.com> 于2021年8月17日周二 上午10:31写道:


2021年8月16日 17:15,Wenjing <wenjing.zwj@alibaba-inc.com> 写道:

Hi Hackers,

Recently, a issue has been bothering me, This is about conditional push-down in SQL.
I use cases from regression testing as an example.
I found that the conditions  (B =1)  can be pushed down into the subquery, However, it cannot be pushed down to sublink/subplan.
If a sublink/subplan clause contains a partition table, it can be useful to get the conditions for pruning.
So, is it worth pushing conditions to sublink/subplan?
Anybody have any ideas?


regards,
Wenjing


example:
create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create table q (a int, b int, c int) partition by list (a);
create table q1 partition of q for values in (1) partition by list (b);
create table q11 partition of q1 for values in (1) partition by list (c);
create table q111 partition of q11 for values in (1);
create table q2 partition of q for values in (2) partition by list (b);
create table q21 partition of q2 for values in (1);
create table q22 partition of q2 for values in (2);
insert into q22 values (2, 2, 3);
Sorry, I messed up the structure of the table.
It is should be:
create table ab (a int not null, b int not null) partition by list (a);
create table ab_a2 partition of ab for values in(2) partition by list (b);
create table ab_a2_b1 partition of ab_a2 for values in (1);
create table ab_a2_b2 partition of ab_a2 for values in (2);
create table ab_a2_b3 partition of ab_a2 for values in (3);
create table ab_a1 partition of ab for values in(1) partition by list (b);
create table ab_a1_b1 partition of ab_a1 for values in (1);
create table ab_a1_b2 partition of ab_a1 for values in (2);
create table ab_a1_b3 partition of ab_a1 for values in (3);
create table ab_a3 partition of ab for values in(3) partition by list (b);
create table ab_a3_b1 partition of ab_a3 for values in (1);
create table ab_a3_b2 partition of ab_a3 for values in (2);
create table ab_a3_b3 partition of ab_a3 for values in (3);




postgres-# explain (costs off)
postgres-# select temp.b  from 
postgres-# (
postgres(# select a,b from ab x where x.a = 1
postgres(# union all 
postgres(# (values(1,1)) 
postgres(# ) temp,
postgres-# ab y
postgres-# where  y.b = temp.b and y.a = 1 and y.b=1;
                    QUERY PLAN                     
---------------------------------------------------
 Nested Loop
   ->  Seq Scan on ab_a1_b1 y
         Filter: ((b = 1) AND (a = 1))
   ->  Append
         ->  Subquery Scan on "*SELECT* 1"
               ->  Seq Scan on ab_a1_b1 x
                     Filter: ((a = 1) AND (b = 1))
         ->  Result
(8 rows)

The conditions  (B =1)  can be pushed down into the subquery.

postgres=# explain (costs off)
postgres-# select
postgres-# y.a,
postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b
postgres-# from ab y where a = 1 and b = 1;
                    QUERY PLAN                     
---------------------------------------------------
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1))
   SubPlan 1
     ->  Append
           ->  Seq Scan on ab_a1_b1 x_1
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b2 x_2
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b3 x_3
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b1 x_4
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b2 x_5
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b3 x_6
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b1 x_7
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b2 x_8
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b3 x_9
                 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in targetlist.

postgres=# explain (costs off)
postgres-# select y.a
postgres-# from ab y 
postgres-# where
postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and
postgres-# y.a = 1 and y.b = 1;
                    QUERY PLAN                     
---------------------------------------------------
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1) AND (SubPlan 1))
   SubPlan 1
     ->  Append
           ->  Seq Scan on ab_a1_b1 x_1
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b2 x_2
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a1_b3 x_3
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b1 x_4
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b2 x_5
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a2_b3 x_6
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b1 x_7
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b2 x_8
                 Filter: ((y.a = a) AND (y.b = b))
           ->  Seq Scan on ab_a3_b3 x_9
                 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions  (B=1 and A=1)  cannot be pushed down to sublink/subplan in where clause.




Вложения

Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

От
Tomas Vondra
Дата:
Hi,

On 12/7/21 10:44, 曾文旌(义从) wrote:
> Hi Hackers
> 
> For my previous proposal, I developed a prototype and passed
> regression testing. It works similarly to subquery's qual pushdown.
> We know that sublink expands at the beginning of each level of
> query. At this stage, The query's conditions and equivalence classes
> are not processed. But after generate_base_implied_equalities the
> conditions are processed,  which is why qual can push down to 
> subquery but sublink not.
> 
> My POC implementation chose to delay the sublink expansion in the
> SELECT clause (targetList) and where clause. Specifically, it is
> delayed after generate_base_implied_equalities. Thus, the equivalent
> conditions already established in the Up level query can be easily
> obtained in the sublink expansion process (make_subplan).
> 
> For example, if the up level query has a.id = 10 and the sublink
> query has a.id = b.id, then we get b.id = 10 and push it down to the
> sublink quey. If b is a partitioned table and is partitioned by id,
> then a large number of unrelated subpartitions are pruned out, This
> optimizes a significant amount of Planner and SQL execution time, 
> especially if the partitioned table has a large number of
> subpartitions and is what I want.
> 
> Currently, There were two SQL failures in the regression test,
> because the expansion order of sublink was changed, which did not
> affect the execution result of SQL.
> 
> Look forward to your suggestions on this proposal.
> 

I took a quick look, and while I don't see / can't think of any problems
with delaying it until after generating implied equalities, there seems
to be a number of gaps.


1) Are there any regression tests exercising this modified behavior?
Maybe there are, but if the only changes are due to change in order of
targetlist entries, that doesn't seem like a clear proof.

It'd be good to add a couple tests exercising both the positive and
negative case (i.e. when we can and can't pushdown a qual).


2) apparently, contrib/postgres_fdw does crash like this:

  #3  0x000000000077b412 in adjust_appendrel_attrs_mutator
      (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470
  470          Assert(!IsA(node, SubLink));
  (gdb) p node
  $1 = (Node *) 0x13f7ea0
  (gdb) p *node
  $2 = {type = T_SubLink}

  Backtrace attached.

3) various parts of the patch really need at least some comments, like:

  - try_push_outer_qual_to_sublink_query really needs some docs

  - new stuff at the end of initsplan.c


4) generate_base_implied_equalities

   shouldn't this

        if (ec->ec_processed)
            ;

   really be?

        if (ec->ec_processed)
            continue;

5) I'm not sure why we need the new ec_processed flag.

6) So we now have lazy_process_sublink callback? Does that mean we
expand sublinks in two places - sometimes lazily, sometimes not?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

回复:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

От
"曾文旌(义从)"
Дата:


------------------原始邮件 ------------------
发件人:Tomas Vondra <tomas.vondra@enterprisedb.com>
发送时间:Wed Dec 8 11:26:35 2021
收件人:曾文旌(义从) <wenjing.zwj@alibaba-inc.com>, shawn wang <shawn.wang.pg@gmail.com>, ggysxcq@gmail.com <ggysxcq@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>
抄送:wjzeng <wjzeng2012@gmail.com>
主题:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?
Hi,

On 12/7/21 10:44, 曾文旌(义从) wrote:
> Hi Hackers

> For my previous proposal, I developed a prototype and passed
> regression testing. It works similarly to subquery's qual pushdown.
> We know that sublink expands at the beginning of each level of
> query. At this stage, The query's conditions and equivalence classes
> are not processed. But after generate_base_implied_equalities the
> conditions are processed,  which is why qual can push down to 
> subquery but sublink not.

> My POC implementation chose to delay the sublink expansion in the
> SELECT clause (targetList) and where clause. Specifically, it is
> delayed after generate_base_implied_equalities. Thus, the equivalent
> conditions already established in the Up level query can be easily
> obtained in the sublink expansion process (make_subplan).

> For example, if the up level query has a.id = 10 and the sublink
> query has a.id = b.id, then we get b.id = 10 and push it down to the
> sublink quey. If b is a partitioned table and is partitioned by id,
> then a large number of unrelated subpartitions are pruned out, This
> optimizes a significant amount of Planner and SQL execution time, 
> especially if the partitioned table has a large number of
> subpartitions and is what I want.

> Currently, There were two SQL failures in the regression test,
> because the expansion order of sublink was changed, which did not
> affect the execution result of SQL.

> Look forward to your suggestions on this proposal.


I took a quick look, and while I don't see / can't think of any problems
with delaying it until after generating implied equalities, there seems
to be a number of gaps.

Thank you for your attention.

1) Are there any regression tests exercising this modified behavior?
Maybe there are, but if the only changes are due to change in order of
targetlist entries, that doesn't seem like a clear proof.

It'd be good to add a couple tests exercising both the positive and
negative case (i.e. when we can and can't pushdown a qual).

I added several samples to the regress(qual_pushdown_to_sublink.sql). 
and I used the partition table to show the plan status of qual being pushed down into sublink.
Hopefully this will help you understand the details of this patch. Later, I will add more cases.

2) apparently, contrib/postgres_fdw does crash like this:

  #3  0x000000000077b412 in adjust_appendrel_attrs_mutator
      (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470
  470          Assert(!IsA(node, SubLink));
  (gdb) p node
  $1 = (Node *) 0x13f7ea0
  (gdb) p *node
  $2 = {type = T_SubLink}

  Backtrace attached.

For the patch attached in the last email, I passed all the tests under src/test/regress.
As you pointed out, there was a problem with regression under contrib(in contrib/postgres_fdw). 
This time I fixed it and the current patch (V2) can pass the check-world.

3) various parts of the patch really need at least some comments, like:

  - try_push_outer_qual_to_sublink_query really needs some docs

  - new stuff at the end of initsplan.c

Ok, I added some comments and will add more. If you have questions about any details,
please point them out directly.

4) generate_base_implied_equalities

   shouldn't this

        if (ec->ec_processed)
            ;

   really be?

        if (ec->ec_processed)
            continue;

You are right. I fixed it.

5) I'm not sure why we need the new ec_processed flag.

I did this to eliminate duplicate equalities from the two generate_base_implied_equalities calls
1) I need the base equivalent expression generated after generate_base_implied_equalities,
which is used to pushdown qual to sublink(lazy_process_sublinks)
2) The expansion of sublink may result in an equivalent expression with parameters, such as a = $1,
which needs to deal with the equivalence classes again.
3) So, I added ec_processed and asked to process it again (generate_base_implied_equalities)
after the equivalence class changed (add_eq_member/process_equivalence).

Maybe you have a better suggestion, please let me know.

6) So we now have lazy_process_sublink callback? Does that mean we
expand sublinks in two places - sometimes lazily, sometimes not?

Yes, not all sublink is delayed. Let me explain this:
1) I added a GUC switch enable_lazy_process_sublink. If it is turned off, all lazy process sublink will not happen,
qual pushdown to sublink depend on lazy procee sublink, which means no quals will be pushed down.
2) Even  if enable_lazy_process_sublink = true If Query in this level contains some complex features,
sublink in this level query will not try do qual pushdown. (see function query_has_sublink_try_pushdown_qual).
I want to support a minimum subset first. Then consider complex features such as CTE/DML.
3) Finally, under conditions 1 and 2, all kinds of sublink contained in the SELECT clause or
WHERE clause will delays expansion and try pushdown qual. The sublink elsewhere in the SQL statement
does not delay process.

The current status meets my requirements for now. Of course, after this scheme is proved to be feasible, maybe
we can discuss that all sublinks are processed by overall delay, just like qual pushdown to subquery.

thanks

Wenjing



regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

От
Zhihong Yu
Дата:


On Sat, Dec 11, 2021 at 7:31 AM 曾文旌(义从) <wenjing.zwj@alibaba-inc.com> wrote:


------------------原始邮件 ------------------
发件人:Tomas Vondra <tomas.vondra@enterprisedb.com>
发送时间:Wed Dec 8 11:26:35 2021
抄送:wjzeng <wjzeng2012@gmail.com>
主题:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?
Hi,

On 12/7/21 10:44, 曾文旌(义从) wrote:
> Hi Hackers

> For my previous proposal, I developed a prototype and passed
> regression testing. It works similarly to subquery's qual pushdown.
> We know that sublink expands at the beginning of each level of
> query. At this stage, The query's conditions and equivalence classes
> are not processed. But after generate_base_implied_equalities the
> conditions are processed,  which is why qual can push down to 
> subquery but sublink not.

> My POC implementation chose to delay the sublink expansion in the
> SELECT clause (targetList) and where clause. Specifically, it is
> delayed after generate_base_implied_equalities. Thus, the equivalent
> conditions already established in the Up level query can be easily
> obtained in the sublink expansion process (make_subplan).

> For example, if the up level query has a.id = 10 and the sublink
> query has a.id = b.id, then we get b.id = 10 and push it down to the
> sublink quey. If b is a partitioned table and is partitioned by id,
> then a large number of unrelated subpartitions are pruned out, This
> optimizes a significant amount of Planner and SQL execution time, 
> especially if the partitioned table has a large number of
> subpartitions and is what I want.

> Currently, There were two SQL failures in the regression test,
> because the expansion order of sublink was changed, which did not
> affect the execution result of SQL.

> Look forward to your suggestions on this proposal.


I took a quick look, and while I don't see / can't think of any problems
with delaying it until after generating implied equalities, there seems
to be a number of gaps.

Thank you for your attention.

1) Are there any regression tests exercising this modified behavior?
Maybe there are, but if the only changes are due to change in order of
targetlist entries, that doesn't seem like a clear proof.

It'd be good to add a couple tests exercising both the positive and
negative case (i.e. when we can and can't pushdown a qual).

I added several samples to the regress(qual_pushdown_to_sublink.sql). 
and I used the partition table to show the plan status of qual being pushed down into sublink.
Hopefully this will help you understand the details of this patch. Later, I will add more cases.

2) apparently, contrib/postgres_fdw does crash like this:

  #3  0x000000000077b412 in adjust_appendrel_attrs_mutator
      (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470
  470          Assert(!IsA(node, SubLink));
  (gdb) p node
  $1 = (Node *) 0x13f7ea0
  (gdb) p *node
  $2 = {type = T_SubLink}

  Backtrace attached.

For the patch attached in the last email, I passed all the tests under src/test/regress.
As you pointed out, there was a problem with regression under contrib(in contrib/postgres_fdw). 
This time I fixed it and the current patch (V2) can pass the check-world.

3) various parts of the patch really need at least some comments, like:

  - try_push_outer_qual_to_sublink_query really needs some docs

  - new stuff at the end of initsplan.c

Ok, I added some comments and will add more. If you have questions about any details,
please point them out directly.

4) generate_base_implied_equalities

   shouldn't this

        if (ec->ec_processed)
            ;

   really be?

        if (ec->ec_processed)
            continue;

You are right. I fixed it.

5) I'm not sure why we need the new ec_processed flag.

I did this to eliminate duplicate equalities from the two generate_base_implied_equalities calls
1) I need the base equivalent expression generated after generate_base_implied_equalities,
which is used to pushdown qual to sublink(lazy_process_sublinks)
2) The expansion of sublink may result in an equivalent expression with parameters, such as a = $1,
which needs to deal with the equivalence classes again.
3) So, I added ec_processed and asked to process it again (generate_base_implied_equalities)
after the equivalence class changed (add_eq_member/process_equivalence).

Maybe you have a better suggestion, please let me know.

6) So we now have lazy_process_sublink callback? Does that mean we
expand sublinks in two places - sometimes lazily, sometimes not?

Yes, not all sublink is delayed. Let me explain this:
1) I added a GUC switch enable_lazy_process_sublink. If it is turned off, all lazy process sublink will not happen,
qual pushdown to sublink depend on lazy procee sublink, which means no quals will be pushed down.
2) Even  if enable_lazy_process_sublink = true If Query in this level contains some complex features,
sublink in this level query will not try do qual pushdown. (see function query_has_sublink_try_pushdown_qual).
I want to support a minimum subset first. Then consider complex features such as CTE/DML.
3) Finally, under conditions 1 and 2, all kinds of sublink contained in the SELECT clause or
WHERE clause will delays expansion and try pushdown qual. The sublink elsewhere in the SQL statement
does not delay process.

The current status meets my requirements for now. Of course, after this scheme is proved to be feasible, maybe
we can discuss that all sublinks are processed by overall delay, just like qual pushdown to subquery.

thanks

Wenjing



regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,

+       /* The outer var could exist in any of the upper-level queries to find these roots */

to find these roots -> so find these roots 

+           if (has_unexpand_sublink(root) && checkExprHasSubLink(node))

has_unexpand_sublink -> has_unexpanded_sublink

+   if (enable_lazy_process_sublink)
+       return true;

The above can be simplified to:

  return enable_lazy_process_sublink;

+           if (checkExprHasSubLink(qual))
+           {
+               qual = lazy_process_sublink_qual(root, qual);
+               newquals = lappend(newquals, qual);
+           }
+           else
+               newquals = lappend(newquals, qual);

Since the lappend() is common to both branches, you can remove the else clause. In the if block, only call lazy_process_sublink_qual().

+       /* under lazy process sublink, parent root may have some data that child not need, so set it to NULL */
+       subroot->join_info_list = NIL;

minor correction to the comment above:
  under lazy process sublink, parent root may have some data that child does not need, so set it to NIL

+void
+preprocess_qual_conditions(PlannerInfo *root, Node *jtnode, bool istop)

Please add a comment explaining the meaning of istop.

+       if (istop)
+           f->quals = preprocess_expression_ext(root, f->quals, EXPRKIND_QUAL, false);
+       else
+           f->quals = preprocess_expression(root, f->quals, EXPRKIND_QUAL);

I think the code would be more readable if you replace the preprocess_expression() call in else branch with call to preprocess_expression_ext().

+           context->root->unexpand_sublink_counter++;

unexpand_sublink_counter -> unexpanded_sublink_counter++

For sublink_query_push_qual(), the return at the end is not needed.

For condition_is_safe_pushdown_to_sublink, you can initialize context this way :

+   equal_expr_info_context context = {0};

+   if (cvar && cvar->varattno > 0 && equal(cvar, var))
+       return true;

The last few lines of condition_is_safe_pushdown_to_sublink() can be written as:

  return cvar && cvar->varattno > 0 && equal(cvar, var);

+       if (equal_expr_safety_check(node, &context))
+       {
+           /* It needs to be something like outer var = inner var */
+           if (context.inner_var &&

The nested if blocks can be merged into one if block.

Cheers

回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

От
"曾文旌(义从)"
Дата:


------------------原始邮件 ------------------
发件人:Zhihong Yu <zyu@yugabyte.com>
发送时间:Sun Dec 12 01:13:11 2021
收件人:曾文旌(义从) <wenjing.zwj@alibaba-inc.com>
抄送:Tomas Vondra <tomas.vondra@enterprisedb.com>, wjzeng <wjzeng2012@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, shawn wang <shawn.wang.pg@gmail.com>, ggysxcq@gmail.com <ggysxcq@gmail.com>
主题:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?


On Sat, Dec 11, 2021 at 7:31 AM 曾文旌(义从) <wenjing.zwj@alibaba-inc.com> wrote:


------------------原始邮件 ------------------
发件人:Tomas Vondra <tomas.vondra@enterprisedb.com>
发送时间:Wed Dec 8 11:26:35 2021
抄送:wjzeng <wjzeng2012@gmail.com>
主题:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?
Hi,

On 12/7/21 10:44, 曾文旌(义从) wrote:
> Hi Hackers

> For my previous proposal, I developed a prototype and passed
> regression testing. It works similarly to subquery's qual pushdown.
> We know that sublink expands at the beginning of each level of
> query. At this stage, The query's conditions and equivalence classes
> are not processed. But after generate_base_implied_equalities the
> conditions are processed,  which is why qual can push down to 
> subquery but sublink not.

> My POC implementation chose to delay the sublink expansion in the
> SELECT clause (targetList) and where clause. Specifically, it is
> delayed after generate_base_implied_equalities. Thus, the equivalent
> conditions already established in the Up level query can be easily
> obtained in the sublink expansion process (make_subplan).

> For example, if the up level query has a.id = 10 and the sublink
> query has a.id = b.id, then we get b.id = 10 and push it down to the
> sublink quey. If b is a partitioned table and is partitioned by id,
> then a large number of unrelated subpartitions are pruned out, This
> optimizes a significant amount of Planner and SQL execution time, 
> especially if the partitioned table has a large number of
> subpartitions and is what I want.

> Currently, There were two SQL failures in the regression test,
> because the expansion order of sublink was changed, which did not
> affect the execution result of SQL.

> Look forward to your suggestions on this proposal.


I took a quick look, and while I don't see / can't think of any problems
with delaying it until after generating implied equalities, there seems
to be a number of gaps.

Thank you for your attention.

1) Are there any regression tests exercising this modified behavior?
Maybe there are, but if the only changes are due to change in order of
targetlist entries, that doesn't seem like a clear proof.

It'd be good to add a couple tests exercising both the positive and
negative case (i.e. when we can and can't pushdown a qual).

I added several samples to the regress(qual_pushdown_to_sublink.sql). 
and I used the partition table to show the plan status of qual being pushed down into sublink.
Hopefully this will help you understand the details of this patch. Later, I will add more cases.

2) apparently, contrib/postgres_fdw does crash like this:

  #3  0x000000000077b412 in adjust_appendrel_attrs_mutator
      (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470
  470          Assert(!IsA(node, SubLink));
  (gdb) p node
  $1 = (Node *) 0x13f7ea0
  (gdb) p *node
  $2 = {type = T_SubLink}

  Backtrace attached.

For the patch attached in the last email, I passed all the tests under src/test/regress.
As you pointed out, there was a problem with regression under contrib(in contrib/postgres_fdw). 
This time I fixed it and the current patch (V2) can pass the check-world.

3) various parts of the patch really need at least some comments, like:

  - try_push_outer_qual_to_sublink_query really needs some docs

  - new stuff at the end of initsplan.c

Ok, I added some comments and will add more. If you have questions about any details,
please point them out directly.

4) generate_base_implied_equalities

   shouldn't this

        if (ec->ec_processed)
            ;

   really be?

        if (ec->ec_processed)
            continue;

You are right. I fixed it.

5) I'm not sure why we need the new ec_processed flag.

I did this to eliminate duplicate equalities from the two generate_base_implied_equalities calls
1) I need the base equivalent expression generated after generate_base_implied_equalities,
which is used to pushdown qual to sublink(lazy_process_sublinks)
2) The expansion of sublink may result in an equivalent expression with parameters, such as a = $1,
which needs to deal with the equivalence classes again.
3) So, I added ec_processed and asked to process it again (generate_base_implied_equalities)
after the equivalence class changed (add_eq_member/process_equivalence).

Maybe you have a better suggestion, please let me know.

6) So we now have lazy_process_sublink callback? Does that mean we
expand sublinks in two places - sometimes lazily, sometimes not?

Yes, not all sublink is delayed. Let me explain this:
1) I added a GUC switch enable_lazy_process_sublink. If it is turned off, all lazy process sublink will not happen,
qual pushdown to sublink depend on lazy procee sublink, which means no quals will be pushed down.
2) Even  if enable_lazy_process_sublink = true If Query in this level contains some complex features,
sublink in this level query will not try do qual pushdown. (see function query_has_sublink_try_pushdown_qual).
I want to support a minimum subset first. Then consider complex features such as CTE/DML.
3) Finally, under conditions 1 and 2, all kinds of sublink contained in the SELECT clause or
WHERE clause will delays expansion and try pushdown qual. The sublink elsewhere in the SQL statement
does not delay process.

The current status meets my requirements for now. Of course, after this scheme is proved to be feasible, maybe
we can discuss that all sublinks are processed by overall delay, just like qual pushdown to subquery.

thanks

Wenjing



regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,

+       /* The outer var could exist in any of the upper-level queries to find these roots */

to find these roots -> so find these roots 

+           if (has_unexpand_sublink(root) && checkExprHasSubLink(node))

has_unexpand_sublink -> has_unexpanded_sublink

+   if (enable_lazy_process_sublink)
+       return true;

The above can be simplified to:

  return enable_lazy_process_sublink;

+           if (checkExprHasSubLink(qual))
+           {
+               qual = lazy_process_sublink_qual(root, qual);
+               newquals = lappend(newquals, qual);
+           }
+           else
+               newquals = lappend(newquals, qual);

Since the lappend() is common to both branches, you can remove the else clause. In the if block, only call lazy_process_sublink_qual().

+       /* under lazy process sublink, parent root may have some data that child not need, so set it to NULL */
+       subroot->join_info_list = NIL;

minor correction to the comment above:
  under lazy process sublink, parent root may have some data that child does not need, so set it to NIL

+void
+preprocess_qual_conditions(PlannerInfo *root, Node *jtnode, bool istop)

Please add a comment explaining the meaning of istop.

+       if (istop)
+           f->quals = preprocess_expression_ext(root, f->quals, EXPRKIND_QUAL, false);
+       else
+           f->quals = preprocess_expression(root, f->quals, EXPRKIND_QUAL);

I think the code would be more readable if you replace the preprocess_expression() call in else branch with call to preprocess_expression_ext().

+           context->root->unexpand_sublink_counter++;

unexpand_sublink_counter -> unexpanded_sublink_counter++

For sublink_query_push_qual(), the return at the end is not needed.

For condition_is_safe_pushdown_to_sublink, you can initialize context this way :

+   equal_expr_info_context context = {0};
I don't understand the benefits of doing this. Please give me some hints.
We can also see a number of memset initializations, such as get_range_partbound_string()

+   if (cvar && cvar->varattno > 0 && equal(cvar, var))
+       return true;

The last few lines of condition_is_safe_pushdown_to_sublink() can be written as:

  return cvar && cvar->varattno > 0 && equal(cvar, var);

+       if (equal_expr_safety_check(node, &context))
+       {
+           /* It needs to be something like outer var = inner var */
+           if (context.inner_var &&

The nested if blocks can be merged into one if block.

Cheers

HI Zhihong Yu

Thank you for your attention.
Every suggestion you make makes the patch better.
I have completed the v3 patch according to your suggestions.
Looking forward to your feedback.


Wenjing

Вложения

回复:回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

От
"曾文旌(义从)"
Дата:

Fixed a bug found during testing.


Wenjing


------------------原始邮件 ------------------
发件人:曾文旌(义从) <wenjing.zwj@alibaba-inc.com>
发送时间:Sun Dec 12 20:51:08 2021
收件人:Zhihong Yu <zyu@yugabyte.com>
抄送:Tomas Vondra <tomas.vondra@enterprisedb.com>, wjzeng <wjzeng2012@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, shawn wang <shawn.wang.pg@gmail.com>, ggysxcq@gmail.com <ggysxcq@gmail.com>
主题:回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?


------------------原始邮件 ------------------
发件人:Zhihong Yu <zyu@yugabyte.com>
发送时间:Sun Dec 12 01:13:11 2021
收件人:曾文旌(义从) <wenjing.zwj@alibaba-inc.com>
抄送:Tomas Vondra <tomas.vondra@enterprisedb.com>, wjzeng <wjzeng2012@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, shawn wang <shawn.wang.pg@gmail.com>, ggysxcq@gmail.com <ggysxcq@gmail.com>
主题:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?


On Sat, Dec 11, 2021 at 7:31 AM 曾文旌(义从) <wenjing.zwj@alibaba-inc.com> wrote:


------------------原始邮件 ------------------
发件人:Tomas Vondra <tomas.vondra@enterprisedb.com>
发送时间:Wed Dec 8 11:26:35 2021
抄送:wjzeng <wjzeng2012@gmail.com>
主题:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?
Hi,

On 12/7/21 10:44, 曾文旌(义从) wrote:
> Hi Hackers

> For my previous proposal, I developed a prototype and passed
> regression testing. It works similarly to subquery's qual pushdown.
> We know that sublink expands at the beginning of each level of
> query. At this stage, The query's conditions and equivalence classes
> are not processed. But after generate_base_implied_equalities the
> conditions are processed,  which is why qual can push down to 
> subquery but sublink not.

> My POC implementation chose to delay the sublink expansion in the
> SELECT clause (targetList) and where clause. Specifically, it is
> delayed after generate_base_implied_equalities. Thus, the equivalent
> conditions already established in the Up level query can be easily
> obtained in the sublink expansion process (make_subplan).

> For example, if the up level query has a.id = 10 and the sublink
> query has a.id = b.id, then we get b.id = 10 and push it down to the
> sublink quey. If b is a partitioned table and is partitioned by id,
> then a large number of unrelated subpartitions are pruned out, This
> optimizes a significant amount of Planner and SQL execution time, 
> especially if the partitioned table has a large number of
> subpartitions and is what I want.

> Currently, There were two SQL failures in the regression test,
> because the expansion order of sublink was changed, which did not
> affect the execution result of SQL.

> Look forward to your suggestions on this proposal.


I took a quick look, and while I don't see / can't think of any problems
with delaying it until after generating implied equalities, there seems
to be a number of gaps.

Thank you for your attention.

1) Are there any regression tests exercising this modified behavior?
Maybe there are, but if the only changes are due to change in order of
targetlist entries, that doesn't seem like a clear proof.

It'd be good to add a couple tests exercising both the positive and
negative case (i.e. when we can and can't pushdown a qual).

I added several samples to the regress(qual_pushdown_to_sublink.sql). 
and I used the partition table to show the plan status of qual being pushed down into sublink.
Hopefully this will help you understand the details of this patch. Later, I will add more cases.

2) apparently, contrib/postgres_fdw does crash like this:

  #3  0x000000000077b412 in adjust_appendrel_attrs_mutator
      (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470
  470          Assert(!IsA(node, SubLink));
  (gdb) p node
  $1 = (Node *) 0x13f7ea0
  (gdb) p *node
  $2 = {type = T_SubLink}

  Backtrace attached.

For the patch attached in the last email, I passed all the tests under src/test/regress.
As you pointed out, there was a problem with regression under contrib(in contrib/postgres_fdw). 
This time I fixed it and the current patch (V2) can pass the check-world.

3) various parts of the patch really need at least some comments, like:

  - try_push_outer_qual_to_sublink_query really needs some docs

  - new stuff at the end of initsplan.c

Ok, I added some comments and will add more. If you have questions about any details,
please point them out directly.

4) generate_base_implied_equalities

   shouldn't this

        if (ec->ec_processed)
            ;

   really be?

        if (ec->ec_processed)
            continue;

You are right. I fixed it.

5) I'm not sure why we need the new ec_processed flag.

I did this to eliminate duplicate equalities from the two generate_base_implied_equalities calls
1) I need the base equivalent expression generated after generate_base_implied_equalities,
which is used to pushdown qual to sublink(lazy_process_sublinks)
2) The expansion of sublink may result in an equivalent expression with parameters, such as a = $1,
which needs to deal with the equivalence classes again.
3) So, I added ec_processed and asked to process it again (generate_base_implied_equalities)
after the equivalence class changed (add_eq_member/process_equivalence).

Maybe you have a better suggestion, please let me know.

6) So we now have lazy_process_sublink callback? Does that mean we
expand sublinks in two places - sometimes lazily, sometimes not?

Yes, not all sublink is delayed. Let me explain this:
1) I added a GUC switch enable_lazy_process_sublink. If it is turned off, all lazy process sublink will not happen,
qual pushdown to sublink depend on lazy procee sublink, which means no quals will be pushed down.
2) Even  if enable_lazy_process_sublink = true If Query in this level contains some complex features,
sublink in this level query will not try do qual pushdown. (see function query_has_sublink_try_pushdown_qual).
I want to support a minimum subset first. Then consider complex features such as CTE/DML.
3) Finally, under conditions 1 and 2, all kinds of sublink contained in the SELECT clause or
WHERE clause will delays expansion and try pushdown qual. The sublink elsewhere in the SQL statement
does not delay process.

The current status meets my requirements for now. Of course, after this scheme is proved to be feasible, maybe
we can discuss that all sublinks are processed by overall delay, just like qual pushdown to subquery.

thanks

Wenjing



regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,

+       /* The outer var could exist in any of the upper-level queries to find these roots */

to find these roots -> so find these roots 

+           if (has_unexpand_sublink(root) && checkExprHasSubLink(node))

has_unexpand_sublink -> has_unexpanded_sublink

+   if (enable_lazy_process_sublink)
+       return true;

The above can be simplified to:

  return enable_lazy_process_sublink;

+           if (checkExprHasSubLink(qual))
+           {
+               qual = lazy_process_sublink_qual(root, qual);
+               newquals = lappend(newquals, qual);
+           }
+           else
+               newquals = lappend(newquals, qual);

Since the lappend() is common to both branches, you can remove the else clause. In the if block, only call lazy_process_sublink_qual().

+       /* under lazy process sublink, parent root may have some data that child not need, so set it to NULL */
+       subroot->join_info_list = NIL;

minor correction to the comment above:
  under lazy process sublink, parent root may have some data that child does not need, so set it to NIL

+void
+preprocess_qual_conditions(PlannerInfo *root, Node *jtnode, bool istop)

Please add a comment explaining the meaning of istop.

+       if (istop)
+           f->quals = preprocess_expression_ext(root, f->quals, EXPRKIND_QUAL, false);
+       else
+           f->quals = preprocess_expression(root, f->quals, EXPRKIND_QUAL);

I think the code would be more readable if you replace the preprocess_expression() call in else branch with call to preprocess_expression_ext().

+           context->root->unexpand_sublink_counter++;

unexpand_sublink_counter -> unexpanded_sublink_counter++

For sublink_query_push_qual(), the return at the end is not needed.

For condition_is_safe_pushdown_to_sublink, you can initialize context this way :

+   equal_expr_info_context context = {0};
I don't understand the benefits of doing this. Please give me some hints.
We can also see a number of memset initializations, such as get_range_partbound_string()

+   if (cvar && cvar->varattno > 0 && equal(cvar, var))
+       return true;

The last few lines of condition_is_safe_pushdown_to_sublink() can be written as:

  return cvar && cvar->varattno > 0 && equal(cvar, var);

+       if (equal_expr_safety_check(node, &context))
+       {
+           /* It needs to be something like outer var = inner var */
+           if (context.inner_var &&

The nested if blocks can be merged into one if block.

Cheers

HI Zhihong Yu

Thank you for your attention.
Every suggestion you make makes the patch better.
I have completed the v3 patch according to your suggestions.
Looking forward to your feedback.


Wenjing

Вложения

Re: 回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

От
Zhihong Yu
Дата:


On Thu, Dec 23, 2021 at 3:52 AM 曾文旌(义从) <wenjing.zwj@alibaba-inc.com> wrote:

Fixed a bug found during testing.


Wenjing

Hi,
+           if (condition_is_safe_pushdown_to_sublink(rinfo, expr_info->outer))
+           {
+               /* replace qual expr from outer var = const to var = const and push down to sublink query */
+               sublink_query_push_qual(subquery, (Node *)copyObject(rinfo->clause), expr_info->outer, expr_info->inner);

Since sublink_query_push_qual() is always guarded by condition_is_safe_pushdown_to_sublink(), it seems sublink_query_push_qual() can be folded into condition_is_safe_pushdown_to_sublink().

For generate_base_implied_equalities():

+       if (ec->ec_processed)
+       {
+           ec_index++;
+           continue;
+       }
+       else if (list_length(ec->ec_members) > 1)

Minor comment: the keyword `else` can be omitted (due to `continue` above).

+            * Since there may be an unexpanded sublink in the targetList,
+            * we'll skip it for now.

Since there may be an -> If there is an

+       {"lazy_process_sublink", PGC_USERSET, QUERY_TUNING_METHOD,
+           gettext_noop("enable lazy process sublink."),

Looking at existing examples from src/backend/utils/misc/guc.c, enable_lazy_sublink_processing seems to be consistent with existing guc variable naming.

+lazy_process_sublinks(PlannerInfo *root, bool single_result_rte)

lazy_process_sublinks -> lazily_process_sublinks

+   else
+   {
    /* There shouldn't be any OJ info to translate, as yet */
    Assert(subroot->join_info_list == NIL);

Indentation for the else block is off.

+       if (istop)
+           f->quals = preprocess_expression_ext(root, f->quals, EXPRKIND_QUAL, false);
+       else
+           f->quals = preprocess_expression_ext(root, f->quals, EXPRKIND_QUAL, true);

The above can be written as:

+           f->quals = preprocess_expression_ext(root, f->quals, EXPRKIND_QUAL, !istop);

For find_equal_conditions_contain_uplevelvar_in_sublink_query():
+               context.has_unexpected_expr == false &&
`!context.has_unexpected_expr` should suffice

equal_expr_safety_check -> is_equal_expr_safe

Cheers

Re: 回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

От
wenjing
Дата:
I corrected it according to your suggestion.

thanks

Wenjing.


Zhihong Yu <zyu@yugabyte.com> 于2021年12月25日周六 02:26写道:


On Thu, Dec 23, 2021 at 3:52 AM 曾文旌(义从) <wenjing.zwj@alibaba-inc.com> wrote:

Fixed a bug found during testing.


Wenjing

Hi,
+           if (condition_is_safe_pushdown_to_sublink(rinfo, expr_info->outer))
+           {
+               /* replace qual expr from outer var = const to var = const and push down to sublink query */
+               sublink_query_push_qual(subquery, (Node *)copyObject(rinfo->clause), expr_info->outer, expr_info->inner);

Since sublink_query_push_qual() is always guarded by condition_is_safe_pushdown_to_sublink(), it seems sublink_query_push_qual() can be folded into condition_is_safe_pushdown_to_sublink().

For generate_base_implied_equalities():

+       if (ec->ec_processed)
+       {
+           ec_index++;
+           continue;
+       }
+       else if (list_length(ec->ec_members) > 1)

Minor comment: the keyword `else` can be omitted (due to `continue` above).

+            * Since there may be an unexpanded sublink in the targetList,
+            * we'll skip it for now.

Since there may be an -> If there is an

+       {"lazy_process_sublink", PGC_USERSET, QUERY_TUNING_METHOD,
+           gettext_noop("enable lazy process sublink."),

Looking at existing examples from src/backend/utils/misc/guc.c, enable_lazy_sublink_processing seems to be consistent with existing guc variable naming.

+lazy_process_sublinks(PlannerInfo *root, bool single_result_rte)

lazy_process_sublinks -> lazily_process_sublinks

+   else
+   {
    /* There shouldn't be any OJ info to translate, as yet */
    Assert(subroot->join_info_list == NIL);

Indentation for the else block is off.

+       if (istop)
+           f->quals = preprocess_expression_ext(root, f->quals, EXPRKIND_QUAL, false);
+       else
+           f->quals = preprocess_expression_ext(root, f->quals, EXPRKIND_QUAL, true);

The above can be written as:

+           f->quals = preprocess_expression_ext(root, f->quals, EXPRKIND_QUAL, !istop);

For find_equal_conditions_contain_uplevelvar_in_sublink_query():
+               context.has_unexpected_expr == false &&
`!context.has_unexpected_expr` should suffice

equal_expr_safety_check -> is_equal_expr_safe

Cheers

Вложения