Implement predicate propagation for non-equivalence clauses

Поиск
Список
Период
Сортировка
От Richard Guo
Тема Implement predicate propagation for non-equivalence clauses
Дата
Msg-id CAN_9JTykHxHqzUFiDNrXDO+u+uq9v9Ewcb=hVDAbDb33eoL7Yw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Implement predicate propagation for non-equivalence clauses  (Heikki Linnakangas <hlinnaka@iki.fi>)
Re: Implement predicate propagation for non-equivalence clauses  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

As we know, current planner will generate additional restriction clauses from
equivalence clauses. This will generally lower the total cost because some of
tuples may be filtered out before joins.

In this patch, we are trying to do the similar deduction, from non-equivalence
clauses, that is, A=B AND f(A) implies A=B AND f(A) and f(B), under some
restrictions on f. It can be turned on/off by GUC enable_predicate_propagation.
This is ported from Greenplum database.

The idea is to collect the non-equivalence clauses in a dedicated list. Then
for each RestrictInfo in the list, we replace its expression with another
equivalent one and make a new RestrictInfo. This is done after all equivalence
classes have been built. For example, if a RestrictInfo stands for A>10, and
if A and B are in the same equivalence class, we generate a new RestrictInfo by
replacing A with B, so we get B>10.

This patch will introduce extra cost for relation scan, due to the cost of
evaluating the new implied quals. Meanwhile, since the extra filter may reduce
the number of tuples returned by the scan, it may lower the cost of following
joins. So, whether we will get a better plan depends on the selectivity of the
implied quals.

Below is a simple demo, in which this patch will deduce an addition qual (b.i <
10 in the first query and b.i > 10 in the second query).

create table a (i int);
create table b (i int);
insert into a select generate_series(1,10000);
insert into b select generate_series(1,10000);
analyze a;
analyze b;


-- low selectivity

set enable_predicate_propagation to off;
explain select * from a,b where a.i = b.i and a.i < 10;
                          QUERY PLAN
---------------------------------------------------------------
 Hash Join  (cost=170.11..352.70 rows=9 width=8)
   Hash Cond: (b.i = a.i)
   ->  Seq Scan on b  (cost=0.00..145.00 rows=10000 width=4)
   ->  Hash  (cost=170.00..170.00 rows=9 width=4)
         ->  Seq Scan on a  (cost=0.00..170.00 rows=9 width=4)
               Filter: (i < 10)
(6 rows)

set enable_predicate_propagation to on;
gpadmin=# explain select * from a,b where a.i = b.i and a.i < 10;
                          QUERY PLAN
---------------------------------------------------------------
 Nested Loop  (cost=0.00..341.24 rows=1 width=8)
   Join Filter: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..170.00 rows=9 width=4)
         Filter: (i < 10)
   ->  Materialize  (cost=0.00..170.04 rows=9 width=4)
         ->  Seq Scan on b  (cost=0.00..170.00 rows=9 width=4)
               Filter: (i < 10)
(7 rows)



-- high selectivity

set enable_predicate_propagation to off;
gpadmin=# explain select * from a,b where a.i = b.i and a.i > 10;
                            QUERY PLAN
-------------------------------------------------------------------
 Hash Join  (cost=270.00..577.36 rows=9990 width=8)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..170.00 rows=9990 width=4)
         Filter: (i > 10)
   ->  Hash  (cost=145.00..145.00 rows=10000 width=4)
         ->  Seq Scan on b  (cost=0.00..145.00 rows=10000 width=4)
(6 rows)

set enable_predicate_propagation to on;
gpadmin=# explain select * from a,b where a.i = b.i and a.i > 10;
                            QUERY PLAN
------------------------------------------------------------------
 Hash Join  (cost=294.88..602.14 rows=9980 width=8)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..170.00 rows=9990 width=4)
         Filter: (i > 10)
   ->  Hash  (cost=170.00..170.00 rows=9990 width=4)
         ->  Seq Scan on b  (cost=0.00..170.00 rows=9990 width=4)
               Filter: (i > 10)
(7 rows)

In general, if the selectivity of the implied qual is low, the extra cost for
relation scan is more likely to be compromised by the cost reduced in join, and
vise versa.

Thanks
Richard
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Victor Wagner
Дата:
Сообщение: Re: Bug fix for glibc broke freebsd build in REL_11_STABLE
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Implement predicate propagation for non-equivalence clauses