How to hint two columns IS NOT DISTINCT FROM each other in a join

Поиск
Список
Период
Сортировка
От Kim Rose Carlsen
Тема How to hint two columns IS NOT DISTINCT FROM each other in a join
Дата
Msg-id AM4PR0501MB26109031E72CA34CF811E403C7AA0@AM4PR0501MB2610.eurprd05.prod.outlook.com
обсуждение исходный текст
Ответы Re: How to hint two columns IS NOT DISTINCT FROM each other in a join  (Kim Rose Carlsen <krc@hiper.dk>)
Список pgsql-general

Hi


I was wondering if there is a way to hint that two columns in two different tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if table_a.key = 'test' THEN table_b.key = 'test' .


The equals operator already does this but it does not handle NULLS very well (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and doesn't establish the same inference rules as equals.


Example:


CREATE TABLE a (

  id INTEGER PRIMARY KEY,

  key VARCHAR,

  value VARCHAR

);


CREATE INDEX ON a (key);


INSERT INTO a 

  VALUES (1, 'test', 'test'), (2, 'foo', 'bar'), (3, null, null), (4, 'baz', 'qoz');


CREATE VIEW view_a AS (

  SELECT table_a.id,

         table_a.key, 

         table_a.value,

         table_b.key as b_key

    FROM a AS table_a

    JOIN a AS table_b

      ON table_a.id = table_b.id

);


CREATE VIEW view_a_eq AS (

  SELECT table_a.id,

         table_a.key, 

         table_a.value,

         table_b.key as b_key

    FROM a AS table_a

    JOIN a AS table_b

      ON table_a.id = table_b.id

     AND table_a.key = table_b.key

);


CREATE VIEW view_a_distinct AS (

  SELECT table_a.id,

         table_a.key, 

         table_a.value,

         table_b.key as b_key

    FROM a AS table_a

    JOIN a AS table_b

      ON table_a.id = table_b.id

     AND table_a.key IS NOT DISTINCT FROM table_b.key

);


EXPLAIN SELECT * FROM view_a WHERE key = 'test';

                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Hash Join  (cost=12.69..34.42 rows=4 width=100)
   Hash Cond: (table_b.id = table_a.id)
   ->  Seq Scan on a table_b  (cost=0.00..18.50 rows=850 width=36)
   ->  Hash  (cost=12.64..12.64 rows=4 width=68)
         ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
               Recheck Cond: ((key)::text = 'test'::text)
               ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 width=0)
                     Index Cond: ((key)::text = 'test'::text)


We only get index scan on table_a


EXPLAIN SELECT * FROM view_a_eq WHERE key = 'test';

                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Nested Loop  (cost=8.36..25.53 rows=1 width=100)
   Join Filter: (table_a.id = table_b.id)
   ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
         Recheck Cond: ((key)::text = 'test'::text)
         ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 width=0)
               Index Cond: ((key)::text = 'test'::text)
   ->  Materialize  (cost=4.18..12.66 rows=4 width=36)
         ->  Bitmap Heap Scan on a table_b  (cost=4.18..12.64 rows=4 width=36)
               Recheck Cond: ((key)::text = 'test'::text)
               ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 width=0)
                     Index Cond: ((key)::text = 'test'::text)
We get index scan on both tables and the where clause is pushed all the way down

EXPLAIN SELECT * FROM view_a_distinct WHERE key = 'test';

                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Hash Join  (cost=12.69..34.43 rows=1 width=100)
   Hash Cond: (table_b.id = table_a.id)
   Join Filter: (NOT ((table_a.key)::text IS DISTINCT FROM (table_b.key)::text))
   ->  Seq Scan on a table_b  (cost=0.00..18.50 rows=850 width=36)
   ->  Hash  (cost=12.64..12.64 rows=4 width=68)
         ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
               Recheck Cond: ((key)::text = 'test'::text)
               ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 width=0)
                     Index Cond: ((key)::text = 'test'::text)
Same as the first example

In these examples it really doesn't matter which plan is used. But for larger view it might be a great hint for the optimizer to know that since we are joining on a primary key we could hint that all other columns for the table is actually NOT DISTINCT from each other. This will result in the planner being able to push the WHERE condition down into the other joined tables.

It works well for the = operator, but it is easy to see the it fails as soon as we asks about nulls.

SELECT * FROM view_a_eq WHERE key IS NULL;
 id | key | value | b_key 
----+-----+-------+-------
(0 rows)

Maybe my examples are too simple, but I hope you can verify the same will hold for a much larger table with 2 different views on top of them.

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: libpq heartbeat
Следующее
От: Francisco Olarte
Дата:
Сообщение: Re: libpq heartbeat