How to hint two columns IS NOT DISTINCT FROM each other in a join

Поиск

Список

Период

Сортировка

От	Kim Rose Carlsen
Тема	How to hint two columns IS NOT DISTINCT FROM each other in a join
Дата	27 октября 2016 г. 19:35:07
Msg-id	AM4PR0501MB26109031E72CA34CF811E403C7AA0@AM4PR0501MB2610.eurprd05.prod.outlook.com обсуждение исходный текст
Ответы	Re: How to hint two columns IS NOT DISTINCT FROM each other in a join (Kim Rose Carlsen <krc@hiper.dk>)
Список	pgsql-general

Дерево обсуждения

I was wondering if there is a way to hint that two columns in two different tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if table_a.key = 'test' THEN table_b.key = 'test' .

The equals operator already does this but it does not handle NULLS very well (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and doesn't establish the same inference rules as equals.

Example:

CREATE TABLE a (

id INTEGER PRIMARY KEY,

key VARCHAR,

value VARCHAR

);

CREATE INDEX ON a (key);

INSERT INTO a

VALUES (1, 'test', 'test'), (2, 'foo', 'bar'), (3, null, null), (4, 'baz', 'qoz');

CREATE VIEW view_a AS (

SELECT table_a.id,

table_a.key,

table_a.value,

table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

ON table_a.id = table_b.id

);

CREATE VIEW view_a_eq AS (

SELECT table_a.id,

table_a.key,

table_a.value,

table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

ON table_a.id = table_b.id

AND table_a.key = table_b.key

);

CREATE VIEW view_a_distinct AS (

SELECT table_a.id,

table_a.key,

table_a.value,

table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

ON table_a.id = table_b.id

AND table_a.key IS NOT DISTINCT FROM table_b.key

);

EXPLAIN SELECT * FROM view_a WHERE key = 'test';

QUERY PLAN

------------------------------------------------------------------------------------

Hash Join (cost=12.69..34.42 rows=4 width=100)

Hash Cond: (table_b.id = table_a.id)

-> Seq Scan on a table_b (cost=0.00..18.50 rows=850 width=36)

-> Hash (cost=12.64..12.64 rows=4 width=68)

-> Bitmap Heap Scan on a table_a (cost=4.18..12.64 rows=4 width=68)

Recheck Cond: ((key)::text = 'test'::text)

-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)

Index Cond: ((key)::text = 'test'::text)

We only get index scan on table_a

EXPLAIN SELECT * FROM view_a_eq WHERE key = 'test';

QUERY PLAN

------------------------------------------------------------------------------------

Nested Loop (cost=8.36..25.53 rows=1 width=100)

Join Filter: (table_a.id = table_b.id)

-> Bitmap Heap Scan on a table_a (cost=4.18..12.64 rows=4 width=68)

Recheck Cond: ((key)::text = 'test'::text)

-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)

Index Cond: ((key)::text = 'test'::text)

-> Materialize (cost=4.18..12.66 rows=4 width=36)

-> Bitmap Heap Scan on a table_b (cost=4.18..12.64 rows=4 width=36)

Recheck Cond: ((key)::text = 'test'::text)

-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)

Index Cond: ((key)::text = 'test'::text)

We get index scan on both tables and the where clause is pushed all the way down

EXPLAIN SELECT * FROM view_a_distinct WHERE key = 'test';

QUERY PLAN

------------------------------------------------------------------------------------

Hash Join (cost=12.69..34.43 rows=1 width=100)

Hash Cond: (table_b.id = table_a.id)

Join Filter: (NOT ((table_a.key)::text IS DISTINCT FROM (table_b.key)::text))

-> Seq Scan on a table_b (cost=0.00..18.50 rows=850 width=36)

-> Hash (cost=12.64..12.64 rows=4 width=68)

-> Bitmap Heap Scan on a table_a (cost=4.18..12.64 rows=4 width=68)

Recheck Cond: ((key)::text = 'test'::text)

-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)

Index Cond: ((key)::text = 'test'::text)

Same as the first example

In these examples it really doesn't matter which plan is used. But for larger view it might be a great hint for the optimizer to know that since we are joining on a primary key we could hint that all other columns for the table is actually NOT DISTINCT from each other. This will result in the planner being able to push the WHERE condition down into the other joined tables.

It works well for the = operator, but it is easy to see the it fails as soon as we asks about nulls.

SELECT * FROM view_a_eq WHERE key IS NULL;

id | key | value | b_key

----+-----+-------+-------

(0 rows)

Maybe my examples are too simple, but I hope you can verify the same will hold for a much larger table with 2 different views on top of them.

В списке pgsql-general по дате отправления:

Предыдущее

От: Tom Lane
Дата: 27 октября 2016 г., 19:33:09
Сообщение: Re: libpq heartbeat

Следующее

От: Francisco Olarte
Дата: 27 октября 2016 г., 19:42:41
Сообщение: Re: libpq heartbeat

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

How to hint two columns IS NOT DISTINCT FROM each other in a join

Предыдущее

Следующее