Обсуждение: lost "left join"

Поиск
Список
Период
Сортировка

lost "left join"

От
Олег Самойлов
Дата:
Hi, all.

I got some mystic behaviour of PostgreSQL, perhaps this is a bug.

Tables is a model of a simple graph where parents and children related as «many to many». And there is a field «valid»
tovalidate for current moment. 

set search_path = 'left_join';
begin;
drop schema if exists left_join cascade;
create schema left_join;
create table node (
   node_id integer primary key generated always as identity,
   valid daterange not null
);
create table link (
   parent integer not null references node(node_id),
   child integer not null references node(node_id),
   valid daterange not null,
   primary key (parent,child)
);
insert into node (node_id,valid) values (default,daterange('2019-01-01','2020-01-01'));
commit;

Now I want to get a one node and all children, if they exists:

=> explain select * from node as parent left join link on parent.node_id=link.parent left join node as child on
link.child=child.node_idwhere parent.node_id=1; 
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=4.50..26.76 rows=1 width=112)
   ->  Nested Loop Left Join  (cost=4.35..21.91 rows=1 width=76)
         Join Filter: (parent.node_id = link.parent)
         ->  Index Scan using node_pkey on node parent  (cost=0.15..8.17 rows=1 width=36)
               Index Cond: (node_id = 1)
         ->  Bitmap Heap Scan on link  (cost=4.20..13.67 rows=6 width=40)
               Recheck Cond: (parent = 1)
               ->  Bitmap Index Scan on link_pkey  (cost=0.00..4.20 rows=6 width=0)
                     Index Cond: (parent = 1)
   ->  Index Scan using node_pkey on node child  (cost=0.15..4.84 rows=1 width=36)
         Index Cond: (link.child = node_id)
(11 rows)

All fine, there is «Left Join» in the planner. And the result is exactly what I want:

=> select * from node as parent left join link on parent.node_id=link.parent left join node as child on
link.child=child.node_idwhere parent.node_id=1; 
 node_id |          valid          | parent | child | valid | node_id | valid
---------+-------------------------+--------+-------+-------+---------+-------
       1 | [2019-01-01,2020-01-01) |        |       |       |         |
(1 row)

But things begin be strange if I add validation by time.

=> explain select * from node as parent left join link on parent.node_id=link.parent left join node as child on
link.child=child.node_idwhere parent.node_id=1 and current_date <@ parent.valid and current_date <@ link.valid and
current_date<@ child.valid; 
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.50..32.35 rows=1 width=112)
   ->  Nested Loop  (cost=4.35..21.88 rows=1 width=76)
         ->  Index Scan using node_pkey on node parent  (cost=0.15..8.18 rows=1 width=36)
               Index Cond: (node_id = 1)
               Filter: (CURRENT_DATE <@ valid)
         ->  Bitmap Heap Scan on link  (cost=4.20..13.70 rows=1 width=40)
               Recheck Cond: (parent = 1)
               Filter: (CURRENT_DATE <@ valid)
               ->  Bitmap Index Scan on link_pkey  (cost=0.00..4.20 rows=6 width=0)
                     Index Cond: (parent = 1)
   ->  Index Scan using node_pkey on node child  (cost=0.15..8.18 rows=1 width=36)
         Index Cond: (node_id = link.child)
         Filter: (CURRENT_DATE <@ valid)
(13 rows)

«Left Join»’s are lost. And in the result too:

=> select * from node as parent left join link on parent.node_id=link.parent left join node as child on
link.child=child.node_idwhere parent.node_id=1 and current_date <@ parent.valid and current_date <@ link.valid and
current_date<@ child.valid; 
 node_id | valid | parent | child | valid | node_id | valid
---------+-------+--------+-------+-------+---------+-------
(0 rows)



Sv: lost "left join"

От
Andreas Joseph Krogh
Дата:
På onsdag 16. januar 2019 kl. 11:54:21, skrev Олег Самойлов <splarv@ya.ru>:
Hi, all.

I got some mystic behaviour of PostgreSQL, perhaps this is a bug.
 
[snip]

But things begin be strange if I add validation by time.

=> explain select * from node as parent left join link on parent.node_id=link.parent left join node as child on link.child=child.node_id where parent.node_id=1 and current_date <@ parent.valid and current_date <@ link.valid and current_date <@ child.valid;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.50..32.35 rows=1 width=112)
   ->  Nested Loop  (cost=4.35..21.88 rows=1 width=76)
         ->  Index Scan using node_pkey on node parent  (cost=0.15..8.18 rows=1 width=36)
               Index Cond: (node_id = 1)
               Filter: (CURRENT_DATE <@ valid)
         ->  Bitmap Heap Scan on link  (cost=4.20..13.70 rows=1 width=40)
               Recheck Cond: (parent = 1)
               Filter: (CURRENT_DATE <@ valid)
               ->  Bitmap Index Scan on link_pkey  (cost=0.00..4.20 rows=6 width=0)
                     Index Cond: (parent = 1)
   ->  Index Scan using node_pkey on node child  (cost=0.15..8.18 rows=1 width=36)
         Index Cond: (node_id = link.child)
         Filter: (CURRENT_DATE <@ valid)
(13 rows)

«Left Join»’s are lost. And in the result too:

=> select * from node as parent left join link on parent.node_id=link.parent left join node as child on link.child=child.node_id where parent.node_id=1 and current_date <@ parent.valid and current_date <@ link.valid and current_date <@ child.valid;
 node_id | valid | parent | child | valid | node_id | valid
---------+-------+--------+-------+-------+---------+-------
(0 rows)
 
The moment you involve columns on "left joined" relations this way in the WHERE-clause, it effectively becomes a right join.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Вложения

Re: lost "left join"

От
Andrew Gierth
Дата:
>>>>> "Олег" == Олег Самойлов <splarv@ya.ru> writes:

 Олег> Hi, all.
 Олег> I got some mystic behaviour of PostgreSQL, perhaps this is a bug.

Feature, actually.

 Олег> But things begin be strange if I add validation by time.

 Олег> => explain select * from node as parent left join link on
 Олег> parent.node_id=link.parent left join node as child on
 Олег> link.child=child.node_id where parent.node_id=1 and current_date
 Олег> <@ parent.valid and current_date <@ link.valid and current_date
 Олег> <@ child.valid;

The problem here is that (for example) child.valid is null if there was
no matching child row in the join, and the planner knows that x <@ NULL
is not true (since the chosen <@ operator is defined as strict), and
therefore it knows that the left join is unnecessary and can be reduced
to an inner join.

At least 90% of the time when you refer to values from the nullable side
of a join in a WHERE clause, you're making a mistake (e.g. the condition
should have been in the ON clause instead). The other 10% or less of the
time, you have to make sure you use non-strict conditions, i.e. take
account of the fact that the values might be null.

--
Andrew (irc:RhodiumToad)