Re: NOT IN subquery optimization

Поиск
Список
Период
Сортировка
От Li, Zheng
Тема Re: NOT IN subquery optimization
Дата
Msg-id DBD4724F-5466-40FB-AF28-4CEC9D4B6D1F@amazon.com
обсуждение исходный текст
Ответ на Re: NOT IN subquery optimization  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: NOT IN subquery optimization
Список pgsql-hackers
The current transformation would not add "or s.a is NULL" in the example provided since it is non-nullable. You will be
comparingthese two cases in terms of the transformation:
 

select count(*) from big b where not exists(select 1 from small s
where s.a = b.a);
Time: 51.416 ms

select count(*) from big b where a not in (select a from s);
Time: 890.088 ms

 But if s.a is nullable, yes, you have proved my previous statement is false... I should have used almost always.
However, if s.a is nullable, we would do this transformation:
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a or s.a is null);

It's possible to stop the nested loop join early during execution once we find an inner Null entry because every outer
tupleis
 
going to evaluate to true on the join condition.

Zheng

On 3/1/19, 6:17 PM, "David Rowley" <david.rowley@2ndquadrant.com> wrote:

    On Sat, 2 Mar 2019 at 12:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >
    > "Li, Zheng" <zhelli@amazon.com> writes:
    > > Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join,
itis always faster compared to the original plan.
 
    >
    > TBH, I am *really* skeptical of sweeping claims like that.  The existing
    > code will typically produce a hashed-subplan plan, which ought not be
    > that awful as long as the subquery result doesn't blow out memory.
    > It certainly is going to beat a naive nested loop.
    
    It's pretty easy to show the claim is false using master and NOT EXISTS.
    
    create table small(a int not null);
    create table big (a int not null);
    insert into small select generate_Series(1,1000);
    insert into big select x%1000+1 from generate_Series(1,1000000) x;
    
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a);
    Time: 178.575 ms
    
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a or s.a is null);
    Time: 38049.969 ms (00:38.050)
    
    
    -- 
     David Rowley                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Chapman Flack
Дата:
Сообщение: Re: Infinity vs Error for division by zero
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Infinity vs Error for division by zero