Re: NOT IN subquery optimization

Поиск

Список

Период

Сортировка

От	Li, Zheng
Тема	Re: NOT IN subquery optimization
Дата	2 марта 2019 г. 02:39:05
Msg-id	DBD4724F-5466-40FB-AF28-4CEC9D4B6D1F@amazon.com обсуждение исходный текст
Ответ на	Re: NOT IN subquery optimization (David Rowley <david.rowley@2ndquadrant.com>)
Ответы	Re: NOT IN subquery optimization
Список	pgsql-hackers

Дерево обсуждения

The current transformation would not add "or s.a is NULL" in the example provided since it is non-nullable. You will be
comparingthese two cases in terms of the transformation:
 

select count(*) from big b where not exists(select 1 from small s
where s.a = b.a);
Time: 51.416 ms

select count(*) from big b where a not in (select a from s);
Time: 890.088 ms

 But if s.a is nullable, yes, you have proved my previous statement is false... I should have used almost always.
However, if s.a is nullable, we would do this transformation:
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a or s.a is null);

It's possible to stop the nested loop join early during execution once we find an inner Null entry because every outer
tupleis
 
going to evaluate to true on the join condition.

Zheng

On 3/1/19, 6:17 PM, "David Rowley" <david.rowley@2ndquadrant.com> wrote:

    On Sat, 2 Mar 2019 at 12:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >
    > "Li, Zheng" <zhelli@amazon.com> writes:
    > > Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join,
itis always faster compared to the original plan.
 
    >
    > TBH, I am *really* skeptical of sweeping claims like that.  The existing
    > code will typically produce a hashed-subplan plan, which ought not be
    > that awful as long as the subquery result doesn't blow out memory.
    > It certainly is going to beat a naive nested loop.
    
    It's pretty easy to show the claim is false using master and NOT EXISTS.
    
    create table small(a int not null);
    create table big (a int not null);
    insert into small select generate_Series(1,1000);
    insert into big select x%1000+1 from generate_Series(1,1000000) x;
    
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a);
    Time: 178.575 ms
    
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a or s.a is null);
    Time: 38049.969 ms (00:38.050)
    
    
    -- 
     David Rowley                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Chapman Flack
Дата: 02 марта 2019 г., 02:38:39
Сообщение: Re: Infinity vs Error for division by zero

Следующее

От: Tom Lane
Дата: 02 марта 2019 г., 02:46:43
Сообщение: Re: Infinity vs Error for division by zero

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: NOT IN subquery optimization

Предыдущее

Следующее