Обсуждение: NOT IN subquery optimization

Поиск
Список
Период
Сортировка

NOT IN subquery optimization

От
Jim Finnerty
Дата:
The semantics of NOT IN (SELECT ...) are subtly different from the semantics
of NOT EXISTS (SELECT ...).  These differences center on how NULLs are
treated, and in general can result in statements that are harder to optimize
and slower to execute than the apparently similar NOT EXISTS statement.

A little over a year ago, Christian Antognini authored the blog "/How Well a
Query Optimizer Handles Subqueries?/" summarizing his findings about the
performance of PostgreSQL, MySQL, and Oracle on various subqueries:

   
https://antognini.ch/2017/12/how-well-a-query-optimizer-handles-subqueries/

His position was that you can classify the optimizations as correct or
incorrect, and based on that he provided the following comparison summary
(see below).  In short, PostgreSQL was the worst of the three systems:

    "Summary

        The number of queries that the query optimizers handle correctly are
the following:

        Oracle Database 12.2: 72 out of 80
        MySQL 8.0.3: 67 out of 80
        PostgreSQL 10.0: 60 out of 80

    Since not all queries are handled correctly, for best performance it is
sometimes necessary to rewrite them."

The subqueries that were found to be optimized "incorrectly" were almost
entirely due to poor or absent NOT IN subquery optimization.

The PostgreSQL community has been aware of the deficiencies in NOT IN
optimization for quite some time.  Based on an analysis of
psgsql-performance posts between 2013 and 2015, Robert Haas identified NOT
IN optimization as one of the common root causes of performance problems.

We have been working on improved optimization of NOT IN, and we would like
to share this optimizaton with the community.  With respect to the test
cases mentioned in the blog post mentioned above, it will elevate PostgreSQL
from "worst" to "first".  Generally the performance gains are large when the
optimization applies, though we have found one test case where performance
is worse.  We are investigating this now to see if we can disable the
optimization in that case.

We would like to include a patch for this change in the current commitfest. 
This thread can be used to track comments about this optimization.




-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: NOT IN subquery optimization

От
Tom Lane
Дата:
Jim Finnerty <jfinnert@amazon.com> writes:
> We have been working on improved optimization of NOT IN, and we would like
> to share this optimizaton with the community.

The idea that's been kicked around in the past is to detect whether the
subselect's output column(s) can be proved NOT NULL, and if so, convert
to an antijoin just like NOT EXISTS.  Is that what you're doing, or
something different?

> We would like to include a patch for this change in the current commitfest. 

Generally, people just send patches, they don't ask for permission first
;-)

Having said that, we have a general policy that we don't like complex
patches that first show up for the last commitfest of a dev cycle.
So unless this is a pretty small patch, it's probably going to get
delayed to v13.  Still, we'd like to have it in the queue, so submit
away ...

            regards, tom lane


Re: NOT IN subquery optimization

От
Jim Finnerty
Дата:
re: The idea that's been kicked around in the past is to detect whether the
subselect's output column(s) can be proved NOT NULL, and if so, convert
to an antijoin just like NOT EXISTS

basically, yes.  this will handle nullability of both the outer and inner
correlated expression(s), multiple expressions, presence or absence of
predicates in the WHERE clause, and whether the correlated expressions are
on the null-padded side of an outer join.  If it is judged to be more
efficient, then it transforms the NOT IN sublink into an anti-join.

some complications enter into the decision to transform NOT IN to anti-join
based on whether a bitmap plan will/not be used, or whether it will/not be
eligible for PQ.



-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: NOT IN subquery optimization

От
Tom Lane
Дата:
Jim Finnerty <jfinnert@amazon.com> writes:
> re: The idea that's been kicked around in the past is to detect whether the
> subselect's output column(s) can be proved NOT NULL, and if so, convert
> to an antijoin just like NOT EXISTS

> basically, yes.  this will handle nullability of both the outer and inner
> correlated expression(s), multiple expressions, presence or absence of
> predicates in the WHERE clause, and whether the correlated expressions are
> on the null-padded side of an outer join.  If it is judged to be more
> efficient, then it transforms the NOT IN sublink into an anti-join.

Hmm, that seems overcomplicated ...

> some complications enter into the decision to transform NOT IN to anti-join
> based on whether a bitmap plan will/not be used, or whether it will/not be
> eligible for PQ.

... and that even more so, considering that this decision really needs
to be taken long before cost estimates would be available.

As far as I can see, there should be no situation where we'd not want
to transform to antijoin if we can prove it's semantically valid to
do so.  If there are cases where that comes out as a worse plan,
that indicates a costing error that would be something to address
separately (because it'd also be a problem for other antijoin cases).
Also, as long as it nearly always wins, I'm not going to cry too hard
if there are corner cases where it makes the wrong choice.  That's not
something that's possible to avoid completely.

            regards, tom lane


Re: NOT IN subquery optimization

От
Jim Finnerty
Дата:
We can always correctly transform a NOT IN to a correlated NOT EXISTS.  In
almost all cases it is more efficient to do so.  In the one case that we've
found that is slower it does come down to a more general costing issue, so
that's probably the right way to think about it.



-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Thu, 21 Feb 2019 at 16:27, Jim Finnerty <jfinnert@amazon.com> wrote:
> We can always correctly transform a NOT IN to a correlated NOT EXISTS.  In
> almost all cases it is more efficient to do so.  In the one case that we've
> found that is slower it does come down to a more general costing issue, so
> that's probably the right way to think about it.

I worked on this over 4 years ago [1]. I think the patch there is not
completely broken and seems just to need a few things fixed. I rebased
it on top of current master and looked at it. I think the main
remaining issue is fixing the code that ensures the outer side join
quals can't be NULL.  The code that's there looks broken still since
it attempts to use quals from any inner joined rel for proofs that
NULLs will be removed.  That might not work so well in a case like:
SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a AND t2.b NOT IN(select b
from t3),  however, I'd need to think harder about that since if there
was such a qual then the planner should convert the left join into an
inner join. But anyway, the function expressions_are_not_nullable()
was more intended to work with targetlists to ensure exprs there can't
be NULL. I just had done a poor job of trying to modify that into
allowing it to take exprs from any random place, likely that should be
a new function and expressions_are_not_nullable() should be put back
to what Tom ended up with.

I've attached the rebased and still broken version.

[1] https://www.postgresql.org/message-id/CAApHDvqRB-iFBy68%3DdCgqS46aRep7AuN2pou4KTwL8kX9YOcTQ%40mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Fri, 22 Feb 2019 at 09:44, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I've attached the rebased and still broken version.

I set about trying to make a less broken version of this.

A quick reminder of the semantics of NOT IN:

1. WHERE <nullable_column> NOT IN(SELECT <not null column> FROM table);

If table is non-empty:
will filter out rows where <nullable_column> is NULL
and only show values that are not in <not null column>

If table is empty:
Filters nothing.

2. WHERE <nonnullable_column> NOT IN(SELECT <null column> FROM table);

If table contains NULLs in the <null column> no records will match.

The previous patch handled #2 correctly but neglected to do anything
about #1.  For #1 the only way we can implement this as a planner only
change is to insist that the outer side expressions also are not null.
If we could have somehow tested if "table" was non-empty then we could
have added a IS NOT NULL clause to the outer query and converted to an
anti-join, but ... can't know that during planning and can't add the
IS NOT NULL regardless as, if "table" is empty we will filter NULLs
when we shouldn't.

In the attached, I set about fixing #1 by determining if the outer
expressions could be NULL by checking

1. If expression is a Var from an inner joined relation it can't be
NULL if there's a NOT NULL constraint on the column; or
2. If expression is a Var from an inner joined relation and there is a
strict WHERE/ON clause, the expression can't be NULL; or
3. If expression is a Var from an outer joined relation check for
quals that were specified in the same syntactical level as the NOT IN
for proofs that NULL will be filtered.

An example of #3 is:

SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a WHERE t2.a IS NOT NULL
AND t2.a NOT IN(SELECT a FROM t3); -- t2 becomes INNER JOINed later in
planning, but...
or;
SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a AND t2.a NOT IN(SELECT a FROM t3);

In the latter of the two, the t1.a = t2.a join conditions ensures that
NULLs can't exist where the NOT IN is evaluated.

I implemented #3 by passing the quals down to
pull_up_sublinks_qual_recurse().  At the top level call 'node' and
'notnull_proofs' are the same, but that changes in recursive calls
like the one we make inside the is_andclause() condition.

Comments welcome.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
I'm attaching a working patch following the discussion.

You can also find the following patch description is the commit message:
NOT IN to ANTI JOIN transformation

    The semantics of ANTI JOIN were created to match the semantics of NOT
    EXISTS, which enables NOT EXISTS subqueries to be efficiently executed
    as a kind of join.  NOT IN subqueries have different NULL semantics than
    NOT EXISTS, but since there is no special join operator for NOT IN it is
    generally executed as a nested sub-plan.  It is possible, however, to
    transform NOT IN to a correlated NOT EXISTS so that it can be executed
    it as an ANTI JOIN with additional correlated predicates.

    A general transformation from NOT IN to NOT EXISTS for the
    single-expression case (the multi-expression case is just ANDs of the
    single-expressions) is:
        t1.x NOT IN (SELECT t2.y from t2 where p) <=> NOT EXISTS (select 1
                from t2 where (y=x or y is NULL or x is NULL) and p).

    If x or y is non-nullable, we can safely remove the predicate "x
    is NULL" or "y is NULL", and if there is no predicate p,
    then "x is NULL" may be factored out of the subquery.
    Experiments show that if we can remove one or the other ORed
    predicates, or if we can factor out the "x is NULL", then
    execution is typically much faster.

    Basically, for the single expression case (we also handle the multi
    expression case), we try to do the following transformation:
    When p does not exist:
        t1.x not in (t2.y) => ANTI JOIN t1(Filter: x is not null), t2 on
        join condition: t1.x=t2.y or t2.y is null.
    When x is non-nullable:
        t1.x not in (t2.y where p) => ANTI JOIN t1, t2 on join condition:
        (t1.x=t2.y or t2.y is null) and p.

    We implemented a nullability test routine is_node_nonnullable().
    Currently it handles Var, TargetEntry, CoalesceExpr and Const. Outer
    joins are taken into consideration in the nullability test.

    We adjust and apply reduce_outer_joins() before the transformation so 
    that the outer joins have an opportunity to be converted to inner joins
    prior to the transformation.

    Using this transformation, we measured performance improvements of
    two to three orders of magnitude on most queries in a development
    environment. In our performance experiments, table s (small) has 11
    rows, table l (large) has 1 million rows. s.n and l.n have NULL
    values. s.nn and l.nn are NOT NULL.

        s.n not in (l.n)          1150 ms -> 0.49 ms
        s.nn not in (l.nn)      1120 ms -> 0.45 ms
        l.n not in (l.n)   over   20 min -> 1700 ms
        l.nn not in (l.nn) over 20 min -> 220 ms
        l.n not in (s.n)               63 ms -> 750 ms
        l.nn not in (s.nn)          58 ms -> 46 ms

    For the only case that performance drops - l.n not in (s.n). It is
    likely to be resolved by ending the nested loop anti join early as
    soon as we find NULL inner tuple entry/entries that satisfies the
    join condition during execution. This is still under investigation.

Comments are welcome.

With Regards,
---
Zheng Li, AWS, Amazon Aurora PostgreSQL

On 2/25/19, 7:32 AM, "David Rowley" <david.rowley@2ndquadrant.com> wrote:

    On Fri, 22 Feb 2019 at 09:44, David Rowley <david.rowley@2ndquadrant.com> wrote:
    > I've attached the rebased and still broken version.
    
    I set about trying to make a less broken version of this.
    
    A quick reminder of the semantics of NOT IN:
    
    1. WHERE <nullable_column> NOT IN(SELECT <not null column> FROM table);
    
    If table is non-empty:
    will filter out rows where <nullable_column> is NULL
    and only show values that are not in <not null column>
    
    If table is empty:
    Filters nothing.
    
    2. WHERE <nonnullable_column> NOT IN(SELECT <null column> FROM table);
    
    If table contains NULLs in the <null column> no records will match.
    
    The previous patch handled #2 correctly but neglected to do anything
    about #1.  For #1 the only way we can implement this as a planner only
    change is to insist that the outer side expressions also are not null.
    If we could have somehow tested if "table" was non-empty then we could
    have added a IS NOT NULL clause to the outer query and converted to an
    anti-join, but ... can't know that during planning and can't add the
    IS NOT NULL regardless as, if "table" is empty we will filter NULLs
    when we shouldn't.
    
    In the attached, I set about fixing #1 by determining if the outer
    expressions could be NULL by checking
    
    1. If expression is a Var from an inner joined relation it can't be
    NULL if there's a NOT NULL constraint on the column; or
    2. If expression is a Var from an inner joined relation and there is a
    strict WHERE/ON clause, the expression can't be NULL; or
    3. If expression is a Var from an outer joined relation check for
    quals that were specified in the same syntactical level as the NOT IN
    for proofs that NULL will be filtered.
    
    An example of #3 is:
    
    SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a WHERE t2.a IS NOT NULL
    AND t2.a NOT IN(SELECT a FROM t3); -- t2 becomes INNER JOINed later in
    planning, but...
    or;
    SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a AND t2.a NOT IN(SELECT a FROM t3);
    
    In the latter of the two, the t1.a = t2.a join conditions ensures that
    NULLs can't exist where the NOT IN is evaluated.
    
    I implemented #3 by passing the quals down to
    pull_up_sublinks_qual_recurse().  At the top level call 'node' and
    'notnull_proofs' are the same, but that changes in recursive calls
    like the one we make inside the is_andclause() condition.
    
    Comments welcome.
    
    -- 
     David Rowley                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    


Вложения

Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
Resend the patch with a whitespace removed so that "git apply patch" works directly.

---
Zheng Li, AWS, Amazon Aurora PostgreSQL

On 2/25/19, 12:39 PM, "Li, Zheng" <zhelli@amazon.com> wrote:

    I'm attaching a working patch following the discussion.
    
    You can also find the following patch description is the commit message:
    NOT IN to ANTI JOIN transformation
    
        The semantics of ANTI JOIN were created to match the semantics of NOT
        EXISTS, which enables NOT EXISTS subqueries to be efficiently executed
        as a kind of join.  NOT IN subqueries have different NULL semantics than
        NOT EXISTS, but since there is no special join operator for NOT IN it is
        generally executed as a nested sub-plan.  It is possible, however, to
        transform NOT IN to a correlated NOT EXISTS so that it can be executed
        it as an ANTI JOIN with additional correlated predicates.
    
        A general transformation from NOT IN to NOT EXISTS for the
        single-expression case (the multi-expression case is just ANDs of the
        single-expressions) is:
            t1.x NOT IN (SELECT t2.y from t2 where p) <=> NOT EXISTS (select 1
                    from t2 where (y=x or y is NULL or x is NULL) and p).
    
        If x or y is non-nullable, we can safely remove the predicate "x
        is NULL" or "y is NULL", and if there is no predicate p,
        then "x is NULL" may be factored out of the subquery.
        Experiments show that if we can remove one or the other ORed
        predicates, or if we can factor out the "x is NULL", then
        execution is typically much faster.
    
        Basically, for the single expression case (we also handle the multi
        expression case), we try to do the following transformation:
        When p does not exist:
            t1.x not in (t2.y) => ANTI JOIN t1(Filter: x is not null), t2 on
            join condition: t1.x=t2.y or t2.y is null.
        When x is non-nullable:
            t1.x not in (t2.y where p) => ANTI JOIN t1, t2 on join condition:
            (t1.x=t2.y or t2.y is null) and p.
    
        We implemented a nullability test routine is_node_nonnullable().
        Currently it handles Var, TargetEntry, CoalesceExpr and Const. Outer
        joins are taken into consideration in the nullability test.
    
        We adjust and apply reduce_outer_joins() before the transformation so 
        that the outer joins have an opportunity to be converted to inner joins
        prior to the transformation.
    
        Using this transformation, we measured performance improvements of
        two to three orders of magnitude on most queries in a development
        environment. In our performance experiments, table s (small) has 11
        rows, table l (large) has 1 million rows. s.n and l.n have NULL
        values. s.nn and l.nn are NOT NULL.
    
            s.n not in (l.n)          1150 ms -> 0.49 ms
            s.nn not in (l.nn)      1120 ms -> 0.45 ms
            l.n not in (l.n)   over   20 min -> 1700 ms
            l.nn not in (l.nn) over 20 min -> 220 ms
            l.n not in (s.n)               63 ms -> 750 ms
            l.nn not in (s.nn)          58 ms -> 46 ms
    
        For the only case that performance drops - l.n not in (s.n). It is
        likely to be resolved by ending the nested loop anti join early as
        soon as we find NULL inner tuple entry/entries that satisfies the
        join condition during execution. This is still under investigation.
    
    Comments are welcome.
    
    With Regards,
    ---
    Zheng Li, AWS, Amazon Aurora PostgreSQL
    
    On 2/25/19, 7:32 AM, "David Rowley" <david.rowley@2ndquadrant.com> wrote:
    
        On Fri, 22 Feb 2019 at 09:44, David Rowley <david.rowley@2ndquadrant.com> wrote:
        > I've attached the rebased and still broken version.
        
        I set about trying to make a less broken version of this.
        
        A quick reminder of the semantics of NOT IN:
        
        1. WHERE <nullable_column> NOT IN(SELECT <not null column> FROM table);
        
        If table is non-empty:
        will filter out rows where <nullable_column> is NULL
        and only show values that are not in <not null column>
        
        If table is empty:
        Filters nothing.
        
        2. WHERE <nonnullable_column> NOT IN(SELECT <null column> FROM table);
        
        If table contains NULLs in the <null column> no records will match.
        
        The previous patch handled #2 correctly but neglected to do anything
        about #1.  For #1 the only way we can implement this as a planner only
        change is to insist that the outer side expressions also are not null.
        If we could have somehow tested if "table" was non-empty then we could
        have added a IS NOT NULL clause to the outer query and converted to an
        anti-join, but ... can't know that during planning and can't add the
        IS NOT NULL regardless as, if "table" is empty we will filter NULLs
        when we shouldn't.
        
        In the attached, I set about fixing #1 by determining if the outer
        expressions could be NULL by checking
        
        1. If expression is a Var from an inner joined relation it can't be
        NULL if there's a NOT NULL constraint on the column; or
        2. If expression is a Var from an inner joined relation and there is a
        strict WHERE/ON clause, the expression can't be NULL; or
        3. If expression is a Var from an outer joined relation check for
        quals that were specified in the same syntactical level as the NOT IN
        for proofs that NULL will be filtered.
        
        An example of #3 is:
        
        SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a WHERE t2.a IS NOT NULL
        AND t2.a NOT IN(SELECT a FROM t3); -- t2 becomes INNER JOINed later in
        planning, but...
        or;
        SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a AND t2.a NOT IN(SELECT a FROM t3);
        
        In the latter of the two, the t1.a = t2.a join conditions ensures that
        NULLs can't exist where the NOT IN is evaluated.
        
        I implemented #3 by passing the quals down to
        pull_up_sublinks_qual_recurse().  At the top level call 'node' and
        'notnull_proofs' are the same, but that changes in recursive calls
        like the one we make inside the is_andclause() condition.
        
        Comments welcome.
        
        -- 
         David Rowley                   http://www.2ndQuadrant.com/
         PostgreSQL Development, 24x7 Support, Training & Services
        
    
    


Вложения

Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Tue, 26 Feb 2019 at 11:51, Li, Zheng <zhelli@amazon.com> wrote:
> Resend the patch with a whitespace removed so that "git apply patch" works directly.

I had a quick look at this and it seems to be broken for the empty
table case I mentioned up thread.

Quick example:

Setup:

create table t1 (a int);
create table t2 (a int not null);
insert into t1 values(NULL),(1),(2);

select * from t1 where a not in(select a from t2);

Patched:
 a
---
 1
 2
(2 rows)

Master:
 a
---

 1
 2
(3 rows)

This will be due to the fact you're adding an a IS NOT NULL qual to
the scan of a:

postgres=# explain select * from t1 where a not in(select a from t2);
                            QUERY PLAN
------------------------------------------------------------------
 Hash Anti Join  (cost=67.38..152.18 rows=1268 width=4)
   Hash Cond: (t1.a = t2.a)
   ->  Seq Scan on t1  (cost=0.00..35.50 rows=2537 width=4)
         Filter: (a IS NOT NULL)
   ->  Hash  (cost=35.50..35.50 rows=2550 width=4)
         ->  Seq Scan on t2  (cost=0.00..35.50 rows=2550 width=4)
(6 rows)

but as I mentioned, you can't do that as t2 might be empty and there's
no way to know that during planning.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
Richard Guo
Дата:
Greenplum Database does this optimization. The idea is to use a new join
type, let's call it JOIN_LASJ_NOTIN, and its semantic regarding NULL is
defined as below:

1. If there is a NULL in the outer side, and the inner side is empty, the
   NULL should be part of the outputs.

2. If there is a NULL in the outer side, and the inner side is not empty,
   the NULL should not be part of the outputs.

3. If there is a NULL in the inner side, no outputs should be produced.

An example plan looks like:

gpadmin=# explain (costs off)  select * from t1 where a not in(select a from t2);
            QUERY PLAN
-----------------------------------
 Hash Left Anti Semi (Not-In) Join
   Hash Cond: (t1.a = t2.a)
   ->  Seq Scan on t1
   ->  Hash
         ->  Seq Scan on t2
(5 rows)

Thanks
Richard

On Tue, Feb 26, 2019 at 7:20 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Tue, 26 Feb 2019 at 11:51, Li, Zheng <zhelli@amazon.com> wrote:
> Resend the patch with a whitespace removed so that "git apply patch" works directly.

I had a quick look at this and it seems to be broken for the empty
table case I mentioned up thread.

Quick example:

Setup:

create table t1 (a int);
create table t2 (a int not null);
insert into t1 values(NULL),(1),(2);

select * from t1 where a not in(select a from t2);

Patched:
 a
---
 1
 2
(2 rows)

Master:
 a
---

 1
 2
(3 rows)

This will be due to the fact you're adding an a IS NOT NULL qual to
the scan of a:

postgres=# explain select * from t1 where a not in(select a from t2);
                            QUERY PLAN
------------------------------------------------------------------
 Hash Anti Join  (cost=67.38..152.18 rows=1268 width=4)
   Hash Cond: (t1.a = t2.a)
   ->  Seq Scan on t1  (cost=0.00..35.50 rows=2537 width=4)
         Filter: (a IS NOT NULL)
   ->  Hash  (cost=35.50..35.50 rows=2550 width=4)
         ->  Seq Scan on t2  (cost=0.00..35.50 rows=2550 width=4)
(6 rows)

but as I mentioned, you can't do that as t2 might be empty and there's
no way to know that during planning.

--
 David Rowley                   https://urldefense.proofpoint.com/v2/url?u=http-3A__www.2ndQuadrant.com_&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=5r3cnfZPUDOHrMiXq8Mq2g&m=dE1nglE17x3nD-oH_BrF0r4SLaFnQKzwwJBJGpDoaaA&s=dshupMomMvkDAd92918cU21AJ1E1s7QwbrxIGSRxZA8&e=
 PostgreSQL Development, 24x7 Support, Training & Services

Re: NOT IN subquery optimization

От
Jim Finnerty
Дата:
The problem is that the special optimization that was proposed for the case
where the subquery has no WHERE clause isn't valid when the subquery returns
no rows.  That additional optimization needs to be removed, and preferably
replaced with some sort of efficient run-time test.





-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Wed, 27 Feb 2019 at 03:07, Jim Finnerty <jfinnert@amazon.com> wrote:
>
> The problem is that the special optimization that was proposed for the case
> where the subquery has no WHERE clause isn't valid when the subquery returns
> no rows.  That additional optimization needs to be removed, and preferably
> replaced with some sort of efficient run-time test.

That is one option, however, the join type that Richard mentions, to
satisfy point #3, surely only can work for Hash joins and perhaps
Merge joins that required a sort, assuming there's some way for the
sort to communicate about if it found NULLs or not. Either way, we
need to have looked at the entire inner side to ensure there are no
nulls. Probably it would be possible to somehow team that up with a
planner check to see if the inner exprs could be NULL then just
implement points #1 and #2 for other join methods.

If you're proposing to do that for this thread then I can take my
planner only patch somewhere else.  I only posted my patch as I pretty
much already had what I thought you were originally talking about.
However, please be aware there are current patents around adding
execution time smarts in this area, so it's probably unlikely you'll
find a way to do this in the executor that does not infringe on those.
Probably no committer would want to touch it.  I think my patch covers
a good number of use cases and as far as I understand, does not go
near any current patents.

Please let me know if I should move my patch to another thread. I
don't want to hi-jack this if you're going in another direction.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
I agree we will need some runtime smarts (such as a new anti join type as pointed out by Richard) to "ultimately"
accountfor all the cases of NOT IN queries.
 

However, given that the March CommitFest is imminent and the runtime smarts patent concerns David had pointed out
(whichI was not aware of before), we would not move that direction at the moment.
 

I propose that we collaborate to build one patch from the two patches submitted in this thread for the CF. The two
patchesare for the same purpose and similar. However, they differ in the following ways as far as I can tell:
 

Nullability Test:
-David's patch uses strict predicates for nullability test.
-Our patch doesn't use strict predicates, but it accounts for COALESCE and null-padded rows from outer join. In
addition,we made reduce_outer_joins() work before the transformation which makes the nullability test more accurate.
 

Anti Join Transformation:
-Dvaid's patch does the transformation when both inner and outer outputs are non-nullable.
-With the latest fix (for the empty table case), our patch does the transformation as long as the outer is non-nullable
regardlessof the inner nullability, experiments show that the results are always faster.
 

David, please let me know what you think. If you would like to collaborate, I'll start merging with your code on using
strictpredicates to make a better Nullability Test.
 

Thanks,
Zheng


Re: NOT IN subquery optimization

От
Tom Lane
Дата:
"Li, Zheng" <zhelli@amazon.com> writes:
> However, given that the March CommitFest is imminent and the runtime smarts patent concerns David had pointed out
(whichI was not aware of before), we would not move that direction at the moment. 

> I propose that we collaborate to build one patch from the two patches submitted in this thread for the CF.

TBH, I think it's very unlikely that any patch for this will be seriously
considered for commit in v12.  It would be against project policy, and
spending a lot of time reviewing the patch would be quite unfair to other
patches that have been in the queue longer.  Therefore, I'd suggest that
you not bend things out of shape just to have some patch to submit before
March 1.  It'd be better to work with the goal of having a coherent patch
ready for the first v13 CF, probably July-ish.

            regards, tom lane


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Wed, 27 Feb 2019 at 13:05, Li, Zheng <zhelli@amazon.com> wrote:
> -With the latest fix (for the empty table case), our patch does the transformation as long as the outer is
non-nullableregardless of the inner nullability, experiments show that the results are always faster.
 

Hi Zheng,

I'm interested to know how this works without testing for inner
nullability.  If any of the inner side's join exprs are NULL then no
records can match. What do you propose to work around that?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Wed, 27 Feb 2019 at 13:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "Li, Zheng" <zhelli@amazon.com> writes:
> > However, given that the March CommitFest is imminent and the runtime smarts patent concerns David had pointed out
(whichI was not aware of before), we would not move that direction at the moment.
 
>
> > I propose that we collaborate to build one patch from the two patches submitted in this thread for the CF.
>
> TBH, I think it's very unlikely that any patch for this will be seriously
> considered for commit in v12.  It would be against project policy, and
> spending a lot of time reviewing the patch would be quite unfair to other
> patches that have been in the queue longer.  Therefore, I'd suggest that
> you not bend things out of shape just to have some patch to submit before
> March 1.  It'd be better to work with the goal of having a coherent patch
> ready for the first v13 CF, probably July-ish.

FWIW, I did add this to the March CF, but I set the target version to
13.  I wasn't considering this for PG12. I see Zheng was, but I agree
with you on PG13 being the target for this.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
I'm totally fine with setting the target to PG13.

--
I'm interested to know how this works without testing for inner
nullability.  If any of the inner side's join exprs are NULL then no
records can match. What do you propose to work around that?
--

We still check for inner side's nullability, when it is nullable we
append a "var is NULL" to the anti join condition. So every outer
tuple is going to evaluate to true on the join condition when there
is indeed a null entry in the inner. 
Actually I think the nested loop anti join can end early in this case,
but I haven't find a way to do it properly, this may be one other reason
why we need a new join type for NOT IN.

e.g.
explain select count(*) from s where u not in (select n from l);
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Aggregate  (cost=2892.88..2892.89 rows=1 width=8)
   ->  Nested Loop Anti Join  (cost=258.87..2892.88 rows=1 width=0)
         ->  Seq Scan on s  (cost=0.00..1.11 rows=11 width=4)
         ->  Bitmap Heap Scan on l  (cost=258.87..262.88 rows=1 width=4)
               Recheck Cond: ((s.u = n) OR (n IS NULL))
               ->  BitmapOr  (cost=258.87..258.87 rows=1 width=0)
                     ->  Bitmap Index Scan on l_n  (cost=0.00..4.43 rows=1 width=0)
                           Index Cond: (s.u = n)
                     ->  Bitmap Index Scan on l_n  (cost=0.00..4.43 rows=1 width=0)
                           Index Cond: (n IS NULL)

Zheng


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Wed, 27 Feb 2019 at 13:41, Li, Zheng <zhelli@amazon.com> wrote:
> We still check for inner side's nullability, when it is nullable we
> append a "var is NULL" to the anti join condition. So every outer
> tuple is going to evaluate to true on the join condition when there
> is indeed a null entry in the inner.

That's possible, at least providing the var is NULL is an OR
condition, but the problem there is that you force the plan into a
nested loop join. That's unfortunately not going to perform very well
when the number of rows to process is large.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
Richard Guo
Дата:

On Wed, Feb 27, 2019 at 4:52 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Wed, 27 Feb 2019 at 03:07, Jim Finnerty <jfinnert@amazon.com> wrote:

If you're proposing to do that for this thread then I can take my
planner only patch somewhere else.  I only posted my patch as I pretty
much already had what I thought you were originally talking about.
However, please be aware there are current patents around adding
execution time smarts in this area, so it's probably unlikely you'll
find a way to do this in the executor that does not infringe on those.
Probably no committer would want to touch it.  I think my patch covers
a good number of use cases and as far as I understand, does not go
near any current patents.

Thanks for pointing out the patent concerns. I was not aware of that before.
Could you please provide some clue where I can find more info about the patents?

Thanks
Richard 

Re: NOT IN subquery optimization

От
Richard Guo
Дата:


On Tue, Feb 26, 2019 at 6:51 AM Li, Zheng <zhelli@amazon.com> wrote:
Resend the patch with a whitespace removed so that "git apply patch" works directly.



Hi Zheng,

I have reviewed your patch. Good job except two issues I can find:

1. The patch would give wrong results when the inner side is empty. In this
case, the whole data from outer side should be in the outputs. But with the
patch, we will lose the NULLs from outer side.

2. Because of the new added predicate 'OR (var is NULL)', we cannot use hash
join or merge join to do the ANTI JOIN.  Nested loop becomes the only choice,
which is low-efficency.

Thanks
Richard

Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Fri, 1 Mar 2019 at 15:27, Richard Guo <riguo@pivotal.io> wrote:
> I have reviewed your patch. Good job except two issues I can find:
>
> 1. The patch would give wrong results when the inner side is empty. In this
> case, the whole data from outer side should be in the outputs. But with the
> patch, we will lose the NULLs from outer side.
>
> 2. Because of the new added predicate 'OR (var is NULL)', we cannot use hash
> join or merge join to do the ANTI JOIN.  Nested loop becomes the only choice,
> which is low-efficency.

Yeah. Both of these seem pretty fundamental, so setting the patch to
waiting on author.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
Andres Freund
Дата:
Hi,

On March 1, 2019 4:53:03 AM PST, David Rowley <david.rowley@2ndquadrant.com> wrote:
>On Fri, 1 Mar 2019 at 15:27, Richard Guo <riguo@pivotal.io> wrote:
>> I have reviewed your patch. Good job except two issues I can find:
>>
>> 1. The patch would give wrong results when the inner side is empty.
>In this
>> case, the whole data from outer side should be in the outputs. But
>with the
>> patch, we will lose the NULLs from outer side.
>>
>> 2. Because of the new added predicate 'OR (var is NULL)', we cannot
>use hash
>> join or merge join to do the ANTI JOIN.  Nested loop becomes the only
>choice,
>> which is low-efficency.
>
>Yeah. Both of these seem pretty fundamental, so setting the patch to
>waiting on author.

I've not checked, but could we please make sure these cases are covered in the regression tests today with a single
liner?Seems people had to rediscover them a number of times now, and unless this thread results in an integrated
featuresoonish, it seems likely other people will again. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: NOT IN subquery optimization

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On March 1, 2019 4:53:03 AM PST, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> On Fri, 1 Mar 2019 at 15:27, Richard Guo <riguo@pivotal.io> wrote:
>>> 1. The patch would give wrong results when the inner side is empty.
>>> 2. Because of the new added predicate 'OR (var is NULL)', we cannot
>>> use hash join or merge join to do the ANTI JOIN.

> I've not checked, but could we please make sure these cases are covered
> in the regression tests today with a single liner?

I'm not sure if the second one is actually a semantics bug or just a
misoptimization?  But yeah, +1 for putting in some simple tests for
corner cases right now.  Anyone want to propose a specific patch?

            regards, tom lane


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Sat, 2 Mar 2019 at 05:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
> > I've not checked, but could we please make sure these cases are covered
> > in the regression tests today with a single liner?
>
> I'm not sure if the second one is actually a semantics bug or just a
> misoptimization?  But yeah, +1 for putting in some simple tests for
> corner cases right now.  Anyone want to propose a specific patch?

The second is just reducing the planner's flexibility to produce a
good plan.  The first is a bug. Proposed regression test attached.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: NOT IN subquery optimization

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Sat, 2 Mar 2019 at 05:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not sure if the second one is actually a semantics bug or just a
>> misoptimization?  But yeah, +1 for putting in some simple tests for
>> corner cases right now.  Anyone want to propose a specific patch?

> The second is just reducing the planner's flexibility to produce a
> good plan.  The first is a bug. Proposed regression test attached.

LGTM, pushed.

            regards, tom lane


Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
Thanks all for the feedbacks! I'm working on a refined patch.

Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join, it is
alwaysfaster compared to the original plan. In order to enable the transformation from NOT IN to anti join when the
inner/outeris nullable, we have to add some NULL test to the join condition.
 

We could make anti join t1, t2 on (t1.x = t2.y or t2.y IS NULL) eligible for hashjoin, it would require changes in
allowingthis special join quals for hash join as well as changes in hash join executor to handle NULL accordingly for
thecase.
 

Another option of transformation is to add "is not false" on top of the join condition.

Regards,
Zheng
On 3/1/19, 5:28 PM, "David Rowley" <david.rowley@2ndquadrant.com> wrote:

    On Sat, 2 Mar 2019 at 05:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >
    > Andres Freund <andres@anarazel.de> writes:
    > > I've not checked, but could we please make sure these cases are covered
    > > in the regression tests today with a single liner?
    >
    > I'm not sure if the second one is actually a semantics bug or just a
    > misoptimization?  But yeah, +1 for putting in some simple tests for
    > corner cases right now.  Anyone want to propose a specific patch?
    
    The second is just reducing the planner's flexibility to produce a
    good plan.  The first is a bug. Proposed regression test attached.
    
    -- 
     David Rowley                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    


Re: NOT IN subquery optimization

От
Tom Lane
Дата:
"Li, Zheng" <zhelli@amazon.com> writes:
> Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join, it is
alwaysfaster compared to the original plan. 

TBH, I am *really* skeptical of sweeping claims like that.  The existing
code will typically produce a hashed-subplan plan, which ought not be
that awful as long as the subquery result doesn't blow out memory.
It certainly is going to beat a naive nested loop.

            regards, tom lane


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Sat, 2 Mar 2019 at 12:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "Li, Zheng" <zhelli@amazon.com> writes:
> > Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join, it
isalways faster compared to the original plan.
 
>
> TBH, I am *really* skeptical of sweeping claims like that.  The existing
> code will typically produce a hashed-subplan plan, which ought not be
> that awful as long as the subquery result doesn't blow out memory.
> It certainly is going to beat a naive nested loop.

It's pretty easy to show the claim is false using master and NOT EXISTS.

create table small(a int not null);
create table big (a int not null);
insert into small select generate_Series(1,1000);
insert into big select x%1000+1 from generate_Series(1,1000000) x;

select count(*) from big b where not exists(select 1 from small s
where s.a = b.a);
Time: 178.575 ms

select count(*) from big b where not exists(select 1 from small s
where s.a = b.a or s.a is null);
Time: 38049.969 ms (00:38.050)


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
The current transformation would not add "or s.a is NULL" in the example provided since it is non-nullable. You will be
comparingthese two cases in terms of the transformation:
 

select count(*) from big b where not exists(select 1 from small s
where s.a = b.a);
Time: 51.416 ms

select count(*) from big b where a not in (select a from s);
Time: 890.088 ms

 But if s.a is nullable, yes, you have proved my previous statement is false... I should have used almost always.
However, if s.a is nullable, we would do this transformation:
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a or s.a is null);

It's possible to stop the nested loop join early during execution once we find an inner Null entry because every outer
tupleis
 
going to evaluate to true on the join condition.

Zheng

On 3/1/19, 6:17 PM, "David Rowley" <david.rowley@2ndquadrant.com> wrote:

    On Sat, 2 Mar 2019 at 12:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >
    > "Li, Zheng" <zhelli@amazon.com> writes:
    > > Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join,
itis always faster compared to the original plan.
 
    >
    > TBH, I am *really* skeptical of sweeping claims like that.  The existing
    > code will typically produce a hashed-subplan plan, which ought not be
    > that awful as long as the subquery result doesn't blow out memory.
    > It certainly is going to beat a naive nested loop.
    
    It's pretty easy to show the claim is false using master and NOT EXISTS.
    
    create table small(a int not null);
    create table big (a int not null);
    insert into small select generate_Series(1,1000);
    insert into big select x%1000+1 from generate_Series(1,1000000) x;
    
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a);
    Time: 178.575 ms
    
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a or s.a is null);
    Time: 38049.969 ms (00:38.050)
    
    
    -- 
     David Rowley                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Sat, 2 Mar 2019 at 12:39, Li, Zheng <zhelli@amazon.com> wrote:
> However, if s.a is nullable, we would do this transformation:
>     select count(*) from big b where not exists(select 1 from small s
>     where s.a = b.a or s.a is null);

I understand you're keen to make this work, but you're assuming again
that forcing the planner into a nested loop plan is going to be a win
over the current behaviour. It may well be in some cases, but it's
very simple to show cases where it's a significant regression.

Using the same tables from earlier, and again with master:

alter table small alter column a drop not null;
select * from big where a not in(select a from small);
Time: 430.283 ms

Here's what you're proposing:

select * from big b where not exists(select 1 from small s where s.a =
b.a or s.a is null);
Time: 37419.646 ms (00:37.420)

about 80 times slower. Making "small" a little less small would likely
see that gap grow even further.

I think you're fighting a losing battle here with adding OR quals to
the join condition. This transformation happens so early in planning
that you really can't cost it out either.  I think the only way that
could be made to work satisfactorily would be with some execution
level support for it.  Short of that, you're left with just adding
checks that either side of the join cannot produce NULL values...
That's what I've proposed in [1].

[1] https://www.postgresql.org/message-id/CAKJS1f_OA5VeZx8A8H8mkj3uqEgOtmHBGCUA6%2BxqgmUJ6JQURw%40mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> I think you're fighting a losing battle here with adding OR quals to
> the join condition.

Yeah --- that has a nontrivial risk of making things significantly worse,
which makes it a hard sell.  I think the most reasonable bet here is
simply to not perform the transformation if we can't prove the inner side
NOT NULL.  That's going to catch most of the useful cases anyway IMO.

            regards, tom lane


Re: NOT IN subquery optimization

От
Jim Finnerty
Дата:
Folks - I was away on vacation for the month of February, and can give this
my attention again.

I agree with Tom's comment above - when the cost of the NOT IN is dominated
by the cost of the outer scan (i.e. when the cardinality of the outer
relation is large relative to the cardinality of the subquery), and if the
inner cardinality is small enough to fit in memory, then the current
implementation does an efficient hash lookup into an in-memory structure,
and that's a very fast way to do the NOT IN.  It almost achieves the
lower-bound cost of scanning the outer relation.  It can also parallelizes
easily, whether or not we currently can do that.  In these cases, the
current plan is the preferred plan, and we should keep it.

preferred in-memory hash lookup plan:  https://explain.depesz.com/s/I1kN

This is a case that we would want to avoid the transform, because when both
the inner and outer are nullable and the outer is large and the inner is
small, the transformed plan would Scan and Materialize the inner for each
row of the outer row, which is very slow compared to the untransformed plan:

slow case for the transformation: https://explain.depesz.com/s/0CBB

However, if the inner is too large to fit into memory, then the transformed
plan is faster on all of our other test cases, although our test cases are
far from complete.  If the current solution supports parallel scan of the
outer, for example, then PQ could have lower elapsed time than the non-PQ
nested loop solution.

Also, remember that the issue with the empty inner was just a bug that was
the result of trying to do an additional optimization in the case where
there is no WHERE clause in the subquery.  That bug has been fixed.  The
general case transformation described in the base note produces the correct
result in all cases, including the empty subquery case.







-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Sat, 2 Mar 2019 at 13:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > I think you're fighting a losing battle here with adding OR quals to
> > the join condition.
>
> Yeah --- that has a nontrivial risk of making things significantly worse,
> which makes it a hard sell.  I think the most reasonable bet here is
> simply to not perform the transformation if we can't prove the inner side
> NOT NULL.  That's going to catch most of the useful cases anyway IMO.

Did you mean outer side NOT NULL?   The OR col IS NULL was trying to
solve the outer side nullability problem when the inner side is empty.
  Of course, the inner side needs to not produce NULLs either, but
that's due to the fact that if a NULL exists in the inner side then
the anti-join should not produce any records.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Sat, 2 Mar 2019 at 13:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah --- that has a nontrivial risk of making things significantly worse,
>> which makes it a hard sell.  I think the most reasonable bet here is
>> simply to not perform the transformation if we can't prove the inner side
>> NOT NULL.  That's going to catch most of the useful cases anyway IMO.

> Did you mean outer side NOT NULL?

Sorry, sloppy thinking.

>   Of course, the inner side needs to not produce NULLs either, but
> that's due to the fact that if a NULL exists in the inner side then
> the anti-join should not produce any records.

Right.  So the path of least resistance is to transform to antijoin
only if we can prove both of those things (and maybe we need to check
that the join operator is strict, too?  -ENOCAFFEINE).  The question
before us is what is the cost-benefit ratio of trying to cope with
additional cases.  I'm skeptical that it's attractive: the cost
certainly seems high, and I don't know that there are very many
real-world cases where we'd get a win.

Hmm ... thinking about the strictness angle some more: what we really
need to optimize NOT IN, IIUC, is an assumption that the join operator
will never return NULL.  While not having NULL inputs is certainly a
*necessary* condition for that (assuming a strict operator) it's not a
*sufficient* condition.  Any Postgres function/operator is capable
of returning NULL whenever it feels like it.  So checking strictness
does not lead to a mathematically correct optimization.

My initial thought about plugging that admittedly-academic point is
to insist that the join operator be both strict and a member of a
btree opclass (hash might be OK too; less sure about other index types).
The system already contains assumptions that btree comparators never
return NULL.  I doubt that this costs us any real-world applicability,
because if the join operator can neither merge nor hash, we're screwed
anyway for finding a join plan that's better than nested-loop.

            regards, tom lane


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Sun, 3 Mar 2019 at 05:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm ... thinking about the strictness angle some more: what we really
> need to optimize NOT IN, IIUC, is an assumption that the join operator
> will never return NULL.  While not having NULL inputs is certainly a
> *necessary* condition for that (assuming a strict operator) it's not a
> *sufficient* condition.  Any Postgres function/operator is capable
> of returning NULL whenever it feels like it.  So checking strictness
> does not lead to a mathematically correct optimization.

That's something I didn't think of.

> My initial thought about plugging that admittedly-academic point is
> to insist that the join operator be both strict and a member of a
> btree opclass (hash might be OK too; less sure about other index types).
> The system already contains assumptions that btree comparators never
> return NULL.  I doubt that this costs us any real-world applicability,
> because if the join operator can neither merge nor hash, we're screwed
> anyway for finding a join plan that's better than nested-loop.

Why strict? If both inputs are non-NULL, then what additional
guarantees does strict give us?

I implemented a btree opfamily check in my version of the patch. Not
so sure about hash, can you point me in the direction of a mention of
how this is guarantees for btree?

The attached v1.2 does this adds a regression test using the LINE
type. This has an operator named '=', but no btree opfamily. A few
other types are in this boat too, per:

select typname from pg_type t where not exists(select 1 from pg_amop
where amoplefttype = t.oid and amopmethod=403) and exists (select 1
from pg_operator where oprleft = t.oid and oprname = '=');

The list of builtin types that have a hash opfamily but no btree
opfamily that support NOT IN are not very exciting, so doing the same
for hash might not be worth the extra code.

select typname from pg_type t where exists(select 1 from pg_amop where
amoplefttype = t.oid and amopmethod=405) and exists (select 1 from
pg_operator where oprleft = t.oid and oprname = '=') and not
exists(select 1 from pg_amop where amoplefttype = t.oid and
amopmethod=403);
 typname
---------
 xid
 cid
 aclitem
(3 rows)

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: NOT IN subquery optimization

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Sun, 3 Mar 2019 at 05:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> My initial thought about plugging that admittedly-academic point is
>> to insist that the join operator be both strict and a member of a
>> btree opclass (hash might be OK too; less sure about other index types).

> Why strict? If both inputs are non-NULL, then what additional
> guarantees does strict give us?

Yeah, if we're verifying the inputs are non-null, I think that probably
doesn't matter.

> I implemented a btree opfamily check in my version of the patch. Not
> so sure about hash, can you point me in the direction of a mention of
> how this is guarantees for btree?

https://www.postgresql.org/docs/devel/btree-support-funcs.html
quoth

    The comparison function must take two non-null values A and B and
    return an int32 value that is < 0, 0, or > 0 when A < B, A = B, or A >
    B, respectively. A null result is disallowed: all values of the data
    type must be comparable.

(At the code level, this is implicit in the fact that the comparison
function will be called via FunctionCall2Coll or a sibling, and those
all throw an error if the called function returns NULL.)

Now, it doesn't say in so many words that the comparison operators
have to yield results consistent with the comparison support function,
but I think that's pretty obvious ...

For hash, the equivalent constraint is that the hash function has to
work for every possible input value.  I suppose it's possible that
the associated equality operator would sometimes return null, but
it's hard to think of a practical reason for doing so.

I've not dug in the code, but I wouldn't be too surprised if
nodeMergejoin.c or nodeHashjoin.c, or the stuff related to hash
grouping or hash aggregation, also contain assumptions about
the equality operators not returning null.

> The list of builtin types that have a hash opfamily but no btree
> opfamily that support NOT IN are not very exciting, so doing the same
> for hash might not be worth the extra code.

Agreed for builtin types, but there might be some extensions out there
where this doesn't hold.  It's not terribly hard to imagine a data type
that hasn't got a linear sort order but is amenable to hashing.
(The in-core xid type is an example, actually.)

            regards, tom lane


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Sun, 3 Mar 2019 at 17:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> (At the code level, this is implicit in the fact that the comparison
> function will be called via FunctionCall2Coll or a sibling, and those
> all throw an error if the called function returns NULL.)
>
> Now, it doesn't say in so many words that the comparison operators
> have to yield results consistent with the comparison support function,
> but I think that's pretty obvious ...

Ah okay. I can get it to misbehave by setting fcinfo->isnull = true in
the debugger from int4eq(). I see the NULL result there is not
verified as that's just translated into "false" by ExecInterpExpr()'s
EEOP_QUAL case.  If you're saying something doing that is
fundamentally broken, then I guess we're okay.

> David Rowley <david.rowley@2ndquadrant.com> writes:
> > The list of builtin types that have a hash opfamily but no btree
> > opfamily that support NOT IN are not very exciting, so doing the same
> > for hash might not be worth the extra code.
>
> Agreed for builtin types, but there might be some extensions out there
> where this doesn't hold.  It's not terribly hard to imagine a data type
> that hasn't got a linear sort order but is amenable to hashing.

On reflection, it seems pretty easy to add this check, so I've done so
in the attached.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: NOT IN subquery optimization

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Sun, 3 Mar 2019 at 17:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (At the code level, this is implicit in the fact that the comparison
>> function will be called via FunctionCall2Coll or a sibling, and those
>> all throw an error if the called function returns NULL.)

> Ah okay. I can get it to misbehave by setting fcinfo->isnull = true in
> the debugger from int4eq(). I see the NULL result there is not
> verified as that's just translated into "false" by ExecInterpExpr()'s
> EEOP_QUAL case.  If you're saying something doing that is
> fundamentally broken, then I guess we're okay.

No, what I'm thinking of is this bit in _bt_compare:

            result = DatumGetInt32(FunctionCall2Coll(&scankey->sk_func,
                                                     scankey->sk_collation,
                                                     datum,
                                                     scankey->sk_argument));

You absolutely will get errors during btree insertions and searches
if a datatype's btree comparison functions ever return NULL (for
non-NULL inputs).

For hash indexes, that kind of restriction only directly applies to
hash-calculation functions, which perhaps are not as tightly tied to the
opclass's user-visible operators as is the case for btree opclasses.
But I think you might be able to find places in hash join or grouping
that are calling the actual equality operator and not allowing for it
to return NULL.

            regards, tom lane


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Mon, 4 Mar 2019 at 04:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > On Sun, 3 Mar 2019 at 17:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> (At the code level, this is implicit in the fact that the comparison
> >> function will be called via FunctionCall2Coll or a sibling, and those
> >> all throw an error if the called function returns NULL.)
>
> > Ah okay. I can get it to misbehave by setting fcinfo->isnull = true in
> > the debugger from int4eq(). I see the NULL result there is not
> > verified as that's just translated into "false" by ExecInterpExpr()'s
> > EEOP_QUAL case.  If you're saying something doing that is
> > fundamentally broken, then I guess we're okay.
>
> No, what I'm thinking of is this bit in _bt_compare:
>
>             result = DatumGetInt32(FunctionCall2Coll(&scankey->sk_func,
>                                                      scankey->sk_collation,
>                                                      datum,
>                                                      scankey->sk_argument));
>
> You absolutely will get errors during btree insertions and searches
> if a datatype's btree comparison functions ever return NULL (for
> non-NULL inputs).

I understand this is the case if an index happens to be used, but
there's no guarantee that's going to be the case. I was looking at the
case where an index was not used.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Mon, 4 Mar 2019 at 04:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You absolutely will get errors during btree insertions and searches
>> if a datatype's btree comparison functions ever return NULL (for
>> non-NULL inputs).

> I understand this is the case if an index happens to be used, but
> there's no guarantee that's going to be the case. I was looking at the
> case where an index was not used.

Not following your point?  An index opclass is surely not going to be
designed on the assumption that it can never be used in an index.
Therefore, its support functions can't return NULL unless the index AM
allows that.

            regards, tom lane


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Mon, 4 Mar 2019 at 11:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > On Mon, 4 Mar 2019 at 04:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> You absolutely will get errors during btree insertions and searches
> >> if a datatype's btree comparison functions ever return NULL (for
> >> non-NULL inputs).
>
> > I understand this is the case if an index happens to be used, but
> > there's no guarantee that's going to be the case. I was looking at the
> > case where an index was not used.
>
> Not following your point?  An index opclass is surely not going to be
> designed on the assumption that it can never be used in an index.
> Therefore, its support functions can't return NULL unless the index AM
> allows that.

I agree that it makes sense that the behaviour of the two match. I was
trying to hint towards that when I said:

> If you're saying something doing that is
> fundamentally broken, then I guess we're okay.

but likely I didn't make that very clear.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Re: NOT IN subquery optimization

От
David Steele
Дата:
On 2/27/19 2:26 AM, David Rowley wrote:
> On Wed, 27 Feb 2019 at 13:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> "Li, Zheng" <zhelli@amazon.com> writes:
>>> However, given that the March CommitFest is imminent and the runtime smarts patent concerns David had pointed out
(whichI was not aware of before), we would not move that direction at the moment.
 
>>
>>> I propose that we collaborate to build one patch from the two patches submitted in this thread for the CF.
>>
>> TBH, I think it's very unlikely that any patch for this will be seriously
>> considered for commit in v12.  It would be against project policy, and
>> spending a lot of time reviewing the patch would be quite unfair to other
>> patches that have been in the queue longer.  Therefore, I'd suggest that
>> you not bend things out of shape just to have some patch to submit before
>> March 1.  It'd be better to work with the goal of having a coherent patch
>> ready for the first v13 CF, probably July-ish.
> 
> FWIW, I did add this to the March CF, but I set the target version to
> 13.  I wasn't considering this for PG12. I see Zheng was, but I agree
> with you on PG13 being the target for this.

Looks like the target version of 13 was removed but I have added it back.

Regards,
-- 
-David
david@pgmasters.net


Re: Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Tue, 5 Mar 2019 at 21:21, David Steele <david@pgmasters.net> wrote:
>
> On 2/27/19 2:26 AM, David Rowley wrote:
> > FWIW, I did add this to the March CF, but I set the target version to
> > 13.  I wasn't considering this for PG12. I see Zheng was, but I agree
> > with you on PG13 being the target for this.
>
> Looks like the target version of 13 was removed but I have added it back.

The problem seems to be that there are now 2 CF entries for this
thread. I originally added [1], but later Zheng added [2].  From what
Jim mentioned when he opened this thread I had the idea that no patch
existed yet, so I posted the one I already had written for this 4
years ago thinking that might be useful to base new work on.  I guess
Zheng's patch already exists when Jim opened this thread as a patch
appeared fairly quickly afterwards.  If that's true then I understand
that they wouldn't want to drop the work they'd already done in favour
of picking mine up.

I'm not all that sure what do to about this. It's going to cause quite
a bit of confusion having two patches in one thread. Part of me feels
that I've hijacked this thread and that I should just drop my patch
altogether and help review Zheng's patch, but I'm struggling a bit to
do that as I've not managed to find problems with my version, but a
few have been pointed out with the other patch  (of course, there may
be some yet undiscovered issues with my version too).

Alternatively, I could take my patch to another thread, but there does
not seem to be much sense in that. It might not solve the confusion
problem.  The best thing would be that if we could work together to
make this work, however, we both seem to have fairly different ideas
on how it should work. Tom and I both agree that Zheng and Jim's
proposal to add OR x IS NULL clauses to the join condition is most
likely a no go area due to it disallowing hash and merge anti-joins.
The last I can understand from Jim is that they appear to disagree
with that and want to do the transformation based on costs.  Perhaps
they're working on some new ideas to make that more feasible. I'm
interested to hear the latest on this.

[1] https://commitfest.postgresql.org/22/2020/
[2] https://commitfest.postgresql.org/22/2023/

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
David Steele
Дата:
On 3/5/19 10:53 AM, David Rowley wrote:
> On Tue, 5 Mar 2019 at 21:21, David Steele <david@pgmasters.net> wrote:
>>
>> On 2/27/19 2:26 AM, David Rowley wrote:
>>> FWIW, I did add this to the March CF, but I set the target version to
>>> 13.  I wasn't considering this for PG12. I see Zheng was, but I agree
>>> with you on PG13 being the target for this.
>>
>> Looks like the target version of 13 was removed but I have added it back.
> 
> The problem seems to be that there are now 2 CF entries for this
> thread. I originally added [1], but later Zheng added [2].  From what
> Jim mentioned when he opened this thread I had the idea that no patch
> existed yet, so I posted the one I already had written for this 4
> years ago thinking that might be useful to base new work on.

Yeah, I just figured this out when I got to your patch which was 
properly marked as PG13 and then saw they were pointing at the same thread.

At the very least one of the patch entries should be closed, or moved to 
a new thread.

I'm not sure if I have an issue with competing patches on the same 
thread.  I've seen that before and it can lead to a good outcome.  It 
case, as you say, also lead to confusion.

Regards,
-- 
-David
david@pgmasters.net


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Sun, 3 Mar 2019 at 01:34, Jim Finnerty <jfinnert@amazon.com> wrote:
> I agree with Tom's comment above - when the cost of the NOT IN is dominated
> by the cost of the outer scan (i.e. when the cardinality of the outer
> relation is large relative to the cardinality of the subquery), and if the
> inner cardinality is small enough to fit in memory, then the current
> implementation does an efficient hash lookup into an in-memory structure,
> and that's a very fast way to do the NOT IN.  It almost achieves the
> lower-bound cost of scanning the outer relation.  It can also parallelizes
> easily, whether or not we currently can do that.  In these cases, the
> current plan is the preferred plan, and we should keep it.

If you do the conversion to anti-join (without hacking at the join
quals and assuming no nulls are possible), then the planner can decide
what's best.  The planner may choose a hash join which is not hugely
different from a hashed subplan, however from the testing I've done
the Hash Join is a bit faster. I imagine there's been more motivation
over the years to optimise that over hashed subplans.  As far as I
know, there's no parallel query support for hashed subplans, but I
know there is for hash joins.  In short, I don't think it makes any
sense to not translate into an anti-join (when possible). I think the
best anti-join plan will always be a win over the subquery. The
planner could make a mistake of course, but that's a different issue.
We certainly don't consider keeping the subquery around for NOT
EXISTS.

> This is a case that we would want to avoid the transform, because when both
> the inner and outer are nullable and the outer is large and the inner is
> small, the transformed plan would Scan and Materialize the inner for each
> row of the outer row, which is very slow compared to the untransformed plan:
>
> slow case for the transformation: https://explain.depesz.com/s/0CBB

Well, that's because you're forcing the planner into a corner in
regards to the join condition. It has no choice but to nested loop
that join.

> However, if the inner is too large to fit into memory, then the transformed
> plan is faster on all of our other test cases, although our test cases are
> far from complete.  If the current solution supports parallel scan of the
> outer, for example, then PQ could have lower elapsed time than the non-PQ
> nested loop solution.

I'm having a little trouble understanding this.  From what I
understand the code adds an OR .. IS NULL clause to the join
condition. Is this still the case with what you've been testing here?
If so, I'm surprised to hear all your test cases are faster. If
there's an OR clause in the join condition then the planner has no
choice but to use a nested loop join, so it's very surprising that you
would find that faster with larger data sets.

Or does the code your testing implement this a different way? Perhaps
with some execution level support?

> Also, remember that the issue with the empty inner was just a bug that was
> the result of trying to do an additional optimization in the case where
> there is no WHERE clause in the subquery.  That bug has been fixed.  The
> general case transformation described in the base note produces the correct
> result in all cases, including the empty subquery case.

I'm not sure why lack of WHERE clause in the subquery counts for
anything here.  The results set from the subquery can be empty or not
empty with or without a WHERE clause.  The only way you'll know it's
empty during planning is if some gating qual says so, but that's yet
to be determined by the time the transformation should be done.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
Tom Lane
Дата:
David Steele <david@pgmasters.net> writes:
> I'm not sure if I have an issue with competing patches on the same 
> thread.  I've seen that before and it can lead to a good outcome.  It 
> case, as you say, also lead to confusion.

It's a bit of a shame that the cfbot will only be testing one of them
at a time if we leave it like this.  I kind of lean towards the
two-thread, two-CF-entry approach because of that.  The amount of
confusion is a constant.

            regards, tom lane


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Wed, 6 Mar 2019 at 03:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Steele <david@pgmasters.net> writes:
> > I'm not sure if I have an issue with competing patches on the same
> > thread.  I've seen that before and it can lead to a good outcome.  It
> > case, as you say, also lead to confusion.
>
> It's a bit of a shame that the cfbot will only be testing one of them
> at a time if we leave it like this.  I kind of lean towards the
> two-thread, two-CF-entry approach because of that.  The amount of
> confusion is a constant.

That sounds fine. I'll take mine elsewhere since I didn't start this thread.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
David Rowley
Дата:
On Wed, 6 Mar 2019 at 12:25, David Rowley <david.rowley@2ndquadrant.com> wrote:
> That sounds fine. I'll take mine elsewhere since I didn't start this thread.

Moved to https://www.postgresql.org/message-id/CAKJS1f82pqjqe3WT9_xREmXyG20aOkHc-XqkKZG_yMA7JVJ3Tw%40mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
Hey, here is our latest patch. Major changes in this patch include:
1. Use the original hashed subplan if the inner fits in memory as decided by subplan_is_hashable().
2. Fixed the inner relation empty case by adding an inner relation existence check when we pull x out as a filter on
theouter (see details below).
 
3. Integrate David Rowley's routine to use strict predicates and inner join conditions when checking nullability of a
Var.

Detailed description of the patch:

    NOT IN to ANTI JOIN transformation
    
    If the NOT IN subquery is not eligible for hashed subplan as decided by
    subplan_is_hashable(), do the following NOT IN to ANTI JOIN
    transformation:
    
    Single expression:
        When x is nullable:
        t1.x not in (t2.y where p) =>
        ANTI JOIN
        t1 (Filter: t1.x IS NOT NULL or NOT EXISTS (select 1 from t2 where p)),
        t2 on join condition (t1.x=t2.y or t2.y IS NULL)
        and p.
        The predicate "t2.y IS NULL" can be removed if y is non-nullable.
    
        When x is non-nullable:
        t1.x not in (t2.y where p) =>
        ANTI JOIN t1, t2 on join condition (t1.x=t2.y or t2.y IS NULL)
        and p.
        The predicate "t2.y IS NULL" can be removed if y is non-nullable.
    
    Multi expression:
        If all xi's are nullable:
        (x1, x2, ... xn) not in (y1, y2, ... yn ) =>
        ANTI JOIN t1, t2 on join condition:
        ((t1.x1 = t2.y1) and ... (t1.xi = t2.yi) ... and
         (t1.xn = t2.yn)) IS NOT FALSE.
    
        If at least one xi is non-nuallable:
        (x1, x2, ... xn) not in (y1, y2, ... yn ) =>
        ANTI JOIN t1, t2 on join condition:
        (t1.x1 = t2.y1 or t2.y1 IS NULL or t1.x1 IS NULL) and ...
        (t1.xi = t2.yi or t2.yi IS NULL) ... and
        (t1.xn = t2.yn or t2.yn IS NULL or t1.xn IS NULL).
    
    Add nullability testing routine is_node_nonnullable(), currently it
    handles Var, TargetEntry, CoalesceExpr and Const. It uses strict
    predicates, inner join conditions and NOT NULL constraint to check
    the nullability of a Var.
    
    Adjust and apply reduce_outer_joins() before the transformation so
    that the outer joins have an opportunity to be converted to inner joins
    prior to the transformation.
    
    We measured performance improvements of two to five orders of magnitude
    on most queries in a development environment. In our performance experiments,
    table s (small) has 11 rows, table l (large) has 1 million rows. s.n and l.n
    have NULL value. s.nn and l.nn are NOT NULL. Index is created on each column.
    
    Cases using hash anti join:
    l.nn not in (l.nn) 21900s -> 235ms
    l.nn not in (l.nn where u>0) 22000s -> 240ms
    l.n not in (l.nn) 21900s -> 238ms
    l.n not in (l.nn where u>0) 22000s -> 248ms
    
    Cases using index nested loop anti join
    s.n not in (l.nn) 360ms -> 0.5ms
    s.n not in (l.nn where u>0) 390ms -> 0.6ms
    s.nn not in (l.nn) 365ms -> 0.5ms
    s.nn not in (l.nn where u>0) 390ms -> 0.5ms
    
    Cases using bitmap heap scan on the inner and nested loop anti join:
    s.n not in (l.n) 360ms -> 0.7ms
    l.n not in (l.n) 21900s ->  1.6s
    l.n not in (l.n where u>0) 22000s -> 1680ms
    s.nn not in (l.n) 360ms -> 0.5ms
    l.nn not in (l.n) 21900s -> 1650ms
    l.nn not in (l.n where u>0) 22000s -> 1660ms
    
    Cases using the original hashed subplan:
    l.n not in (s.n) 63ms -> 63ms
    l.nn not in (s.nn) 63ms -> 63ms
    l.n not in (s.n where u>0) 63ms -> 63ms

Comments are welcome.

Regards,
-----------
Zheng Li
AWS, Amazon Aurora PostgreSQL
 


Вложения

Re: NOT IN subquery optimization

От
zhengli
Дата:
In our patch, we only proceed with the ANTI JOIN transformation if
subplan_is_hashable(subplan) is
false, it requires the subquery to be planned at this point.

To avoid planning the subquery again later on, I want to keep a pointer of
the subplan in SubLink so that we can directly reuse the subplan when
needed. However, this change breaks initdb for some reason and I'm trying to
figure it out.

I'll send the rebased patch in the following email since it's been a while.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: [UNVERIFIED SENDER] Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
Rebased patch is attached.

Comments are welcome.

-----------
Zheng Li
AWS, Amazon Aurora PostgreSQL
 

On 6/14/19, 5:39 PM, "zhengli" <zhelli@amazon.com> wrote:

    In our patch, we only proceed with the ANTI JOIN transformation if
    subplan_is_hashable(subplan) is
    false, it requires the subquery to be planned at this point.
    
    To avoid planning the subquery again later on, I want to keep a pointer of
    the subplan in SubLink so that we can directly reuse the subplan when
    needed. However, this change breaks initdb for some reason and I'm trying to
    figure it out.
    
    I'll send the rebased patch in the following email since it's been a while.
    
    
    
    --
    Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
    
    
    


Вложения

Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
-----
    To avoid planning the subquery again later on, I want to keep a pointer of
    the subplan in SubLink so that we can directly reuse the subplan when
    needed. However, this change breaks initdb for some reason and I'm trying to
    figure it out.
-----

"make clean" solved the initdb issue. This new patch keeps a pointer of the subplan
 in SubLink so that we can directly reuse the subplan when needed. When the subplan
is not hashable (too big to fit in work_mem), the NOT IN query will be flattened to
an ANTI JOIN and we won't need to use subplan again. However, when the subplan
is hashable, we don't do the conversion and will need to use subplan later, patch v2.1
avoids planning the subquery twice in this case.

-----------
Zheng Li
AWS, Amazon Aurora PostgreSQL    
    


Вложения

Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
Hi,

I'm submitting patch v2.2.

This version fixed an issue that involves CTE. Because we call subquery_planner before deciding whether to proceed with
thetransformation, we need to setup access to upper level CTEs at this point if the subquery contains any CTE
RangeTblEntry.

Also added more test cases of NOT IN accessing CTEs, including recursive CTE. It's nice that CTE can use index now!

Let me know if you have any comments.

Regards,
-----------
Zheng Li
AWS, Amazon Aurora PostgreSQL
 


Вложения

Re: NOT IN subquery optimization

От
Michael Paquier
Дата:
On Wed, Jun 26, 2019 at 09:26:16PM +0000, Li, Zheng wrote:
> Let me know if you have any comments.

I have one: the latest patch visibly applies, but fails to build
because of the recent API changes around lists in the backend code.
So a rebase is in order.  The discussion has not moved a iota in the
last couple of months, still as the latest patch has not received
reviews, I have moved it to next CF waiting on author.
--
Michael

Вложения

Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
Hi Michael,

Here is the latest rebased patch.

Regards,
-----------
Zheng Li
AWS, Amazon Aurora PostgreSQL
 

On 11/30/19, 10:43 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

    On Wed, Jun 26, 2019 at 09:26:16PM +0000, Li, Zheng wrote:
    > Let me know if you have any comments.
    
    I have one: the latest patch visibly applies, but fails to build
    because of the recent API changes around lists in the backend code.
    So a rebase is in order.  The discussion has not moved a iota in the
    last couple of months, still as the latest patch has not received
    reviews, I have moved it to next CF waiting on author.
    --
    Michael
    


Вложения

Re: NOT IN subquery optimization

От
Andrey Lepikhov
Дата:
At the top of the thread your co-author argued the beginning of this 
work with "findings about the performance of PostgreSQL, MySQL, and 
Oracle on various subqueries:"

https://antognini.ch/2017/12/how-well-a-query-optimizer-handles-subqueries/

I launched this test set with your "not_in ..." patch. Your optimization 
improves only results of D10-D13 queries. Nothing has changed for bad 
plans of the E20-E27 and F20-F27 queries.

For example, we can replace E20 query:
SELECT * FROM large WHERE n IN (SELECT n FROM small WHERE small.u = 
large.u); - execution time: 1370 ms, by
SELECT * FROM large WHERE EXISTS (SELECT n,u FROM small WHERE (small.u = 
large.u) AND (large.n = small.n
)) AND n IS NOT NULL; - execution time: 0.112 ms

E21 query:
SELECT * FROM large WHERE n IN (SELECT nn FROM small WHERE small.u = 
large.u); - 1553 ms, by
SELECT * FROM large WHERE EXISTS (SELECT nn FROM small WHERE (small.u = 
large.u) AND (small.nn = large.n)); - 0.194 ms

F27 query:
SELECT * FROM large WHERE nn NOT IN (SELECT nn FROM small WHERE small.nu 
= large.u); - 1653.048 ms, by
SELECT * FROM large WHERE NOT EXISTS (SELECT nn,nu FROM small WHERE 
(small.nu = large.u) AND (small.nn = large.nn)); - 274.019 ms

Are you planning to make another patch for these cases?

Also i tried to increase work_mem up to 2GB: may be hashed subqueries 
can improve situation? But hashing is not improved execution time of the 
queries significantly.

On your test cases (from the comments of the patch) the subquery hashing 
has the same execution time with queries No.13-17. At the queries 
No.1-12 it is not so slow as without hashing, but works more slowly (up 
to 3 orders) than NOT IN optimization.

On 12/2/19 9:25 PM, Li, Zheng wrote:
> Here is the latest rebased patch.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
Hi Andrey,

Thanks for the comment!

The unimproved cases you mentioned all fall into the category “correlated subquery”. This category is explicitly
disallowedby existing code to convert to join in convert_ANY_sublink_to_join:
 
    /*
     * The sub-select must not refer to any Vars of the parent query. (Vars of
     * higher levels should be okay, though.)
     */
    if (contain_vars_of_level((Node *) subselect, 1))
        return NULL;

I think this is also the reason why hashed subplan is not used for such subqueries.

It's probably not always safe to convert a correlated subquery to join. We need to find out/prove when it’s safe/unsafe
toconvert such ANY subquery if we were to do so.
 

Regards,
-----------
Zheng Li
AWS, Amazon Aurora PostgreSQL
 

On 1/5/20, 1:12 AM, "Andrey Lepikhov" <a.lepikhov@postgrespro.ru> wrote:

    At the top of the thread your co-author argued the beginning of this 
    work with "findings about the performance of PostgreSQL, MySQL, and 
    Oracle on various subqueries:"
    
    https://antognini.ch/2017/12/how-well-a-query-optimizer-handles-subqueries/
    
    I launched this test set with your "not_in ..." patch. Your optimization 
    improves only results of D10-D13 queries. Nothing has changed for bad 
    plans of the E20-E27 and F20-F27 queries.
    
    For example, we can replace E20 query:
    SELECT * FROM large WHERE n IN (SELECT n FROM small WHERE small.u = 
    large.u); - execution time: 1370 ms, by
    SELECT * FROM large WHERE EXISTS (SELECT n,u FROM small WHERE (small.u = 
    large.u) AND (large.n = small.n
    )) AND n IS NOT NULL; - execution time: 0.112 ms
    
    E21 query:
    SELECT * FROM large WHERE n IN (SELECT nn FROM small WHERE small.u = 
    large.u); - 1553 ms, by
    SELECT * FROM large WHERE EXISTS (SELECT nn FROM small WHERE (small.u = 
    large.u) AND (small.nn = large.n)); - 0.194 ms
    
    F27 query:
    SELECT * FROM large WHERE nn NOT IN (SELECT nn FROM small WHERE small.nu 
    = large.u); - 1653.048 ms, by
    SELECT * FROM large WHERE NOT EXISTS (SELECT nn,nu FROM small WHERE 
    (small.nu = large.u) AND (small.nn = large.nn)); - 274.019 ms
    
    Are you planning to make another patch for these cases?
    
    Also i tried to increase work_mem up to 2GB: may be hashed subqueries 
    can improve situation? But hashing is not improved execution time of the 
    queries significantly.
    
    On your test cases (from the comments of the patch) the subquery hashing 
    has the same execution time with queries No.13-17. At the queries 
    No.1-12 it is not so slow as without hashing, but works more slowly (up 
    to 3 orders) than NOT IN optimization.
    
    On 12/2/19 9:25 PM, Li, Zheng wrote:
    > Here is the latest rebased patch.
    
    -- 
    Andrey Lepikhov
    Postgres Professional
    https://postgrespro.com
    The Russian Postgres Company
    


Re: NOT IN subquery optimization

От
Andrey Lepikhov
Дата:

On 1/7/20 12:34 AM, Li, Zheng wrote:
> Hi Andrey,
> 
> Thanks for the comment!
> 
> The unimproved cases you mentioned all fall into the category “correlated subquery”. This category is explicitly
disallowedby existing code to convert to join in convert_ANY_sublink_to_join:
 
>      /*
>       * The sub-select must not refer to any Vars of the parent query. (Vars of
>       * higher levels should be okay, though.)
>       */
>      if (contain_vars_of_level((Node *) subselect, 1))
>          return NULL;
> 
> I think this is also the reason why hashed subplan is not used for such subqueries.
> 
> It's probably not always safe to convert a correlated subquery to join. We need to find out/prove when it’s
safe/unsafeto convert such ANY subquery if we were to do so.
 
> 

Maybe this part of code contains logical error?
You optimize only the special case of the "NOT IN" expression, equal to 
NOT EXISTS. The convert_EXISTS_sublink_to_join() routine can contain 
vars of the parent query.
May be you give an trivial example for this problem?

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: NOT IN subquery optimization

От
Tom Lane
Дата:
"Li, Zheng" <zhelli@amazon.com> writes:
> Here is the latest rebased patch.

I noticed that the cfbot is failing to test this because of some trivial
merge conflicts, so here's a re-rebased version.

I haven't reviewed this in any detail, but here's a couple of notes
from having quickly looked through the patch:

* I find it entirely unacceptable to stick some planner temporary
fields into struct SubLink.  If you need that storage you'll have
to find some other place to put it.  But in point of fact I don't
think you need it; it doesn't look to me to be critical to generate
the subquery's plan any earlier than make_subplan would have done it.
Moreover, you should really strive to *not* do that, because it's
likely to get in the way of other future optimizations.  As the
existing comment in make_subplan already suggests, we might want to
delay subplan planning even further than that in future.

* I'm also not too happy with the (undocumented) rearrangement of
reduce_outer_joins.  There's a specific sequence of processing that
that's involved in, as documented at the top of prepjointree.c, and
I doubt that you can just randomly call it from other places and expect
good results.  In particular, since JOIN alias var flattening won't have
happened yet when this code is being run from pull_up_sublinks, it's
unlikely that reduce_outer_joins will reliably get the same answers it
would get normally.  I also wonder whether it's safe to make the
parsetree changes it makes earlier than normal, and whether it will be
problematic to run it twice on the same tree, and whether its rather
indirect connection to distribute_qual_to_rels is going to misbehave.

* The proposed test additions seem to about triple the runtime of
subselect.sql.  This seems excessive.  I also wonder why it's necessary
for this test to build its own large tables; couldn't it re-use ones
that already exist in the regression database?

* Not really sure that we need a new planner GUC for this, but if we
do, it needs to be documented.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5da0528..16ce707 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -992,7 +992,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
      * preprocessing.
      */
     if (hasOuterJoins)
-        reduce_outer_joins(root);
+        reduce_outer_joins(parse);

     /*
      * If we have any RTE_RESULT relations, see if they can be deleted from
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 3650e83..fa64281 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -39,6 +39,8 @@
 #include "utils/syscache.h"


+bool enable_not_in_transform;
+
 typedef struct convert_testexpr_context
 {
     PlannerInfo *root;
@@ -158,8 +160,7 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod,
  * subquery itself is in a resjunk tlist entry whose value is uninteresting).
  */
 static Node *
-make_subplan(PlannerInfo *root, Query *orig_subquery,
-             SubLinkType subLinkType, int subLinkId,
+make_subplan(PlannerInfo *root, SubLink *sublink,
              Node *testexpr, bool isTopQual)
 {
     Query       *subquery;
@@ -171,6 +172,9 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
     Plan       *plan;
     List       *plan_params;
     Node       *result;
+    Query      *orig_subquery = (Query *) sublink->subselect;
+    SubLinkType subLinkType = sublink->subLinkType;
+    int         subLinkId = sublink->subLinkId;

     /*
      * Copy the source Query node.  This is a quick and dirty kluge to resolve
@@ -216,24 +220,33 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
     /* plan_params should not be in use in current query level */
     Assert(root->plan_params == NIL);

-    /* Generate Paths for the subquery */
-    subroot = subquery_planner(root->glob, subquery,
-                               root,
-                               false, tuple_fraction);
+    if(sublink->subroot != NULL &&
+        sublink->subplan != NULL)
+    {
+        subroot = (PlannerInfo *) sublink->subroot;
+        plan = (Plan *) sublink->subplan;
+    }
+    else
+    {
+        /* Generate Paths for the subquery */
+        subroot = subquery_planner(root->glob, subquery,
+                                   root,
+                                   false, tuple_fraction);
+
+        /*
+         * Select best Path and turn it into a Plan.  At least for now, there
+         * seems no reason to postpone doing that.
+         */
+        final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
+        best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
+
+        plan = create_plan(subroot, best_path);
+    }

     /* Isolate the params needed by this specific subplan */
     plan_params = root->plan_params;
     root->plan_params = NIL;

-    /*
-     * Select best Path and turn it into a Plan.  At least for now, there
-     * seems no reason to postpone doing that.
-     */
-    final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
-    best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
-
-    plan = create_plan(subroot, best_path);
-
     /* And convert to SubPlan or InitPlan format. */
     result = build_subplan(root, plan, subroot, plan_params,
                            subLinkType, subLinkId,
@@ -1173,6 +1186,389 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context)
     return expression_tree_walker(node, inline_cte_walker, context);
 }

+/* Returns a List of Nodes from the testexpr of an Any SubLink */
+static List *
+getTestExpr(SubLink *sublink)
+{
+    Node * testexpr = sublink->testexpr;
+    Assert(testexpr);
+
+    /* single expression */
+    if(IsA(testexpr, OpExpr))
+    {
+        OpExpr    *opexpr = (OpExpr *) testexpr;
+        Node    *testnode = linitial(opexpr->args);
+
+        return list_make1(testnode);
+    }
+    /* multi-expression */
+    else if(IsA(testexpr, BoolExpr))
+    {
+        BoolExpr    *bexpr = (BoolExpr *) testexpr;
+        ListCell        *lc;
+        Node        *node;
+        List            *result = NULL;
+
+        foreach(lc, bexpr->args)
+        {
+            node = lfirst(lc);
+            if(IsA(node, OpExpr))
+            {
+                OpExpr *expr = (OpExpr *) node;
+                result = lappend(result, linitial(expr->args));
+            }
+            else
+            {
+                elog(ERROR, "unrecognized node type for testexpr: %d",
+                        (int) nodeTag(node));
+            }
+        }
+        return result;
+    }
+    else
+    {
+        elog(ERROR, "unrecognized node type for testexpr: %d",
+                (int) nodeTag(testexpr));
+    }
+}
+
+/* Try to reduce outer joins if there is one in the Query */
+static void
+reduce_outer_joins_NOT_IN(Query *parse)
+{
+    ListCell                *lc;
+    RangeTblEntry    *rte;
+
+    foreach(lc, parse->rtable)
+    {
+        rte = (RangeTblEntry *) lfirst(lc);
+        /* try to reduce outer joins if there is one */
+        if (rte->rtekind == RTE_JOIN &&
+                IS_OUTER_JOIN(rte->jointype))
+        {
+            reduce_outer_joins(parse);
+            return;
+        }
+    }
+}
+
+/*
+ * Make clause NOT EXISTS
+ * (select 1 from t2 where p) for the NOT IN to ANTI JOIN
+ * transformation.
+ */
+static
+Node *
+makeExistsTest_NOT_IN(Query *subselect)
+{
+    BoolExpr *notExpr = makeNode(BoolExpr);
+    SubLink *exists = makeNode(SubLink);
+    Query *selectOne =  copyObject(subselect);
+    Const *oneconst;
+    TargetEntry *dummyte;
+
+    /* modify subselect target list to contain a dummy const 1 */
+    oneconst = makeConst(INT4OID,
+                         -1,
+                         InvalidOid,
+                         sizeof(int32),
+                         Int32GetDatum(1),
+                         false, /* isnull */
+                         true); /* pass by value */
+    dummyte = makeTargetEntry((Expr *) oneconst,
+                              1,
+                              "one",
+                              false /* resjunk */ );
+    selectOne->targetList = list_make1(dummyte);
+
+    /* make EXISTS(select 1 from t2 where p) */
+    exists->subLinkType = EXISTS_SUBLINK;
+    exists->subLinkId = 0;
+    exists->subselect = (Node *) selectOne;
+    exists->location = -1;
+    exists->subroot = NULL;
+    exists->subplan = NULL;
+
+    /* make NOT EXISTS(select 1 from t2 where p) */
+    notExpr->boolop = NOT_EXPR;
+    notExpr->args = list_make1(exists);
+    notExpr->location = -1;
+
+    return (Node *) notExpr;
+}
+
+/*
+ *Allow transformation from NOT IN query to ANTI JOIN if ALL of the
+ * following conditions are true:
+ * 1. The GUC enable_not_in_transform is set to true.
+ * 2. the NOT IN subquery is not hashable, in which case an expensive
+ *        subplan will be generated if we don't transform.
+ * 3.. the subquery does not define any CTE.
+ */
+static bool
+allow_NOT_IN_transformation(PlannerInfo *root,
+                                                      SubLink *sublink)
+{
+    Query            *subselect = (Query *) sublink->subselect;
+    PlannerInfo     *subroot;
+    double             tuple_fraction;
+    RelOptInfo        *final_rel;
+    Path            *best_path;
+    Plan            *plan;
+
+    if(! enable_not_in_transform)
+        return false;
+
+    /*
+     * Can't flatten if it contains WITH.  (We could arrange to pull up the
+     * WITH into the parent query's cteList, but that risks changing the
+     * semantics, since a WITH ought to be executed once per associated query
+     * call.)  Note that convert_ANY_sublink_to_join doesn't have to reject
+     * this case, since it just produces a subquery RTE that doesn't have to
+     * get flattened into the parent query.
+     */
+    if(subselect->cteList)
+        return false;
+
+    /* For ALL and ANY subplans, we will be
+     * able to stop evaluating if the test condition fails or matches, so very
+     * often not all the tuples will be retrieved; for lack of a better idea,
+     * specify 50% retrieval.
+     */
+    tuple_fraction = 0.5;
+    /*
+     * Generate Paths for the subquery, use a copied version of the subquery
+     * so that the existing one doesn't get modified.
+     */
+    subroot = subquery_planner(root->glob, copyObject(subselect),
+                                                    root, false, tuple_fraction);
+
+    /* Select best Path and turn it into a Plan. */
+    final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
+    best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
+
+    plan = create_plan(subroot, best_path);
+
+    sublink->subroot = (Node *) subroot;
+    sublink->subplan = (Node *) plan;
+    /*
+     * Punt if subplan is hashable since using hashed subplan is almost like
+     * doing a Hash Anti Join, we probably can't do better than that.
+     */
+    if(subplan_is_hashable(plan))
+    {
+        return false;
+    }
+
+    return true;
+}
+
+/*
+ * do the following NOT IN to ANTI JOIN conversions:
+ *
+ * When x is non-nullable:
+ * t1.x not in (t2.y where p) => ANTI JOIN
+ * t1, t2 on join condition (t1.x=t2.y or t2.y IS NULL) and p.
+ * the above predicate "t2.y IS NULL" can be removed if y
+ * is also non-nullable.
+ *
+ * When x is nullable:
+ * t1.x not in (t2.y where p) => ANTI JOIN
+ * t1 (Filter: t1.x is not null or not exists (select 1 from t2 where p)),
+ * t2 on join condition (t1.x=t2.y or t2.y is null) and p.
+ * the above predicate "t2.y IS NULL" can be removed if y
+ * is also non-nullable.
+ *
+ * The multi-expression case is just ANDs of the single-
+ * expression case.
+ */
+static bool
+convert_NOT_IN_to_join(PlannerInfo *root, Node **quals,
+                                    SubLink *sublink, List *subquery_vars,
+                                    Node **pullout)
+{
+    Query            *parse = root->parse;
+    Query            *subselect = (Query *) sublink->subselect;
+    List            *testnodes = getTestExpr(sublink);
+    bool             outerNonNull;
+    bool             innerNonNull;
+    NullTest         *nt;
+
+    /*
+     * Try reduce_outer_joins since outer join affects the nullability test that's coming up next.
+     * We have to call reduce_outer_joins for outer and inner query separately because we
+     * don't have a global range table yet.
+     */
+    reduce_outer_joins_NOT_IN(parse);
+
+    reduce_outer_joins_NOT_IN(subselect);
+
+    Assert(testnodes);
+    outerNonNull =
+                    list_hasnonnullable(testnodes, parse);
+    innerNonNull =
+                    list_hasnonnullable(subselect->targetList, subselect);
+
+    /* Single-expression case, do the following:
+     * When x is non-nullable:
+     * t1.x not in (t2.y where p) => ANTI JOIN
+     * t1, t2 on join condition (t1.x=t2.y or t2.y IS NULL) and p.
+     * the above predicate "t2.y IS NULL" can be removed if y
+     * is also non-nullable.
+     *
+     * When x is nullable:
+     * t1.x not in (t2.y where p) => ANTI JOIN
+     * t1 (Filter: t1.x is not null or not exists (select 1 from t2 where p)),
+     * t2 on join condition (t1.x=t2.y or t2.y is null) and p.
+     * the above predicate "t2.y IS NULL" can be removed if y
+     * is also non-nullable.
+     */
+    if(IsA(*quals, OpExpr))
+    {
+        /*  Add "OR y IS NULL" if y is nullable */
+        if(!innerNonNull)
+        {
+            /* make expr y IS NULL */
+            nt = makeNode(NullTest);
+            nt->arg = (Expr *)linitial(subquery_vars);
+            nt->nulltesttype = IS_NULL;
+            nt->argisrow = false;
+            nt->location = -1;
+
+            /* make orclause (x = y OR y IS NULL) */
+            *quals = (Node *)make_orclause(list_make2(*quals,
+                                (Node *)nt));
+        }
+
+        /*
+         * if x is nullable, make the following filter for t1 :
+         * x IS NOT NULL or NOT EXISTS (select 1 from t2 where p)
+         */
+        if(!outerNonNull)
+        {
+            Node * existsTest = NULL;
+
+            /* make expr x IS NOT NULL */
+            nt = makeNode(NullTest);
+            nt->arg = (Expr *) linitial(testnodes);
+            nt->nulltesttype = IS_NOT_NULL;
+            nt->argisrow = false;
+            nt->location = -1;
+
+            existsTest = makeExistsTest_NOT_IN(subselect);
+            /* make x IS NOT NULL OR NOT EXISTS (select 1 from t2 where p) */
+            *pullout = (Node *)make_orclause(list_make2(
+                                    (Node *) nt, existsTest));
+        }
+    }
+    /*
+     * Multi-expression case:
+     * If all xi's are nullable:
+     * (x1, x2, ... xn) not in (y1, y2, ... yn ) =>
+     * ANTI JOIN t1,
+     * t2 on join condition:
+     * ((t1.x1 = t2.y1) and ... (t1.xi = t2.yi) ... and
+     * (t1.xn = t2.yn)) is NOT FALSE.
+     *
+     * If at least one xi is non-nuallable:
+     * (x1, x2, ... xn) not in (y1, y2, ... yn ) =>
+     * ANTI JOIN t1,
+     * t2 on join condition:
+     * (t1.x1 = t2.y1 or t2.y1 is NULL) and ...
+     * (t1.xi = t2.yi or t2.yi is NULL or t1.xi is NULL) ... and
+     * (t1.xn = t2.yn or t2.yn is NULL).
+     */
+    else if(IsA(*quals, BoolExpr))
+    {
+        /*
+         * Add IS NOT FALSE on top of the join condition if ALL x_i's are nullable
+         */
+        if(!outerNonNull)
+        {
+            BooleanTest *btest;
+
+            btest = makeNode(BooleanTest);
+            btest->arg = (Expr *) *quals;
+            btest->booltesttype = IS_NOT_FALSE;
+            *quals = (Node *) btest;
+        }
+        else
+        {
+            ListCell            *qualc;
+            TargetEntry    *te;
+            ListCell            *xc = list_head(testnodes);
+            ListCell            *yc = list_head(subquery_vars);
+            ListCell            *ytlc = list_head(subselect->targetList);
+            List                *quallist = ((BoolExpr *)*quals)->args;
+            Node                *joinCondition = NULL;
+            bool                 xnonNull;
+            bool                 ynonNull;
+
+            /* Reconstruct quals in the loop */
+            *quals = NULL;
+            foreach(qualc, quallist)
+            {
+                te = (TargetEntry *)lfirst(ytlc);
+                /* x_i = y_i */
+                joinCondition = lfirst(qualc);
+                ynonNull = is_node_nonnullable((Node*)te, subselect);
+
+                /* append y_i IS NULL to x_i = y_i if y_i is non-nullable */
+                if(!ynonNull)
+                {
+                    /* make expr y_i IS NULL */
+                    nt = makeNode(NullTest);
+                    nt->arg = (Expr *)lfirst(yc);
+                    nt->nulltesttype = IS_NULL;
+                    nt->argisrow = false;
+                    nt->location = -1;
+
+                    /* make orclause (x_i = y_i OR y_i IS NULL) */
+                    joinCondition = (Node *)make_orclause(list_make2(joinCondition,
+                                            (Node *)nt));
+                }
+
+                /*
+                 * Append "OR x_i is null" to the join condition if x_i is nullable.
+                 * Notice at least one x_i should be non-nullable because the all
+                 * x_i's nullable case is handled earlier by adding "IS NOT FALSE"
+                 * on top of the join condition.
+                 */
+                xnonNull = is_node_nonnullable(lfirst(xc), parse);
+                if(!xnonNull)
+                {
+                    /* make expr x_i IS NULL */
+                    nt = makeNode(NullTest);
+                    nt->arg = (Expr *)lfirst(xc);
+                    nt->nulltesttype = IS_NULL;
+                    nt->argisrow = false;
+                    nt->location = -1;
+
+                    /* make orclause (x_i = y_i OR y_i IS NULL OR x_i IS NULL) */
+                    joinCondition = (Node *)make_orclause(list_make2(joinCondition,
+                                            (Node *)nt));
+                }
+
+                /*
+                 * Now append joinCondition to quals as one andclause.
+                 * (x_i = y_i OR y_i IS NULL OR x_i IS NULL) AND
+                 * (x_j = y_j OR y_j IS NULL OR x_j IS NULL)...
+                 */
+                *quals = (Node *)make_andclause(list_make2(*quals, joinCondition));
+                xc = lnext(testnodes, xc);
+                yc = lnext(subquery_vars, yc);
+                ytlc = lnext(subselect->targetList, ytlc);
+            }
+        }
+    }
+    /* quals should be either OpExpr or BoolExpr, otherwise don't convert */
+    else
+    {
+        return false;
+    }
+
+    return true;
+}

 /*
  * convert_ANY_sublink_to_join: try to convert an ANY SubLink to a join
@@ -1210,7 +1606,8 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context)
  */
 JoinExpr *
 convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
-                            Relids available_rels)
+                                            bool under_not, Node **pullout,
+                                            Relids available_rels)
 {
     JoinExpr   *result;
     Query       *parse = root->parse;
@@ -1254,6 +1651,12 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
     if (contain_volatile_functions(sublink->testexpr))
         return NULL;

+    if (under_not &&
+        ! allow_NOT_IN_transformation(root, sublink))
+    {
+        return NULL;
+    }
+
     /* Create a dummy ParseState for addRangeTableEntryForSubquery */
     pstate = make_parsestate(NULL);

@@ -1293,10 +1696,28 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
     quals = convert_testexpr(root, sublink->testexpr, subquery_vars);

     /*
+     * Try converting x NOT IN (y) to ANTI JOIN.
+     */
+    if(under_not &&
+            !convert_NOT_IN_to_join(root, &quals,
+                                sublink, subquery_vars, pullout))
+    {
+        /*
+         * In theory, we shouldn't get here since allow_NOT_IN_transformation()
+         * has already ruled out cases that shouldn't be transformed. In other words,
+         * I expect convert_NOT_IN_to_join to always return true, but just in case
+         * it fails, reset parse->rtable which has been changed a few lines above.
+         */
+        parse->rtable = list_delete(parse->rtable, rte);
+        return NULL;
+    }
+
+    /*
      * And finally, build the JoinExpr node.
      */
     result = makeNode(JoinExpr);
-    result->jointype = JOIN_SEMI;
+    /* NOT IN will be converted to ANTI JOIN */
+    result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI;
     result->isNatural = false;
     result->larg = NULL;        /* caller must fill this in */
     result->rarg = (Node *) rtr;
@@ -1885,9 +2306,7 @@ process_sublinks_mutator(Node *node, process_sublinks_context *context)
          * Now build the SubPlan node and make the expr to return.
          */
         return make_subplan(context->root,
-                            (Query *) sublink->subselect,
-                            sublink->subLinkType,
-                            sublink->subLinkId,
+                            sublink,
                             testexpr,
                             context->isTopQual);
     }
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 1452172..cd1557c 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -113,7 +113,7 @@ static Query *pullup_replace_vars_subquery(Query *query,
 static reduce_outer_joins_state *reduce_outer_joins_pass1(Node *jtnode);
 static void reduce_outer_joins_pass2(Node *jtnode,
                                      reduce_outer_joins_state *state,
-                                     PlannerInfo *root,
+                                     Query *parse,
                                      Relids nonnullable_rels,
                                      List *nonnullable_vars,
                                      List *forced_null_vars);
@@ -267,6 +267,13 @@ pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
         }
         /* Build the replacement FromExpr; no quals yet */
         newf = makeFromExpr(newfromlist, NULL);
+        /*
+         * Replace parse->jointree with newf now because we might modify join types
+         * during reduce_outer_joins() in convert_NOT_IN_to_join() which is called
+         * in pull_up_sublinks_qual_recurse() that's coming up next.
+         */
+        newf->quals = f->quals;
+        root->parse->jointree = newf;
         /* Set up a link representing the rebuilt jointree */
         jtlink = (Node *) newf;
         /* Now process qual --- all children are available for use */
@@ -399,7 +406,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,
         /* Is it a convertible ANY or EXISTS clause? */
         if (sublink->subLinkType == ANY_SUBLINK)
         {
-            if ((j = convert_ANY_sublink_to_join(root, sublink,
+            if ((j = convert_ANY_sublink_to_join(root, sublink, false, NULL,
                                                  available_rels1)) != NULL)
             {
                 /* Yes; insert the new join node into the join tree */
@@ -425,7 +432,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,
                 return NULL;
             }
             if (available_rels2 != NULL &&
-                (j = convert_ANY_sublink_to_join(root, sublink,
+                (j = convert_ANY_sublink_to_join(root, sublink, false, NULL,
                                                  available_rels2)) != NULL)
             {
                 /* Yes; insert the new join node into the join tree */
@@ -571,6 +578,68 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,
                     return NULL;
                 }
             }
+            else if (sublink->subLinkType == ANY_SUBLINK)
+            {
+                Node *pullout = NULL;
+
+                if ((j = convert_ANY_sublink_to_join(root, sublink, true, &pullout,
+                                                     available_rels1)) != NULL)
+                {
+                    /* Yes; insert the new join node into the join tree */
+                    j->larg = *jtlink1;
+                    *jtlink1 = (Node *) j;
+                    /* Recursively process pulled-up jointree nodes */
+                    j->rarg = pull_up_sublinks_jointree_recurse(root,
+                                                                j->rarg,
+                                                                &child_rels);
+
+                    /*
+                     * Now recursively process the pulled-up quals.  Any inserted
+                     * joins can get stacked onto either j->larg or j->rarg,
+                     * depending on which rels they reference.
+                     */
+                    j->quals = pull_up_sublinks_qual_recurse(root,
+                                                             j->quals,
+                                                             &j->larg,
+                                                             available_rels1,
+                                                             &j->rarg,
+                                                             child_rels);
+                    /*
+                     * Return pullout predicate (x is NOT NULL) if it's not null,
+                     * otherwise return NULL representing constant TRUE.
+                     */
+                    return pullout? pullout : NULL;
+                }
+                if (available_rels2 != NULL &&
+                    (j = convert_ANY_sublink_to_join(root, sublink, true, &pullout,
+                                                     available_rels2)) != NULL)
+                {
+                    /* Yes; insert the new join node into the join tree */
+                    j->larg = *jtlink2;
+                    *jtlink2 = (Node *) j;
+                    /* Recursively process pulled-up jointree nodes */
+                    j->rarg = pull_up_sublinks_jointree_recurse(root,
+                                                                j->rarg,
+                                                                &child_rels);
+
+                    /*
+                     * Now recursively process the pulled-up quals.  Any inserted
+                     * joins can get stacked onto either j->larg or j->rarg,
+                     * depending on which rels they reference.
+                     */
+                    j->quals = pull_up_sublinks_qual_recurse(root,
+                                                             j->quals,
+                                                             &j->larg,
+                                                             available_rels2,
+                                                             &j->rarg,
+                                                             child_rels);
+                    /*
+                     * Return pullout predicate (x is NOT NULL) if it's not null,
+                     * otherwise return NULL representing constant TRUE.
+                     */
+                    return pullout? pullout : NULL;
+                }
+            }
         }
         /* Else return it unmodified */
         return node;
@@ -909,7 +978,13 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     subroot->parse = subquery;
     subroot->glob = root->glob;
     subroot->query_level = root->query_level;
-    subroot->parent_root = root->parent_root;
+    /*
+     * Keep a path to the top level root so that we can recursively access top level
+     * CTEs in root->parse->cteList, and CTE plans in root->init_plans. This hack
+     * won't change the original PlannerInfo tree structure because subroot is just
+     * a auxiliary PlannerInfo to help pulling up subquery.
+     */
+    subroot->parent_root = root;
     subroot->plan_params = NIL;
     subroot->outer_params = NULL;
     subroot->planner_cxt = CurrentMemoryContext;
@@ -2558,7 +2633,7 @@ flatten_simple_union_all(PlannerInfo *root)
  * alias-var expansion).
  */
 void
-reduce_outer_joins(PlannerInfo *root)
+reduce_outer_joins(Query *parse)
 {
     reduce_outer_joins_state *state;

@@ -2571,14 +2646,14 @@ reduce_outer_joins(PlannerInfo *root)
      * join(s) below each side of each join clause. The second pass examines
      * qual clauses and changes join types as it descends the tree.
      */
-    state = reduce_outer_joins_pass1((Node *) root->parse->jointree);
+    state = reduce_outer_joins_pass1((Node *) parse->jointree);

     /* planner.c shouldn't have called me if no outer joins */
     if (state == NULL || !state->contains_outer)
         elog(ERROR, "so where are the outer joins?");

-    reduce_outer_joins_pass2((Node *) root->parse->jointree,
-                             state, root, NULL, NIL, NIL);
+    reduce_outer_joins_pass2((Node *) parse->jointree,
+                             state, parse, NULL, NIL, NIL);
 }

 /*
@@ -2661,7 +2736,7 @@ reduce_outer_joins_pass1(Node *jtnode)
 static void
 reduce_outer_joins_pass2(Node *jtnode,
                          reduce_outer_joins_state *state,
-                         PlannerInfo *root,
+                         Query *parse,
                          Relids nonnullable_rels,
                          List *nonnullable_vars,
                          List *forced_null_vars)
@@ -2700,7 +2775,7 @@ reduce_outer_joins_pass2(Node *jtnode,
             reduce_outer_joins_state *sub_state = lfirst(s);

             if (sub_state->contains_outer)
-                reduce_outer_joins_pass2(lfirst(l), sub_state, root,
+                reduce_outer_joins_pass2(lfirst(l), sub_state, parse,
                                          pass_nonnullable_rels,
                                          pass_nonnullable_vars,
                                          pass_forced_null_vars);
@@ -2812,7 +2887,7 @@ reduce_outer_joins_pass2(Node *jtnode,
         /* Apply the jointype change, if any, to both jointree node and RTE */
         if (rtindex && jointype != j->jointype)
         {
-            RangeTblEntry *rte = rt_fetch(rtindex, root->parse->rtable);
+            RangeTblEntry *rte = rt_fetch(rtindex, parse->rtable);

             Assert(rte->rtekind == RTE_JOIN);
             Assert(rte->jointype == j->jointype);
@@ -2897,7 +2972,7 @@ reduce_outer_joins_pass2(Node *jtnode,
                     pass_nonnullable_vars = NIL;
                     pass_forced_null_vars = NIL;
                 }
-                reduce_outer_joins_pass2(j->larg, left_state, root,
+                reduce_outer_joins_pass2(j->larg, left_state, parse,
                                          pass_nonnullable_rels,
                                          pass_nonnullable_vars,
                                          pass_forced_null_vars);
@@ -2919,7 +2994,7 @@ reduce_outer_joins_pass2(Node *jtnode,
                     pass_nonnullable_vars = NIL;
                     pass_forced_null_vars = NIL;
                 }
-                reduce_outer_joins_pass2(j->rarg, right_state, root,
+                reduce_outer_joins_pass2(j->rarg, right_state, parse,
                                          pass_nonnullable_rels,
                                          pass_nonnullable_vars,
                                          pass_forced_null_vars);
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 0c6fe01..46a9877 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -38,10 +38,12 @@
 #include "optimizer/optimizer.h"
 #include "optimizer/plancat.h"
 #include "optimizer/planmain.h"
+#include "optimizer/pathnode.h"
 #include "parser/analyze.h"
 #include "parser/parse_agg.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_func.h"
+#include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "tcop/tcopprot.h"
 #include "utils/acl.h"
@@ -2013,6 +2015,271 @@ find_forced_null_var(Node *node)
 }

 /*
+ * find_innerjoined_rels
+ *        Traverse jointree to locate non-outerjoined-rels and quals above them
+ *
+ * We fill innerjoined_rels with the relids of all rels that are not below
+ * the nullable side of any outer join (which would cause their Vars to be
+ * possibly NULL regardless of what's in the catalogs).  In the same scan,
+ * we locate all WHERE and JOIN/ON quals that constrain these rels add them to
+ * the usable_quals list (forming a list with implicit-AND semantics).
+ *
+ * Top-level caller must initialize innerjoined_rels/usable_quals to NULL/NIL.
+ */
+static void
+find_innerjoined_rels(Node *jtnode,
+                      Relids *innerjoined_rels, List **usable_quals)
+{
+    if (jtnode == NULL)
+        return;
+    if (IsA(jtnode, RangeTblRef))
+    {
+        int            varno = ((RangeTblRef *) jtnode)->rtindex;
+
+        *innerjoined_rels = bms_add_member(*innerjoined_rels, varno);
+    }
+    else if (IsA(jtnode, FromExpr))
+    {
+        FromExpr   *f = (FromExpr *) jtnode;
+        ListCell   *lc;
+
+        /* All elements of the FROM list are allowable */
+        foreach(lc, f->fromlist)
+            find_innerjoined_rels((Node *) lfirst(lc),
+                                  innerjoined_rels, usable_quals);
+        /* ... and its WHERE quals are too */
+        if (f->quals)
+            *usable_quals = lappend(*usable_quals, f->quals);
+    }
+    else if (IsA(jtnode, JoinExpr))
+    {
+        JoinExpr   *j = (JoinExpr *) jtnode;
+
+        switch (j->jointype)
+        {
+            case JOIN_INNER:
+                /* visit both children */
+                find_innerjoined_rels(j->larg,
+                                      innerjoined_rels, usable_quals);
+                find_innerjoined_rels(j->rarg,
+                                      innerjoined_rels, usable_quals);
+                /* and grab the ON quals too */
+                if (j->quals)
+                    *usable_quals = lappend(*usable_quals, j->quals);
+                break;
+
+            case JOIN_LEFT:
+            case JOIN_SEMI:
+            case JOIN_ANTI:
+
+                /*
+                 * Only the left input is possibly non-nullable; furthermore,
+                 * the quals of this join don't constrain the left input.
+                 * Note: we probably can't see SEMI or ANTI joins at this
+                 * point, but if we do, we can treat them like LEFT joins.
+                 */
+                find_innerjoined_rels(j->larg,
+                                      innerjoined_rels, usable_quals);
+                break;
+
+            case JOIN_RIGHT:
+                /* Reverse of the above case */
+                find_innerjoined_rels(j->rarg,
+                                      innerjoined_rels, usable_quals);
+                break;
+
+            case JOIN_FULL:
+                /* Neither side is non-nullable, so stop descending */
+                break;
+
+            case JOIN_UNIQUE_OUTER:
+            case JOIN_UNIQUE_INNER:
+                /* Don't think we will see JOIN_UNIQUE_OUTER or
+                 * JOIN_UNIQUE_INNER since they are only used internally in
+                 * the planner in a much later phase (in standard_join_search).
+                */
+                break;
+
+            default:
+                elog(ERROR, "unrecognized join type: %d",
+                     (int) j->jointype);
+        }
+    }
+    else
+        elog(ERROR, "unrecognized node type: %d",
+             (int) nodeTag(jtnode));
+}
+
+/*
+ * Returns true if the Node passed in is nonnullable. Currently handles Var,
+ * TargetEntry, CoaleseExpr and Const.
+ * TODO: Add more supporting cases.
+ * A Var is nonnullable if:
+ *    It does not appear on the null-padded side of an outer join and it has NOT NULL constraint,
+ *     Or if it's forced non-null by inner join or other strict predicates.
+ * A CoalesceExpr is nonnullable if it has a non-null argument.
+ */
+bool
+is_node_nonnullable(Node * node, Query *parse)
+{
+    AttrNumber     attno = InvalidAttrNumber;
+    Oid                 reloid;
+    Var                 *var = NULL;
+
+    /*
+     * If the query contains set operations, punt.  The set ops themselves
+     * couldn't introduce nulls that weren't in their inputs, but the tlist
+     * present in the top-level query is just dummy and won't give us useful
+     * info.  We could get an answer by recursing to examine each leaf query,
+     * but for the moment it doesn't seem worth the extra complication.
+     *
+     * Note that we needn't consider other top-level operators such as
+     * DISTINCT, GROUP BY, etc, as those will not introduce nulls either.
+     */
+    if (parse->setOperations)
+        return false;
+
+    switch (nodeTag(node))
+    {
+        case T_Var:
+        {
+            RangeTblEntry    *rte;
+
+            var = (Var *) node;
+            attno = var->varattno;
+            rte = rt_fetch(var->varno, parse->rtable);
+            reloid = rte->relid;
+            break;
+        }
+        case T_TargetEntry:
+        {
+            TargetEntry    *te = (TargetEntry *)node;
+            switch(nodeTag(te->expr))
+            {
+                case T_Var:
+                {
+                    var = (Var *) te->expr;
+                    attno = te->resorigcol;
+                    reloid = te->resorigtbl;
+                    break;
+                }
+                /* recurse into is_node_nonnullable for other types of Node */
+                default:
+                    return is_node_nonnullable((Node *)te->expr, parse);
+            }
+            break;
+        }
+        case T_CoalesceExpr:
+        {
+            ListCell               *arg;
+            CoalesceExpr    *cexpr = (CoalesceExpr *)node;
+
+            /* handle COALESCE Function by looking for non-null argument */
+            foreach(arg, cexpr->args)
+            {
+                Node    *e = lfirst(arg);
+
+                /* recurse into is_node_nonnullable */
+                if (is_node_nonnullable(e, parse))
+                {
+                    return true;
+                }
+            }
+            break;
+        }
+        case T_Const:
+        {
+            if(!((Const *) node)->constisnull)
+            {
+                return true;
+            }
+            break;
+        }
+        /*
+         * TODO: handle more cases to make the nullability test more accurate
+         * Assume unhandled cases are nullable.
+         */
+        default:
+            return false;
+    }
+
+    /*
+     * If we have found a Var, it is non-null if:
+     * not on NULL-padded side of an outer join and has NOT NULL constraint
+     * or forced NOT NULL by inner join conditions or by other strict predicates.
+     */
+    if(var &&
+            reloid != InvalidOid)
+    {
+        Relids         innerjoined_rels = NULL;
+        List               *innerjoined_useful_quals = NIL;
+        List            *nonnullable_innerjoined_vars = NIL;
+
+        find_innerjoined_rels((Node *) parse->jointree,
+                              &innerjoined_rels,
+                              &innerjoined_useful_quals);
+
+        /*
+         * Check if the Var is from an INNER JOINed rel, it's also guaranteed
+         * to not be on the null-padded side of an outer join.
+         */
+        if (bms_is_member(var->varno, innerjoined_rels))
+        {
+            /*
+             * If Var is from a plain relation and its column is marked
+             * NOT NULL according to the catalogs, it can't produce NULL.
+             */
+            if (get_attnotnull(reloid, attno))
+            {
+                return true;
+            }
+
+            /*
+             * Otherwise check for the existance of strict predicates which filter
+             * out NULL values for this Var.
+             */
+            nonnullable_innerjoined_vars =
+                find_nonnullable_vars((Node *) innerjoined_useful_quals);
+
+            if (list_member(nonnullable_innerjoined_vars, var))
+            {
+                return true;
+            }
+        }
+
+        /*
+         * If we get here it means -
+         * The var is on null-padded side of an outer-join, since we've already
+         * tried reduce_outer_joins(), there isn't any strict predicates to turn this
+         * outer join to inner join, then there will be no strict predicates to force
+         * this var non-null as well.
+         */
+    }
+
+    return false;
+}
+
+/*
+ * Returns true if at least one Node in the input list is non-nullable
+ */
+bool
+list_hasnonnullable(List * list, Query *parse)
+{
+    ListCell    *lc;
+    Node    *node;
+
+    foreach(lc, list)
+    {
+        node = lfirst(lc);
+        if(is_node_nonnullable(node, parse))
+        {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
  * Can we treat a ScalarArrayOpExpr as strict?
  *
  * If "falseOK" is true, then a "false" result can be considered strict,
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 27bbb58..61ecac7 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -934,6 +934,35 @@ get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok)
     return oid;
 }

+/*
+ * get_attnotnull
+ *        Given the relation id and the attribute number,
+ *        return the "attnotnull" field from the attribute relation.
+ */
+bool
+get_attnotnull(Oid relid, AttrNumber attnum)
+{
+    HeapTuple      tp;
+    Form_pg_attribute att_tup;
+
+    tp = SearchSysCache2(ATTNUM,
+            ObjectIdGetDatum(relid),
+            Int16GetDatum(attnum));
+
+    if (HeapTupleIsValid(tp))
+    {
+        bool result;
+
+        att_tup = (Form_pg_attribute) GETSTRUCT(tp);
+        result = att_tup->attnotnull;
+        ReleaseSysCache(tp);
+
+        return result;
+    }
+    /* Assume att is nullable if no valid heap tuple is found */
+    return false;
+}
+
 /*                ---------- COLLATION CACHE ----------                     */

 /*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index af876d1..29e3fac 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1122,6 +1122,15 @@ static struct config_bool ConfigureNamesBool[] =
         NULL, NULL, NULL
     },
     {
+        {"enable_not_in_transform", PGC_USERSET, QUERY_TUNING_METHOD,
+            gettext_noop("Enables the planner to transform NOT IN subquery to ANTI JOIN when possible."),
+            NULL
+        },
+        &enable_not_in_transform,
+        true,
+        NULL, NULL, NULL
+    },
+    {
         {"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
             gettext_noop("Enables genetic query optimization."),
             gettext_noop("This algorithm attempts to do planning without "
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index d73be2a..12ee3a7 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -657,6 +657,8 @@ typedef struct SubLink
     List       *operName;        /* originally specified operator name */
     Node       *subselect;        /* subselect as Query* or raw parsetree */
     int            location;        /* token location, or -1 if unknown */
+    Node       *subroot;        /* PlannerInfo for the subquery */
+    Node       *subplan;        /* best Plan for the subquery */
 } SubLink;

 /*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index b7456e3..5078cdd 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -45,6 +45,9 @@ extern List *find_nonnullable_vars(Node *clause);
 extern List *find_forced_null_vars(Node *clause);
 extern Var *find_forced_null_var(Node *clause);

+extern bool is_node_nonnullable(Node * node, Query *parse);
+extern bool list_hasnonnullable(List * list, Query *parse);
+
 extern bool is_pseudo_constant_clause(Node *clause);
 extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids);

diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 735ba09..d103415 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -60,6 +60,7 @@ extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
 extern PGDLLIMPORT bool enable_mergejoin;
 extern PGDLLIMPORT bool enable_hashjoin;
+extern PGDLLIMPORT bool enable_not_in_transform;
 extern PGDLLIMPORT bool enable_gathermerge;
 extern PGDLLIMPORT bool enable_partitionwise_join;
 extern PGDLLIMPORT bool enable_partitionwise_aggregate;
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 19c9230..7f807c4 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -26,7 +26,7 @@ extern void pull_up_sublinks(PlannerInfo *root);
 extern void preprocess_function_rtes(PlannerInfo *root);
 extern void pull_up_subqueries(PlannerInfo *root);
 extern void flatten_simple_union_all(PlannerInfo *root);
-extern void reduce_outer_joins(PlannerInfo *root);
+extern void reduce_outer_joins(Query *parse);
 extern void remove_useless_result_rtes(PlannerInfo *root);
 extern Relids get_relids_in_jointree(Node *jtnode, bool include_joins);
 extern Relids get_relids_for_join(Query *query, int joinrelid);
diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h
index d6a872b..a4d309e 100644
--- a/src/include/optimizer/subselect.h
+++ b/src/include/optimizer/subselect.h
@@ -19,6 +19,8 @@
 extern void SS_process_ctes(PlannerInfo *root);
 extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root,
                                              SubLink *sublink,
+                                             bool under_not,
+                                             Node **pullout,
                                              Relids available_rels);
 extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root,
                                                 SubLink *sublink,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4e646c5..1e585aa 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -90,6 +90,7 @@ extern char get_attgenerated(Oid relid, AttrNumber attnum);
 extern Oid    get_atttype(Oid relid, AttrNumber attnum);
 extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
                                   Oid *typid, int32 *typmod, Oid *collid);
+extern bool get_attnotnull(Oid relid, AttrNumber attnum);
 extern Oid    get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok);
 extern char *get_collation_name(Oid colloid);
 extern bool get_collation_isdeterministic(Oid colloid);
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 4c6cd5f..9a54f35 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1614,3 +1614,2789 @@ select * from x for update;
    Output: subselect_tbl.f1, subselect_tbl.f2, subselect_tbl.f3
 (2 rows)

+-- test NON IN to ANTI JOIN conversion
+CREATE TABLE s (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL);
+insert into s (u, n, nn, p)
+    select
+    generate_series(1,3) as u,
+    generate_series(1,3) as n,
+    generate_series(1,3) as nn,
+    'foo' as p;
+insert into s values(1000002, 1000002, 1000002, 'foofoo');
+UPDATE s set n = NULL WHERE n = 3;
+analyze s;
+CREATE TABLE l (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL);
+insert into l (u, n, nn, p)
+    select
+    generate_series(1,10000 ) as u,
+    generate_series(1,10000 ) as n,
+    generate_series(1,10000 ) as nn,
+    'bar' as p;
+UPDATE l set n = NULL WHERE n = 7;
+CREATE UNIQUE INDEX l_u ON l (u);
+CREATE INDEX l_n ON l (n);
+CREATE INDEX l_nn ON l (nn);
+analyze l;
+CREATE TABLE s1 (u INTEGER NOT NULL, n INTEGER NULL, n1 INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL);
+insert into s1 (u, n, n1, nn, p)
+    select
+    generate_series(1,3) as u,
+    generate_series(1,3) as n,
+    generate_series(1,3) as n1,
+    generate_series(1,3) as nn,
+    'foo' as p;
+insert into s1 values(1000003, 1000003, 1000003, 1000003, 'foofoo');
+insert into s1 values(1003, 1003, 1003, 1003, 'foofoo');
+UPDATE s1 set n = NULL WHERE n = 3;
+UPDATE s1 set n1 = NULL WHERE n = 2;
+UPDATE s1 set n1 = NULL WHERE n1 = 3;
+analyze s1;
+CREATE TABLE empty (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL);
+analyze empty;
+-- set work_mem to 64KB so that NOT IN to ANTI JOIN optimization will kick in
+set work_mem = 64;
+-- correctness test 1: inner empty, return every thing from outer including NULL
+explain (costs false) select * from s where n not in (select n from empty);
+             QUERY PLAN
+------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on empty
+(4 rows)
+
+select * from s where n not in (select n from empty);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+       1 |       1 |       1 | foo
+       2 |       2 |       2 | foo
+ 1000002 | 1000002 | 1000002 | foofoo
+       3 |         |       3 | foo
+(4 rows)
+
+-- correctness test 2: inner has NULL, return empty result
+explain (costs false) select * from s where n not in (select n from l);
+                    QUERY PLAN
+--------------------------------------------------
+ Nested Loop Anti Join
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+   ->  Seq Scan on s
+         Filter: ((n IS NOT NULL) OR (NOT $0))
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: ((s.n = n) OR (n IS NULL))
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = s.n)
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(12 rows)
+
+select * from s where n not in (select n from l);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+-- correctness test 3: inner non-null, result has no NULL
+explain (costs false) select * from s where n not in (select u from l);
+                  QUERY PLAN
+-----------------------------------------------
+ Nested Loop Anti Join
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+   ->  Seq Scan on s
+         Filter: ((n IS NOT NULL) OR (NOT $0))
+   ->  Index Only Scan using l_u on l
+         Index Cond: (u = s.n)
+(7 rows)
+
+select * from s where n not in (select u from l);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+-- correctness test 4: inner has predicate
+explain (costs false) select * from s where n not in (select n from l where u > 7);
+                    QUERY PLAN
+--------------------------------------------------
+ Nested Loop Anti Join
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+           Filter: (u > 7)
+   ->  Seq Scan on s
+         Filter: ((n IS NOT NULL) OR (NOT $0))
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: ((s.n = n) OR (n IS NULL))
+         Filter: (u > 7)
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = s.n)
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(14 rows)
+
+select * from s where n not in (select n from l where u > 7);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+       1 |       1 |       1 | foo
+       2 |       2 |       2 | foo
+ 1000002 | 1000002 | 1000002 | foofoo
+(3 rows)
+
+-- correctness test 5: multi-expression, (2, 2, null, 2, foo) should be in the result
+explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u >= 3);
+                           QUERY PLAN
+-----------------------------------------------------------------
+ Nested Loop Anti Join
+   Join Filter: (((s1.n = l.u) AND (s1.n1 = l.nn)) IS NOT FALSE)
+   ->  Seq Scan on s1
+   ->  Materialize
+         ->  Seq Scan on l
+               Filter: (u >= 3)
+(6 rows)
+
+select * from s1 where (n,n1) not in (select u,nn from l where u >= 3);
+    u    |    n    |   n1    |   nn    |   p
+---------+---------+---------+---------+--------
+       1 |       1 |       1 |       1 | foo
+ 1000003 | 1000003 | 1000003 | 1000003 | foofoo
+       2 |       2 |         |       2 | foo
+(3 rows)
+
+-- correctness test 6: multi-expression, (3, null, null, 3, foo) should not be in the result
+explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u > 0);
+                           QUERY PLAN
+-----------------------------------------------------------------
+ Nested Loop Anti Join
+   Join Filter: (((s1.n = l.u) AND (s1.n1 = l.nn)) IS NOT FALSE)
+   ->  Seq Scan on s1
+   ->  Materialize
+         ->  Seq Scan on l
+               Filter: (u > 0)
+(6 rows)
+
+select * from s1 where (n,n1) not in (select u,nn from l where u > 0);
+    u    |    n    |   n1    |   nn    |   p
+---------+---------+---------+---------+--------
+ 1000003 | 1000003 | 1000003 | 1000003 | foofoo
+(1 row)
+
+-- correctness test 6: multi-expression, (3, null, null, 3, foo) should be in the result
+explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u < 0);
+             QUERY PLAN
+------------------------------------
+ Seq Scan on s1
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Index Scan using l_u on l
+           Index Cond: (u < 0)
+(5 rows)
+
+select * from s1 where (n,n1) not in (select u,nn from l where u < 0);
+    u    |    n    |   n1    |   nn    |   p
+---------+---------+---------+---------+--------
+       1 |       1 |       1 |       1 | foo
+ 1000003 | 1000003 | 1000003 | 1000003 | foofoo
+    1003 |    1003 |    1003 |    1003 | foofoo
+       2 |       2 |         |       2 | foo
+       3 |         |         |       3 | foo
+(5 rows)
+
+-- test using hashed subplan when inner fits in work_mem
+explain (costs false) select * from l where n not in (select n from s);
+             QUERY PLAN
+------------------------------------
+ Seq Scan on l
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on s
+(4 rows)
+
+select * from l where n not in (select n from s);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+-- test single expression
+explain (costs false) select * from s where n not in (select n from l);
+                    QUERY PLAN
+--------------------------------------------------
+ Nested Loop Anti Join
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+   ->  Seq Scan on s
+         Filter: ((n IS NOT NULL) OR (NOT $0))
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: ((s.n = n) OR (n IS NULL))
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = s.n)
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(12 rows)
+
+select * from s where n not in (select n from l);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where u not in (select u from l);
+              QUERY PLAN
+--------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Index Only Scan using l_u on l
+         Index Cond: (u = s.u)
+(4 rows)
+
+select * from s where u not in (select u from l);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where 3*n not in (select n from l);
+                       QUERY PLAN
+--------------------------------------------------------
+ Nested Loop Anti Join
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+   ->  Seq Scan on s
+         Filter: (((3 * n) IS NOT NULL) OR (NOT $0))
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: (((3 * s.n) = n) OR (n IS NULL))
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = (3 * s.n))
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(12 rows)
+
+select * from s where 3*n not in (select n from l);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where n not in (select 3*n from l);
+                        QUERY PLAN
+-----------------------------------------------------------
+ Nested Loop Anti Join
+   Join Filter: ((s.n = (3 * l.n)) OR ((3 * l.n) IS NULL))
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+   ->  Seq Scan on s
+         Filter: ((n IS NOT NULL) OR (NOT $0))
+   ->  Materialize
+         ->  Seq Scan on l
+(8 rows)
+
+select * from s where n not in (select 3*n from l);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+-- test single expression with predicates
+explain (costs false) select * from s where n not in (select n from l where u > 0);
+                    QUERY PLAN
+--------------------------------------------------
+ Nested Loop Anti Join
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+           Filter: (u > 0)
+   ->  Seq Scan on s
+         Filter: ((n IS NOT NULL) OR (NOT $0))
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: ((s.n = n) OR (n IS NULL))
+         Filter: (u > 0)
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = s.n)
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(14 rows)
+
+select * from s where n not in (select n from l where u > 0);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where n not in (select n from l where u > 100);
+                    QUERY PLAN
+--------------------------------------------------
+ Nested Loop Anti Join
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+           Filter: (u > 100)
+   ->  Seq Scan on s
+         Filter: ((n IS NOT NULL) OR (NOT $0))
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: ((s.n = n) OR (n IS NULL))
+         Filter: (u > 100)
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = s.n)
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(14 rows)
+
+select * from s where n not in (select n from l where u > 100);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+       1 |       1 |       1 | foo
+       2 |       2 |       2 | foo
+ 1000002 | 1000002 | 1000002 | foofoo
+(3 rows)
+
+-- test multi expression
+explain (costs false) select * from s where (n,u) not in (select n,u from l);
+                         QUERY PLAN
+-------------------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Index Scan using l_u on l
+         Index Cond: (u = s.u)
+         Filter: ((s.n = n) OR (n IS NULL) OR (s.n IS NULL))
+(5 rows)
+
+select * from s where (n,u) not in (select n,u from l);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where (u, nn) not in (select u, nn from l);
+            QUERY PLAN
+----------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Index Scan using l_nn on l
+         Index Cond: (nn = s.nn)
+         Filter: (s.u = u)
+(5 rows)
+
+select * from s where (u, nn) not in (select u, nn from l);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where (n,u) not in (select u,n from l);
+                    QUERY PLAN
+--------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: ((s.u = n) OR (n IS NULL))
+         Filter: ((s.n = u) OR (s.n IS NULL))
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = s.u)
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(10 rows)
+
+select * from s where (n,u) not in (select u,n from l);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l);
+                                  QUERY PLAN
+-------------------------------------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Index Scan using l_nn on l
+         Index Cond: (nn = s.nn)
+         Filter: (((s.n = u) OR (s.n IS NULL)) AND ((s.u = n) OR (n IS NULL)))
+(5 rows)
+
+select * from s where (n,u,nn) not in (select u,n,nn from l);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1000);
+                                          QUERY PLAN
+----------------------------------------------------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Index Scan using l_nn on l
+         Index Cond: (nn = s.nn)
+         Filter: ((u > 1000) AND ((s.n = u) OR (s.n IS NULL)) AND ((s.u = n) OR (n IS NULL)))
+(5 rows)
+
+select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1000);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+       1 |       1 |       1 | foo
+       2 |       2 |       2 | foo
+ 1000002 | 1000002 | 1000002 | foofoo
+       3 |         |       3 | foo
+(4 rows)
+
+explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 0);
+                                        QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Index Scan using l_nn on l
+         Index Cond: (nn = s.nn)
+         Filter: ((u > 0) AND ((s.n = u) OR (s.n IS NULL)) AND ((s.u = n) OR (n IS NULL)))
+(5 rows)
+
+select * from s where (n,u,nn) not in (select u,n,nn from l where u > 0);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1);
+                                        QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Index Scan using l_nn on l
+         Index Cond: (nn = s.nn)
+         Filter: ((u > 1) AND ((s.n = u) OR (s.n IS NULL)) AND ((s.u = n) OR (n IS NULL)))
+(5 rows)
+
+select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+       1 |       1 |       1 | foo
+ 1000002 | 1000002 | 1000002 | foofoo
+(2 rows)
+
+-- test multi-table
+explain (costs false) select count(*) from s, l where s.n not in (select n from l);
+                          QUERY PLAN
+--------------------------------------------------------------
+ Aggregate
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_2
+   ->  Nested Loop
+         ->  Nested Loop Anti Join
+               ->  Seq Scan on s
+                     Filter: ((n IS NOT NULL) OR (NOT $0))
+               ->  Bitmap Heap Scan on l l_1
+                     Recheck Cond: ((s.n = n) OR (n IS NULL))
+                     ->  BitmapOr
+                           ->  Bitmap Index Scan on l_n
+                                 Index Cond: (n = s.n)
+                           ->  Bitmap Index Scan on l_n
+                                 Index Cond: (n IS NULL)
+         ->  Seq Scan on l
+(15 rows)
+
+select count(*) from s, l where s.n not in (select n from l);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s, l where s.nn not in (select nn from l);
+                      QUERY PLAN
+-------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         ->  Nested Loop Anti Join
+               ->  Seq Scan on s
+               ->  Index Only Scan using l_nn on l l_1
+                     Index Cond: (nn = s.nn)
+         ->  Seq Scan on l
+(7 rows)
+
+select count(*) from s, l where s.nn not in (select nn from l);
+ count
+-------
+ 10000
+(1 row)
+
+-- test null padded results from outer join
+explain (costs false) select * from s where n not in (select s.nn from l left join s on l.nn = s.nn);
+                     QUERY PLAN
+-----------------------------------------------------
+ Nested Loop Anti Join
+   Join Filter: ((s.n = s_1.nn) OR (s_1.nn IS NULL))
+   InitPlan 1 (returns $0)
+     ->  Nested Loop Left Join
+           Join Filter: (l_1.nn = s_2.nn)
+           ->  Seq Scan on l l_1
+           ->  Materialize
+                 ->  Seq Scan on s s_2
+   ->  Seq Scan on s
+         Filter: ((n IS NOT NULL) OR (NOT $0))
+   ->  Hash Left Join
+         Hash Cond: (l.nn = s_1.nn)
+         ->  Seq Scan on l
+         ->  Hash
+               ->  Seq Scan on s s_1
+(15 rows)
+
+select * from s where n not in (select s.nn from l left join s on l.nn = s.nn);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where n not in (select s.nn from s right join l on s.nn = l.nn);
+                     QUERY PLAN
+-----------------------------------------------------
+ Nested Loop Anti Join
+   Join Filter: ((s.n = s_1.nn) OR (s_1.nn IS NULL))
+   InitPlan 1 (returns $0)
+     ->  Nested Loop Left Join
+           Join Filter: (s_2.nn = l_1.nn)
+           ->  Seq Scan on l l_1
+           ->  Materialize
+                 ->  Seq Scan on s s_2
+   ->  Seq Scan on s
+         Filter: ((n IS NOT NULL) OR (NOT $0))
+   ->  Hash Left Join
+         Hash Cond: (l.nn = s_1.nn)
+         ->  Seq Scan on l
+         ->  Hash
+               ->  Seq Scan on s s_1
+(15 rows)
+
+select * from s where n not in (select s.nn from s right join l on s.nn = l.nn);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn where l.nn not in (select nn from s);
+                   QUERY PLAN
+------------------------------------------------
+ Aggregate
+   ->  Hash Left Join
+         Hash Cond: (l.nn = s.nn)
+         ->  Seq Scan on l
+               Filter: (NOT (hashed SubPlan 1))
+               SubPlan 1
+                 ->  Seq Scan on s s_1
+         ->  Hash
+               ->  Seq Scan on s
+(9 rows)
+
+select count(*) from s right join l on s.nn = l.nn where l.nn not in (select nn from s);
+ count
+-------
+  9997
+(1 row)
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from s);
+                QUERY PLAN
+------------------------------------------
+ Aggregate
+   ->  Hash Left Join
+         Hash Cond: (l.nn = s.nn)
+         Filter: (NOT (hashed SubPlan 1))
+         ->  Seq Scan on l
+         ->  Hash
+               ->  Seq Scan on s
+         SubPlan 1
+           ->  Seq Scan on s s_1
+(9 rows)
+
+select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from s);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s right join l on s.nn=l.nn where l.nn not in (select l.nn from l left join
son l.nn = s.nn); 
+                       QUERY PLAN
+--------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: (s.nn = l.nn)
+         ->  Hash Anti Join
+               Hash Cond: (l.nn = l_1.nn)
+               ->  Seq Scan on l
+               ->  Hash
+                     ->  Hash Left Join
+                           Hash Cond: (l_1.nn = s_1.nn)
+                           ->  Seq Scan on l l_1
+                           ->  Hash
+                                 ->  Seq Scan on s s_1
+         ->  Seq Scan on s
+(13 rows)
+
+select count(*) from s right join l on s.nn=l.nn where l.nn not in (select l.nn from l left join s on l.nn = s.nn);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s right join l on s.nn=l.nn where s.nn not in (select s.nn from l left join
son l.nn = s.nn); 
+                         QUERY PLAN
+------------------------------------------------------------
+ Aggregate
+   InitPlan 1 (returns $0)
+     ->  Nested Loop Left Join
+           Join Filter: (l_2.nn = s_2.nn)
+           ->  Seq Scan on l l_2
+           ->  Materialize
+                 ->  Seq Scan on s s_2
+   ->  Nested Loop Anti Join
+         Join Filter: ((s.nn = s_1.nn) OR (s_1.nn IS NULL))
+         ->  Hash Left Join
+               Hash Cond: (l.nn = s.nn)
+               Filter: ((s.nn IS NOT NULL) OR (NOT $0))
+               ->  Seq Scan on l
+               ->  Hash
+                     ->  Seq Scan on s
+         ->  Materialize
+               ->  Hash Left Join
+                     Hash Cond: (l_1.nn = s_1.nn)
+                     ->  Seq Scan on l l_1
+                     ->  Hash
+                           ->  Seq Scan on s s_1
+(21 rows)
+
+select count(*) from s right join l on s.nn=l.nn where s.nn not in (select s.nn from l left join s on l.nn = s.nn);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s left join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn
froml); 
+                         QUERY PLAN
+-------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         ->  Nested Loop Left Join
+               Join Filter: (s.u = s1.u)
+               ->  Nested Loop Anti Join
+                     ->  Seq Scan on s
+                     ->  Index Only Scan using l_nn on l l_1
+                           Index Cond: (nn = s.nn)
+               ->  Seq Scan on s1
+         ->  Index Only Scan using l_u on l
+               Index Cond: (u = s.u)
+(11 rows)
+
+select count(*) from s left join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s left join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in
(selectnn from l); 
+                       QUERY PLAN
+--------------------------------------------------------
+ Aggregate
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_2
+   ->  Hash Anti Join
+         Hash Cond: (s.nn = l_1.nn)
+         ->  Hash Left Join
+               Hash Cond: (l.u = s.u)
+               Filter: ((s.nn IS NOT NULL) OR (NOT $0))
+               ->  Seq Scan on l
+               ->  Hash
+                     ->  Hash Right Join
+                           Hash Cond: (s1.u = s.u)
+                           ->  Seq Scan on s1
+                           ->  Hash
+                                 ->  Seq Scan on s
+         ->  Hash
+               ->  Seq Scan on l l_1
+(17 rows)
+
+select count(*) from s left join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn from l);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s left join s1 on s.u=s1.u left join l on s.u=l.u where s.nn not in (select
nnfrom l); 
+                    QUERY PLAN
+---------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: (s.u = s1.u)
+         ->  Nested Loop Anti Join
+               ->  Seq Scan on s
+               ->  Index Only Scan using l_nn on l
+                     Index Cond: (nn = s.nn)
+         ->  Seq Scan on s1
+(8 rows)
+
+select count(*) from s left join s1 on s.u=s1.u left join l on s.u=l.u where s.nn not in (select nn from l);
+ count
+-------
+     1
+(1 row)
+
+explain (costs false) select count(*) from s right join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn
froml); 
+                         QUERY PLAN
+-------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         ->  Nested Loop
+               Join Filter: (s.u = s1.u)
+               ->  Nested Loop Anti Join
+                     ->  Seq Scan on s
+                     ->  Index Only Scan using l_nn on l l_1
+                           Index Cond: (nn = s.nn)
+               ->  Seq Scan on s1
+         ->  Index Only Scan using l_u on l
+               Index Cond: (u = s.u)
+(11 rows)
+
+select count(*) from s right join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn
froml); 
+                       QUERY PLAN
+--------------------------------------------------------
+ Aggregate
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_2
+   ->  Hash Anti Join
+         Hash Cond: (s.nn = l_1.nn)
+         ->  Hash Left Join
+               Hash Cond: (l.u = s.u)
+               Filter: ((s.nn IS NOT NULL) OR (NOT $0))
+               ->  Seq Scan on l
+               ->  Hash
+                     ->  Hash Join
+                           Hash Cond: (s1.u = s.u)
+                           ->  Seq Scan on s1
+                           ->  Hash
+                                 ->  Seq Scan on s
+         ->  Hash
+               ->  Seq Scan on l l_1
+(17 rows)
+
+select * from s join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn from l);
+ u | n | nn | p | u | n | n1 | nn | p | u | n | nn | p
+---+---+----+---+---+---+----+----+---+---+---+----+---
+(0 rows)
+
+explain (costs false) select count(*) from s full join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn
froml); 
+                         QUERY PLAN
+-------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         ->  Nested Loop Left Join
+               Join Filter: (s.u = s1.u)
+               ->  Nested Loop Anti Join
+                     ->  Seq Scan on s
+                     ->  Index Only Scan using l_nn on l l_1
+                           Index Cond: (nn = s.nn)
+               ->  Seq Scan on s1
+         ->  Index Only Scan using l_u on l
+               Index Cond: (u = s.u)
+(11 rows)
+
+select count(*) from s full join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s join s1 on s.u=s1.u full join l on s.u=l.u where s.nn not in (select nn
froml); 
+                       QUERY PLAN
+--------------------------------------------------------
+ Aggregate
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_2
+   ->  Hash Anti Join
+         Hash Cond: (s.nn = l_1.nn)
+         ->  Hash Full Join
+               Hash Cond: (l.u = s.u)
+               Filter: ((s.nn IS NOT NULL) OR (NOT $0))
+               ->  Seq Scan on l
+               ->  Hash
+                     ->  Hash Join
+                           Hash Cond: (s1.u = s.u)
+                           ->  Seq Scan on s1
+                           ->  Hash
+                                 ->  Seq Scan on s
+         ->  Hash
+               ->  Seq Scan on l l_1
+(17 rows)
+
+select count(*) from s join s1 on s.u=s1.u full join l on s.u=l.u where s.nn not in (select nn from l);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn left join s1 on
l.nn=s1.nn);
+                    QUERY PLAN
+---------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Nested Loop Left Join
+         Join Filter: (l.nn = s1.nn)
+         ->  Nested Loop Left Join
+               Join Filter: (l.nn = s_1.nn)
+               ->  Index Only Scan using l_nn on l
+                     Index Cond: (nn = s.nn)
+               ->  Seq Scan on s s_1
+         ->  Seq Scan on s1
+(10 rows)
+
+select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn left join s1 on l.nn=s1.nn);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn right join s1 on
l.nn=s1.nn);
+                     QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop Left Join
+           Join Filter: (l.nn = s_1.nn)
+           ->  Nested Loop Left Join
+                 ->  Seq Scan on s1
+                 ->  Index Only Scan using l_nn on l
+                       Index Cond: (nn = s1.nn)
+           ->  Materialize
+                 ->  Seq Scan on s s_1
+(11 rows)
+
+select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn right join s1 on l.nn=s1.nn);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn);
+                                    QUERY PLAN
+-----------------------------------------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Nested Loop Left Join
+         Join Filter: (l.nn = s_1.nn)
+         ->  Index Scan using l_nn on l
+               Index Cond: (nn = s.nn)
+               Filter: (((s.n = n) OR (n IS NULL) OR (s.n IS NULL)) AND (s.u = u))
+         ->  Seq Scan on s s_1
+(8 rows)
+
+select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l right join s on l.nn = s.nn);
+                QUERY PLAN
+-------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop Left Join
+           ->  Seq Scan on s s_1
+           ->  Index Scan using l_nn on l
+                 Index Cond: (nn = s_1.nn)
+(7 rows)
+
+select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+--test reduce outer joins from outer query
+explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l);
+                       QUERY PLAN
+--------------------------------------------------------
+ Aggregate
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_2
+   ->  Hash Anti Join
+         Hash Cond: (s.nn = l_1.nn)
+         ->  Hash Left Join
+               Hash Cond: (l.nn = s.nn)
+               Filter: ((s.nn IS NOT NULL) OR (NOT $0))
+               ->  Seq Scan on l
+               ->  Hash
+                     ->  Seq Scan on s
+         ->  Hash
+               ->  Seq Scan on l l_1
+(13 rows)
+
+select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l) and
s.u>0;
+                      QUERY PLAN
+-------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         ->  Nested Loop Anti Join
+               ->  Seq Scan on s
+                     Filter: (u > 0)
+               ->  Index Only Scan using l_nn on l l_1
+                     Index Cond: (nn = s.nn)
+         ->  Index Only Scan using l_nn on l
+               Index Cond: (nn = s.nn)
+(9 rows)
+
+select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l) and s.u>0;
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in
(selectnn from l); 
+                         QUERY PLAN
+-------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (s.u = s1.u)
+         ->  Nested Loop
+               ->  Nested Loop Anti Join
+                     ->  Seq Scan on s
+                     ->  Index Only Scan using l_nn on l l_1
+                           Index Cond: (nn = s.nn)
+               ->  Index Only Scan using l_nn on l
+                     Index Cond: (nn = s.nn)
+         ->  Seq Scan on s1
+(11 rows)
+
+select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (select nn from l);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn right join s1 on s.u = s1.u where s.nn not in
(selectnn from l); 
+                          QUERY PLAN
+---------------------------------------------------------------
+ Aggregate
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_2
+   ->  Nested Loop Anti Join
+         ->  Nested Loop Left Join
+               Join Filter: (s.u = s1.u)
+               Filter: ((s.nn IS NOT NULL) OR (NOT $0))
+               ->  Seq Scan on s1
+               ->  Materialize
+                     ->  Nested Loop
+                           ->  Seq Scan on s
+                           ->  Index Only Scan using l_nn on l
+                                 Index Cond: (nn = s.nn)
+         ->  Index Only Scan using l_nn on l l_1
+               Index Cond: (nn = s.nn)
+(15 rows)
+
+select count(*) from s right join l on s.nn = l.nn right join s1 on s.u = s1.u where s.nn not in (select nn from l);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in
(selectnn from l); 
+                          QUERY PLAN
+--------------------------------------------------------------
+ Aggregate
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_2
+   ->  Nested Loop Left Join
+         Join Filter: (s.u = s1.u)
+         ->  Hash Anti Join
+               Hash Cond: (s.nn = l_1.nn)
+               ->  Hash Left Join
+                     Hash Cond: (l.nn = s.nn)
+                     Filter: ((s.nn IS NOT NULL) OR (NOT $0))
+                     ->  Seq Scan on l
+                     ->  Hash
+                           ->  Seq Scan on s
+               ->  Hash
+                     ->  Seq Scan on l l_1
+         ->  Seq Scan on s1
+(16 rows)
+
+select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (select nn from l);
+ count
+-------
+     0
+(1 row)
+
+--test reduce outer joins from subquery
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn);
+                  QUERY PLAN
+-----------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop Left Join
+           ->  Seq Scan on s s_1
+           ->  Index Only Scan using l_nn on l
+                 Index Cond: (nn = s_1.nn)
+(7 rows)
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where l.u > 9);
+                QUERY PLAN
+-------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop
+           ->  Seq Scan on s s_1
+           ->  Index Scan using l_nn on l
+                 Index Cond: (nn = s_1.nn)
+                 Filter: (u > 9)
+(8 rows)
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where l.u > 9);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+       1 |       1 |       1 | foo
+       2 |       2 |       2 | foo
+ 1000002 | 1000002 | 1000002 | foofoo
+       3 |         |       3 | foo
+(4 rows)
+
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where s.u > 9);
+                  QUERY PLAN
+-----------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop Left Join
+           ->  Seq Scan on s s_1
+                 Filter: (u > 9)
+           ->  Index Only Scan using l_nn on l
+                 Index Cond: (nn = s_1.nn)
+(8 rows)
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where s.u > 9);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn join s1 on l.n =
s1.n);
+                      QUERY PLAN
+-------------------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop
+           Join Filter: (l.n = s1.n)
+           ->  Seq Scan on s1
+           ->  Materialize
+                 ->  Nested Loop
+                       ->  Seq Scan on s s_1
+                       ->  Index Scan using l_nn on l
+                             Index Cond: (nn = s_1.nn)
+(11 rows)
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn join s1 on l.n = s1.n);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+       3 |         |       3 | foo
+(2 rows)
+
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn right join s1 on
l.n= s1.n); 
+                      QUERY PLAN
+-------------------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop Left Join
+           Join Filter: (l.n = s1.n)
+           ->  Seq Scan on s1
+           ->  Materialize
+                 ->  Nested Loop
+                       ->  Seq Scan on s s_1
+                       ->  Index Scan using l_nn on l
+                             Index Cond: (nn = s_1.nn)
+(11 rows)
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn right join s1 on l.n = s1.n);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn left join s1 on
l.n= s1.n); 
+                   QUERY PLAN
+-------------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop Left Join
+           Join Filter: (l.n = s1.n)
+           ->  Nested Loop Left Join
+                 ->  Seq Scan on s s_1
+                 ->  Index Scan using l_nn on l
+                       Index Cond: (nn = s_1.nn)
+           ->  Materialize
+                 ->  Seq Scan on s1
+(11 rows)
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn left join s1 on l.n = s1.n);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+--test reduce outer join on outer and sub-query
+explain (costs false) select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in
(selectl.nn from l right join s on l.nn = s.nn join s1 on l.n = s1.n); 
+                                    QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (s.u = s1.u)
+         ->  Seq Scan on s1
+         ->  Materialize
+               ->  Nested Loop
+                     ->  Seq Scan on s
+                           Filter: (NOT (hashed SubPlan 1))
+                           SubPlan 1
+                             ->  Nested Loop
+                                   Join Filter: (l_1.n = s1_1.n)
+                                   ->  Seq Scan on s1 s1_1
+                                   ->  Materialize
+                                         ->  Nested Loop
+                                               ->  Seq Scan on s s_1
+                                               ->  Index Scan using l_nn on l l_1
+                                                     Index Cond: (nn = s_1.nn)
+                     ->  Index Only Scan using l_nn on l
+                           Index Cond: (nn = s.nn)
+(19 rows)
+
+select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (select l.nn from l right
joins on l.nn = s.nn join s1 on l.n = s1.n); 
+ count
+-------
+     1
+(1 row)
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in
(selectl.nn from l right join s on l.nn = s.nn left join s1 on l.n = s1.n); 
+                        QUERY PLAN
+----------------------------------------------------------
+ Aggregate
+   ->  Hash Left Join
+         Hash Cond: (l.nn = s.nn)
+         Filter: (NOT (hashed SubPlan 1))
+         ->  Seq Scan on l
+         ->  Hash
+               ->  Hash Right Join
+                     Hash Cond: (s1.u = s.u)
+                     ->  Seq Scan on s1
+                     ->  Hash
+                           ->  Seq Scan on s
+         SubPlan 1
+           ->  Nested Loop Left Join
+                 Join Filter: (l_1.n = s1_1.n)
+                 ->  Nested Loop Left Join
+                       ->  Seq Scan on s s_1
+                       ->  Index Scan using l_nn on l l_1
+                             Index Cond: (nn = s_1.nn)
+                 ->  Materialize
+                       ->  Seq Scan on s1 s1_1
+(20 rows)
+
+select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (select l.nn from l
rightjoin s on l.nn = s.nn left join s1 on l.n = s1.n); 
+ count
+-------
+     0
+(1 row)
+
+-- test union all
+explain (costs false) select * from s as t where not exists
+(select 1 from (select n as y from l union all
+                select u as y from s union all
+                select nn as y from s) as v where t.n=v.y or v.y is null) and n is not null;
+                       QUERY PLAN
+--------------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s t
+         Filter: (n IS NOT NULL)
+   ->  Append
+         ->  Bitmap Heap Scan on l
+               Recheck Cond: ((t.n = n) OR (n IS NULL))
+               ->  BitmapOr
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n = t.n)
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n IS NULL)
+         ->  Seq Scan on s
+               Filter: ((t.n = u) OR (u IS NULL))
+         ->  Seq Scan on s s_1
+               Filter: ((t.n = nn) OR (nn IS NULL))
+(15 rows)
+
+select * from s as t where not exists
+(select 1 from (select n as y from l union all
+                select u as y from s union all
+                select nn as y from s) as v where t.n=v.y or v.y is null) and n is not null;
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where n not in
+(select n as y from l union all
+ select u as y from s union all
+ select nn as y from s);
+                       QUERY PLAN
+--------------------------------------------------------
+ Nested Loop Anti Join
+   InitPlan 1 (returns $0)
+     ->  Result
+           ->  Append
+                 ->  Seq Scan on l l_1
+                 ->  Seq Scan on s s_3
+                 ->  Seq Scan on s s_4
+   ->  Seq Scan on s
+         Filter: ((n IS NOT NULL) OR (NOT $0))
+   ->  Append
+         ->  Bitmap Heap Scan on l
+               Recheck Cond: ((s.n = n) OR (n IS NULL))
+               ->  BitmapOr
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n = s.n)
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n IS NULL)
+         ->  Seq Scan on s s_1
+               Filter: ((s.n = u) OR (u IS NULL))
+         ->  Seq Scan on s s_2
+               Filter: ((s.n = nn) OR (nn IS NULL))
+(21 rows)
+
+select * from s where n not in
+(select n as y from l union all
+ select u as y from s union all
+ select nn as y from s);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select count(*) from
+(select n as x from s union all select u as x from l) t where t.x not in
+(select nn from l);
+                        QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_2
+   ->  Hash Anti Join
+         Hash Cond: (s.n = l_1.nn)
+         ->  Append
+               ->  Seq Scan on s
+                     Filter: ((n IS NOT NULL) OR (NOT $0))
+               ->  Seq Scan on l
+                     Filter: ((u IS NOT NULL) OR (NOT $0))
+         ->  Hash
+               ->  Seq Scan on l l_1
+(12 rows)
+
+select count(*) from
+(select n as x from s union all select u as x from l) t where t.x not in
+(select nn from l);
+ count
+-------
+     1
+(1 row)
+
+explain (costs false) select count(*) from
+(select n as x from s union all select n as x from l) t where t.x not in
+(select nn from empty);
+                   QUERY PLAN
+------------------------------------------------
+ Aggregate
+   ->  Append
+         ->  Seq Scan on s
+               Filter: (NOT (hashed SubPlan 1))
+               SubPlan 1
+                 ->  Seq Scan on empty
+         ->  Seq Scan on l
+               Filter: (NOT (hashed SubPlan 1))
+(8 rows)
+
+select count(*) from
+(select n as x from s union all select n as x from l) t where t.x not in
+(select nn from empty);
+ count
+-------
+ 10004
+(1 row)
+
+explain (costs false) select count(*) from
+(select n as x from s union all select u as x from l) t where t.x not in
+(select n as y from l union all
+ select u as y from s union all
+ select nn as y from s);
+                                QUERY PLAN
+--------------------------------------------------------------------------
+ Finalize Aggregate
+   InitPlan 1 (returns $0)
+     ->  Result
+           ->  Append
+                 ->  Seq Scan on l l_2
+                 ->  Seq Scan on s s_3
+                 ->  Seq Scan on s s_4
+   ->  Gather
+         Workers Planned: 2
+         Params Evaluated: $0
+         ->  Partial Aggregate
+               ->  Nested Loop Anti Join
+                     ->  Parallel Append
+                           ->  Parallel Seq Scan on l
+                                 Filter: ((u IS NOT NULL) OR (NOT $0))
+                           ->  Parallel Seq Scan on s
+                                 Filter: ((n IS NOT NULL) OR (NOT $0))
+                     ->  Append
+                           ->  Bitmap Heap Scan on l l_1
+                                 Recheck Cond: ((l.u = n) OR (n IS NULL))
+                                 ->  BitmapOr
+                                       ->  Bitmap Index Scan on l_n
+                                             Index Cond: (n = l.u)
+                                       ->  Bitmap Index Scan on l_n
+                                             Index Cond: (n IS NULL)
+                           ->  Seq Scan on s s_1
+                                 Filter: ((l.u = u) OR (u IS NULL))
+                           ->  Seq Scan on s s_2
+                                 Filter: ((l.u = nn) OR (nn IS NULL))
+(29 rows)
+
+select count(*) from
+(select n as x from s union all select u as x from l) t where t.x not in
+(select n as y from l union all
+ select u as y from s union all
+ select nn as y from s);
+ count
+-------
+     0
+(1 row)
+
+-- test multi-levels of NOT IN
+explain (costs false) select * from s where n not in (select n from s where n not in (select n from l));
+                         QUERY PLAN
+------------------------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 2))
+   SubPlan 2
+     ->  Nested Loop Anti Join
+           InitPlan 1 (returns $0)
+             ->  Seq Scan on l
+           ->  Seq Scan on s s_1
+                 Filter: ((n IS NOT NULL) OR (NOT $0))
+           ->  Bitmap Heap Scan on l l_1
+                 Recheck Cond: ((s_1.n = n) OR (n IS NULL))
+                 ->  BitmapOr
+                       ->  Bitmap Index Scan on l_n
+                             Index Cond: (n = s_1.n)
+                       ->  Bitmap Index Scan on l_n
+                             Index Cond: (n IS NULL)
+(15 rows)
+
+select * from s where n not in (select n from s where n not in (select n from l));
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+       1 |       1 |       1 | foo
+       2 |       2 |       2 | foo
+ 1000002 | 1000002 | 1000002 | foofoo
+       3 |         |       3 | foo
+(4 rows)
+
+explain (costs false) select * from s where n not in (select n from s where n not in (select u from l));
+                      QUERY PLAN
+-------------------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 2))
+   SubPlan 2
+     ->  Nested Loop Anti Join
+           InitPlan 1 (returns $0)
+             ->  Seq Scan on l
+           ->  Seq Scan on s s_1
+                 Filter: ((n IS NOT NULL) OR (NOT $0))
+           ->  Index Only Scan using l_u on l l_1
+                 Index Cond: (u = s_1.n)
+(10 rows)
+
+select * from s where n not in (select n from s where n not in (select u from l));
+ u | n | nn |  p
+---+---+----+-----
+ 1 | 1 |  1 | foo
+ 2 | 2 |  2 | foo
+(2 rows)
+
+explain (costs false) select count(*) from s where u not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n));
+                               QUERY PLAN
+-------------------------------------------------------------------------
+ Aggregate
+   ->  Seq Scan on s
+         Filter: (NOT (SubPlan 2))
+         SubPlan 2
+           ->  Result
+                 One-Time Filter: (NOT $2)
+                 InitPlan 1 (returns $2)
+                   ->  Nested Loop Anti Join
+                         ->  Seq Scan on s1
+                               Filter: (n = s.n)
+                         ->  Bitmap Heap Scan on l
+                               Recheck Cond: ((s1.u = n) OR (n IS NULL))
+                               ->  BitmapOr
+                                     ->  Bitmap Index Scan on l_n
+                                           Index Cond: (n = s1.u)
+                                     ->  Bitmap Index Scan on l_n
+                                           Index Cond: (n IS NULL)
+                 ->  Seq Scan on s1 s1_1
+(18 rows)
+
+select count(*) from s where u not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n));
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in
(selectnn from s1); 
+                                           QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Seq Scan on s
+   Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)) AND (NOT (hashed SubPlan 3)))
+   SubPlan 1
+     ->  Seq Scan on s1
+   SubPlan 2
+     ->  Seq Scan on s1 s1_1
+   SubPlan 3
+     ->  Seq Scan on s1 s1_2
+(8 rows)
+
+select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (select nn from s1);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in
(selectnn from l); 
+                               QUERY PLAN
+-------------------------------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+         Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)))
+         SubPlan 1
+           ->  Seq Scan on s1
+         SubPlan 2
+           ->  Seq Scan on s1 s1_1
+   ->  Index Only Scan using l_nn on l
+         Index Cond: (nn = s.nn)
+(9 rows)
+
+select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (select nn from l);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select count(*) from s where u not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n))
+and nn not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n));
+                                QUERY PLAN
+---------------------------------------------------------------------------
+ Aggregate
+   ->  Seq Scan on s
+         Filter: ((NOT (SubPlan 2)) AND (NOT (SubPlan 4)))
+         SubPlan 2
+           ->  Result
+                 One-Time Filter: (NOT $2)
+                 InitPlan 1 (returns $2)
+                   ->  Nested Loop Anti Join
+                         ->  Seq Scan on s1
+                               Filter: (n = s.n)
+                         ->  Bitmap Heap Scan on l
+                               Recheck Cond: ((s1.u = n) OR (n IS NULL))
+                               ->  BitmapOr
+                                     ->  Bitmap Index Scan on l_n
+                                           Index Cond: (n = s1.u)
+                                     ->  Bitmap Index Scan on l_n
+                                           Index Cond: (n IS NULL)
+                 ->  Seq Scan on s1 s1_1
+         SubPlan 4
+           ->  Result
+                 One-Time Filter: (NOT $6)
+                 InitPlan 3 (returns $6)
+                   ->  Nested Loop Anti Join
+                         ->  Seq Scan on s1 s1_2
+                               Filter: (n = s.n)
+                         ->  Bitmap Heap Scan on l l_1
+                               Recheck Cond: ((s1_2.u = n) OR (n IS NULL))
+                               ->  BitmapOr
+                                     ->  Bitmap Index Scan on l_n
+                                           Index Cond: (n = s1_2.u)
+                                     ->  Bitmap Index Scan on l_n
+                                           Index Cond: (n IS NULL)
+                 ->  Seq Scan on s1 s1_3
+(33 rows)
+
+select count(*) from s where u not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n))
+and nn not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n));
+ count
+-------
+     0
+(1 row)
+
+--test COALESCE
+explain (costs false) select * from s where COALESCE(n, -1) not in (select COALESCE(n, -1) from l);
+                                 QUERY PLAN
+----------------------------------------------------------------------------
+ Hash Anti Join
+   Hash Cond: (COALESCE(s.n, '-1'::integer) = COALESCE(l.n, '-1'::integer))
+   ->  Seq Scan on s
+   ->  Hash
+         ->  Seq Scan on l
+(5 rows)
+
+select * from s where COALESCE(n, -1) not in (select COALESCE(n, -1) from l);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where COALESCE(n, NULL, -1) not in (select COALESCE(n, NULL, -1) from l);
+                                 QUERY PLAN
+----------------------------------------------------------------------------
+ Hash Anti Join
+   Hash Cond: (COALESCE(s.n, '-1'::integer) = COALESCE(l.n, '-1'::integer))
+   ->  Seq Scan on s
+   ->  Hash
+         ->  Seq Scan on l
+(5 rows)
+
+select * from s where COALESCE(n, NULL, -1) not in (select COALESCE(n, NULL, -1) from l);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where COALESCE(n, NULL, NULL) not in (select COALESCE(n, NULL, NULL) from l);
+                                 QUERY PLAN
+-----------------------------------------------------------------------------
+ Nested Loop Anti Join
+   Join Filter: ((COALESCE(s.n) = COALESCE(l.n)) OR (COALESCE(l.n) IS NULL))
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+   ->  Seq Scan on s
+         Filter: ((COALESCE(n) IS NOT NULL) OR (NOT $0))
+   ->  Materialize
+         ->  Seq Scan on l
+(8 rows)
+
+select * from s where COALESCE(n, NULL, NULL) not in (select COALESCE(n, NULL, NULL) from l);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where COALESCE(n, nn) not in (select COALESCE(n, nn) from l);
+                        QUERY PLAN
+----------------------------------------------------------
+ Hash Anti Join
+   Hash Cond: (COALESCE(s.n, s.nn) = COALESCE(l.n, l.nn))
+   ->  Seq Scan on s
+   ->  Hash
+         ->  Seq Scan on l
+(5 rows)
+
+select * from s where COALESCE(n, nn) not in (select COALESCE(n, nn) from l);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where COALESCE(nn, NULL) not in (select COALESCE(nn, NULL) from l);
+                   QUERY PLAN
+------------------------------------------------
+ Hash Anti Join
+   Hash Cond: (COALESCE(s.nn) = COALESCE(l.nn))
+   ->  Seq Scan on s
+   ->  Hash
+         ->  Seq Scan on l
+(5 rows)
+
+select * from s where COALESCE(nn, NULL) not in (select COALESCE(nn, NULL) from l);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where (COALESCE(n, -1), nn, COALESCE(n, u)) not in (select COALESCE(n, -1), nn,
COALESCE(n,u) from l); 
+                                                       QUERY PLAN
  

+-------------------------------------------------------------------------------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Index Scan using l_nn on l
+         Index Cond: (nn = s.nn)
+         Filter: ((COALESCE(s.n, '-1'::integer) = COALESCE(n, '-1'::integer)) AND (COALESCE(s.n, s.u) = COALESCE(n,
u)))
+(5 rows)
+
+select * from s where (COALESCE(n, -1), nn, COALESCE(n, u)) not in (select COALESCE(n, -1), nn, COALESCE(n, u) from
l);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+       3 |         |       3 | foo
+(2 rows)
+
+-- test miscellaneous outer nullable cases
+explain (costs false) select * from s where (n,n) not in (select n,n from l);
+                         QUERY PLAN
+-------------------------------------------------------------
+ Nested Loop Anti Join
+   Join Filter: (((s.n = l.n) AND (s.n = l.n)) IS NOT FALSE)
+   ->  Seq Scan on s
+   ->  Materialize
+         ->  Seq Scan on l
+(5 rows)
+
+select * from s where (n,n) not in (select n,n from l);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l);
+                                     QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop Anti Join
+   Join Filter: (((s.n = l_1.n) AND (s.u = l_1.u) AND (s.nn = l_1.nn)) IS NOT FALSE)
+   ->  Hash Left Join
+         Hash Cond: (l.nn = s.nn)
+         ->  Seq Scan on l
+         ->  Hash
+               ->  Seq Scan on s
+   ->  Materialize
+         ->  Seq Scan on l l_1
+(9 rows)
+
+select * from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l);
+ u | n | nn | p | u | n | nn | p
+---+---+----+---+---+---+----+---
+(0 rows)
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn
froml where u < 0); 
+                 QUERY PLAN
+---------------------------------------------
+ Aggregate
+   ->  Hash Left Join
+         Hash Cond: (l.nn = s.nn)
+         Filter: (NOT (hashed SubPlan 1))
+         ->  Seq Scan on l
+         ->  Hash
+               ->  Seq Scan on s
+         SubPlan 1
+           ->  Index Scan using l_u on l l_1
+                 Index Cond: (u < 0)
+(10 rows)
+
+select count(*) from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l where u < 0);
+ count
+-------
+ 10000
+(1 row)
+
+explain (costs false) select * from s where (n,n,n) not in (select distinct n,n,n from l where u > 0 limit 3) order by
n;
+                     QUERY PLAN
+-----------------------------------------------------
+ Sort
+   Sort Key: s.n
+   ->  Seq Scan on s
+         Filter: (NOT (hashed SubPlan 1))
+         SubPlan 1
+           ->  Limit
+                 ->  Unique
+                       ->  Index Scan using l_n on l
+                             Filter: (u > 0)
+(9 rows)
+
+select * from s where (n,n,n) not in (select distinct n,n,n from l where u > 0 limit 3) order by n;
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+--test outer has strict predicate or inner join
+explain (costs false) select * from s where n not in (select n from l) and n > 0;
+                    QUERY PLAN
+--------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+         Filter: (n > 0)
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: ((s.n = n) OR (n IS NULL))
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = s.n)
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(10 rows)
+
+select * from s where n not in (select n from l) and n > 0;
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where n not in (select n from l) and u > 0;
+                         QUERY PLAN
+-------------------------------------------------------------
+ Nested Loop Anti Join
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+   ->  Seq Scan on s
+         Filter: (((n IS NOT NULL) OR (NOT $0)) AND (u > 0))
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: ((s.n = n) OR (n IS NULL))
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = s.n)
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(12 rows)
+
+select * from s where n not in (select n from l) and u > 0;
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where n not in (select n from l) and n is not null;
+                    QUERY PLAN
+--------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+         Filter: (n IS NOT NULL)
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: ((s.n = n) OR (n IS NULL))
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = s.n)
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(10 rows)
+
+select * from s where n not in (select n from l) and n is not null;
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s join l on s.n = l.n where s.n not in (select n from l);
+                       QUERY PLAN
+--------------------------------------------------------
+ Nested Loop
+   ->  Nested Loop Anti Join
+         ->  Seq Scan on s
+         ->  Bitmap Heap Scan on l l_1
+               Recheck Cond: ((s.n = n) OR (n IS NULL))
+               ->  BitmapOr
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n = s.n)
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n IS NULL)
+   ->  Index Scan using l_n on l
+         Index Cond: (n = s.n)
+(12 rows)
+
+select * from s join l on s.n = l.n where s.n not in (select n from l);
+ u | n | nn | p | u | n | nn | p
+---+---+----+---+---+---+----+---
+(0 rows)
+
+explain (costs false) select count(*) from s right join l on s.n = l.n where s.n not in (select n from l);
+                       QUERY PLAN
+--------------------------------------------------------
+ Aggregate
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_2
+   ->  Nested Loop Anti Join
+         ->  Hash Left Join
+               Hash Cond: (l.n = s.n)
+               Filter: ((s.n IS NOT NULL) OR (NOT $0))
+               ->  Seq Scan on l
+               ->  Hash
+                     ->  Seq Scan on s
+         ->  Bitmap Heap Scan on l l_1
+               Recheck Cond: ((s.n = n) OR (n IS NULL))
+               ->  BitmapOr
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n = s.n)
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n IS NULL)
+(17 rows)
+
+select count(*) from s right join l on s.n = l.n where s.n not in (select n from l);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select
nfrom l); 
+                             QUERY PLAN
+--------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (s.u = s1.u)
+         ->  Nested Loop
+               ->  Nested Loop Anti Join
+                     ->  Seq Scan on s
+                     ->  Bitmap Heap Scan on l l_1
+                           Recheck Cond: ((s.n = n) OR (n IS NULL))
+                           ->  BitmapOr
+                                 ->  Bitmap Index Scan on l_n
+                                       Index Cond: (n = s.n)
+                                 ->  Bitmap Index Scan on l_n
+                                       Index Cond: (n IS NULL)
+               ->  Index Only Scan using l_n on l
+                     Index Cond: (n = s.n)
+         ->  Seq Scan on s1
+(16 rows)
+
+select count(*) from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select n from l);
+ count
+-------
+     0
+(1 row)
+
+explain (costs false) select count(*) from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select
nfrom l); 
+                          QUERY PLAN
+--------------------------------------------------------------
+ Aggregate
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_2
+   ->  Nested Loop Anti Join
+         ->  Nested Loop Left Join
+               Join Filter: (s.u = s1.u)
+               Filter: ((s.n IS NOT NULL) OR (NOT $0))
+               ->  Seq Scan on s1
+               ->  Materialize
+                     ->  Nested Loop
+                           ->  Seq Scan on s
+                           ->  Index Only Scan using l_n on l
+                                 Index Cond: (n = s.n)
+         ->  Bitmap Heap Scan on l l_1
+               Recheck Cond: ((s.n = n) OR (n IS NULL))
+               ->  BitmapOr
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n = s.n)
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n IS NULL)
+(20 rows)
+
+select count(*) from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select n from l);
+ count
+-------
+     0
+(1 row)
+
+--test inner has strict predicate or inner join
+explain (costs false) select * from s where u not in (select n from l where n > 0);
+                 QUERY PLAN
+---------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Index Only Scan using l_n on l
+         Index Cond: ((n = s.u) AND (n > 0))
+(4 rows)
+
+select * from s where u not in (select n from l where n > 0);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where u not in (select n from l where u > 0);
+                    QUERY PLAN
+--------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: ((s.u = n) OR (n IS NULL))
+         Filter: (u > 0)
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = s.u)
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(10 rows)
+
+select * from s where u not in (select n from l where u > 0);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where u not in (select n from l where n is not null);
+                     QUERY PLAN
+-----------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+   ->  Index Only Scan using l_n on l
+         Index Cond: ((n = s.u) AND (n IS NOT NULL))
+(4 rows)
+
+select * from s where u not in (select n from l where n is not null);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where u not in (select l.n from l join s on l.n=s.n);
+                  QUERY PLAN
+----------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop
+           ->  Seq Scan on s s_1
+           ->  Index Only Scan using l_n on l
+                 Index Cond: (n = s_1.n)
+(7 rows)
+
+select * from s where u not in (select l.n from l join s on l.n=s.n);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+       3 |         |       3 | foo
+(2 rows)
+
+explain (costs false) select * from s where u not in (select l.n from l join s on l.u=s.u);
+               QUERY PLAN
+-----------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop
+           ->  Seq Scan on s s_1
+           ->  Index Scan using l_u on l
+                 Index Cond: (u = s_1.u)
+(7 rows)
+
+select * from s where u not in (select l.n from l join s on l.u=s.u);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where u not in (select l.n from l join s on l.n = s.n);
+                  QUERY PLAN
+----------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop
+           ->  Seq Scan on s s_1
+           ->  Index Only Scan using l_n on l
+                 Index Cond: (n = s_1.n)
+(7 rows)
+
+select * from s where u not in (select l.n from l join s on l.n = s.n);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+       3 |         |       3 | foo
+(2 rows)
+
+explain (costs false) select * from s where u not in (select l.n from l right join s on l.n = s.n);
+                  QUERY PLAN
+----------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop Left Join
+           ->  Seq Scan on s s_1
+           ->  Index Only Scan using l_n on l
+                 Index Cond: (n = s_1.n)
+(7 rows)
+
+select * from s where u not in (select l.n from l right join s on l.n = s.n);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where u not in (select l.n from l right join s on l.n=s.n join s1 on l.n=s1.n);
+                  QUERY PLAN
+----------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop
+           ->  Nested Loop
+                 Join Filter: (s_1.n = s1.n)
+                 ->  Seq Scan on s1
+                 ->  Materialize
+                       ->  Seq Scan on s s_1
+           ->  Index Only Scan using l_n on l
+                 Index Cond: (n = s_1.n)
+(11 rows)
+
+select * from s where u not in (select l.n from l right join s on l.n=s.n join s1 on l.n=s1.n);
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+       3 |         |       3 | foo
+(2 rows)
+
+explain (costs false) select * from s where u not in (select l.n from l join s on l.n=s.n right join s1 on l.n=s1.n);
+                        QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on s
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Nested Loop Left Join
+           Join Filter: (l.n = s1.n)
+           ->  Seq Scan on s1
+           ->  Materialize
+                 ->  Nested Loop
+                       ->  Seq Scan on s s_1
+                       ->  Index Only Scan using l_n on l
+                             Index Cond: (n = s_1.n)
+(11 rows)
+
+select * from s where u not in (select l.n from l join s on l.n=s.n right join s1 on l.n=s1.n);
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+--test both sides have strict predicate or inner join
+explain (costs false) select * from s where n not in (select n from l where n > 0) and n > 0;
+                 QUERY PLAN
+---------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+         Filter: (n > 0)
+   ->  Index Only Scan using l_n on l
+         Index Cond: ((n = s.n) AND (n > 0))
+(5 rows)
+
+select * from s where n not in (select n from l where n > 0) and n > 0;
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s where n not in (select n from l where u > 0) and n > 0;
+                    QUERY PLAN
+--------------------------------------------------
+ Nested Loop Anti Join
+   ->  Seq Scan on s
+         Filter: (n > 0)
+   ->  Bitmap Heap Scan on l
+         Recheck Cond: ((s.n = n) OR (n IS NULL))
+         Filter: (u > 0)
+         ->  BitmapOr
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n = s.n)
+               ->  Bitmap Index Scan on l_n
+                     Index Cond: (n IS NULL)
+(11 rows)
+
+select * from s where n not in (select n from l where u > 0) and n > 0;
+ u | n | nn | p
+---+---+----+---
+(0 rows)
+
+explain (costs false) select * from s where n not in (select n from l where n > 0) and u > 0;
+                         QUERY PLAN
+-------------------------------------------------------------
+ Nested Loop Anti Join
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+           Filter: (n > 0)
+   ->  Seq Scan on s
+         Filter: (((n IS NOT NULL) OR (NOT $0)) AND (u > 0))
+   ->  Index Only Scan using l_n on l
+         Index Cond: ((n = s.n) AND (n > 0))
+(8 rows)
+
+select * from s where n not in (select n from l where n > 0) and u > 0;
+    u    |    n    |   nn    |   p
+---------+---------+---------+--------
+ 1000002 | 1000002 | 1000002 | foofoo
+(1 row)
+
+explain (costs false) select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n
froml right join s on l.n=s.n join s s1 on l.n=s1.n); 
+                             QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+   Join Filter: (s.u = s1.u)
+   ->  Seq Scan on s1
+   ->  Materialize
+         ->  Nested Loop
+               ->  Seq Scan on s
+                     Filter: (NOT (hashed SubPlan 1))
+                     SubPlan 1
+                       ->  Nested Loop
+                             ->  Nested Loop
+                                   Join Filter: (s_1.n = s1_1.n)
+                                   ->  Seq Scan on s s_1
+                                   ->  Materialize
+                                         ->  Seq Scan on s s1_1
+                             ->  Index Only Scan using l_n on l l_1
+                                   Index Cond: (n = s_1.n)
+               ->  Index Scan using l_n on l
+                     Index Cond: (n = s.n)
+(18 rows)
+
+select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n from l right join s on
l.n=s.njoin s s1 on l.n=s1.n); 
+ u | n | nn | p | u | n | nn | p | u | n | n1 | nn | p
+---+---+----+---+---+---+----+---+---+---+----+----+---
+(0 rows)
+
+explain (costs false) select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n
froml join s on l.n=s.n right join s s1 on l.n=s1.n); 
+                                   QUERY PLAN
+--------------------------------------------------------------------------------
+ Nested Loop
+   Join Filter: (s.u = s1.u)
+   ->  Seq Scan on s1
+   ->  Materialize
+         ->  Nested Loop
+               ->  Seq Scan on s
+                     Filter: (NOT (hashed SubPlan 1))
+                     SubPlan 1
+                       ->  Nested Loop Left Join
+                             Join Filter: (l_1.n = s1_1.n)
+                             ->  Seq Scan on s s1_1
+                             ->  Materialize
+                                   ->  Nested Loop
+                                         ->  Seq Scan on s s_1
+                                         ->  Index Only Scan using l_n on l l_1
+                                               Index Cond: (n = s_1.n)
+               ->  Index Scan using l_n on l
+                     Index Cond: (n = s.n)
+(18 rows)
+
+select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n from l join s on l.n=s.n
rightjoin s s1 on l.n=s1.n); 
+ u | n | nn | p | u | n | nn | p | u | n | n1 | nn | p
+---+---+----+---+---+---+----+---+---+---+----+----+---
+(0 rows)
+
+explain (costs false) select * from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select l.n
froml join s on l.n=s.n right join s s1 on l.n=s1.n); 
+                          QUERY PLAN
+--------------------------------------------------------------
+ Nested Loop Left Join
+   Join Filter: (s.u = s1.u)
+   Filter: (NOT (hashed SubPlan 1))
+   ->  Seq Scan on s1
+   ->  Materialize
+         ->  Nested Loop
+               ->  Seq Scan on s
+               ->  Index Scan using l_n on l
+                     Index Cond: (n = s.n)
+   SubPlan 1
+     ->  Nested Loop Left Join
+           Join Filter: (l_1.n = s1_1.n)
+           ->  Seq Scan on s s1_1
+           ->  Materialize
+                 ->  Nested Loop
+                       ->  Seq Scan on s s_1
+                       ->  Index Only Scan using l_n on l l_1
+                             Index Cond: (n = s_1.n)
+(18 rows)
+
+select * from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select l.n from l join s on l.n=s.n
rightjoin s s1 on l.n=s1.n); 
+ u | n | nn | p | u | n | nn | p | u | n | n1 | nn | p
+---+---+----+---+---+---+----+---+---+---+----+----+---
+(0 rows)
+
+--JIRA-7279 CTE with NOT IN
+create table public.testing
+(
+a integer,
+b integer,
+c integer
+);
+explain (costs false) with
+selected(a,b,c) as (values(1,2,3)),
+updated(d,e,f) as (values(4,5,6))
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated);
+                    QUERY PLAN
+---------------------------------------------------
+ Insert on testing
+   ->  Result
+         One-Time Filter: (NOT (hashed SubPlan 1))
+         SubPlan 1
+           ->  Result
+(5 rows)
+
+with
+selected(a,b,c) as (values(1,2,3)),
+updated(d,e,f) as (values(4,5,6))
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated);
+select * from public.testing;
+ a | b | c
+---+---+---
+ 1 | 2 | 3
+(1 row)
+
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated);
+                                     QUERY PLAN
+-------------------------------------------------------------------------------------
+ Insert on testing
+   ->  Nested Loop Anti Join
+         Join Filter: (((s.u = l.u) AND (s.n = l.n) AND (s.nn = l.nn)) IS NOT FALSE)
+         ->  Seq Scan on s
+         ->  Materialize
+               ->  Seq Scan on l
+(6 rows)
+
+with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated);
+select * from public.testing;
+    a    |    b    |    c
+---------+---------+---------
+       1 |       2 |       3
+ 1000002 | 1000002 | 1000002
+(2 rows)
+
+-- expect to get Hash Anti Join
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where a not in (select d from updated);
+                     QUERY PLAN
+-----------------------------------------------------
+ Insert on testing
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+   ->  Nested Loop Anti Join
+         ->  Seq Scan on s
+               Filter: ((u IS NOT NULL) OR (NOT $0))
+         ->  Index Only Scan using l_u on l
+               Index Cond: (u = s.u)
+(8 rows)
+
+with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where a not in (select d from updated);
+select * from public.testing;
+    a    |    b    |    c
+---------+---------+---------
+       1 |       2 |       3
+ 1000002 | 1000002 | 1000002
+ 1000002 | 1000002 | 1000002
+(3 rows)
+
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where b not in (select e from updated);
+                       QUERY PLAN
+--------------------------------------------------------
+ Insert on testing
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+   ->  Nested Loop Anti Join
+         ->  Seq Scan on s
+               Filter: ((n IS NOT NULL) OR (NOT $0))
+         ->  Bitmap Heap Scan on l
+               Recheck Cond: ((s.n = n) OR (n IS NULL))
+               ->  BitmapOr
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n = s.n)
+                     ->  Bitmap Index Scan on l_n
+                           Index Cond: (n IS NULL)
+(13 rows)
+
+with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where b not in (select e from updated);
+select * from public.testing;
+    a    |    b    |    c
+---------+---------+---------
+       1 |       2 |       3
+ 1000002 | 1000002 | 1000002
+ 1000002 | 1000002 | 1000002
+(3 rows)
+
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where b not in (select d from updated);
+                     QUERY PLAN
+-----------------------------------------------------
+ Insert on testing
+   InitPlan 1 (returns $0)
+     ->  Seq Scan on l l_1
+   ->  Nested Loop Anti Join
+         ->  Seq Scan on s
+               Filter: ((n IS NOT NULL) OR (NOT $0))
+         ->  Index Only Scan using l_u on l
+               Index Cond: (u = s.n)
+(8 rows)
+
+with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where b not in (select d from updated);
+select * from public.testing;
+    a    |    b    |    c
+---------+---------+---------
+       1 |       2 |       3
+ 1000002 | 1000002 | 1000002
+ 1000002 | 1000002 | 1000002
+ 1000002 | 1000002 | 1000002
+(4 rows)
+
+-- two levels of NOT IN with CTE, 2nd NOT IN
+-- subquery access CTE two levels above
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated
+      where d not in (select a from selected));
+                                               QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Insert on testing
+   CTE selected
+     ->  Seq Scan on s
+   ->  Nested Loop Anti Join
+         Join Filter: (((selected.a = l.u) AND (selected.b = l.n) AND (selected.c = l.nn)) IS NOT FALSE)
+         ->  CTE Scan on selected
+         ->  Materialize
+               ->  Seq Scan on l
+                     Filter: (NOT (hashed SubPlan 3))
+                     SubPlan 3
+                       ->  CTE Scan on selected selected_1
+(11 rows)
+
+with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated
+      where d not in (select a from selected));
+select * from public.testing;
+    a    |    b    |    c
+---------+---------+---------
+       1 |       2 |       3
+ 1000002 | 1000002 | 1000002
+ 1000002 | 1000002 | 1000002
+ 1000002 | 1000002 | 1000002
+       1 |       1 |       1
+       2 |       2 |       2
+ 1000002 | 1000002 | 1000002
+       3 |         |       3
+(8 rows)
+
+-- With clause inside a query block
+explain select count(distinct t.a) from
+(with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+select * from selected where (a,b,c) not in
+(select d,e,f from updated
+ where d not in (select a from selected))) as t;
+                                               QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Aggregate  (cost=693.75..693.76 rows=1 width=8)
+   ->  Nested Loop Anti Join  (cost=1.13..693.71 rows=3 width=12)
+         Join Filter: (((selected.a = l.u) AND (selected.b = l.n) AND (selected.c = l.nn)) IS NOT FALSE)
+         CTE selected
+           ->  Seq Scan on s  (cost=0.00..1.04 rows=4 width=12)
+         ->  CTE Scan on selected  (cost=0.00..0.08 rows=4 width=12)
+         ->  Materialize  (cost=0.09..230.09 rows=5000 width=12)
+               ->  Seq Scan on l  (cost=0.09..180.09 rows=5000 width=12)
+                     Filter: (NOT (hashed SubPlan 3))
+                     SubPlan 3
+                       ->  CTE Scan on selected selected_1  (cost=0.00..0.08 rows=4 width=4)
+(11 rows)
+
+select count(distinct t.a) from
+(with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+select * from selected where (a,b,c) not in
+(select d,e,f from updated
+ where d not in (select a from selected))) as t;
+ count
+-------
+     4
+(1 row)
+
+-- With clause in subquery, can't flatten subquery to anti join
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s)
+insert into public.testing
+select * from selected where (a,b,c) not in
+(with updated (d,e,f) as (select u, n, nn from l)
+select d,e,f from updated);
+            QUERY PLAN
+-----------------------------------
+ Insert on testing
+   ->  Seq Scan on s
+         Filter: (NOT (SubPlan 1))
+         SubPlan 1
+           ->  Materialize
+                 ->  Seq Scan on l
+(6 rows)
+
+with
+selected(a,b,c) as (select u, n, nn from s)
+insert into public.testing
+select * from selected where (a,b,c) not in
+(with updated (d,e,f) as (select u, n, nn from l)
+select d,e,f from updated);
+select * from public.testing;
+    a    |    b    |    c
+---------+---------+---------
+       1 |       2 |       3
+ 1000002 | 1000002 | 1000002
+ 1000002 | 1000002 | 1000002
+ 1000002 | 1000002 | 1000002
+       1 |       1 |       1
+       2 |       2 |       2
+ 1000002 | 1000002 | 1000002
+       3 |         |       3
+ 1000002 | 1000002 | 1000002
+(9 rows)
+
+-- With clause in subquery, subsubquery access CTE in subquery
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s)
+insert into public.testing
+select * from selected where (a,b,c) not in
+(
+with updated(d,e,f) as (select u, n, nn from l)
+select d,e,f from updated where d not in (select d from updated)
+);
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Insert on testing
+   ->  Seq Scan on s
+         Filter: (NOT (SubPlan 3))
+         SubPlan 3
+           ->  Materialize
+                 CTE updated
+                   ->  Seq Scan on l
+                 InitPlan 2 (returns $1)
+                   ->  CTE Scan on updated
+                 ->  Hash Anti Join
+                       Hash Cond: (updated_1.d = updated_2.d)
+                       ->  CTE Scan on updated updated_1
+                             Filter: ((d IS NOT NULL) OR (NOT $1))
+                       ->  Hash
+                             ->  CTE Scan on updated updated_2
+(15 rows)
+
+with
+selected(a,b,c) as (select u, n, nn from s)
+insert into public.testing
+select * from selected where (a,b,c) not in
+(
+with updated(d,e,f) as (select u, n, nn from l)
+select d,e,f from updated where d not in (select d from updated)
+);
+select * from public.testing;
+    a    |    b    |    c
+---------+---------+---------
+       1 |       2 |       3
+ 1000002 | 1000002 | 1000002
+ 1000002 | 1000002 | 1000002
+ 1000002 | 1000002 | 1000002
+       1 |       1 |       1
+       2 |       2 |       2
+ 1000002 | 1000002 | 1000002
+       3 |         |       3
+ 1000002 | 1000002 | 1000002
+       1 |       1 |       1
+       2 |       2 |       2
+ 1000002 | 1000002 | 1000002
+       3 |         |       3
+(13 rows)
+
+-- Recursive CTE
+CREATE TABLE employees (
+  id serial,
+  name varchar(255),
+  manager_id int
+);
+INSERT INTO employees VALUES (1, 'Mark', null);
+INSERT INTO employees VALUES (2, 'John', 1);
+INSERT INTO employees VALUES (3, 'Dan', 2);
+INSERT INTO employees VALUES (4, 'Clark', 1);
+INSERT INTO employees VALUES (5, 'Linda', 2);
+INSERT INTO employees VALUES (6, 'Willy', 2);
+INSERT INTO employees VALUES (7, 'Barack', 2);
+INSERT INTO employees VALUES (8, 'Elen', 2);
+INSERT INTO employees VALUES (9, 'Kate', 3);
+INSERT INTO employees VALUES (10, 'Terry', 4);
+WITH RECURSIVE managertree AS (
+  SELECT id, name, manager_id
+  FROM employees
+  WHERE id = 2
+  UNION ALL
+  SELECT e.id, e.name, e.manager_id
+  FROM employees e
+  INNER JOIN managertree mtree ON mtree.id = e.manager_id
+)
+SELECT *
+FROM managertree;
+ id |  name  | manager_id
+----+--------+------------
+  2 | John   |          1
+  3 | Dan    |          2
+  5 | Linda  |          2
+  6 | Willy  |          2
+  7 | Barack |          2
+  8 | Elen   |          2
+  9 | Kate   |          3
+(7 rows)
+
+-- NOT IN subquery access Recursive CTE
+EXPLAIN (COSTS FALSE) WITH RECURSIVE managertree AS (
+  SELECT id, name, manager_id
+  FROM employees
+  WHERE id = 2
+  UNION ALL
+  SELECT e.id, e.name, e.manager_id
+  FROM employees e
+  INNER JOIN managertree mtree ON mtree.id = e.manager_id
+)
+SELECT *
+FROM managertree mt WHERE mt.manager_id NOT IN (SELECT id FROM managertree);
+                          QUERY PLAN
+---------------------------------------------------------------
+ CTE Scan on managertree mt
+   Filter: (NOT (hashed SubPlan 2))
+   CTE managertree
+     ->  Recursive Union
+           ->  Seq Scan on employees
+                 Filter: (id = 2)
+           ->  Hash Join
+                 Hash Cond: (e.manager_id = mtree.id)
+                 ->  Seq Scan on employees e
+                 ->  Hash
+                       ->  WorkTable Scan on managertree mtree
+   SubPlan 2
+     ->  CTE Scan on managertree
+(13 rows)
+
+WITH RECURSIVE managertree AS (
+  SELECT id, name, manager_id
+  FROM employees
+  WHERE id = 2
+  UNION ALL
+  SELECT e.id, e.name, e.manager_id
+  FROM employees e
+  INNER JOIN managertree mtree ON mtree.id = e.manager_id
+)
+SELECT *
+FROM managertree mt WHERE mt.manager_id NOT IN (SELECT id FROM managertree);
+ id | name | manager_id
+----+------+------------
+  2 | John |          1
+(1 row)
+
+-- NOT IN under UNION ALL inside Recursive CTE
+EXPLAIN (COSTS FALSE) WITH RECURSIVE managertree AS (
+  SELECT id, name, manager_id
+  FROM employees
+  WHERE id = 2
+  UNION ALL
+  SELECT e.id, e.name, e.manager_id
+  FROM employees e
+  INNER JOIN managertree mtree ON
+  (mtree.id = e.manager_id AND
+  mtree.manager_id NOT IN (SELECT manager_id FROM employees)
+  )
+)
+SELECT *
+FROM managertree;
+                          QUERY PLAN
+---------------------------------------------------------------
+ CTE Scan on managertree
+   CTE managertree
+     ->  Recursive Union
+           ->  Seq Scan on employees employees_1
+                 Filter: (id = 2)
+           ->  Hash Join
+                 Hash Cond: (e.manager_id = mtree.id)
+                 ->  Seq Scan on employees e
+                 ->  Hash
+                       ->  WorkTable Scan on managertree mtree
+                             Filter: (NOT (hashed SubPlan 1))
+                             SubPlan 1
+                               ->  Seq Scan on employees
+(13 rows)
+
+WITH RECURSIVE managertree AS (
+  SELECT id, name, manager_id
+  FROM employees
+  WHERE id = 2
+  UNION ALL
+  SELECT e.id, e.name, e.manager_id
+  FROM employees e
+  INNER JOIN managertree mtree ON
+  (mtree.id = e.manager_id AND
+  mtree.manager_id NOT IN (SELECT manager_id FROM employees)
+  )
+)
+SELECT *
+FROM managertree;
+ id | name | manager_id
+----+------+------------
+  2 | John |          1
+(1 row)
+
+--Manfred-7613 CTE NOT IN with Union All
+create table cocotero as (
+    select * from(
+    values(1,2,3)) as data(a,b,c)
+);
+explain (costs off) with selected as (
+    select *
+    from cocotero
+),
+updated as (
+    update cocotero
+    set a = 3
+    from selected
+    where cocotero.a = selected.a
+    returning selected.a,selected.b,selected.c
+),
+inserted as (
+    insert into cocotero
+    select *
+    from selected
+    where a not in (select a from updated)
+    returning *
+)
+select 'updated' as action, count(*) as lines from updated
+union all
+select 'inserted' as action, count(*) as lines from inserted;
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Append
+   CTE selected
+     ->  Seq Scan on cocotero
+   CTE updated
+     ->  Update on cocotero cocotero_1
+           ->  Merge Join
+                 Merge Cond: (cocotero_1.a = selected.a)
+                 ->  Sort
+                       Sort Key: cocotero_1.a
+                       ->  Seq Scan on cocotero cocotero_1
+                 ->  Materialize
+                       ->  Sort
+                             Sort Key: selected.a
+                             ->  CTE Scan on selected
+   CTE inserted
+     ->  Insert on cocotero cocotero_2
+           InitPlan 3 (returns $3)
+             ->  CTE Scan on updated updated_1
+           ->  Nested Loop Anti Join
+                 Join Filter: ((selected_1.a = updated_2.a) OR (updated_2.a IS NULL))
+                 ->  CTE Scan on selected selected_1
+                       Filter: ((a IS NOT NULL) OR (NOT $3))
+                 ->  CTE Scan on updated updated_2
+   ->  Aggregate
+         ->  CTE Scan on updated
+   ->  Aggregate
+         ->  CTE Scan on inserted
+(27 rows)
+
+with selected as (
+    select *
+    from cocotero
+),
+updated as (
+    update cocotero
+    set a = 3
+    from selected
+    where cocotero.a = selected.a
+    returning selected.a,selected.b,selected.c
+),
+inserted as (
+    insert into cocotero
+    select *
+    from selected
+    where a not in (select a from updated)
+    returning *
+)
+select 'updated' as action, count(*) as lines from updated
+union all
+select 'inserted' as action, count(*) as lines from inserted;
+  action  | lines
+----------+-------
+ updated  |     1
+ inserted |     0
+(2 rows)
+
+--test enable_not_in_transform
+explain (costs off) select count(*) from s where s.u not in (select l.u from l);
+                 QUERY PLAN
+--------------------------------------------
+ Aggregate
+   ->  Nested Loop Anti Join
+         ->  Seq Scan on s
+         ->  Index Only Scan using l_u on l
+               Index Cond: (u = s.u)
+(5 rows)
+
+set enable_not_in_transform = off;
+explain (costs off) select count(*) from s where s.u not in (select l.u from l);
+            QUERY PLAN
+-----------------------------------
+ Aggregate
+   ->  Seq Scan on s
+         Filter: (NOT (SubPlan 1))
+         SubPlan 1
+           ->  Materialize
+                 ->  Seq Scan on l
+(6 rows)
+
+-- clean up
+reset work_mem;
+reset enable_not_in_transform;
+drop table s;
+drop table s1;
+drop table l;
+drop table empty;
+drop table public.testing;
+drop table employees;
+drop table cocotero;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 715842b..6e3aee8 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -83,6 +83,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_material                | on
  enable_mergejoin               | on
  enable_nestloop                | on
+ enable_not_in_transform        | on
  enable_parallel_append         | on
  enable_parallel_hash           | on
  enable_partition_pruning       | on
@@ -91,7 +92,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(19 rows)
+(20 rows)

 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index 893d8d0..4553a9f 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -831,3 +831,818 @@ select * from (with x as (select 2 as y) select * from x) ss;
 explain (verbose, costs off)
 with x as (select * from subselect_tbl)
 select * from x for update;
+
+-- test NON IN to ANTI JOIN conversion
+CREATE TABLE s (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL);
+insert into s (u, n, nn, p)
+    select
+    generate_series(1,3) as u,
+    generate_series(1,3) as n,
+    generate_series(1,3) as nn,
+    'foo' as p;
+insert into s values(1000002, 1000002, 1000002, 'foofoo');
+UPDATE s set n = NULL WHERE n = 3;
+analyze s;
+
+CREATE TABLE l (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL);
+insert into l (u, n, nn, p)
+    select
+    generate_series(1,10000 ) as u,
+    generate_series(1,10000 ) as n,
+    generate_series(1,10000 ) as nn,
+    'bar' as p;
+UPDATE l set n = NULL WHERE n = 7;
+
+CREATE UNIQUE INDEX l_u ON l (u);
+CREATE INDEX l_n ON l (n);
+CREATE INDEX l_nn ON l (nn);
+analyze l;
+
+CREATE TABLE s1 (u INTEGER NOT NULL, n INTEGER NULL, n1 INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL);
+insert into s1 (u, n, n1, nn, p)
+    select
+    generate_series(1,3) as u,
+    generate_series(1,3) as n,
+    generate_series(1,3) as n1,
+    generate_series(1,3) as nn,
+    'foo' as p;
+insert into s1 values(1000003, 1000003, 1000003, 1000003, 'foofoo');
+insert into s1 values(1003, 1003, 1003, 1003, 'foofoo');
+UPDATE s1 set n = NULL WHERE n = 3;
+UPDATE s1 set n1 = NULL WHERE n = 2;
+UPDATE s1 set n1 = NULL WHERE n1 = 3;
+analyze s1;
+
+CREATE TABLE empty (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL);
+analyze empty;
+
+-- set work_mem to 64KB so that NOT IN to ANTI JOIN optimization will kick in
+set work_mem = 64;
+
+-- correctness test 1: inner empty, return every thing from outer including NULL
+explain (costs false) select * from s where n not in (select n from empty);
+
+select * from s where n not in (select n from empty);
+
+-- correctness test 2: inner has NULL, return empty result
+explain (costs false) select * from s where n not in (select n from l);
+
+select * from s where n not in (select n from l);
+
+-- correctness test 3: inner non-null, result has no NULL
+explain (costs false) select * from s where n not in (select u from l);
+
+select * from s where n not in (select u from l);
+
+-- correctness test 4: inner has predicate
+explain (costs false) select * from s where n not in (select n from l where u > 7);
+
+select * from s where n not in (select n from l where u > 7);
+
+-- correctness test 5: multi-expression, (2, 2, null, 2, foo) should be in the result
+explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u >= 3);
+
+select * from s1 where (n,n1) not in (select u,nn from l where u >= 3);
+
+-- correctness test 6: multi-expression, (3, null, null, 3, foo) should not be in the result
+explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u > 0);
+
+select * from s1 where (n,n1) not in (select u,nn from l where u > 0);
+
+-- correctness test 6: multi-expression, (3, null, null, 3, foo) should be in the result
+explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u < 0);
+
+select * from s1 where (n,n1) not in (select u,nn from l where u < 0);
+
+-- test using hashed subplan when inner fits in work_mem
+explain (costs false) select * from l where n not in (select n from s);
+
+select * from l where n not in (select n from s);
+
+-- test single expression
+explain (costs false) select * from s where n not in (select n from l);
+
+select * from s where n not in (select n from l);
+
+explain (costs false) select * from s where u not in (select u from l);
+
+select * from s where u not in (select u from l);
+
+explain (costs false) select * from s where 3*n not in (select n from l);
+
+select * from s where 3*n not in (select n from l);
+
+explain (costs false) select * from s where n not in (select 3*n from l);
+
+select * from s where n not in (select 3*n from l);
+
+-- test single expression with predicates
+explain (costs false) select * from s where n not in (select n from l where u > 0);
+
+select * from s where n not in (select n from l where u > 0);
+
+explain (costs false) select * from s where n not in (select n from l where u > 100);
+
+select * from s where n not in (select n from l where u > 100);
+
+-- test multi expression
+explain (costs false) select * from s where (n,u) not in (select n,u from l);
+
+select * from s where (n,u) not in (select n,u from l);
+
+explain (costs false) select * from s where (u, nn) not in (select u, nn from l);
+
+select * from s where (u, nn) not in (select u, nn from l);
+
+explain (costs false) select * from s where (n,u) not in (select u,n from l);
+
+select * from s where (n,u) not in (select u,n from l);
+
+explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l);
+
+select * from s where (n,u,nn) not in (select u,n,nn from l);
+
+explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1000);
+
+select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1000);
+
+explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 0);
+
+select * from s where (n,u,nn) not in (select u,n,nn from l where u > 0);
+
+explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1);
+
+select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1);
+
+-- test multi-table
+explain (costs false) select count(*) from s, l where s.n not in (select n from l);
+
+select count(*) from s, l where s.n not in (select n from l);
+
+explain (costs false) select count(*) from s, l where s.nn not in (select nn from l);
+
+select count(*) from s, l where s.nn not in (select nn from l);
+
+-- test null padded results from outer join
+explain (costs false) select * from s where n not in (select s.nn from l left join s on l.nn = s.nn);
+
+select * from s where n not in (select s.nn from l left join s on l.nn = s.nn);
+
+explain (costs false) select * from s where n not in (select s.nn from s right join l on s.nn = l.nn);
+
+select * from s where n not in (select s.nn from s right join l on s.nn = l.nn);
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn where l.nn not in (select nn from s);
+
+select count(*) from s right join l on s.nn = l.nn where l.nn not in (select nn from s);
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from s);
+
+select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from s);
+
+explain (costs false) select count(*) from s right join l on s.nn=l.nn where l.nn not in (select l.nn from l left join
son l.nn = s.nn); 
+
+select count(*) from s right join l on s.nn=l.nn where l.nn not in (select l.nn from l left join s on l.nn = s.nn);
+
+explain (costs false) select count(*) from s right join l on s.nn=l.nn where s.nn not in (select s.nn from l left join
son l.nn = s.nn); 
+
+select count(*) from s right join l on s.nn=l.nn where s.nn not in (select s.nn from l left join s on l.nn = s.nn);
+
+explain (costs false) select count(*) from s left join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn
froml); 
+
+select count(*) from s left join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l);
+
+explain (costs false) select count(*) from s left join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in
(selectnn from l); 
+
+select count(*) from s left join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn from l);
+
+explain (costs false) select count(*) from s left join s1 on s.u=s1.u left join l on s.u=l.u where s.nn not in (select
nnfrom l); 
+
+select count(*) from s left join s1 on s.u=s1.u left join l on s.u=l.u where s.nn not in (select nn from l);
+
+explain (costs false) select count(*) from s right join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn
froml); 
+
+select count(*) from s right join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l);
+
+explain (costs false) select count(*) from s join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn
froml); 
+
+select * from s join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn from l);
+
+explain (costs false) select count(*) from s full join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn
froml); 
+
+select count(*) from s full join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l);
+
+explain (costs false) select count(*) from s join s1 on s.u=s1.u full join l on s.u=l.u where s.nn not in (select nn
froml); 
+
+select count(*) from s join s1 on s.u=s1.u full join l on s.u=l.u where s.nn not in (select nn from l);
+
+explain (costs false) select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn left join s1 on
l.nn=s1.nn);
+
+select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn left join s1 on l.nn=s1.nn);
+
+explain (costs false) select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn right join s1 on
l.nn=s1.nn);
+
+select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn right join s1 on l.nn=s1.nn);
+
+explain (costs false) select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn);
+
+select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn);
+
+explain (costs false) select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l right join s on l.nn = s.nn);
+
+select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn);
+
+--test reduce outer joins from outer query
+explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l);
+
+select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l);
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l) and
s.u>0;
+
+select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l) and s.u>0;
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in
(selectnn from l); 
+
+select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (select nn from l);
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn right join s1 on s.u = s1.u where s.nn not in
(selectnn from l); 
+
+select count(*) from s right join l on s.nn = l.nn right join s1 on s.u = s1.u where s.nn not in (select nn from l);
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in
(selectnn from l); 
+
+select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (select nn from l);
+
+--test reduce outer joins from subquery
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn);
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn);
+
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where l.u > 9);
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where l.u > 9);
+
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where s.u > 9);
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where s.u > 9);
+
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn join s1 on l.n =
s1.n);
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn join s1 on l.n = s1.n);
+
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn right join s1 on
l.n= s1.n); 
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn right join s1 on l.n = s1.n);
+
+explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn left join s1 on
l.n= s1.n); 
+
+select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn left join s1 on l.n = s1.n);
+
+--test reduce outer join on outer and sub-query
+explain (costs false) select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in
(selectl.nn from l right join s on l.nn = s.nn join s1 on l.n = s1.n); 
+
+select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (select l.nn from l right
joins on l.nn = s.nn join s1 on l.n = s1.n); 
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in
(selectl.nn from l right join s on l.nn = s.nn left join s1 on l.n = s1.n); 
+
+select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (select l.nn from l
rightjoin s on l.nn = s.nn left join s1 on l.n = s1.n); 
+
+-- test union all
+explain (costs false) select * from s as t where not exists
+(select 1 from (select n as y from l union all
+                select u as y from s union all
+                select nn as y from s) as v where t.n=v.y or v.y is null) and n is not null;
+
+select * from s as t where not exists
+(select 1 from (select n as y from l union all
+                select u as y from s union all
+                select nn as y from s) as v where t.n=v.y or v.y is null) and n is not null;
+
+explain (costs false) select * from s where n not in
+(select n as y from l union all
+ select u as y from s union all
+ select nn as y from s);
+
+select * from s where n not in
+(select n as y from l union all
+ select u as y from s union all
+ select nn as y from s);
+
+explain (costs false) select count(*) from
+(select n as x from s union all select u as x from l) t where t.x not in
+(select nn from l);
+
+select count(*) from
+(select n as x from s union all select u as x from l) t where t.x not in
+(select nn from l);
+
+explain (costs false) select count(*) from
+(select n as x from s union all select n as x from l) t where t.x not in
+(select nn from empty);
+
+select count(*) from
+(select n as x from s union all select n as x from l) t where t.x not in
+(select nn from empty);
+
+explain (costs false) select count(*) from
+(select n as x from s union all select u as x from l) t where t.x not in
+(select n as y from l union all
+ select u as y from s union all
+ select nn as y from s);
+
+select count(*) from
+(select n as x from s union all select u as x from l) t where t.x not in
+(select n as y from l union all
+ select u as y from s union all
+ select nn as y from s);
+
+-- test multi-levels of NOT IN
+explain (costs false) select * from s where n not in (select n from s where n not in (select n from l));
+
+select * from s where n not in (select n from s where n not in (select n from l));
+
+explain (costs false) select * from s where n not in (select n from s where n not in (select u from l));
+
+select * from s where n not in (select n from s where n not in (select u from l));
+
+explain (costs false) select count(*) from s where u not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n));
+
+select count(*) from s where u not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n));
+
+explain (costs false) select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in
(selectnn from s1); 
+
+select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (select nn from s1);
+
+explain (costs false) select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in
(selectnn from l); 
+
+select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (select nn from l);
+
+explain (costs false) select count(*) from s where u not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n))
+and nn not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n));
+
+select count(*) from s where u not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n))
+and nn not in
+(select n from s1 where not exists
+ (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n));
+
+--test COALESCE
+explain (costs false) select * from s where COALESCE(n, -1) not in (select COALESCE(n, -1) from l);
+
+select * from s where COALESCE(n, -1) not in (select COALESCE(n, -1) from l);
+
+explain (costs false) select * from s where COALESCE(n, NULL, -1) not in (select COALESCE(n, NULL, -1) from l);
+
+select * from s where COALESCE(n, NULL, -1) not in (select COALESCE(n, NULL, -1) from l);
+
+explain (costs false) select * from s where COALESCE(n, NULL, NULL) not in (select COALESCE(n, NULL, NULL) from l);
+
+select * from s where COALESCE(n, NULL, NULL) not in (select COALESCE(n, NULL, NULL) from l);
+
+explain (costs false) select * from s where COALESCE(n, nn) not in (select COALESCE(n, nn) from l);
+
+select * from s where COALESCE(n, nn) not in (select COALESCE(n, nn) from l);
+
+explain (costs false) select * from s where COALESCE(nn, NULL) not in (select COALESCE(nn, NULL) from l);
+
+select * from s where COALESCE(nn, NULL) not in (select COALESCE(nn, NULL) from l);
+
+explain (costs false) select * from s where (COALESCE(n, -1), nn, COALESCE(n, u)) not in (select COALESCE(n, -1), nn,
COALESCE(n,u) from l); 
+
+select * from s where (COALESCE(n, -1), nn, COALESCE(n, u)) not in (select COALESCE(n, -1), nn, COALESCE(n, u) from
l);
+
+-- test miscellaneous outer nullable cases
+
+explain (costs false) select * from s where (n,n) not in (select n,n from l);
+
+select * from s where (n,n) not in (select n,n from l);
+
+explain (costs false) select * from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l);
+
+select * from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l);
+
+explain (costs false) select count(*) from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn
froml where u < 0); 
+
+select count(*) from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l where u < 0);
+
+explain (costs false) select * from s where (n,n,n) not in (select distinct n,n,n from l where u > 0 limit 3) order by
n;
+
+select * from s where (n,n,n) not in (select distinct n,n,n from l where u > 0 limit 3) order by n;
+
+--test outer has strict predicate or inner join
+explain (costs false) select * from s where n not in (select n from l) and n > 0;
+
+select * from s where n not in (select n from l) and n > 0;
+
+explain (costs false) select * from s where n not in (select n from l) and u > 0;
+
+select * from s where n not in (select n from l) and u > 0;
+
+explain (costs false) select * from s where n not in (select n from l) and n is not null;
+
+select * from s where n not in (select n from l) and n is not null;
+
+explain (costs false) select * from s join l on s.n = l.n where s.n not in (select n from l);
+
+select * from s join l on s.n = l.n where s.n not in (select n from l);
+
+explain (costs false) select count(*) from s right join l on s.n = l.n where s.n not in (select n from l);
+
+select count(*) from s right join l on s.n = l.n where s.n not in (select n from l);
+
+explain (costs false) select count(*) from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select
nfrom l); 
+
+select count(*) from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select n from l);
+
+explain (costs false) select count(*) from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select
nfrom l); 
+
+select count(*) from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select n from l);
+
+--test inner has strict predicate or inner join
+explain (costs false) select * from s where u not in (select n from l where n > 0);
+
+select * from s where u not in (select n from l where n > 0);
+
+explain (costs false) select * from s where u not in (select n from l where u > 0);
+
+select * from s where u not in (select n from l where u > 0);
+
+explain (costs false) select * from s where u not in (select n from l where n is not null);
+
+select * from s where u not in (select n from l where n is not null);
+
+explain (costs false) select * from s where u not in (select l.n from l join s on l.n=s.n);
+
+select * from s where u not in (select l.n from l join s on l.n=s.n);
+
+explain (costs false) select * from s where u not in (select l.n from l join s on l.u=s.u);
+
+select * from s where u not in (select l.n from l join s on l.u=s.u);
+
+explain (costs false) select * from s where u not in (select l.n from l join s on l.n = s.n);
+
+select * from s where u not in (select l.n from l join s on l.n = s.n);
+
+explain (costs false) select * from s where u not in (select l.n from l right join s on l.n = s.n);
+
+select * from s where u not in (select l.n from l right join s on l.n = s.n);
+
+explain (costs false) select * from s where u not in (select l.n from l right join s on l.n=s.n join s1 on l.n=s1.n);
+
+select * from s where u not in (select l.n from l right join s on l.n=s.n join s1 on l.n=s1.n);
+
+explain (costs false) select * from s where u not in (select l.n from l join s on l.n=s.n right join s1 on l.n=s1.n);
+
+select * from s where u not in (select l.n from l join s on l.n=s.n right join s1 on l.n=s1.n);
+
+--test both sides have strict predicate or inner join
+explain (costs false) select * from s where n not in (select n from l where n > 0) and n > 0;
+
+select * from s where n not in (select n from l where n > 0) and n > 0;
+
+explain (costs false) select * from s where n not in (select n from l where u > 0) and n > 0;
+
+select * from s where n not in (select n from l where u > 0) and n > 0;
+
+explain (costs false) select * from s where n not in (select n from l where n > 0) and u > 0;
+
+select * from s where n not in (select n from l where n > 0) and u > 0;
+
+explain (costs false) select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n
froml right join s on l.n=s.n join s s1 on l.n=s1.n); 
+
+select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n from l right join s on
l.n=s.njoin s s1 on l.n=s1.n); 
+
+explain (costs false) select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n
froml join s on l.n=s.n right join s s1 on l.n=s1.n); 
+
+select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n from l join s on l.n=s.n
rightjoin s s1 on l.n=s1.n); 
+
+explain (costs false) select * from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select l.n
froml join s on l.n=s.n right join s s1 on l.n=s1.n); 
+
+select * from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select l.n from l join s on l.n=s.n
rightjoin s s1 on l.n=s1.n); 
+
+--JIRA-7279 CTE with NOT IN
+create table public.testing
+(
+a integer,
+b integer,
+c integer
+);
+
+explain (costs false) with
+selected(a,b,c) as (values(1,2,3)),
+updated(d,e,f) as (values(4,5,6))
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated);
+
+with
+selected(a,b,c) as (values(1,2,3)),
+updated(d,e,f) as (values(4,5,6))
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated);
+
+select * from public.testing;
+
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated);
+
+with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated);
+
+select * from public.testing;
+
+-- expect to get Hash Anti Join
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where a not in (select d from updated);
+
+with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where a not in (select d from updated);
+
+select * from public.testing;
+
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where b not in (select e from updated);
+
+with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where b not in (select e from updated);
+
+select * from public.testing;
+
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where b not in (select d from updated);
+
+with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where b not in (select d from updated);
+
+select * from public.testing;
+
+-- two levels of NOT IN with CTE, 2nd NOT IN
+-- subquery access CTE two levels above
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated
+      where d not in (select a from selected));
+
+with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+insert into public.testing
+select * from selected
+where (a,b,c) not in (select d,e,f from updated
+      where d not in (select a from selected));
+
+select * from public.testing;
+
+-- With clause inside a query block
+explain select count(distinct t.a) from
+(with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+select * from selected where (a,b,c) not in
+(select d,e,f from updated
+ where d not in (select a from selected))) as t;
+
+select count(distinct t.a) from
+(with
+selected(a,b,c) as (select u, n, nn from s),
+updated(d,e,f) as (select u, n, nn from l)
+select * from selected where (a,b,c) not in
+(select d,e,f from updated
+ where d not in (select a from selected))) as t;
+
+-- With clause in subquery, can't flatten subquery to anti join
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s)
+insert into public.testing
+select * from selected where (a,b,c) not in
+(with updated (d,e,f) as (select u, n, nn from l)
+select d,e,f from updated);
+
+with
+selected(a,b,c) as (select u, n, nn from s)
+insert into public.testing
+select * from selected where (a,b,c) not in
+(with updated (d,e,f) as (select u, n, nn from l)
+select d,e,f from updated);
+
+select * from public.testing;
+
+-- With clause in subquery, subsubquery access CTE in subquery
+explain (costs false) with
+selected(a,b,c) as (select u, n, nn from s)
+insert into public.testing
+select * from selected where (a,b,c) not in
+(
+with updated(d,e,f) as (select u, n, nn from l)
+select d,e,f from updated where d not in (select d from updated)
+);
+
+with
+selected(a,b,c) as (select u, n, nn from s)
+insert into public.testing
+select * from selected where (a,b,c) not in
+(
+with updated(d,e,f) as (select u, n, nn from l)
+select d,e,f from updated where d not in (select d from updated)
+);
+
+select * from public.testing;
+
+-- Recursive CTE
+CREATE TABLE employees (
+  id serial,
+  name varchar(255),
+  manager_id int
+);
+
+INSERT INTO employees VALUES (1, 'Mark', null);
+INSERT INTO employees VALUES (2, 'John', 1);
+INSERT INTO employees VALUES (3, 'Dan', 2);
+INSERT INTO employees VALUES (4, 'Clark', 1);
+INSERT INTO employees VALUES (5, 'Linda', 2);
+INSERT INTO employees VALUES (6, 'Willy', 2);
+INSERT INTO employees VALUES (7, 'Barack', 2);
+INSERT INTO employees VALUES (8, 'Elen', 2);
+INSERT INTO employees VALUES (9, 'Kate', 3);
+INSERT INTO employees VALUES (10, 'Terry', 4);
+
+WITH RECURSIVE managertree AS (
+  SELECT id, name, manager_id
+  FROM employees
+  WHERE id = 2
+  UNION ALL
+  SELECT e.id, e.name, e.manager_id
+  FROM employees e
+  INNER JOIN managertree mtree ON mtree.id = e.manager_id
+)
+SELECT *
+FROM managertree;
+
+-- NOT IN subquery access Recursive CTE
+EXPLAIN (COSTS FALSE) WITH RECURSIVE managertree AS (
+  SELECT id, name, manager_id
+  FROM employees
+  WHERE id = 2
+  UNION ALL
+  SELECT e.id, e.name, e.manager_id
+  FROM employees e
+  INNER JOIN managertree mtree ON mtree.id = e.manager_id
+)
+SELECT *
+FROM managertree mt WHERE mt.manager_id NOT IN (SELECT id FROM managertree);
+
+WITH RECURSIVE managertree AS (
+  SELECT id, name, manager_id
+  FROM employees
+  WHERE id = 2
+  UNION ALL
+  SELECT e.id, e.name, e.manager_id
+  FROM employees e
+  INNER JOIN managertree mtree ON mtree.id = e.manager_id
+)
+SELECT *
+FROM managertree mt WHERE mt.manager_id NOT IN (SELECT id FROM managertree);
+
+-- NOT IN under UNION ALL inside Recursive CTE
+EXPLAIN (COSTS FALSE) WITH RECURSIVE managertree AS (
+  SELECT id, name, manager_id
+  FROM employees
+  WHERE id = 2
+  UNION ALL
+  SELECT e.id, e.name, e.manager_id
+  FROM employees e
+  INNER JOIN managertree mtree ON
+  (mtree.id = e.manager_id AND
+  mtree.manager_id NOT IN (SELECT manager_id FROM employees)
+  )
+)
+SELECT *
+FROM managertree;
+
+WITH RECURSIVE managertree AS (
+  SELECT id, name, manager_id
+  FROM employees
+  WHERE id = 2
+  UNION ALL
+  SELECT e.id, e.name, e.manager_id
+  FROM employees e
+  INNER JOIN managertree mtree ON
+  (mtree.id = e.manager_id AND
+  mtree.manager_id NOT IN (SELECT manager_id FROM employees)
+  )
+)
+SELECT *
+FROM managertree;
+
+--Manfred-7613 CTE NOT IN with Union All
+create table cocotero as (
+    select * from(
+    values(1,2,3)) as data(a,b,c)
+);
+
+explain (costs off) with selected as (
+    select *
+    from cocotero
+),
+updated as (
+    update cocotero
+    set a = 3
+    from selected
+    where cocotero.a = selected.a
+    returning selected.a,selected.b,selected.c
+),
+inserted as (
+    insert into cocotero
+    select *
+    from selected
+    where a not in (select a from updated)
+    returning *
+)
+select 'updated' as action, count(*) as lines from updated
+union all
+select 'inserted' as action, count(*) as lines from inserted;
+
+with selected as (
+    select *
+    from cocotero
+),
+updated as (
+    update cocotero
+    set a = 3
+    from selected
+    where cocotero.a = selected.a
+    returning selected.a,selected.b,selected.c
+),
+inserted as (
+    insert into cocotero
+    select *
+    from selected
+    where a not in (select a from updated)
+    returning *
+)
+select 'updated' as action, count(*) as lines from updated
+union all
+select 'inserted' as action, count(*) as lines from inserted;
+
+--test enable_not_in_transform
+explain (costs off) select count(*) from s where s.u not in (select l.u from l);
+
+set enable_not_in_transform = off;
+
+explain (costs off) select count(*) from s where s.u not in (select l.u from l);
+
+-- clean up
+reset work_mem;
+reset enable_not_in_transform;
+drop table s;
+drop table s1;
+drop table l;
+drop table empty;
+drop table public.testing;
+drop table employees;
+drop table cocotero;

Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
Hi Tom,

Thanks for the feedback.
    
    * I find it entirely unacceptable to stick some planner temporary
    fields into struct SubLink.  If you need that storage you'll have
    to find some other place to put it.  But in point of fact I don't
    think you need it; it doesn't look to me to be critical to generate
    the subquery's plan any earlier than make_subplan would have done it.
    Moreover, you should really strive to *not* do that, because it's
    likely to get in the way of other future optimizations.  As the
    existing comment in make_subplan already suggests, we might want to
    delay subplan planning even further than that in future.

  The reason for calling make_subplan this early is that we want to
Call subplan_is_hashable(plan), to decide whether to proceed with the proposed
transformation. We try to stick with the fast hashed subplan when possible to avoid
potential performance degradation from the transformation which may restrict the
planner to choose Nested Loop Anti Join in order to handle Null properly,
the following is an example from subselect.out:
explain (costs false) select * from s where n not in (select u from l);
                  QUERY PLAN
-----------------------------------------------
 Nested Loop Anti Join
   InitPlan 1 (returns $0)
     ->  Seq Scan on l l_1
   ->  Seq Scan on s
         Filter: ((n IS NOT NULL) OR (NOT $0))
   ->  Index Only Scan using l_u on l
         Index Cond: (u = s.n)

However, if the subplan is not hashable, the above Nested Loop Anti Join is
actually faster.
    
    * I'm also not too happy with the (undocumented) rearrangement of
    reduce_outer_joins.  There's a specific sequence of processing that
    that's involved in, as documented at the top of prepjointree.c, and
    I doubt that you can just randomly call it from other places and expect
    good results.  In particular, since JOIN alias var flattening won't have
    happened yet when this code is being run from pull_up_sublinks, it's
    unlikely that reduce_outer_joins will reliably get the same answers it
    would get normally.  I also wonder whether it's safe to make the
    parsetree changes it makes earlier than normal, and whether it will be
    problematic to run it twice on the same tree, and whether its rather
    indirect connection to distribute_qual_to_rels is going to misbehave.

  The rearrangement of  reduce_outer_joins was to make the null test function
is_node_nonnullable() more accurate. Later we added strict predicates logic in
is_node_nonnullable(), so I think we can get rid of the rearrangement of
reduce_outer_joins now without losing accuracy.
    
    * The proposed test additions seem to about triple the runtime of
    subselect.sql.  This seems excessive.  I also wonder why it's necessary
    for this test to build its own large tables; couldn't it re-use ones
    that already exist in the regression database?

  I added a lot of test cases. But yes, I can reuse the existing large table if
there is one that doesn't fit in 64KB work_mem.
    
    * Not really sure that we need a new planner GUC for this, but if we
    do, it needs to be documented.

  The new GUC is just in case if anything goes wrong, the user can
easily turn it off.

Regards,
Zheng    
    


Re: NOT IN subquery optimization

От
Tom Lane
Дата:
"Li, Zheng" <zhelli@amazon.com> writes:
>     * I find it entirely unacceptable to stick some planner temporary
>     fields into struct SubLink.  If you need that storage you'll have
>     to find some other place to put it.  But in point of fact I don't
>     think you need it; it doesn't look to me to be critical to generate
>     the subquery's plan any earlier than make_subplan would have done it.
>     Moreover, you should really strive to *not* do that, because it's
>     likely to get in the way of other future optimizations.  As the
>     existing comment in make_subplan already suggests, we might want to
>     delay subplan planning even further than that in future.

>   The reason for calling make_subplan this early is that we want to
> Call subplan_is_hashable(plan), to decide whether to proceed with the proposed
> transformation.

Well, you're going to have to find another way, because this one won't do.

If you really need to get whacked over the head with a counterexample for
this approach, consider what happens if some part of the planner decides
to pass the SubLink through copyObject, expression_tree_mutator, etc
in between where you've done the planning and where make_subplan looks
at it.  Since you haven't taught copyfuncs.c about these fields, they'll
semi-accidentally wind up as NULL afterwards, meaning you lost the
information anyway.  (In fact, I wouldn't be surprised if that's happening
already in some cases; you couldn't really tell, since make_subplan will
just repeat the work.)  On the other hand, you can't have copyfuncs.c
copying such fields either --- we don't have copyfuncs support for
PlannerInfo, and if we did, the case would end up as infinite recursion.
Nor would it be particularly cool to try to fake things out by copying the
pointers as scalars; that will lead to dangling pointers later on.

BTW, so far as I can see, the only reason you're bothering with the whole
thing is to compare the size of the subquery output with work_mem, because
that's all that subplan_is_hashable does.  I wonder whether that
consideration is even still necessary in the wake of 1f39bce02.  If it is,
I wonder whether there isn't a cheaper way to figure it out.  (Note
similar comment in make_subplan.)

Also ...

> We try to stick with the fast hashed subplan when possible to avoid
> potential performance degradation from the transformation which may
> restrict the planner to choose Nested Loop Anti Join in order to handle
> Null properly,

But can't you detect that case directly?  It seems like you'd need to
figure out the NULL situation anyway to know whether the transformation
to antijoin is valid in the first place.

            regards, tom lane



Re: NOT IN subquery optimization

От
"Li, Zheng"
Дата:
    >BTW, so far as I can see, the only reason you're bothering with the whole
    thing is to compare the size of the subquery output with work_mem, because
    that's all that subplan_is_hashable does.  I wonder whether that
    consideration is even still necessary in the wake of 1f39bce02.  If it is,
    I wonder whether there isn't a cheaper way to figure it out.  (Note
    similar comment in make_subplan.)

    The comment in make_subplan says there is no cheaper way to figure out:
    /* At present, however, we can only check hashability after
     * we've made the subplan :-(.  (Determining whether it'll fit in work_mem
     * is the really hard part.)
     */

    I don't see why commit 1f39bce02 is related to this problem. Can you expand on this?
        
    >But can't you detect that case directly?  It seems like you'd need to
    figure out the NULL situation anyway to know whether the transformation
    to antijoin is valid in the first place.
    
    Yes, we do need to figure out the NULL situation, and there is always valid transformation
    to antijoin, it's just in the NULL case we need to stuff additional clause to the anti join
    condition, and in these cases the transformation actually outperforms Subplan (non-hashed),
    but underperforms the hashed Subplan. The unmodified anti hash join has similar performance
    compared to hashed Subplan.


Re: NOT IN subquery optimization

От
Andrey Lepikhov
Дата:
You should do small rebase (conflict with 911e7020770) and pgindent of 
the patch to repair problems with long lines and backspaces.

I am reviewing your patch in small steps. Questions:
1. In the find_innerjoined_rels() routine you stop descending on 
JOIN_FULL node type. I think it is wrong because if var has NOT NULL 
constraint, full join can't change it to NULL.
2. The convert_NOT_IN_to_join() routine is ok, but its name is 
misleading. May be you can use something like make_NOT_IN_to_join_quals()?
3. pull_up_sublinks_qual_recurse(). Comment:
"Return pullout predicate (x is NOT NULL)..."
may be change to
"Return pullout predicate (x is NOT NULL or NOT EXISTS...)"?
4. is_node_nonnullable():
I think one more case of non-nullable var may be foreign key constraint.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company




Re: NOT IN subquery optimization

От
David Steele
Дата:
On 3/26/20 4:58 PM, Li, Zheng wrote:
>      >BTW, so far as I can see, the only reason you're bothering with the whole
>      thing is to compare the size of the subquery output with work_mem, because
>      that's all that subplan_is_hashable does.  I wonder whether that
>      consideration is even still necessary in the wake of 1f39bce02.  If it is,
>      I wonder whether there isn't a cheaper way to figure it out.  (Note
>      similar comment in make_subplan.)
> 
>      The comment in make_subplan says there is no cheaper way to figure out:
>      /* At present, however, we can only check hashability after
>       * we've made the subplan :-(.  (Determining whether it'll fit in work_mem
>       * is the really hard part.)
>       */
> 
>      I don't see why commit 1f39bce02 is related to this problem. Can you expand on this?
>          
>      >But can't you detect that case directly?  It seems like you'd need to
>      figure out the NULL situation anyway to know whether the transformation
>      to antijoin is valid in the first place.
>      
>      Yes, we do need to figure out the NULL situation, and there is always valid transformation
>      to antijoin, it's just in the NULL case we need to stuff additional clause to the anti join
>      condition, and in these cases the transformation actually outperforms Subplan (non-hashed),
>      but underperforms the hashed Subplan. The unmodified anti hash join has similar performance
>      compared to hashed Subplan.

There seem to be enough questions about this implementation that I think 
it makes sense to mark this patch Returned with Feedback.

Feel free to resubmit it to a future CF when there is more of a 
consensus on the implementation.

Regards,
-- 
-David
david@pgmasters.net