Re: Convert NOT IN sublinks to anti-joins when safe
| От | David Geier |
|---|---|
| Тема | Re: Convert NOT IN sublinks to anti-joins when safe |
| Дата | |
| Msg-id | 0dd7bed2-2a1d-4a12-bddf-e00013744eab@gmail.com обсуждение исходный текст |
| Ответ на | Re: Convert NOT IN sublinks to anti-joins when safe (Richard Guo <guofenglinux@gmail.com>) |
| Ответы |
Re: Convert NOT IN sublinks to anti-joins when safe
|
| Список | pgsql-hackers |
On 05.02.2026 07:09, Richard Guo wrote: > On Wed, Feb 4, 2026 at 11:59 PM David Geier <geidav.pg@gmail.com> wrote: >> If the sub-select can yield NULLs, the rewrite can be fixed by adding an >> OR t2.c1 IS NULL clause, such as: >> >> SELECT t1.c1 FROM t1 WHERE >> NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL) > > I'm not sure if this rewrite results in a better plan. The OR clause > would force a nested loop join, which could be much slower than a > hashed-subplan plan. That's why I had shared a variant that doesn't have the OR but a instead uses a second NOT EXISTS: SELECT t1.c1 FROM t1 WHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL) >> If the outer expression can yield NULLs, the rewrite can be fixed by >> adding a t1.c1 IS NOT NULL clause, such as: >> >> SELECT t1.c1 FROM T1 WHERE >> t1.c1 IS NOT NULL AND >> NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) > > This rewrite doesn't seem correct to me. If t2 is empty, you would > incorrectly lose the NULL rows from t1 in the final result. Yes, that rewrite was only for the case where the outer expression can yield NULLs but the sub-query cannot. The very last rewrite combines both cases. The rewritten query then looks like: SELECT t1.c1 FROM T1 WHERE t1.c1 IS NOT NULL AND NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL) >> What's our today's take on doing more involved transformations inside >> the planner to support such cases? It would greatly open up the scope of >> the optimization. > > As mentioned in my initial email, the goal of this patch is not to > handle every possible case, but rather only to handle the basic form > where both sides of NOT IN are provably non-nullable. This keeps the > code complexity to a minimum, and I believe this would cover the most > common use cases in real world. Seems reasonable to start with the non-NULL variant, though there are certainly cases where there's no PK / unique index on the relevant columns. -- David Geier
В списке pgsql-hackers по дате отправления: