Обсуждение: Planner : anti-join on left joins

Поиск
Список
Период
Сортировка

Planner : anti-join on left joins

От
Nicolas Adenis-Lamarre
Дата:
Hi,

while writting a book in 2025, i read an played with the postgresql code.
now my book is finished and i think that maybe a next step is to try to patch postgresql for some optimisations. I'm a beginner about postgresql code (but i patched a lot of opensource programs while working on a linux distro).

this first email aim is to discuss about the fact that this kind of code (just a draft here) have a chance to be commited or not (once comments added, reident, cleaning, ...). Details about each implementation could be discussed later.

There are some optimisations at the planner level that are not mandatory when you know how it works, but all the year, i get query to optimize because people doesn't know the pg internals or write not well written queries. So, the patches i would like to suggest are more "non mandatory optimisations".

like:
- detect anti join on "a left join b where x is null" where x is a non null var b (b being a rte)
this is the object of the attached patched.
it is not finished, but working for a demonstation (this is a quick and dirty patch just to try if i were able to do it).
it shows me that it has drawbacks : for example : it requires to know the details on some tables sooner on the planner, (and thus, sometimes, before we detect that we could just remove a table => so we build some tables for nothing except optimisations)

- remove unrequirered distinct, group by (select distinct id_unique from people;)
- remove double order (select * from (select * from a order by x) order by y) (where * doesn't containt functions based on row nums)
- detect anti join on "not in(...)"
- have a way to view the rewritten query ? (like explain)

and so on.

Вложения

Re: Planner : anti-join on left joins

От
Tom Lane
Дата:
Nicolas Adenis-Lamarre <nicolas.adenis.lamarre@gmail.com> writes:
> - detect anti join on "a left join b where x is null" where x is a non null
> var b (b being a rte)
> this is the object of the attached patched.

This is a perfectly reasonable thing to do, especially now that we've
built out some infrastructure that would help.  It doesn't look like
your patch is using that though.  Take a look at commits 904f6a593
and e2debb643.

BTW, it is not a good look for even a draft patch to not bother
updating adjacent comments that it falsifies, such as this in
reduce_outer_joins_pass2:

         * See if we can reduce JOIN_LEFT to JOIN_ANTI.  This is the case if
         * the join's own quals are strict for any var that was forced null by
         * higher qual levels.  NOTE: there are other ways that we could
         * detect an anti-join, in particular if we were to check whether Vars
         * coming from the RHS must be non-null because of table constraints.
         * That seems complicated and expensive though (in particular, one
         * would have to be wary of lower outer joins). For the moment this
         * seems sufficient.

In the long run, the comments are as important as the code, if not
even more so.  Keeping them accurate is not optional.

> - remove unrequirered distinct, group by (select distinct id_unique from
> people;)

Perhaps.  Not sure it's worth the trouble.

> - remove double order (select * from (select * from a order by x) order by
> y) (where * doesn't containt functions based on row nums)

I'd be inclined to think this is a bad idea.  If someone wrote that
they probably had a reason to want a double sort.

            regards, tom lane



Re: Planner : anti-join on left joins

От
David Rowley
Дата:
On Wed, 31 Dec 2025 at 22:59, Nicolas Adenis-Lamarre
<nicolas.adenis.lamarre@gmail.com> wrote:
> I'm a beginner about postgresql code (but i patched a lot of opensource programs while working on a linux distro).

Welcome.

> There are some optimisations at the planner level that are not mandatory when you know how it works, but all the
year,i get query to optimize because people doesn't know the pg internals or write not well written queries. So, the
patchesi would like to suggest are more "non mandatory optimisations". 
>
> like:
> - detect anti join on "a left join b where x is null" where x is a non null var b (b being a rte)
> this is the object of the attached patched.
> it is not finished, but working for a demonstation (this is a quick and dirty patch just to try if i were able to do
it).
> it shows me that it has drawbacks : for example : it requires to know the details on some tables sooner on the
planner,(and thus, sometimes, before we detect that we could just remove a table => so we build some tables for nothing
exceptoptimisations) 

Please look at find_relation_notnullatts(). You may be able to check
the forced_null_vars against that Bitmapset, with care to offset by
FirstLowInvalidHeapAttributeNumber.

> - remove unrequirered distinct, group by (select distinct id_unique from people;)

There's been work in that area before. Please search the archives for
UniqueKeys ([1]).

> - remove double order (select * from (select * from a order by x) order by y) (where * doesn't containt functions
basedon row nums) 

I think you risk breaking quite a few things there. There'd be quite a
large number of reasons to not do this and it seems quite difficult to
think of all of them upfront, which you'd need to do.

> - detect anti join on "not in(...)"

A few people have worked in that area before (some of it in [2], but I
think there was a more recent effort too). It might be worth reviewing
those discussions. Keep in mind that since then, the planner has more
infrastructure to know if Vars or Exprs can be NULL.

> - have a way to view the rewritten query ? (like explain)

I don't know what that means. It's not like there's a way to express
all the optimisations that were applied back into SQL.

David

[1] https://www.postgresql.org/search/?m=1&q=UniqueKeys&l=1&d=-1&s=d
[2] https://www.postgresql.org/message-id/flat/3793.1565689764%40linux-edt6#bf4b983d5744bca153c288904c038020



Re: Planner : anti-join on left joins

От
Tender Wang
Дата:


Tom Lane <tgl@sss.pgh.pa.us> 于2026年1月1日周四 07:37写道:
Nicolas Adenis-Lamarre <nicolas.adenis.lamarre@gmail.com> writes:
> - detect anti join on "a left join b where x is null" where x is a non null
> var b (b being a rte)
> this is the object of the attached patched.

This is a perfectly reasonable thing to do, especially now that we've
built out some infrastructure that would help.  It doesn't look like
your patch is using that though.  Take a look at commits 904f6a593
and e2debb643.

Yes, after commits 904f6a593 and e2debb643, we have some infrastructure to use.
I provided a patch to implement this reduction using these infrastructure codes.
Please check the attached patch.

BTW, it is not a good look for even a draft patch to not bother
updating adjacent comments that it falsifies, such as this in
reduce_outer_joins_pass2:

         * See if we can reduce JOIN_LEFT to JOIN_ANTI.  This is the case if
         * the join's own quals are strict for any var that was forced null by
         * higher qual levels.  NOTE: there are other ways that we could
         * detect an anti-join, in particular if we were to check whether Vars
         * coming from the RHS must be non-null because of table constraints.
         * That seems complicated and expensive though (in particular, one
         * would have to be wary of lower outer joins). For the moment this
         * seems sufficient.

In the long run, the comments are as important as the code, if not
even more so.  Keeping them accurate is not optional.

I updated the comments in the attached patch as well.
And I test the regression in the src, all tests pass.
Any thoughts? 

--
Thanks,
Tender Wang
Вложения

Re: Planner : anti-join on left joins

От
Nicolas Adenis-Lamarre
Дата:
Thanks a lot for your answers.

>>In the long run, the comments are as important as the code
Reading postgresql comments as a book is something i really enjoy. it increases the learning curve by two orders of magnitude.

@Tender Wang
thanks a lot. about, the patch, i've some points:

- 1
in my original patch, i added the following test to avoid computing things like find_nonnullable_vars
it is a minor improvement, i don't know if that should be kept.
 + if(forced_null_vars != NIL)

- 2
i added locally some regressions tests for such detection, but i needed to add cols to tenk1 cause tenk1 has no "not null" columns nor primary keys.
i'm too new to tell if that's the correct way to do. however, i think that such regression tests must be added as it was already the case for the existing anti join detection (and it were on the tenk1 table).

- 3
your patch seems to not work on 2 of my tests (the 2nd one is not working with my patch too)
* when rhl is a subquery

select p.firstname
from people p
left join (select 5 aa, id_husband, divorce_place from mariages left join people on mariages.id_wife = people.id) m on m.id_husband = p.id
where divorce_place is not null -- divorce_place was set not null for my tests

* when b.z is a constant from a subquery (but while the previous one didn't, in your case, i guess it is normal)
select p.firstname
from people p
left join (select 5 aa, id_husband, divorce_place from mariages left join people on mariages.id_wife = people.id) m on m.id_husband = p.id
where aa is not null

I will review the commits and suggestions you told me to continue to learn,
I will reanalyze your patch more carefully to see why my tests are not ok.
but i'm new, i need more time.



Le jeu. 1 janv. 2026 à 07:24, Tender Wang <tndrwang@gmail.com> a écrit :


Tom Lane <tgl@sss.pgh.pa.us> 于2026年1月1日周四 07:37写道:
Nicolas Adenis-Lamarre <nicolas.adenis.lamarre@gmail.com> writes:
> - detect anti join on "a left join b where x is null" where x is a non null
> var b (b being a rte)
> this is the object of the attached patched.

This is a perfectly reasonable thing to do, especially now that we've
built out some infrastructure that would help.  It doesn't look like
your patch is using that though.  Take a look at commits 904f6a593
and e2debb643.

Yes, after commits 904f6a593 and e2debb643, we have some infrastructure to use.
I provided a patch to implement this reduction using these infrastructure codes.
Please check the attached patch.

BTW, it is not a good look for even a draft patch to not bother
updating adjacent comments that it falsifies, such as this in
reduce_outer_joins_pass2:

         * See if we can reduce JOIN_LEFT to JOIN_ANTI.  This is the case if
         * the join's own quals are strict for any var that was forced null by
         * higher qual levels.  NOTE: there are other ways that we could
         * detect an anti-join, in particular if we were to check whether Vars
         * coming from the RHS must be non-null because of table constraints.
         * That seems complicated and expensive though (in particular, one
         * would have to be wary of lower outer joins). For the moment this
         * seems sufficient.

In the long run, the comments are as important as the code, if not
even more so.  Keeping them accurate is not optional.

I updated the comments in the attached patch as well.
And I test the regression in the src, all tests pass.
Any thoughts? 

--
Thanks,
Tender Wang

Re: Planner : anti-join on left joins

От
Pavel Stehule
Дата:
Hi

čt 1. 1. 2026 v 10:34 odesílatel Nicolas Adenis-Lamarre <nicolas.adenis.lamarre@gmail.com> napsal:
Thanks a lot for your answers.

>>In the long run, the comments are as important as the code
Reading postgresql comments as a book is something i really enjoy. it increases the learning curve by two orders of magnitude.

@Tender Wang
thanks a lot. about, the patch, i've some points:

- 1
in my original patch, i added the following test to avoid computing things like find_nonnullable_vars
it is a minor improvement, i don't know if that should be kept.
 + if(forced_null_vars != NIL)

- 2
i added locally some regressions tests for such detection, but i needed to add cols to tenk1 cause tenk1 has no "not null" columns nor primary keys.
i'm too new to tell if that's the correct way to do. however, i think that such regression tests must be added as it was already the case for the existing anti join detection (and it were on the tenk1 table).

- 3
your patch seems to not work on 2 of my tests (the 2nd one is not working with my patch too)
* when rhl is a subquery

select p.firstname
from people p
left join (select 5 aa, id_husband, divorce_place from mariages left join people on mariages.id_wife = people.id) m on m.id_husband = p.id
where divorce_place is not null -- divorce_place was set not null for my tests

* when b.z is a constant from a subquery (but while the previous one didn't, in your case, i guess it is normal)
select p.firstname
from people p
left join (select 5 aa, id_husband, divorce_place from mariages left join people on mariages.id_wife = people.id) m on m.id_husband = p.id
where aa is not null

I will review the commits and suggestions you told me to continue to learn,
I will reanalyze your patch more carefully to see why my tests are not ok.
but i'm new, i need more time.




Regards

Pavel

 

Le jeu. 1 janv. 2026 à 07:24, Tender Wang <tndrwang@gmail.com> a écrit :


Tom Lane <tgl@sss.pgh.pa.us> 于2026年1月1日周四 07:37写道:
Nicolas Adenis-Lamarre <nicolas.adenis.lamarre@gmail.com> writes:
> - detect anti join on "a left join b where x is null" where x is a non null
> var b (b being a rte)
> this is the object of the attached patched.

This is a perfectly reasonable thing to do, especially now that we've
built out some infrastructure that would help.  It doesn't look like
your patch is using that though.  Take a look at commits 904f6a593
and e2debb643.

Yes, after commits 904f6a593 and e2debb643, we have some infrastructure to use.
I provided a patch to implement this reduction using these infrastructure codes.
Please check the attached patch.

BTW, it is not a good look for even a draft patch to not bother
updating adjacent comments that it falsifies, such as this in
reduce_outer_joins_pass2:

         * See if we can reduce JOIN_LEFT to JOIN_ANTI.  This is the case if
         * the join's own quals are strict for any var that was forced null by
         * higher qual levels.  NOTE: there are other ways that we could
         * detect an anti-join, in particular if we were to check whether Vars
         * coming from the RHS must be non-null because of table constraints.
         * That seems complicated and expensive though (in particular, one
         * would have to be wary of lower outer joins). For the moment this
         * seems sufficient.

In the long run, the comments are as important as the code, if not
even more so.  Keeping them accurate is not optional.

I updated the comments in the attached patch as well.
And I test the regression in the src, all tests pass.
Any thoughts? 

--
Thanks,
Tender Wang