Обсуждение: Convert ALL SubLinks to ANY SubLinks

Поиск
Список
Период
Сортировка

Convert ALL SubLinks to ANY SubLinks

От
Richard Guo
Дата:
I was reading a paper on null-aware anti-joins (section 6 in [1]) and
noticed that they use <> ALL to represent NOT IN semantics.  We
currently execute ALL SubLinks using a nested-loop SubPlan.  This
made me realize that it could be beneficial to convert ALL SubLinks to
ANY SubLinks.

According to De Morgan's laws:

  foo op ALL (sub-SELECT) => NOT (foo negator_op ANY (sub-SELECT))
  NOT (foo op ALL (sub-SELECT)) => foo negator_op ANY (sub-SELECT)

This conversion makes it possible for the executor to evaluate the
sublink using a hashed SubPlan, which replaces a nested loop with a
more efficient hash probe.  Even better, because our planner already
knows how to pull up ANY SubLinks (and pending the other patch, NOT
ANY SubLinks), these transformed clauses have the chance to be
optimized directly into semijoins or anti-joins.

In the worst-case scenario where the transformed ANY sublink cannot be
pulled up and cannot be hashed, execution falls back to a nested-loop
SubPlan.  Performance in this scenario is effectively identical to the
legacy ALL SubPlan, so there should be no regressions.  Because the
operator is negated, the ANY subplan will short-circuit on the exact
same inner tuple that the ALL subplan would have short-circuited on.
The only added overhead is a single boolean NOT inversion per outer
tuple, which is, IMO, computationally negligible compared to the cost
of the nested-loop execution itself.

Attached is a draft patch illustrating the idea.  Any thoughts or
interest in this transformation?

[1] http://www.vldb.org/pvldb/vol2/vldb09-423.pdf

- Richard

Вложения

Re: Convert ALL SubLinks to ANY SubLinks

От
Andrei Lepikhov
Дата:
On 26/2/26 08:11, Richard Guo wrote:
> Attached is a draft patch illustrating the idea.  Any thoughts or
> interest in this transformation?
Thanks for sharing your thoughts.
I want to correct your statement on the 'no regression' phrase. In 
practice, users often report issues after each new sublink 
transformation goes live.

This happens because the transformation changes the 'join problem' 
order. Before, the subplan's join list was handled independently, but 
now its relations are mixed with those from higher levels. If the join 
collapse limit is exceeded, this can sometimes cause much worse 
performance than in earlier Postgres versions.

A recent example of this issue, where a query increased from 9 joins in 
PG14 to 18 joins in PG19, can be found in pgsql-performance [1].

[1] 
https://www.postgresql.org/message-id/flat/ECA94B5A-E5B4-45F7-B1B4-9BF754083164%40gmx.net

-- 
regards, Andrei Lepikhov,
pgEdge



Re: Convert ALL SubLinks to ANY SubLinks

От
Richard Guo
Дата:
On Thu, Feb 26, 2026 at 8:37 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> I want to correct your statement on the 'no regression' phrase. In
> practice, users often report issues after each new sublink
> transformation goes live.
>
> This happens because the transformation changes the 'join problem'
> order. Before, the subplan's join list was handled independently, but
> now its relations are mixed with those from higher levels. If the join
> collapse limit is exceeded, this can sometimes cause much worse
> performance than in earlier Postgres versions.

I'm not convinced this is a regression.  In scenarios where the join
tree becomes too large, wouldn't the standard solution be for the user
to tune join_collapse_limit (and maybe also geqo_threshold)?

If pulling up a subquery results in a suboptimal join order due to the
number of relations, the root cause lies in the limitations of the
join search algorithm itself.  It doesn't seem reasonable to penalize
the general subquery flattening mechanism for that limitation, and it
makes even less sense to fault this specific transformation, which
simply converts ALL_SUBLINK to ANY_SUBLINK so it can benefit from the
same hashed SubPlan and flattening mechanics that other subqueries
already enjoy.

- Richard



Re: Convert ALL SubLinks to ANY SubLinks

От
wenhui qiu
Дата:
HI 
> If pulling up a subquery results in a suboptimal join order due to the
> number of relations, the root cause lies in the limitations of the
> join search algorithm itself.  It doesn't seem reasonable to penalize
> the general subquery flattening mechanism for that limitation, and it
> makes even less sense to fault this specific transformation, which
> simply converts ALL_SUBLINK to ANY_SUBLINK so it can benefit from the
> same hashed SubPlan and flattening mechanics that other subqueries
>already enjoy.
Agree +1  


Thanks 


On Fri, Feb 27, 2026 at 9:42 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Feb 26, 2026 at 8:37 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> I want to correct your statement on the 'no regression' phrase. In
> practice, users often report issues after each new sublink
> transformation goes live.
>
> This happens because the transformation changes the 'join problem'
> order. Before, the subplan's join list was handled independently, but
> now its relations are mixed with those from higher levels. If the join
> collapse limit is exceeded, this can sometimes cause much worse
> performance than in earlier Postgres versions.

I'm not convinced this is a regression.  In scenarios where the join
tree becomes too large, wouldn't the standard solution be for the user
to tune join_collapse_limit (and maybe also geqo_threshold)?

If pulling up a subquery results in a suboptimal join order due to the
number of relations, the root cause lies in the limitations of the
join search algorithm itself.  It doesn't seem reasonable to penalize
the general subquery flattening mechanism for that limitation, and it
makes even less sense to fault this specific transformation, which
simply converts ALL_SUBLINK to ANY_SUBLINK so it can benefit from the
same hashed SubPlan and flattening mechanics that other subqueries
already enjoy.

- Richard


Re: Convert ALL SubLinks to ANY SubLinks

От
Andrei Lepikhov
Дата:
On 27/2/26 02:42, Richard Guo wrote:
> On Thu, Feb 26, 2026 at 8:37 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> I want to correct your statement on the 'no regression' phrase. In
>> practice, users often report issues after each new sublink
>> transformation goes live.
>>
>> This happens because the transformation changes the 'join problem'
>> order. Before, the subplan's join list was handled independently, but
>> now its relations are mixed with those from higher levels. If the join
>> collapse limit is exceeded, this can sometimes cause much worse
>> performance than in earlier Postgres versions.
> 
> I'm not convinced this is a regression.  In scenarios where the join
> tree becomes too large, wouldn't the standard solution be for the user
> to tune join_collapse_limit (and maybe also geqo_threshold)?

Depends on the exact definition of regression. In my mind, if a new 
Postgres version produces a worse plan by costs and execution times, and 
the list of possible solutions includes solely buying a more performant 
server (which is exactly what the join_collapse_limit increment means in 
a highly tuned production system) and rewriting the query, this is 
definitely a regression.

Also, do not forget the implicitly added LATERAL restriction. It is 
quite a simple puzzle to find a corner case that limits the search scope 
to less productive query plans.

I want to say I'm not against pull-up techniques as they are. I just 
point out that, according to (my personal) real-world statistics, 
correlated subquery pull-ups cause the most problematic user complaints, 
because we have almost no tools to help without changing the query 
generation driver.
It happens because of a Postgres planner limitation: any parse-tree 
transformation is 'blind' and irreversible. This issue happens more 
frequently when people try to use Postgres for kinda analytics tasks 
like 'monthly accountant report'. So, it may be beneficial to tangle 
extending pull-up techniques with work on how to ease corner cases.

-- 
regards, Andrei Lepikhov,
pgEdge