Обсуждение: Clean up remove_rel_from_query() after self-join elimination commit
While working on reusing remove_rel_from_query() for inner-join
removal, I noticed that the function is in pretty rough shape. The
self-join elimination (SJE) commit grafted self-join removal onto
remove_rel_from_query(), which was originally written for left-join
removal only. This resulted in several issues:
1. Comments throughout remove_rel_from_query() still assumed only
left-join removal, making the code misleading. For example:
* This is relevant in case the outer join we're deleting is nested inside
* other outer joins:
2. This was called even for left-join removal, with subst=-1. This is
pointless and confusing.
ChangeVarNodesExtended((Node *) phv->phexpr, relid, subst, 0,
replace_relid_callback);
3. phinfo->ph_lateral was adjusted for left-join removal, which is
also confusing ...
phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
/*
* ph_lateral might contain rels mentioned in ph_eval_at after the
* replacement, remove them.
*/
phinfo->ph_lateral = bms_difference(phinfo->ph_lateral,
phinfo->ph_eval_at);
... since you can find this Assert just above:
Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
4. The comment about attr_needed reconstruction was in
remove_rel_from_query(), but the actual rebuild is performed by
the callers.
5. The EquivalenceClass processing in remove_rel_from_query():
/*
* Likewise remove references from EquivalenceClasses.
*/
foreach(l, root->eq_classes)
{
EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
if (bms_is_member(relid, ec->ec_relids) ||
(sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
remove_rel_from_eclass(ec, sjinfo, relid, subst);
}
The condition always evaluates to true for self-join elimination
(i.e., sjinfo == NULL), meaning every EquivalenceClass gets adjusted.
But this is redundant because the caller remove_self_join_rel()
already handles ECs via update_eclasses().
6. In remove_self_join_rel(), I notice this code:
/* At last, replace varno in root targetlist and HAVING clause */
ChangeVarNodesExtended((Node *) root->processed_tlist, toRemove->relid,
toKeep->relid, 0, replace_relid_callback);
ChangeVarNodesExtended((Node *) root->processed_groupClause,
toRemove->relid, toKeep->relid, 0,
replace_relid_callback);
The comment mentions "HAVING clause", but neither processed_tlist nor
processed_groupClause has anything to do with the HAVING clause.
Furthermore, processed_groupClause contains SortGroupClause nodes,
which have no Var nodes to rewrite, so calling ChangeVarNodesExtended
on it is pointless.
The attached patch is an attempt to fix all these issues. It also
aims to leave the code better structured for adding new types of join
removal (such as inner-join removal) in the future.
Thoughts?
- Richard
Вложения
Richard Guo <guofenglinux@gmail.com> 于2026年4月6日周一 16:11写道:
>
> While working on reusing remove_rel_from_query() for inner-join
> removal, I noticed that the function is in pretty rough shape. The
> self-join elimination (SJE) commit grafted self-join removal onto
> remove_rel_from_query(), which was originally written for left-join
> removal only.
Yes, I noticed this a few days ago when I tried to remove a redundant
filter added during SJE.
>This resulted in several issues:
> 1. Comments throughout remove_rel_from_query() still assumed only
> left-join removal, making the code misleading. For example:
>
> * This is relevant in case the outer join we're deleting is nested inside
> * other outer joins:
>
The current logic of remove_rel_from_query() is not easy to read. For example,
It distinguishes between left-join removal and SJE based on whether
sjinfo is NULL.
> 2. This was called even for left-join removal, with subst=-1. This is
> pointless and confusing.
>
> ChangeVarNodesExtended((Node *) phv->phexpr, relid, subst, 0,
> replace_relid_callback);
>
Yes, it is.
> 3. phinfo->ph_lateral was adjusted for left-join removal, which is
> also confusing ...
>
> phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
>
> /*
> * ph_lateral might contain rels mentioned in ph_eval_at after the
> * replacement, remove them.
> */
> phinfo->ph_lateral = bms_difference(phinfo->ph_lateral,
> phinfo->ph_eval_at);
>
> ... since you can find this Assert just above:
>
> Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
>
> 4. The comment about attr_needed reconstruction was in
> remove_rel_from_query(), but the actual rebuild is performed by
> the callers.
>
> 5. The EquivalenceClass processing in remove_rel_from_query():
>
> /*
> * Likewise remove references from EquivalenceClasses.
> */
> foreach(l, root->eq_classes)
> {
> EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
>
> if (bms_is_member(relid, ec->ec_relids) ||
> (sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
> remove_rel_from_eclass(ec, sjinfo, relid, subst);
> }
>
> The condition always evaluates to true for self-join elimination
> (i.e., sjinfo == NULL), meaning every EquivalenceClass gets adjusted.
> But this is redundant because the caller remove_self_join_rel()
> already handles ECs via update_eclasses().
>
> 6. In remove_self_join_rel(), I notice this code:
>
> /* At last, replace varno in root targetlist and HAVING clause */
> ChangeVarNodesExtended((Node *) root->processed_tlist, toRemove->relid,
> toKeep->relid, 0, replace_relid_callback);
> ChangeVarNodesExtended((Node *) root->processed_groupClause,
> toRemove->relid, toKeep->relid, 0,
> replace_relid_callback);
>
> The comment mentions "HAVING clause", but neither processed_tlist nor
> processed_groupClause has anything to do with the HAVING clause.
> Furthermore, processed_groupClause contains SortGroupClause nodes,
> which have no Var nodes to rewrite, so calling ChangeVarNodesExtended
> on it is pointless.
I didn't find as many as you listed here. I agree that we should do
something for
current logic.
>
> The attached patch is an attempt to fix all these issues. It also
> aims to leave the code better structured for adding new types of join
> removal (such as inner-join removal) in the future.
I looked through the attached patch. LGTM.
--
Thanks,
Tender Wang
On 06/04/2026 10:11, Richard Guo wrote: > Thoughts? Thanks for your efforts! The main goal of the SJE feature was to find common ground within the community - to come up with a solution that everyone could get behind on whether to allow optimisations that address redundant queries and reduce query tree complexity in early planning stages. So, some code roughness was ok. I looked through your code, though maybe not as deeply as this part of the planner deserves. Overall, it looks good, and I didn’t spot any obvious problems, but I do have a few comments. We added some ‘redundant’ checks with future optimisations in mind, so others can rely on these functions to remove tails from query structures or to error if something was left behind. You explicitly write: ‘Each specific caller remains responsible for updating any remaining data structures required by its unique removal logic’ that differs from the initial idea. At the same time, by the end of SJE development, I wasn’t so sure we could invent a universal approach to guarantee the cleanup of the query tree - mostly because of the inherent volatility of the PlannerInfo structure and because we had not agreed to make the parse tree a read-only structure. So, I’m fine with the changes in this patch. -- regards, Andrei Lepikhov, pgEdge
HI Richard
> While working on reusing remove_rel_from_query() for inner-join
> removal, I noticed that the function is in pretty rough shape. The
> self-join elimination (SJE) commit grafted self-join removal onto
> remove_rel_from_query(), which was originally written for left-join
> removal only.
> removal, I noticed that the function is in pretty rough shape. The
> self-join elimination (SJE) commit grafted self-join removal onto
> remove_rel_from_query(), which was originally written for left-join
> removal only.
Thanks for working on this.
I see there are already asserts here for the mode split and for ojrelid validity. What I had in mind was slightly narrower: making the joinrelids contract explicit as well.As far as I can tell, the existing asserts would still allow an outer-join call with joinrelids == NULL, even though joinrelids is later used in the outer-join-specific PHV handling.
I wonder if something like
```c
Assert(!is_outer_join || joinrelids != NULL);
Assert(!is_self_join || joinrelids == NULL);
I see there are already asserts here for the mode split and for ojrelid validity. What I had in mind was slightly narrower: making the joinrelids contract explicit as well.As far as I can tell, the existing asserts would still allow an outer-join call with joinrelids == NULL, even though joinrelids is later used in the outer-join-specific PHV handling.
I wonder if something like
```c
Assert(!is_outer_join || joinrelids != NULL);
Assert(!is_self_join || joinrelids == NULL);
Thanks
On Tue, Apr 7, 2026 at 4:22 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 06/04/2026 10:11, Richard Guo wrote:
> Thoughts?
Thanks for your efforts!
The main goal of the SJE feature was to find common ground within the
community - to come up with a solution that everyone could get behind on
whether to allow optimisations that address redundant queries and reduce
query tree complexity in early planning stages. So, some code roughness
was ok.
I looked through your code, though maybe not as deeply as this part of
the planner deserves. Overall, it looks good, and I didn’t spot any
obvious problems, but I do have a few comments. We added some
‘redundant’ checks with future optimisations in mind, so others can rely
on these functions to remove tails from query structures or to error if
something was left behind.
You explicitly write:
‘Each specific caller remains responsible for updating any remaining
data structures required by its unique removal logic’
that differs from the initial idea. At the same time, by the end of SJE
development, I wasn’t so sure we could invent a universal approach to
guarantee the cleanup of the query tree - mostly because of the inherent
volatility of the PlannerInfo structure and because we had not agreed to
make the parse tree a read-only structure.
So, I’m fine with the changes in this patch.
--
regards, Andrei Lepikhov,
pgEdge
On Tue, Apr 7, 2026 at 6:57 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote: > Assert(!is_outer_join || joinrelids != NULL); Worth asserting. If a caller sets sjinfo but passes NULL for joinrelids, this would silently over-delete PHVs. > Assert(!is_self_join || joinrelids == NULL); I prefer to not add this one. It's not defending any invariant. - Richard
Hi Richard
LGTM with the added assertion. Thanks again for all the heavy lifting you're doing on the Postgres optimizer
> I prefer to not add this one. It's not defending any invariant.
> Assert(!is_outer_join || joinrelids != NULL);
> Worth asserting. If a caller sets sjinfo but passes NULL for
> joinrelids, this would silently over-delete PHVs.
> Worth asserting. If a caller sets sjinfo but passes NULL for
> joinrelids, this would silently over-delete PHVs.
> Assert(!is_self_join || joinrelids == NULL);
LGTM with the added assertion. Thanks again for all the heavy lifting you're doing on the Postgres optimizer
> I prefer to not add this one. It's not defending any invariant.
Thank you for your explanation ,
Thanks
On Sat, Apr 18, 2026 at 6:17 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, Apr 7, 2026 at 6:57 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
> Assert(!is_outer_join || joinrelids != NULL);
Worth asserting. If a caller sets sjinfo but passes NULL for
joinrelids, this would silently over-delete PHVs.
> Assert(!is_self_join || joinrelids == NULL);
I prefer to not add this one. It's not defending any invariant.
- Richard
On Mon, Apr 20, 2026 at 9:49 AM wenhui qiu <qiuwenhuifx@gmail.com> wrote: > > Assert(!is_outer_join || joinrelids != NULL); > > > Worth asserting. If a caller sets sjinfo but passes NULL for > > joinrelids, this would silently over-delete PHVs. > > Assert(!is_self_join || joinrelids == NULL); > > LGTM with the added assertion. Thanks. Committed. - Richard