Обсуждение: Re: Remove no-op PlaceHolderVars

Поиск
Список
Период
Сортировка

Re: Remove no-op PlaceHolderVars

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> Attached is a WIP patch that marks PHVs that need to be kept because
> they are serving to isolate subexpressions, and removes all other PHVs
> in remove_nulling_relids_mutator if their phnullingrels bits become
> empty.  One problem with it is that a PHV's contained expression may
> not have been fully preprocessed.

Yeah.  I've been mulling over how we could do this, and the real
problem is that the expression containing the PHV *has* been fully
preprocessed by the time we get to outer join strength reduction
(cf. file header comment in prepjointree.c).  Simply dropping the PHV
can break various invariants that expression preprocessing is supposed
to establish, such as "no RelabelType directly above another" or "no
AND clause directly above another".  I haven't thought of a reliable
way to fix that short of re-running eval_const_expressions afterwards,
which seems like a mighty expensive answer.  We could try to make
remove_nulling_relids_mutator preserve all these invariants, but
keeping it in sync with what eval_const_expressions does seems like
a maintenance nightmare.

> The other problem with this is that it breaks 17 test cases.

I've not looked into that, but yeah, it would need some tedious
analysis to verify whether the changes are OK.

> Before delving into these two problems, I'd like to know whether this
> optimization is worthwhile, and whether I'm going in the right
> direction.

I believe this is an area worth working on.  I've been wondering
whether it'd be better to handle the expression-identity problems by
introducing an "expression wrapper" node type that is distinct from
PHV and has the sole purpose of demarcating a subexpression we don't
want to be folded into its parent.  I think that doesn't really move
the football in terms of fixing the problems mentioned above, but
perhaps it's conceptually cleaner than adding a bool field to PHV.

Another thought is that grouping sets provide one of the big reasons
why we need an "expression wrapper" or equivalent functionality.
So maybe we should try to move forward on your other patch to change
the representation of those before we spend too much effort here.

            regards, tom lane



Re: Remove no-op PlaceHolderVars

От
Richard Guo
Дата:
On Tue, Sep 3, 2024 at 11:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah.  I've been mulling over how we could do this, and the real
> problem is that the expression containing the PHV *has* been fully
> preprocessed by the time we get to outer join strength reduction
> (cf. file header comment in prepjointree.c).  Simply dropping the PHV
> can break various invariants that expression preprocessing is supposed
> to establish, such as "no RelabelType directly above another" or "no
> AND clause directly above another".

Yeah, this is what the problem is.

> I haven't thought of a reliable
> way to fix that short of re-running eval_const_expressions afterwards,
> which seems like a mighty expensive answer.

This seems likely to result in a lot of duplicate work.

> We could try to make
> remove_nulling_relids_mutator preserve all these invariants, but
> keeping it in sync with what eval_const_expressions does seems like
> a maintenance nightmare.

Yeah, and it seems that we also need to know the EXPRKIND code for the
expression containing the to-be-dropped PHV in remove_nulling_relids
to know how we should preserve all these invariants, which seems to
require a lot of code changes.

Looking at the routines run in preprocess_expression, most of them
recurse into PlaceHolderVars and preprocess the contained expressions.
The two exceptions are canonicalize_qual and make_ands_implicit.  I
wonder if we can modify them to also recurse into PlaceHolderVars.
Will this resolve our problem here?

> I believe this is an area worth working on.  I've been wondering
> whether it'd be better to handle the expression-identity problems by
> introducing an "expression wrapper" node type that is distinct from
> PHV and has the sole purpose of demarcating a subexpression we don't
> want to be folded into its parent.  I think that doesn't really move
> the football in terms of fixing the problems mentioned above, but
> perhaps it's conceptually cleaner than adding a bool field to PHV.
>
> Another thought is that grouping sets provide one of the big reasons
> why we need an "expression wrapper" or equivalent functionality.
> So maybe we should try to move forward on your other patch to change
> the representation of those before we spend too much effort here.

Hmm, in the case of grouping sets, we have a similar situation where
we do not want a subexpression that is part of grouping items to be
folded into its parent, because otherwise we may fail to match these
subexpressions to lower target items.

For grouping sets, this problem is much easier to resolve, because
we've already replaced such subexpressions with Vars referencing the
GROUP RTE.  As a result, during expression preprocessing, these
subexpressions will not be folded into their parents.

An ensuing effect of this approach is that a HAVING clause may contain
expressions that are not fully preprocessed if they are part of
grouping items.  This is not an issue as long as the clause remains in
HAVING, because these expressions will be matched to lower target
items in setrefs.c.  However, if the clause is moved or copied into
WHERE, we need to re-preprocess these expressions.  This should not be
too expensive, as we only need to re-preprocess the HAVING clauses
that are moved to WHERE, not the entire tree.  The v13 patch in that
thread implements this approach.

Thanks
Richard



Re: Remove no-op PlaceHolderVars

От
Richard Guo
Дата:
On Tue, Sep 3, 2024 at 5:50 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Tue, Sep 3, 2024 at 11:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Yeah.  I've been mulling over how we could do this, and the real
> > problem is that the expression containing the PHV *has* been fully
> > preprocessed by the time we get to outer join strength reduction
> > (cf. file header comment in prepjointree.c).  Simply dropping the PHV
> > can break various invariants that expression preprocessing is supposed
> > to establish, such as "no RelabelType directly above another" or "no
> > AND clause directly above another".

> Yeah, this is what the problem is.

While fixing some performance issues caused by PHVs recently, it
struck me again that we should be removing "no-op" PHVs whenever
possible, because PHVs can be optimization barriers in several cases.

I still do not have a good idea for ensuring that removing the PHV
wrapper does not break the expression tree invariants.  But maybe we
can use a conservative approach: we only strip the PHV if the
contained expression is known to be safe (for example, a simple Var).

I've also realized that we cannot remove no-op PHVs in every case.
For example, deconstruct_distribute_oj_quals() relies on
remove_nulling_relids() to temporarily strip out the nullingrels bits
that are later restored as we crawl up the join stack.  If the PHV
were removed, we would be unable to restore its nullingrels bits for
next level up.  To handle this, I added a new parameter to
remove_nulling_relids() so the caller can decide whether to allow
removal or not.

Attached is a draft patch for the code changes.  It currently causes
plan changes in 28 regression test queries, which is a surprisingly
high number.  We'll have to go through these tests one by one, but
before doing that, I would like to hear others' thoughts on this
patch.

- Richard

Вложения

Re: Remove no-op PlaceHolderVars

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> While fixing some performance issues caused by PHVs recently, it
> struck me again that we should be removing "no-op" PHVs whenever
> possible, because PHVs can be optimization barriers in several cases.

My immediate reaction is "how sure are you that they're no-ops"?
I recall that there are places where we intentionally insert PHVs
to preserve the separate identity of the contained expression
(so that, for example, it can be matched to a subquery output
later).

> I still do not have a good idea for ensuring that removing the PHV
> wrapper does not break the expression tree invariants.  But maybe we
> can use a conservative approach: we only strip the PHV if the
> contained expression is known to be safe (for example, a simple Var).

Do we generate a PHV at all in that case?  Seems like we could
deal with that by adding to the Var's varnullingrels instead of
making a wrapper node.

            regards, tom lane



Re: Remove no-op PlaceHolderVars

От
Richard Guo
Дата:
On Fri, Jan 16, 2026 at 12:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> My immediate reaction is "how sure are you that they're no-ops"?
> I recall that there are places where we intentionally insert PHVs
> to preserve the separate identity of the contained expression
> (so that, for example, it can be matched to a subquery output
> later).

The new phpreserved flag is used for that purpose, as explained in the
commit message and the code comments.

> Do we generate a PHV at all in that case?  Seems like we could
> deal with that by adding to the Var's varnullingrels instead of
> making a wrapper node.

The Var can be a reference to something outside the subquery being
pulled up.  If it is a reference to the non-nullable side, we'll have
to wrap it in a PHV to ensure that it is forced to null when the outer
join should do so.

- Richard