Обсуждение: Fwd: pg18 bug? SELECT query doesn't work

Поиск
Список
Период
Сортировка

Fwd: pg18 bug? SELECT query doesn't work

От
Richard Guo
Дата:
(resending, as the previous one has been held for moderation)

---------- Forwarded message ---------
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, Jan 7, 2026 at 11:18 AM
Subject: Re: pg18 bug? SELECT query doesn't work
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Eric Ridge <eebbrr@gmail.com>, pgsql-general
<pgsql-general@postgresql.org>, Pg Hackers
<pgsql-hackers@lists.postgresql.org>


On Wed, Jan 7, 2026 at 3:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Eric Ridge <eebbrr@gmail.com> writes:
> > # SELECT * FROM (SELECT upper(unnest(ARRAY['cat', 'dog'])) as animal FROM generate_series(1, 10) GROUP BY 1) x
WHEREanimal ilike 'c%'; 
> > ERROR:  set-valued function called in context that cannot accept a set
> > LINE 1: SELECT * FROM (SELECT upper(unnest(ARRAY['cat', 'dog'])) as ...

> I agree that this is a bug.  "git bisect" says it broke at
>
> 247dea89f7616fdf06b7272b74abafc29e8e5860 is the first bad commit
> commit 247dea89f7616fdf06b7272b74abafc29e8e5860 (HEAD)
> Author: Richard Guo <rguo@postgresql.org>
> Date:   Tue Sep 10 12:35:34 2024 +0900
>
>     Introduce an RTE for the grouping step
>
> I've not probed further than that, but my guess is that now we check
> for set-returning tlist items while the tlist still has grouping Vars,
> thus missing the fact that there's a SRF represented by one of those
> Vars.  This prompts us to flatten a subquery we shouldn't have
> flattened (because that ends by introducing a SRF into the outer
> WHERE).

Thanks for the report and the diagnosis.

The first part of your diagnosis is correct.  This issue is caused by
a failure to notice the SRF in the target list, as the item is hidden
under a grouped Var.  This doesn't lead to incorrect subquery
flattening though, since such a subquery must involve grouping, and
is_simple_subquery() would refuse to flatten it.  Instead, it
incorrectly indicates that the subquery's restriction clause is safe
to push down, which mistakenly introduces SRFs into the subquery's
WHERE quals.

I think the problem is that when we check whether a subquery's
restriction clauses are safe to push down, we are still working with a
'raw' parse tree that hasn't been preprocessed.  We might be able to
fix this specific issue by manually flattening the grouped Vars in the
subquery's tlist before performing the safety check:

 check_output_expressions(Query *subquery, pushdown_safety_info *safetyInfo)
 {
    ListCell   *lc;
+   List       *flattened_targetList = subquery->targetList;

-   foreach(lc, subquery->targetList)
+   if (subquery->hasGroupRTE)
+   {
+       flattened_targetList = (List *)
+           flatten_group_exprs(NULL, subquery, (Node *) subquery->targetList);
+   }
+
+   foreach(lc, flattened_targetList)
    {
        TargetEntry *tle = (TargetEntry *) lfirst(lc);

(I wonder whether this same issue exists for join alias Vars.)

- Richard



Re: Fwd: pg18 bug? SELECT query doesn't work

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> On Wed, Jan 7, 2026 at 3:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've not probed further than that, but my guess is that now we check
>> for set-returning tlist items while the tlist still has grouping Vars,
>> thus missing the fact that there's a SRF represented by one of those
>> Vars.  This prompts us to flatten a subquery we shouldn't have
>> flattened (because that ends by introducing a SRF into the outer
>> WHERE).

> The first part of your diagnosis is correct.  This issue is caused by
> a failure to notice the SRF in the target list, as the item is hidden
> under a grouped Var.  This doesn't lead to incorrect subquery
> flattening though, since such a subquery must involve grouping, and
> is_simple_subquery() would refuse to flatten it.  Instead, it
> incorrectly indicates that the subquery's restriction clause is safe
> to push down, which mistakenly introduces SRFs into the subquery's
> WHERE quals.

Got it.

> (I wonder whether this same issue exists for join alias Vars.)

Seems highly likely that we'd have noticed if it did.  I think
flatten_join_alias_vars happens early enough that no interesting
decisions have been made yet.  I wonder whether we could fix the
current problem by doing grouping-Var expansion earlier?

I'm also wondering (don't recall the details of your patch)
whether you are repeating eval_const_expressions after
grouping-Var expansion.  If not, there are going to be bugs
there, like failure to handle named args in function calls.
That could be another reason to make this happen earlier.

            regards, tom lane



Re: Fwd: pg18 bug? SELECT query doesn't work

От
Richard Guo
Дата:
On Wed, Jan 7, 2026 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
> > (I wonder whether this same issue exists for join alias Vars.)

> Seems highly likely that we'd have noticed if it did.  I think
> flatten_join_alias_vars happens early enough that no interesting
> decisions have been made yet.  I wonder whether we could fix the
> current problem by doing grouping-Var expansion earlier?

Hmm, I don't think so.  The decision on whether to push down a
subquery's restriction clauses is made even before we invoke
subquery_planner() on that subquery.  At that stage, join alias Vars
have not yet been flattened, meaning the underlying expressions are
still hidden.  What I was wondering is whether this could cause
subquery_is_pushdown_safe() to make the wrong decision.

For the same reason, it seems that doing grouping-Var expansion
earlier wouldn't help with this bug, unless we move that expansion
ahead of the subquery_planner() call entirely.

In addition, it seems to me that it would cause problems if we move
the expansion of grouped Vars to before we've done with expression
preprocessing on targetlist and havingQual.  For example, consider
this query:

    select not a from t group by rollup(not a) having not not a;

If we do grouping-Var expansion before the havingQual is preprocessed,
the HAVING clause "not not a" would be reduced to "a" and thus fail to
be matched to lower tlist.

> I'm also wondering (don't recall the details of your patch)
> whether you are repeating eval_const_expressions after
> grouping-Var expansion.  If not, there are going to be bugs
> there, like failure to handle named args in function calls.
> That could be another reason to make this happen earlier.

Currently we're not repeating eval_const_expressions after the
grouping-Var expansion, but I fail to wrap my head around why that
would be a problem.  I ran a simple test with named args in function
calls:

create table t (i int);

CREATE OR REPLACE FUNCTION add_three(
    a int DEFAULT 0,
    b int DEFAULT 0,
    c int DEFAULT 0
)
RETURNS int AS $$
    SELECT a + b + c;
$$ LANGUAGE sql;

explain (verbose, costs off)
select add_three(i, c => 10) from t group by 1 having add_three(i, c
=> 10) > 100;
                QUERY PLAN
------------------------------------------
 HashAggregate
   Output: (((i + 0) + 10))
   Group Key: ((t.i + 0) + 10)
   ->  Seq Scan on public.t
         Output: ((i + 0) + 10)
         Filter: (((t.i + 0) + 10) > 100)
(6 rows)

... and the named args are expanded as expected.

- Richard



Re: Fwd: pg18 bug? SELECT query doesn't work

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> On Wed, Jan 7, 2026 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm also wondering (don't recall the details of your patch)
>> whether you are repeating eval_const_expressions after
>> grouping-Var expansion.  If not, there are going to be bugs
>> there, like failure to handle named args in function calls.
>> That could be another reason to make this happen earlier.

> Currently we're not repeating eval_const_expressions after the
> grouping-Var expansion, but I fail to wrap my head around why that
> would be a problem.

What I was mainly concerned about was whether the replacement
expressions ever got passed through eval_const_expressions().
I see now that they do, at planner.c:1069.

It's still not really desirable that this is done separately;
for example, I think it breaks the assumption that we will have
AND/OR flatness everywhere.  But I think that only leads to
possible inefficiencies not wrong answers.  And I do take your
point about needing to preserve the separate identities of
these subexpressions.  So let's let that go for now.

The main problem, as you say, is that allpaths.c is coming to
conclusions about the contents of output expressions of the
subquery without having done any of this.  The only really
simple answer I can see is to make a copy of the subquery's
tlist and apply these transformations to it before we run
the subquery_is_pushdown_safe logic.  That's ... ugly.

Perhaps another idea could be to shove the responsibility for this
down into subquery_planner (or make it call a callback at the right
point), and handle transferring of parent restriction clauses into
HAVING only after we've finished preprocessing the subquery's tlist.
That's an uncomfortably big change to be making in a released branch,
but it might still be a better way than duplicating preprocessing.

            regards, tom lane



Re: Fwd: pg18 bug? SELECT query doesn't work

От
Richard Guo
Дата:
On Thu, Jan 8, 2026 at 2:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The main problem, as you say, is that allpaths.c is coming to
> conclusions about the contents of output expressions of the
> subquery without having done any of this.  The only really
> simple answer I can see is to make a copy of the subquery's
> tlist and apply these transformations to it before we run
> the subquery_is_pushdown_safe logic.  That's ... ugly.

It seems that allpaths.c checks the subquery's output only for two
specific cases: to determine if it contains SRFs or volatile
functions.  Because of this, it seems that we don't need to apply the
full set of transformations to it.  We only need to account for join
alias Vars and grouping Vars, as these can hide underlying
expressions.

The underlying expression of a join alias Var can only be a Var
(potentially coerced) from one of the join's input rels, or a COALESCE
expression containing the two input Vars.  Therefore, it should not be
able to contain SRFs or volatile functions, and thus we do not need to
expand it beforehand.  (This also answers my previous question about
whether the current bug exists for join alias Vars.)

Therefore, it seems to me that we only need to expand the grouping
Vars beforehand when checking the subquery's output, as in the changes
I proposed earlier.  It's still ugly, but less so I think.

> Perhaps another idea could be to shove the responsibility for this
> down into subquery_planner (or make it call a callback at the right
> point), and handle transferring of parent restriction clauses into
> HAVING only after we've finished preprocessing the subquery's tlist.
> That's an uncomfortably big change to be making in a released branch,
> but it might still be a better way than duplicating preprocessing.

Agreed.  I think this is the theoretically correct way to handle the
push-down of a subquery's restriction clauses.  However, it seems like
a non-trivial project, and it seems to require changing the signature
of subquery_planner(), as we'd need to pass the subquery's RTE and
RelOptInfo into it.  So this looks too risky for stable branches.  But
maybe we can do that in dev branch.

- Richard



Re: Fwd: pg18 bug? SELECT query doesn't work

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> The underlying expression of a join alias Var can only be a Var
> (potentially coerced) from one of the join's input rels, or a COALESCE
> expression containing the two input Vars.  Therefore, it should not be
> able to contain SRFs or volatile functions, and thus we do not need to
> expand it beforehand.

[ itch... ]  That statement is false in general, because subquery
pullup within the subquery can replace a sub-subquery's output Vars
with expressions.  It might be okay for this purpose, as I think we'd
not pull up if the sub-subquery's output expressions are volatile or
SRFs.  These assumptions had better be well commented though.

The larger point here is that this behavior is all recursive,
and we can happily end with an expression that's been pulled up
several levels; we'd better make sure the right checks happen.
So I'm a little bit distressed that planner.c's invocations of
flatten_group_exprs are not at all analogous to its usage of
flatten_join_alias_vars.  The latter pattern has a couple of
decades of usage to lend credence to the assumption that it's
correct.  flatten_group_exprs, um, not so much.  It may be
fine, given the fact that grouping Vars can appear within
much less of the query than join aliases.  But in view of the
present bug, I'm feeling nervous.

            regards, tom lane



Re: Fwd: pg18 bug? SELECT query doesn't work

От
Richard Guo
Дата:
On Thu, Jan 8, 2026 at 1:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> [ itch... ]  That statement is false in general, because subquery
> pullup within the subquery can replace a sub-subquery's output Vars
> with expressions.  It might be okay for this purpose, as I think we'd
> not pull up if the sub-subquery's output expressions are volatile or
> SRFs.  These assumptions had better be well commented though.

Ah, I see.  Once the sub-subqueries are flattened, the join alias
entries in the subquery can become arbitrary expressions rather than
simple Vars.  I suspect we have only avoided issues with join aliases
in subquery_is_pushdown_safe() for all these years by sheer luck:
since we don't pull up subqueries that output set-returning or
volatile functions, those 'arbitrary expressions' are unlikely to
include them.

How about we add a comment to check_output_expressions() along the
below lines?

/*
 * We need to expand grouping Vars to their underlying expressions (the
 * grouping clauses) because the grouping expressions themselves might be
 * volatile or set-returning.  However, we do not need to recurse deeper
 * into the arguments of those expressions.  If an argument references a
 * lower-level subquery output, we can rely on the fact that subqueries
 * containing volatile or set-returning functions in their targetlists are
 * never pulled up.
 *
 * We do not need to expand join alias Vars.  The underlying expression of
 * a join alias Var does not itself introduce volatility or set-returning
 * behavior.  As with grouping Vars, we rely on the pull-up restrictions to
 * guarantee that any referenced inputs from lower levels are free of such
 * functions.
 */

> The larger point here is that this behavior is all recursive,
> and we can happily end with an expression that's been pulled up
> several levels; we'd better make sure the right checks happen.
> So I'm a little bit distressed that planner.c's invocations of
> flatten_group_exprs are not at all analogous to its usage of
> flatten_join_alias_vars.  The latter pattern has a couple of
> decades of usage to lend credence to the assumption that it's
> correct.  flatten_group_exprs, um, not so much.  It may be
> fine, given the fact that grouping Vars can appear within
> much less of the query than join aliases.  But in view of the
> present bug, I'm feeling nervous.

I checked the invocations of flatten_join_alias_vars and
flatten_group_exprs in the planner to understand why they are not
being used analogously.

1. planner.c:1041

Here, we call flatten_join_alias_vars on the subquery because the
subquery may contain join aliases from the outer query level; since
these won't be expanded during the subquery's own planning, we must
expand them now.  A query illustrating this scenario is:

select * from tenk1 t1 full join tenk1 t2 using (unique1)
  join lateral (select unique1 offset 0) on true;

(BTW, the test cases added in da3df9987 for this logic are no longer
valid.  These queries still function correctly even if this code is
removed.  I think this is something we should fix.)

However, I don't think an analogous call to flatten_group_exprs is
necessary here.  Subqueries should not contain grouping-Vars from the
outer query, since FROM clause is processed before the GROUP BY step.
While it is true that a subquery could reference the output of another
subquery that happens to be a grouping-Var, that would be handled when
expanding grouping-Vars for that specific subquery.

2. prepjointree.c:1473

Here, we call flatten_join_alias_vars on the subquery's targetlist
during subquery flattening because once the the subquery's subqueries
are flattened, join alias entries may become arbitrary expressions
rather than simple Vars.

Again, I don't see a need for an analogous flatten_group_exprs call
here.  Any subquery containing grouping-Vars must involve grouping,
and we don't flatten such subqueries to begin with.

3. planner.c:1309

Here, flatten_join_alias_vars is called within preprocess_expression
for various expressions.  The analogous call to flatten_group_exprs
occurs at planner.c:1118.  I believe we only need to call
flatten_group_exprs on the targetList and havingQual, as these are the
only places where grouping-Vars can appear.

Based on the above, I suspect whether we should expect the invocations
of flatten_group_exprs to be analogous to those of flatten_join_alias_vars.

- Richard



Re: Fwd: pg18 bug? SELECT query doesn't work

От
Richard Guo
Дата:
On Thu, Jan 8, 2026 at 10:30 PM Richard Guo <guofenglinux@gmail.com> wrote:
> How about we add a comment to check_output_expressions() along the
> below lines?

I've worked on the comment a bit more in the attached patch, which
also includes my previously proposed code changes and some test cases.

I think this patch is suitable for fixing the current bug in the back
branches.  We can use a separate patch for the more ambitious goal of
moving the push-down of subquery's restriction clauses into
subquery_planner().

Any thoughts?

- Richard

Вложения