Обсуждение: BUG #19106: Potential regression with CTE materialization planning in Postgres 18
BUG #19106: Potential regression with CTE materialization planning in Postgres 18
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 19106 Logged by: Kamil Monicz Email address: kamil@monicz.dev PostgreSQL version: 18.0 Operating system: NixOS unstable ffcdcf99d65c61956d882df249a9be53e59 Description: After upgrading from Postgres 17 to 18, one of my queries started raising an error: "unexpected outer reference in CTE query" The problematic query is: https://github.com/openstreetmap-ng/openstreetmap-ng/blob/eb805d8766fb4b359b96eb6b50acc8c2a835a165/app/services/element_spatial_service.py#L82-L215 Specifically, the `WITH member_geoms ...` part inside the `LEFT JOIN LATERAL`. I was able to resolve the issue by forcing the LATERAL CTEs as NOT MATERIALIZED: --- app/services/element_spatial_service.py +++ app/services/element_spatial_service.py @@ -155,3 +155,3 @@ rels_computed AS ( LEFT JOIN LATERAL ( - WITH member_geoms AS ( + WITH member_geoms AS NOT MATERIALIZED ( SELECT ST_Collect(geom_val) AS geom @@ -179,3 +179,3 @@ rels_computed AS ( ), - noded_geoms AS ( + noded_geoms AS NOT MATERIALIZED ( SELECT ST_UnaryUnion(ST_Collect( @@ -186,3 +186,3 @@ rels_computed AS ( ), - polygon_geoms AS ( + polygon_geoms AS NOT MATERIALIZED ( SELECT ST_UnaryUnion(ST_Collect( This seems like a regression because in cases where a CTE has an outer reference, it simply shouldn't be materialized (I don't really know the Postgres internals). I never expected these CTEs to be materialized. I simply use them for improved readability.
PG Bug reporting form <noreply@postgresql.org> writes: > After upgrading from Postgres 17 to 18, one of my queries started raising an > error: > "unexpected outer reference in CTE query" I agree that sounds like a bug ... > The problematic query is: > https://github.com/openstreetmap-ng/openstreetmap-ng/blob/eb805d8766fb4b359b96eb6b50acc8c2a835a165/app/services/element_spatial_service.py#L82-L215 ... but I am not going to spend time trying to reproduce it given this amount of detail. There's too much missing context, like what data you were running the query on. I could spend all day, not see the failure, and be left no wiser than before as to whether it's already fixed or I just didn't duplicate your context closely enough. Please see if you can reduce the problem case to a self-contained SQL script. regards, tom lane
Re: BUG #19106: Potential regression with CTE materialization planning in Postgres 18
От
"Kamil Monicz"
Дата:
On Sun, Nov 9, 2025, at 16:53, Tom Lane wrote: > PG Bug reporting form <noreply@postgresql.org> writes: > > After upgrading from Postgres 17 to 18, one of my queries started raising an > > error: > > "unexpected outer reference in CTE query" > > I agree that sounds like a bug ... > > > The problematic query is: > > https://github.com/openstreetmap-ng/openstreetmap-ng/blob/eb805d8766fb4b359b96eb6b50acc8c2a835a165/app/services/element_spatial_service.py#L82-L215 > > ... but I am not going to spend time trying to reproduce it given > this amount of detail. There's too much missing context, like what > data you were running the query on. I could spend all day, not > see the failure, and be left no wiser than before as to whether > it's already fixed or I just didn't duplicate your context closely > enough. Please see if you can reduce the problem case to a > self-contained SQL script. > > regards, tom lane > Hello Tom, It's my first time here (and realistically on a proper mailing list), so please excuse me. Here's the small, self-containedreproduction: ``` EXPLAIN SELECT * FROM ( SELECT ARRAY[1, 2] AS arr ) r CROSS JOIN LATERAL ( WITH a AS ( SELECT CASE WHEN id = 1 THEN ST_GeomFromText('LINESTRING(0 0,1 0,1 1)') ELSE ST_GeomFromText('POINT(0 0)') END AS geom FROM unnest(r.arr) AS id ), b AS ( SELECT ST_Polygonize( (SELECT ST_UnaryUnion(ST_Collect(geom)) FROM a) ) AS st_polygonize ) SELECT (SELECT st_polygonize FROM b), (SELECT st_polygonize FROM b) ) s; ``` ``` ERROR: unexpected outer reference in CTE query SQL state: XX000 ``` It depends on PostGIS being installed and loaded. In my case, it's version 3.6.0. I tried to make it work without it, butI couldn't figure it out. -Kamil Monicz
"Kamil Monicz" <kamil@monicz.dev> writes:
> It's my first time here (and realistically on a proper mailing list), so please excuse me. Here's the small,
self-containedreproduction:
Thanks. After a bit of fooling around I was able to convert this
to something without any PostGIS dependency:
EXPLAIN
SELECT *
FROM (
SELECT ARRAY[1, 2] AS arr
) r
CROSS JOIN LATERAL (
WITH a AS (
SELECT id FROM unnest(r.arr) AS id
),
b AS (
SELECT max((SELECT sum(id) FROM a)) AS agg
)
SELECT
(SELECT agg FROM b)
) s;
This worked up until commit b0cc0a71e, and since then it hits an
assertion failure in check_agglevels_and_constraints(); or if you
don't have asserts enabled then the planner gets confused, because
the max() aggregate function is given the wrong agglevelsup.
I need to think through what is the correct behavior for cross-CTE
references like these. Sadly, this is too late for this week's
releases ...
regards, tom lane
[ cc'ing Peter and Vik for possible input on SQL-standard question ]
I wrote:
> This worked up until commit b0cc0a71e, and since then it hits an
> assertion failure in check_agglevels_and_constraints(); or if you
> don't have asserts enabled then the planner gets confused, because
> the max() aggregate function is given the wrong agglevelsup.
> I need to think through what is the correct behavior for cross-CTE
> references like these. Sadly, this is too late for this week's
> releases ...
Here is a draft fix for this. What it basically decides is that
commit b0cc0a71e was in error to suppose that an outer CTE reference
should work like an outer Var reference. In this even-more-simplified
test case:
WITH a AS (
SELECT id FROM (VALUES (1), (2)) AS v(id)
),
b AS (
SELECT max((SELECT sum(id) FROM a)) AS agg
)
SELECT agg FROM b;
it's pretty obvious that the reference to "a" should not cause us to
act as though the max() aggregate belongs to the outer "WITH ...
SELECT agg FROM b" query level. However, if it doesn't belong there,
where does it belong?
The draft patch takes the approach of forcing agglevelsup to zero
anytime we find a CTE reference within an aggregate's arguments
(unless it is to a CTE defined inside those arguments). That
means such an aggregate always has its syntactic level and cannot
be treated as an outer aggregate even if it otherwise references
(only) outer Vars.
I don't have a huge amount of confidence in this being the final
answer; it seems like it might still be too simplistic. But I'm
not sure what to do instead. It's at least less likely to break
cases that worked before b0cc0a71e, since "it's level zero" is
pretty much the default answer anyway.
I looked at the SQL standard for possible guidance and found none:
they disallow subqueries altogether within aggregate arguments,
so they need not consider such cases. I am curious though whether
Peter or Vik know if the committee ever considered relaxing that
restriction, and if so whether they stopped to think about this
particular point.
regards, tom lane
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 3254c83cc6c..5de07a2aa9a 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -795,9 +795,16 @@ check_agg_arguments_walker(Node *node,
if (IsA(node, RangeTblEntry))
{
/*
- * CTE references act similarly to Vars of the CTE's level. Without
- * this we might conclude that the Agg can be evaluated above the CTE,
- * leading to trouble.
+ * If we find a nonlocal CTE reference, force the Agg to be given
+ * level zero, i.e. make its semantic level match its syntactic level.
+ * Without this we might conclude that the Agg can be evaluated above
+ * the CTE (if it contains only Vars from above the CTE), leading to
+ * trouble. On the other hand, we shouldn't allow a CTE reference by
+ * itself to promote an Agg to be an outer Agg; that fails for
+ * cross-references to sibling CTEs. The best solution seems to be to
+ * force the Agg to level zero. (The SQL spec provides no guidance
+ * here; they disallow subqueries within aggregates, and therefore
+ * don't reach the question of what to do with CTE references.)
*/
RangeTblEntry *rte = (RangeTblEntry *) node;
@@ -810,9 +817,8 @@ check_agg_arguments_walker(Node *node,
/* ignore local CTEs of subqueries */
if (ctelevelsup >= 0)
{
- if (context->min_varlevel < 0 ||
- context->min_varlevel > ctelevelsup)
- context->min_varlevel = ctelevelsup;
+ /* nonlocal CTE, so force Agg to level zero */
+ context->min_varlevel = 0;
}
}
return false; /* allow range_table_walker to continue */
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 86fdb85c6c5..bb45cb83c62 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -2331,6 +2331,34 @@ from int4_tbl i4;
-2147483647 | 1
(5 rows)
+--
+-- test for bug #19106: interaction of WITH with aggregates
+--
+-- the fix for #19055 was too aggressive and broke this case
+explain (verbose, costs off)
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
+ QUERY PLAN
+--------------------------------------------
+ Aggregate
+ Output: max((InitPlan expr_1).col1)
+ InitPlan expr_1
+ -> Aggregate
+ Output: sum("*VALUES*".column1)
+ -> Values Scan on "*VALUES*"
+ Output: "*VALUES*".column1
+ -> Result
+(8 rows)
+
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
+ agg
+-----
+ 3
+(1 row)
+
--
-- test for nested-recursive-WITH bug
--
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index d88d5abb91a..e199662edd3 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -1111,6 +1111,19 @@ select f1, (with cte1(x,y) as (select 1,2)
select count((select i4.f1 from cte1))) as ss
from int4_tbl i4;
+--
+-- test for bug #19106: interaction of WITH with aggregates
+--
+-- the fix for #19055 was too aggressive and broke this case
+explain (verbose, costs off)
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
+
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
+
--
-- test for nested-recursive-WITH bug
--
On 10/11/2025 22:05, Tom Lane wrote: > [ cc'ing Peter and Vik for possible input on SQL-standard question ] Thanks! > WITH a AS ( > SELECT id FROM (VALUES (1), (2)) AS v(id) > ), > b AS ( > SELECT max((SELECT sum(id) FROM a)) AS agg > ) > SELECT agg FROM b; snip > I looked at the SQL standard for possible guidance and found none: > they disallow subqueries altogether within aggregate arguments, > so they need not consider such cases. I am curious though whether > Peter or Vik know if the committee ever considered relaxing that > restriction, and if so whether they stopped to think about this > particular point. I am not seeing that restriction in the standard. For this test case, we have MAX which has the lineage: <aggregate function> <general set function> <set function type> <computational operation> MAX Its argument, (SELECT SUM(id) FROM a), has this lineage: <value expression> <common value expression> <numeric value expression> <term> <factor> <numeric primary> <value expression primary> <non-parenthesized value expression primary> <scalar subquery> <subquery> Since there are no outer column references, the subquery should be independent. And if we inline it: WITH b (agg) AS ( SELECT MAX(( SELECT SUM(id) FROM (VALUES (1), (2)) AS v (id) )) -- FROM nothing ) TABLE b then the query works as expected. MATERIALIZEDing either or both CTEs has no effect, which I find strange. -- Vik Fearing
Vik Fearing <vik@postgresfriends.org> writes:
> On 10/11/2025 22:05, Tom Lane wrote:
>> I looked at the SQL standard for possible guidance and found none:
>> they disallow subqueries altogether within aggregate arguments,
>> so they need not consider such cases.
> I am not seeing that restriction in the standard.
Maybe I'm misunderstanding what I read, but in SQL:2021
6.9 <set function specification> SR1 says
If <aggregate function> specifies a <general set function>, then
the <value expression> simply contained in the <general set
function> shall not contain a <set function specification>
or a <query expression>.
The predecessor text in SQL99 says
4) The <value expression> simply contained in <set function
specification> shall not contain a <set function specification>
or a <subquery>.
I don't think replacing <subquery> with <query expression> moved the
goalposts at all, but maybe I'm missing something.
> ... MATERIALIZEDing either or both CTEs
> has no effect, which I find strange.
The fundamental problem is that the parser is mis-assigning
agglevelsup; given that, the planner is very likely to get
confused no matter what other details there are.
regards, tom lane
On 11/11/2025 16:24, Tom Lane wrote: > Vik Fearing <vik@postgresfriends.org> writes: >> On 10/11/2025 22:05, Tom Lane wrote: >>> I looked at the SQL standard for possible guidance and found none: >>> they disallow subqueries altogether within aggregate arguments, >>> so they need not consider such cases. >> I am not seeing that restriction in the standard. > Maybe I'm misunderstanding what I read, but in SQL:2021 > 6.9 <set function specification> SR1 says > > If <aggregate function> specifies a <general set function>, then > the <value expression> simply contained in the <general set > function> shall not contain a <set function specification> > or a <query expression>. > > The predecessor text in SQL99 says > > 4) The <value expression> simply contained in <set function > specification> shall not contain a <set function specification> > or a <subquery>. > > I don't think replacing <subquery> with <query expression> moved the > goalposts at all, but maybe I'm missing something. I don't think you are. I was missing that you can't get to <aggregate function> without going through <set function specification> (or a window) so I did not see that rule. I had a rummage through the archives but couldn't easily find the paper introducing aggregates so I can't see what the justification for that rule was. This language was not in 1989 but is in 1992. It may just be a case of "this is what we've implemented so this is what we are specifying." >> ... MATERIALIZEDing either or both CTEs >> has no effect, which I find strange. > The fundamental problem is that the parser is mis-assigning > agglevelsup; given that, the planner is very likely to get > confused no matter what other details there are. Thank you for the explanation. -- Vik Fearing
I wrote:
> Here is a draft fix for this. What it basically decides is that
> commit b0cc0a71e was in error to suppose that an outer CTE reference
> should work like an outer Var reference. In this even-more-simplified
> test case:
> WITH a AS (
> SELECT id FROM (VALUES (1), (2)) AS v(id)
> ),
> b AS (
> SELECT max((SELECT sum(id) FROM a)) AS agg
> )
> SELECT agg FROM b;
I wanted to post a little more analysis of what's happening here,
mostly for the archives' sake. Since b0cc0a71e,
check_agg_arguments_walker mistakenly decides that the max()
aggregate ought to belong to the query level where the "a" CTE
is, that is the outer "WITH ... SELECT agg FROM b". In an
assert-enabled build, check_agglevels_and_constraints promptly
crashes at
switch (pstate->p_expr_kind)
{
case EXPR_KIND_NONE:
Assert(false); /* can't happen */
break;
because it's looking at the ParseState for that outer query
level, where we are not examining any particular subexpression.
In a non-assert build, we more-or-less-accidentally get through
check_agglevels_and_constraints unscathed, but then the parser
fails with
ERROR: column "b.agg" must appear in the GROUP BY clause or be used in an aggregate function
LINE 7: SELECT agg FROM b;
^
because we've marked the outer query level with p_hasAggs = true.
(This seems like sufficient proof that we're assigning the aggregate
to the wrong level, if you were doubting that conclusion.)
The submitted problem query is shaped a little differently, though.
It's more nearly
WITH a AS (
SELECT id FROM (VALUES (1), (2)) AS v(id)
),
b AS (
SELECT max((SELECT sum(id) FROM a)) AS agg
)
SELECT (SELECT agg FROM b);
and that extra level of sub-select avoids the aforesaid error message,
since the sub-select where the "agg" reference is doesn't get marked
p_hasAggs. (I wonder whether the OP introduced that extra sub-select
in trying to work around this bug.)
Rather remarkably, this formulation actually gets the expected answer
"3", although if you look at the generated plan it's fairly wacko:
postgres=# explain (verbose, costs off) WITH a AS (
SELECT id FROM (VALUES (1), (2)) AS v(id)
),
b AS (
SELECT max((SELECT sum(id) FROM a)) AS agg
)
SELECT (SELECT agg FROM b);
QUERY PLAN
-----------------------------------------------
Aggregate
Output: (SubPlan expr_1)
InitPlan expr_2
-> Aggregate
Output: sum("*VALUES*".column1)
-> Values Scan on "*VALUES*"
Output: "*VALUES*".column1
-> Result
SubPlan expr_1
-> Result
Output: max((InitPlan expr_2).col1)
(11 rows)
I don't quite understand how that works, since the max() call is
not within the upper Aggregate's expression trees: why isn't it
triggering "Aggref found in non-Agg plan node"? It's probably not
worth figuring out though; this is all garbage-in-garbage-out behavior
so far as the planner is concerned. Variants of this, such as
inserting MATERIALIZED for "b", lead to planner failures like
ERROR: unexpected outer reference in CTE query
which is similar to the original report.
Anyway, the point is that we are assigning the wrong semantic level
to the max() aggregate, and all the rest of this behavior is just
follow-on effects of that.
regards, tom lane
Vik Fearing <vik@postgresfriends.org> writes:
> I had a rummage through the archives but couldn't easily find the paper
> introducing aggregates so I can't see what the justification for that
> rule was. This language was not in 1989 but is in 1992. It may just be a
> case of "this is what we've implemented so this is what we are specifying."
Thanks for doing that research. It's not at all surprising if back
in the early 90's nobody had tried to make it work for sub-selects
(or at least had not succeeded), so they just wrote the spec to not
require it.
After sleeping on it I feel that my proposal of "force the aggregate
to have agglevelsup = 0" is a reasonably sane solution. I'd
originally sought some minimal adjustment to the
make-CTE-refs-work-like-Vars approach of b0cc0a71e, like moving the
aggregate down one level if we detect that the CTE reference is to a
sibling CTE. However, that wouldn't be sufficient to deal with
nested WITHs, for example
WITH a AS (
SELECT id FROM (VALUES (1), (2)) AS v(id)
),
b AS (
WITH b1 AS ( SELECT max((SELECT sum(id) FROM a)) AS agg )
SELECT * FROM b1
)
SELECT agg FROM b;
"Move down one level" would assign the max() to the
"WITH b1 ... SELECT * FROM b1" level, so we still have the bug.
"Move down through all levels of WITH" might do the trick,
or it might not. In any case it's feeling arbitrary and rather
far away from what we do for Vars. So I feel like the analogy
to Vars was fatally flawed to start with.
However, the real reason why I'm feeling good about the simplistic
solution is: if we invent some more-complex rule, whose life are
we making better? I think the submitted problem query is an example
of a common SQL programming idiom, which is to use a series of WITH
CTEs to chop up a complex computation into small independent black
boxes. It's the exact opposite of independence if the positioning of
a CTE reference in a later CTE affects the semantics of aggregates in
that CTE. Now, I fear we can't solve the problem that b0cc0a71e
set out to solve without having some impact of that sort, but we
should keep it as minimal and surprise-free as possible. Forcing
affected aggregates to have their semantic level equal to their
syntactic level seems about as surprise-free as we can get.
Especially since that would have been the result we produced before
b0cc0a71e in all non-contrived cases. The bug report that induced
us to do b0cc0a71e was a pretty contrived case if you ask me:
why would you combine an outer Var reference with a select from
a CTE? And if you did, why would you expect the surrounding
aggregate to be considered non-local?
regards, tom lane
On Wed, Nov 12, 2025 at 5:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thanks for doing that research. It's not at all surprising if back
> in the early 90's nobody had tried to make it work for sub-selects
> (or at least had not succeeded), so they just wrote the spec to not
> require it.
I played with this patch, but I couldn't quite wrap my head around the
expected behavior of using subqueries as arguments to aggregate
functions. The outputs of the following queries are confusing to me.
create table t (a int);
insert into t values (1), (2);
Query 1:
select (select sum((select a from t t1 limit 1))) from t t2;
sum
-----
1
1
(2 rows)
As I understand it, a query of the form:
SELECT <scalar_expression> FROM table;
... produces one output row for each row in the table, with the value
of <scalar_expression> evaluated for that row. Thus, the output of
Query 1 makes sense to me.
Query 2:
select (select sum((select a from t t1 where a = t2.a or true limit
1))) from t t2;
sum
-----
2
(1 row)
I don't quite understand the output of Query 2. The subquery is now
correlated with the outer table t2, but I believe it's still in the
same form as Query 1, so I would expect it to also produce one output
row per table row. Moreover, IIUC, the "or true" clause should make
the two queries semantically equivalent.
Query 3:
with t as (select a from (values (1), (2)) as v(a))
select (select sum((select a from t t1 where a = t2.a or true limit
1))) from t t2;
sum
-----
1
1
(2 rows)
Query 3 replaces the physical table with a CTE that produces the same
logical table content, so I would expect the query's output to remain
unchanged. So the differing outputs of Query 2 and Query 3 are also
confusing to me.
- Richard
Richard Guo <guofenglinux@gmail.com> writes:
> I played with this patch, but I couldn't quite wrap my head around the
> expected behavior of using subqueries as arguments to aggregate
> functions. The outputs of the following queries are confusing to me.
> ...
> Query 2:
> select (select sum((select a from t t1 where a = t2.a or true limit
> 1))) from t t2;
> sum
> -----
> 2
> (1 row)
> I don't quite understand the output of Query 2. The subquery is now
> correlated with the outer table t2, but I believe it's still in the
> same form as Query 1, so I would expect it to also produce one output
> row per table row.
I believe the critical point about Q2 is that the presence of the
reference to t2.a causes the sum() aggregate to be assigned to the
outer query level. Now, that outer query is an aggregation query
so it will produce only one row, aggregated over both rows of t2
(for each of which, the bottom sub-select produces the value "1").
I've never totally understood the rationale for the SQL standard
to assign aggregates to outer query levels, but it's definitely
their fault not ours.
> Query 3:
> with t as (select a from (values (1), (2)) as v(a))
> select (select sum((select a from t t1 where a = t2.a or true limit
> 1))) from t t2;
> sum
> -----
> 1
> 1
> (2 rows)
Actually, as of HEAD we produce:
regression=# with t as (select a from (values (1), (2)) as v(a))
select (select sum((select a from t t1 where a = t2.a or true limit
1))) from t t2;
sum
-----
2
(1 row)
and it's the same if you remove the t2.a reference:
regression=# with t as (select a from (values (1), (2)) as v(a))
select (select sum((select a from t t1 where true limit
1))) from t t2;
sum
-----
2
(1 row)
In both cases we assign the aggregate to the outer-level SELECT.
I assume you were testing with my patch, which forces the sum()
to be level zero, that is belonging to the intermediate sub-select.
> Query 3 replaces the physical table with a CTE that produces the same
> logical table content, so I would expect the query's output to remain
> unchanged. So the differing outputs of Query 2 and Query 3 are also
> confusing to me.
The sticky point here is that a CTE reference isn't quite as absolute
as a physical-table reference: the CTE name only has meaning within
a portion of the query. So the problem that b0cc0a71e tried to solve
is "what do we do if the SQL-standard rules about semantic level of
an aggregate would result in putting the aggregate outside of the
scope of a CTE it references?"
I suppose another answer would be to throw up our hands and give
an error if that happens, rather than trying to fix the level
assigned to the aggregate.
regards, tom lane
On Thu, Nov 13, 2025 at 4:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > The sticky point here is that a CTE reference isn't quite as absolute > as a physical-table reference: the CTE name only has meaning within > a portion of the query. So the problem that b0cc0a71e tried to solve > is "what do we do if the SQL-standard rules about semantic level of > an aggregate would result in putting the aggregate outside of the > scope of a CTE it references?" So, IIUC, the confusion arises in cases where an aggregation is to be assigned to the outer side of its syntactic level. With the current patch, if the aggregation does not reference any CTEs, it would be evaluated at the outer query level. If the aggregation references any CTEs, it'd be evaluated at its syntactic query level. However, I still find this behavior somewhat confusing. For example, one might expect that an inlined CTE should be semantically equivalent to a subquery, yet the following two queries can produce different results. create table t (a int); insert into t values (1), (2); with ss as not materialized (select * from t) select (select sum((select a from ss where a = t.a limit 1))) from t; sum ----- 1 2 (2 rows) select (select sum((select a from (select * from t) ss where a = t.a limit 1))) from t; sum ----- 3 (1 row) I don't have much experience reading the SQL spec, but from the discussions, it seems that the spec does not provide guidance on this case. So the current behavior may be acceptable. I think it might be helpful to explicitly document this behavior somewhere. - Richard
Richard Guo <guofenglinux@gmail.com> writes:
> However, I still find this behavior somewhat confusing. For example,
> one might expect that an inlined CTE should be semantically equivalent
> to a subquery, yet the following two queries can produce different
> results.
> create table t (a int);
> insert into t values (1), (2);
> with ss as not materialized (select * from t)
> select (select sum((select a from ss where a = t.a limit 1))) from t;
> sum
> -----
> 1
> 2
> (2 rows)
> select (select sum((select a from (select * from t) ss where a = t.a
> limit 1))) from t;
> sum
> -----
> 3
> (1 row)
That's with my v1 patch? I agree it's pretty bogus.
The more I think about this, the more I like the other idea of just
throwing an error rather than trying to fix up cases like bug #19055.
I don't think we have much evidence that anyone is trying to do that
in the real world (otherwise reports would have surfaced years ago).
And this discussion is making it clear that fixing it up is harder
than it sounds.
Also: if we throw an error, and someone shows up with a real-world
example that triggers the error, we'd then have some more evidence
about what would be a plausible interpretation.
So now I'm thinking about something more like the attached. It
causes the #19055 query to throw an error, and restores our current
test query to working (I added said query to the test so we can't
break it again).
Producing a decently localized error turned out to be harder than
I expected, because RangeTableEntrys don't have location fields so
it's impossible to put an error cursor on the troublesome reference.
I ended up putting the cursor on the aggregate function itself
and giving the name of the CTE in errdetail.
regards, tom lane
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 3254c83cc6c..b8340557b34 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -38,6 +38,8 @@ typedef struct
ParseState *pstate;
int min_varlevel;
int min_agglevel;
+ int min_ctelevel;
+ RangeTblEntry *min_cte;
int sublevels_up;
} check_agg_arguments_context;
@@ -58,7 +60,8 @@ typedef struct
static int check_agg_arguments(ParseState *pstate,
List *directargs,
List *args,
- Expr *filter);
+ Expr *filter,
+ int agglocation);
static bool check_agg_arguments_walker(Node *node,
check_agg_arguments_context *context);
static Node *substitute_grouped_columns(Node *node, ParseState *pstate, Query *qry,
@@ -339,7 +342,8 @@ check_agglevels_and_constraints(ParseState *pstate, Node *expr)
min_varlevel = check_agg_arguments(pstate,
directargs,
args,
- filter);
+ filter,
+ location);
*p_levelsup = min_varlevel;
@@ -641,7 +645,8 @@ static int
check_agg_arguments(ParseState *pstate,
List *directargs,
List *args,
- Expr *filter)
+ Expr *filter,
+ int agglocation)
{
int agglevel;
check_agg_arguments_context context;
@@ -649,6 +654,8 @@ check_agg_arguments(ParseState *pstate,
context.pstate = pstate;
context.min_varlevel = -1; /* signifies nothing found yet */
context.min_agglevel = -1;
+ context.min_ctelevel = -1;
+ context.min_cte = NULL;
context.sublevels_up = 0;
(void) check_agg_arguments_walker((Node *) args, &context);
@@ -686,6 +693,20 @@ check_agg_arguments(ParseState *pstate,
parser_errposition(pstate, aggloc)));
}
+ /*
+ * If there's a non-local CTE that's below the aggregate's semantic level,
+ * complain. It's not quite clear what we should do to fix up such a case
+ * (treating the CTE reference like a Var seems wrong), and it's also
+ * unclear whether there is a real-world use for such cases.
+ */
+ if (context.min_ctelevel >= 0 && context.min_ctelevel < agglevel)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("outer-level aggregate cannot use a nested CTE"),
+ errdetail("CTE \"%s\" is below the aggregate's semantic level.",
+ context.min_cte->eref->aliasname),
+ parser_errposition(pstate, agglocation)));
+
/*
* Now check for vars/aggs in the direct arguments, and throw error if
* needed. Note that we allow a Var of the agg's semantic level, but not
@@ -699,6 +720,7 @@ check_agg_arguments(ParseState *pstate,
{
context.min_varlevel = -1;
context.min_agglevel = -1;
+ context.min_ctelevel = -1;
(void) check_agg_arguments_walker((Node *) directargs, &context);
if (context.min_varlevel >= 0 && context.min_varlevel < agglevel)
ereport(ERROR,
@@ -714,6 +736,13 @@ check_agg_arguments(ParseState *pstate,
parser_errposition(pstate,
locate_agg_of_level((Node *) directargs,
context.min_agglevel))));
+ if (context.min_ctelevel >= 0 && context.min_ctelevel < agglevel)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("outer-level aggregate cannot use a nested CTE"),
+ errdetail("CTE \"%s\" is below the aggregate's semantic level.",
+ context.min_cte->eref->aliasname),
+ parser_errposition(pstate, agglocation)));
}
return agglevel;
}
@@ -794,11 +823,6 @@ check_agg_arguments_walker(Node *node,
if (IsA(node, RangeTblEntry))
{
- /*
- * CTE references act similarly to Vars of the CTE's level. Without
- * this we might conclude that the Agg can be evaluated above the CTE,
- * leading to trouble.
- */
RangeTblEntry *rte = (RangeTblEntry *) node;
if (rte->rtekind == RTE_CTE)
@@ -810,9 +834,12 @@ check_agg_arguments_walker(Node *node,
/* ignore local CTEs of subqueries */
if (ctelevelsup >= 0)
{
- if (context->min_varlevel < 0 ||
- context->min_varlevel > ctelevelsup)
- context->min_varlevel = ctelevelsup;
+ if (context->min_ctelevel < 0 ||
+ context->min_ctelevel > ctelevelsup)
+ {
+ context->min_ctelevel = ctelevelsup;
+ context->min_cte = rte;
+ }
}
}
return false; /* allow range_table_walker to continue */
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 86fdb85c6c5..f4caedf272f 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -2300,36 +2300,44 @@ from int4_tbl;
--
-- test for bug #19055: interaction of WITH with aggregates
--
--- The reference to cte1 must determine the aggregate's level,
--- even though it contains no Vars referencing cte1
-explain (verbose, costs off)
+-- For now, we just throw an error if there's a use of a CTE below the
+-- semantic level that the SQL standard assigns to the aggregate.
+-- It's not entirely clear what we could do instead that doesn't risk
+-- breaking more things than it fixes.
select f1, (with cte1(x,y) as (select 1,2)
select count((select i4.f1 from cte1))) as ss
from int4_tbl i4;
- QUERY PLAN
--------------------------------------------------
- Seq Scan on public.int4_tbl i4
- Output: i4.f1, (SubPlan expr_1)
- SubPlan expr_1
+ERROR: outer-level aggregate cannot use a nested CTE
+LINE 2: select count((select i4.f1 from cte1))) as ss
+ ^
+DETAIL: CTE "cte1" is below the aggregate's semantic level.
+--
+-- test for bug #19106: interaction of WITH with aggregates
+--
+-- the initial fix for #19055 was too aggressive and broke this case
+explain (verbose, costs off)
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
+ QUERY PLAN
+--------------------------------------------
+ Aggregate
+ Output: max((InitPlan expr_1).col1)
+ InitPlan expr_1
-> Aggregate
- Output: count((InitPlan expr_2).col1)
- InitPlan expr_2
- -> Result
- Output: i4.f1
- -> Result
-(9 rows)
+ Output: sum("*VALUES*".column1)
+ -> Values Scan on "*VALUES*"
+ Output: "*VALUES*".column1
+ -> Result
+(8 rows)
-select f1, (with cte1(x,y) as (select 1,2)
- select count((select i4.f1 from cte1))) as ss
-from int4_tbl i4;
- f1 | ss
--------------+----
- 0 | 1
- 123456 | 1
- -123456 | 1
- 2147483647 | 1
- -2147483647 | 1
-(5 rows)
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
+ agg
+-----
+ 3
+(1 row)
--
-- test for nested-recursive-WITH bug
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index d88d5abb91a..cd25a5e7154 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -1100,16 +1100,26 @@ from int4_tbl;
--
-- test for bug #19055: interaction of WITH with aggregates
--
--- The reference to cte1 must determine the aggregate's level,
--- even though it contains no Vars referencing cte1
-explain (verbose, costs off)
+-- For now, we just throw an error if there's a use of a CTE below the
+-- semantic level that the SQL standard assigns to the aggregate.
+-- It's not entirely clear what we could do instead that doesn't risk
+-- breaking more things than it fixes.
select f1, (with cte1(x,y) as (select 1,2)
select count((select i4.f1 from cte1))) as ss
from int4_tbl i4;
-select f1, (with cte1(x,y) as (select 1,2)
- select count((select i4.f1 from cte1))) as ss
-from int4_tbl i4;
+--
+-- test for bug #19106: interaction of WITH with aggregates
+--
+-- the initial fix for #19055 was too aggressive and broke this case
+explain (verbose, costs off)
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
+
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
--
-- test for nested-recursive-WITH bug
I wrote:
> The more I think about this, the more I like the other idea of just
> throwing an error rather than trying to fix up cases like bug #19055.
> I don't think we have much evidence that anyone is trying to do that
> in the real world (otherwise reports would have surfaced years ago).
> And this discussion is making it clear that fixing it up is harder
> than it sounds.
Hearing no further comments, I've pushed the v2 patch that does it
that way, restoring the previous behavior in cases that would not
have failed before #19055.
regards, tom lane