Обсуждение: pg_get_viewdef() produces non-round-trippable SQL for views with USING join on mismatched integer types

Поиск
Список
Период
Сортировка
PostgreSQL version: 18.1 (also verified on 18.2)
OS: Linux

Description:
============
When a view uses a USING join on columns with different integer types (integer vs
bigint) and the SELECT clause contains an explicit narrowing cast, pg_get_viewdef()
produces SQL that PostgreSQL itself rejects. This makes pg_dump produce dumps that
fail on restore for any such view.

The SELECT clause gets col::integer, while the GROUP BY gets (col::bigint) — they
refer to the same underlying column but with different casts, so the GROUP BY check
fails to recognise the SELECT expression as covered.

Minimal reproducer:
===================

  CREATE TABLE t1 (year integer, val numeric);
  CREATE TABLE t2 (year bigint, label text);
  INSERT INTO t1 VALUES (2025, 100), (2025, 200), (2026, 300);
  INSERT INTO t2 VALUES (2025, 'A'), (2026, 'B');

  -- View creation succeeds and queries work fine:
  CREATE VIEW v AS
    SELECT year::integer AS year, t2.label, sum(val) AS total
    FROM t1
    LEFT JOIN t2 USING (year)
    GROUP BY year, t2.label;

  SELECT * FROM v;  -- returns correct results

  -- pg_get_viewdef output:
  SELECT pg_get_viewdef('v'::regclass, true);

Output of pg_get_viewdef:
=========================

   SELECT t1.year::integer AS year,
      t2.label,
      sum(t1.val) AS total
     FROM t1
       LEFT JOIN t2 USING (year)
    GROUP BY (t1.year::bigint), t2.label;

Note: SELECT has t1.year::integer, GROUP BY has (t1.year::bigint).

Attempting to re-execute the pg_get_viewdef output fails:
=========================================================

  CREATE VIEW v2 AS
   SELECT t1.year::integer AS year,
      t2.label,
      sum(t1.val) AS total
     FROM t1
       LEFT JOIN t2 USING (year)
    GROUP BY (t1.year::bigint), t2.label;

  ERROR:  column "t1.year" must appear in the GROUP BY clause or be used
          in an aggregate function
  LINE 2:  SELECT t1.year::integer AS year,
                  ^

Impact:
=======
pg_dump uses pg_get_viewdef() to serialise view definitions. Any database
containing such a view (USING join between integer and bigint columns, with
an explicit cast in SELECT) will produce a dump that fails during restore with
the above error.

Without the explicit cast in SELECT (i.e. just SELECT year, ...) pg_get_viewdef
emits t1.year::bigint in both SELECT and GROUP BY, and the round-trip works.
The bug is triggered specifically by the narrowing cast (bigint -> integer) in
the SELECT list combined with a USING join.


Kind regards,

Swirl

Swirl Smog Dowry <swirl-smog-dowry@duck.com> writes:
> When a view uses a USING join on columns with different integer
> types (integer vs bigint) and the SELECT clause contains an explicit
> narrowing cast, pg_get_viewdef() produces SQL that PostgreSQL itself
> rejects. This makes pg_dump produce dumps that fail on restore for
> any such view.

Hmm, yeah.  This used to work as-expected, too.  "git bisect" finds
that it broke at

247dea89f7616fdf06b7272b74abafc29e8e5860 is the first bad commit
commit 247dea89f7616fdf06b7272b74abafc29e8e5860
Author: Richard Guo <rguo@postgresql.org>
Date:   Tue Sep 10 12:35:34 2024 +0900

    Introduce an RTE for the grouping step

Looking at the parse tree for the problem query, I see

          {RANGETBLENTRY 
          :alias <> 
          :eref 
             {ALIAS 
             :aliasname *GROUP* 
             :colnames ("?column?" "label")
             }
          :rtekind 9 
          :groupexprs (
             {FUNCEXPR 
             :funcid 481 
             :funcresulttype 20 
             :funcretset false 
             :funcvariadic false 
             :funcformat 2 
             :funccollid 0 
             :inputcollid 0 
             :args (
                {VAR 
                :varno 1 
                :varattno 1 
                :vartype 23 
                :vartypmod -1 
                :varcollid 0 
                :varnullingrels (b)
                :varlevelsup 0 
                :varreturningtype 0 
                :varnosyn 1 
                :varattnosyn 1 
                :location -1
                }
             )
             :location -1
             }
             {VAR 
             :varno 2 
             :varattno 2 
             :vartype 25 
             :vartypmod -1 
             :varcollid 100 
             :varnullingrels (b 3)
             :varlevelsup 0 
             :varreturningtype 0 
             :varnosyn 2 
             :varattnosyn 2 
             :location 32
             }
          )
          :lateral false 
          :inFromCl false 
          :securityQuals <>
          }

The first groupexpr is the same as the joinaliasvars entry for that
column in the JOIN RTE.  This surprises me: I'd expect to see a
reference to the join output column there, ie Var 3/1, because I'm
pretty sure that's what parsing of "GROUP BY year" would have produced
initially.  If it were like that, I think ruleutils would produce the
desired output.  So I'd tentatively classify this as "join alias Vars
are being flattened too soon".  Richard, any thoughts?

            regards, tom lane



I wrote:
> The first groupexpr is the same as the joinaliasvars entry for that
> column in the JOIN RTE.  This surprises me: I'd expect to see a
> reference to the join output column there, ie Var 3/1, because I'm
> pretty sure that's what parsing of "GROUP BY year" would have produced
> initially.  If it were like that, I think ruleutils would produce the
> desired output.  So I'd tentatively classify this as "join alias Vars
> are being flattened too soon".  Richard, any thoughts?

The problem is obvious after looking at parseCheckAggregates: the
RTE_GROUP RTE is manufactured using the groupClauses list after we
have flattened that, which we are only doing for comparison purposes;
it shouldn't affect what goes into the parse tree.  I experimented
with just changing the order of operations, and that seems to fix it.
The lack of any effect on check-world shows we need more test cases
here ...

            regards, tom lane

diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 25ee0f87d93..d0187ea84a0 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -1213,8 +1213,8 @@ parseCheckAggregates(ParseState *pstate, Query *qry)
     }
 
     /*
-     * Build a list of the acceptable GROUP BY expressions for use by
-     * substitute_grouped_columns().
+     * Build a list of the acceptable GROUP BY expressions to save in the
+     * RTE_GROUP RTE, and for use by substitute_grouped_columns().
      *
      * We get the TLE, not just the expr, because GROUPING wants to know the
      * sortgroupref.
@@ -1231,6 +1231,23 @@ parseCheckAggregates(ParseState *pstate, Query *qry)
         groupClauses = lappend(groupClauses, expr);
     }
 
+    /*
+     * If there are any acceptable GROUP BY expressions, build an RTE and
+     * nsitem for the result of the grouping step.  (It's important to do this
+     * before flattening join alias vars in groupClauses, because the RTE
+     * should preserve any alias vars that were in the input.)
+     */
+    if (groupClauses)
+    {
+        pstate->p_grouping_nsitem =
+            addRangeTableEntryForGroup(pstate, groupClauses);
+
+        /* Set qry->rtable again in case it was previously NIL */
+        qry->rtable = pstate->p_rtable;
+        /* Mark the Query as having RTE_GROUP RTE */
+        qry->hasGroupRTE = true;
+    }
+
     /*
      * If there are join alias vars involved, we have to flatten them to the
      * underlying vars, so that aliased and unaliased vars will be correctly
@@ -1266,21 +1283,6 @@ parseCheckAggregates(ParseState *pstate, Query *qry)
         }
     }
 
-    /*
-     * If there are any acceptable GROUP BY expressions, build an RTE and
-     * nsitem for the result of the grouping step.
-     */
-    if (groupClauses)
-    {
-        pstate->p_grouping_nsitem =
-            addRangeTableEntryForGroup(pstate, groupClauses);
-
-        /* Set qry->rtable again in case it was previously NIL */
-        qry->rtable = pstate->p_rtable;
-        /* Mark the Query as having RTE_GROUP RTE */
-        qry->hasGroupRTE = true;
-    }
-
     /*
      * Replace grouped variables in the targetlist and HAVING clause with Vars
      * that reference the RTE_GROUP RTE.  Emit an error message if we find any

On Fri, Feb 27, 2026 at 3:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The problem is obvious after looking at parseCheckAggregates: the
> RTE_GROUP RTE is manufactured using the groupClauses list after we
> have flattened that, which we are only doing for comparison purposes;
> it shouldn't affect what goes into the parse tree.  I experimented
> with just changing the order of operations, and that seems to fix it.

Right.  We should keep the unmodified GROUP BY expressions in the
parse tree, and then rely on the planner to flatten the join alias
vars within them.

+1 to the fix.

> The lack of any effect on check-world shows we need more test cases
> here ...

How about this new test case in the attached (parse_agg.c untouched)?

- Richard

Вложения
On Fri, Feb 27, 2026 at 12:23 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Fri, Feb 27, 2026 at 3:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > The problem is obvious after looking at parseCheckAggregates: the
> > RTE_GROUP RTE is manufactured using the groupClauses list after we
> > have flattened that, which we are only doing for comparison purposes;
> > it shouldn't affect what goes into the parse tree.  I experimented
> > with just changing the order of operations, and that seems to fix it.

> Right.  We should keep the unmodified GROUP BY expressions in the
> parse tree, and then rely on the planner to flatten the join alias
> vars within them.
>
> +1 to the fix.

I am on the fence about whether this fix is safe to back-patch to v18.
I cautiously think it is safe, as it does not change the parsetree's
external representation and does not require an initdb.

However, this fix will only apply to newly created views.  Users who
have existing views affected by this bug will have to recreate them
after upgrading to get the corrected pg_get_viewdef output.  (really
kicking myself for this.)

- Richard



Richard Guo <guofenglinux@gmail.com> writes:
> I am on the fence about whether this fix is safe to back-patch to v18.

I don't think we have a lot of choice.  The cases where it makes a
difference are pretty broken.  Fortunately, I think these cases
are rare.  JOIN USING combining two different-type columns has got
to be an edge-case usage, and I think it likely doesn't matter much
in other cases.

> However, this fix will only apply to newly created views.  Users who
> have existing views affected by this bug will have to recreate them
> after upgrading to get the corrected pg_get_viewdef output.

Yeah :-(.  What's really annoying is that probably people will not
notice until they try to upgrade to v19, and by then recreating
the view correctly might be difficult.  But I'm not seeing a way
to smooth their path.

            regards, tom lane



I wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
>> I am on the fence about whether this fix is safe to back-patch to v18.

> I don't think we have a lot of choice.  The cases where it makes a
> difference are pretty broken.  Fortunately, I think these cases
> are rare.  JOIN USING combining two different-type columns has got
> to be an edge-case usage, and I think it likely doesn't matter much
> in other cases.

I spent a bit of effort on determining which cases actually cause
wrong output, and AFAICT it's very narrow: you need "SELECT ...
t1 LEFT JOIN t2 USING (x) GROUP BY x" where t1.x and t2.x are
different data types and t1.x is the side requiring coercion.
With no coercion, or if the join side to be coerced is nullable, we
show the flattened alias Var but that doesn't actually break anything.

So I went ahead and pushed this, using your test case.

            regards, tom lane



On Sat, Feb 28, 2026 at 3:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I went ahead and pushed this, using your test case.

Thank you for handling this, Tom.

- Richard