Обсуждение: JumbleQuery ma treat different GROUP BY expr as the same
hi.
drop table if exists t;
create table t(a text, b text, c int);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
explain(costs off, verbose) select count(*) from t group by a;
explain(costs off, verbose) select count(*) from t group by b;
explain(costs off, verbose) select count(*) from t group by c;
SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
calls | rows | query
-------+------+------------------------------------------------------------------------------
0 | 0 | SELECT calls, rows, query FROM pg_stat_statements
ORDER BY query COLLATE "C"
1 | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
2 | 0 | explain(costs off, verbose) select count(*) from t group by a
2 | 0 | explain(costs off, verbose) select count(*) from t group by a;
1 | 0 | explain(costs off, verbose) select count(*) from t group by c
1 | 0 | explain(costs off, verbose) select count(*) from t group by c;
(6 rows)
transformSelectStmt->transformGroupClause->transformGroupClauseExpr->addTargetToGroupList
will produce the same SortGroupClause node for "group by a" and "group by b".
JumbleQuery will jumble Query->groupClause, but RangeTblEntry->groupexprs in
Query->rtable is marked with query_jumble_ignore and therefore excluded from
jumbling.
So "group by a" and "group by" merged into the same entry in
pg_stat_statements,
Is this what we expected?
--
jian
https://www.enterprisedb.com/
jian he <jian.universality@gmail.com> writes:
> explain(costs off, verbose) select count(*) from t group by a;
> explain(costs off, verbose) select count(*) from t group by b;
> explain(costs off, verbose) select count(*) from t group by c;
> JumbleQuery will jumble Query->groupClause, but RangeTblEntry->groupexprs in
> Query->rtable is marked with query_jumble_ignore and therefore excluded from
> jumbling.
> So "group by a" and "group by" merged into the same entry in
> pg_stat_statements,
> Is this what we expected?
It is not what happened before we invented RTE_GROUP. I tried your
experiment in v14 and got:
regression=# SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
calls | rows | query
-------+------+---------------------------------------------------------------
1 | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
1 | 0 | explain(costs off, verbose) select count(*) from t group by a
1 | 0 | explain(costs off, verbose) select count(*) from t group by b
1 | 0 | explain(costs off, verbose) select count(*) from t group by c
(4 rows)
So I'm inclined to think this was an unintentional change of behavior.
regards, tom lane
On Sat, Jan 10, 2026 at 11:46:27AM -0500, Tom Lane wrote: > It is not what happened before we invented RTE_GROUP. I tried your > experiment in v14 and got: > > regression=# SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"; > calls | rows | query > -------+------+--------------------------------------------------------------- > 1 | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t > 1 | 0 | explain(costs off, verbose) select count(*) from t group by a > 1 | 0 | explain(costs off, verbose) select count(*) from t group by b > 1 | 0 | explain(costs off, verbose) select count(*) from t group by c > (4 rows) > > So I'm inclined to think this was an unintentional change of behavior. The difference of behavior is between v17 and v18, as an effect of 247dea89f761. I know that we should not break query ID computations in stable branches, but v18 is very recent and we still have many years to support it.. So I would like to suggest that we make an exception and backpatch a fix to v18. This behavior is not cool for users. This issue also points to a gap in the regression of pg_stat_statements, where we have never bothered testing patterns of GROUP BY with the same table and different attributes. Jian, would you like to write a patch? -- Michael
Вложения
On Sun, Jan 11, 2026 at 6:13 PM Michael Paquier <michael@paquier.xyz> wrote: > > This issue also points to a gap in the regression of > pg_stat_statements, where we have never bothered testing patterns of > GROUP BY with the same table and different attributes. Jian, would > you like to write a patch? > -- > Michael Sure, I will give it a try over the next few days.
Michael Paquier <michael@paquier.xyz> writes:
> On Sat, Jan 10, 2026 at 11:46:27AM -0500, Tom Lane wrote:
>> So I'm inclined to think this was an unintentional change of behavior.
> The difference of behavior is between v17 and v18, as an effect of
> 247dea89f761. I know that we should not break query ID computations
> in stable branches, but v18 is very recent and we still have many
> years to support it.. So I would like to suggest that we make an
> exception and backpatch a fix to v18. This behavior is not cool for
> users.
I agree we should fix it in v18, but we really need to push to have
the fix in 18.2 rather than delay longer. (IME, the .2 release is
about when our more risk-averse users start to think about upgrading.)
So there's a time limit on getting this done.
regards, tom lane
hi. while working on it, I guess I found another bug, below JumbleQuery will return the same result: SELECT FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b; SELECT a FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b; SELECT a, b FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b; so I think TargetEntry.resjunk should not be marked as query_jumble_ignore. addRangeTableEntryForGroup will make RangeTblEntry(RTE_GROUP) have a newly copied original groupexprs. Query->targetList also has the original groupexprs. but after parseCheckAggregates->substitute_grouped_columns. Query->targetList Var node will point to the offset of the RTE_GROUP, not the RTE_RELATION. see src/backend/parser/parse_agg.c line 1543. After parseCheckAggregates, JumbleQuery(Query->targetList) will produce the same result as long as the grouping columns have the same list of data types. JumbleQuery(Query->groupClause) will also produce the same result as long as the grouping columns have the same list of data types. Since only the RangeTblEntry(RTE_GROUP) have the original grouping expressions, we can not mark the RangeTblEntry->groupexprs as query_jumble_ignore. looking at, transformUpdateTargetList->transformTargetList so i think it's OK to remove query_jumble_ignore from TargetEntry.resjunk for INSERT/UPDATE/DELETE. -- jian https://www.enterprisedb.com/
Вложения
On Mon, Jan 12, 2026 at 04:20:44PM +0800, jian he wrote: > While working on it, I guess I found another bug, below JumbleQuery will return > the same result: > > SELECT FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b; > SELECT a FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b; > SELECT a, b FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b; > > so I think TargetEntry.resjunk should not be marked as query_jumble_ignore. Not sure how to feel about this one, as a primary node. 3db72ebcbe20 has put a query_jumble_ignore to TargetEntry.resjunk which was simply a consistent move with the pre-v15 branches because these columns have always been ignored. I have never heard complaints about that in the field with PGSS, TBH. The original choice comes from this thread, back in 2012 when this was still integrated into PGSS: https://www.postgresql.org/message-id/CAEYLb_WGeFCT7MfJ8FXf-CR6BSE6Lbn%2BO1VX3%2BOGrc4Bscn4%3DA%40mail.gmail.com Anyway, let's not mix apples and oranges for now. The GROUP BY issue is a bug worth fixing on its own. What you are pointing out with resjunk is the original behavior we have been relying on. If we finish by changing it, this should not and cannot be backpatched. I have expanded a bit the tests, with a couple of extra patterns, giving the attached. The behavior is the same as the pre-v17 branches. -- Michael
Вложения
On Tue, Jan 13, 2026 at 04:09:46PM +0900, Michael Paquier wrote: > I have expanded a bit the tests, with a couple of extra patterns, > giving the attached. The behavior is the same as the pre-v17 > branches. Done this part for the GROUP BY regression now, as of e217dc7484e5, down to v18. -- Michael
Вложения
On Tue, Jan 13, 2026 at 3:10 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Jan 12, 2026 at 04:20:44PM +0800, jian he wrote: > > While working on it, I guess I found another bug, below JumbleQuery will return > > the same result: > > > > SELECT FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b; > > SELECT a FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b; > > SELECT a, b FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b; > > > > so I think TargetEntry.resjunk should not be marked as query_jumble_ignore. > > Not sure how to feel about this one, as a primary node. 3db72ebcbe20 > has put a query_jumble_ignore to TargetEntry.resjunk which was simply > a consistent move with the pre-v15 branches because these columns have > always been ignored. I have never heard complaints about that in the > field with PGSS, TBH. The original choice comes from this thread, > back in 2012 when this was still integrated into PGSS: > https://www.postgresql.org/message-id/CAEYLb_WGeFCT7MfJ8FXf-CR6BSE6Lbn%2BO1VX3%2BOGrc4Bscn4%3DA%40mail.gmail.com > if not remove the query_jumble_ignore from TargetEntry.resjunk the below query would have the same QueryID. SELECT COUNT(*), a FROM (VALUES (1::INT, 2::INT)) AS t(a, b) GROUP BY a, b; SELECT COUNT(*), a, b FROM (VALUES (1::INT, 2::INT)) AS t(a, b) GROUP BY a, b; It affects queries that include an ORDER BY or GROUP BY clause, so a patch is attached. -- jian https://www.enterprisedb.com/