Обсуждение: JumbleQuery ma treat different GROUP BY expr as the same

Поиск
Список
Период
Сортировка

JumbleQuery ma treat different GROUP BY expr as the same

От
jian he
Дата:
hi.

drop table if exists t;
create table t(a text, b text, c int);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
explain(costs off, verbose) select count(*) from t group by a;
explain(costs off, verbose) select count(*) from t group by b;
explain(costs off, verbose) select count(*) from t group by c;

SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
 calls | rows |                                    query
-------+------+------------------------------------------------------------------------------
     0 |    0 | SELECT calls, rows, query FROM pg_stat_statements
ORDER BY query COLLATE "C"
     1 |    1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
     2 |    0 | explain(costs off, verbose) select count(*) from t group by a
     2 |    0 | explain(costs off, verbose) select count(*) from t group by a;
     1 |    0 | explain(costs off, verbose) select count(*) from t group by c
     1 |    0 | explain(costs off, verbose) select count(*) from t group by c;
(6 rows)

transformSelectStmt->transformGroupClause->transformGroupClauseExpr->addTargetToGroupList
will produce the same SortGroupClause node for "group by a" and "group by b".

JumbleQuery will jumble Query->groupClause, but RangeTblEntry->groupexprs in
Query->rtable is marked with query_jumble_ignore and therefore excluded from
jumbling.

So  "group by a" and "group by" merged into the same entry in
pg_stat_statements,
Is this what we expected?



--
jian
https://www.enterprisedb.com/



Re: JumbleQuery ma treat different GROUP BY expr as the same

От
Tom Lane
Дата:
jian he <jian.universality@gmail.com> writes:
> explain(costs off, verbose) select count(*) from t group by a;
> explain(costs off, verbose) select count(*) from t group by b;
> explain(costs off, verbose) select count(*) from t group by c;

> JumbleQuery will jumble Query->groupClause, but RangeTblEntry->groupexprs in
> Query->rtable is marked with query_jumble_ignore and therefore excluded from
> jumbling.

> So  "group by a" and "group by" merged into the same entry in
> pg_stat_statements,
> Is this what we expected?

It is not what happened before we invented RTE_GROUP.  I tried your
experiment in v14 and got:

regression=# SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
 calls | rows |                             query
-------+------+---------------------------------------------------------------
     1 |    1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
     1 |    0 | explain(costs off, verbose) select count(*) from t group by a
     1 |    0 | explain(costs off, verbose) select count(*) from t group by b
     1 |    0 | explain(costs off, verbose) select count(*) from t group by c
(4 rows)

So I'm inclined to think this was an unintentional change of behavior.

            regards, tom lane



Re: JumbleQuery ma treat different GROUP BY expr as the same

От
Michael Paquier
Дата:
On Sat, Jan 10, 2026 at 11:46:27AM -0500, Tom Lane wrote:
> It is not what happened before we invented RTE_GROUP.  I tried your
> experiment in v14 and got:
>
> regression=# SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
>  calls | rows |                             query
> -------+------+---------------------------------------------------------------
>      1 |    1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
>      1 |    0 | explain(costs off, verbose) select count(*) from t group by a
>      1 |    0 | explain(costs off, verbose) select count(*) from t group by b
>      1 |    0 | explain(costs off, verbose) select count(*) from t group by c
> (4 rows)
>
> So I'm inclined to think this was an unintentional change of behavior.

The difference of behavior is between v17 and v18, as an effect of
247dea89f761.  I know that we should not break query ID computations
in stable branches, but v18 is very recent and we still have many
years to support it..  So I would like to suggest that we make an
exception and backpatch a fix to v18.  This behavior is not cool for
users.

This issue also points to a gap in the regression of
pg_stat_statements, where we have never bothered testing patterns of
GROUP BY with the same table and different attributes.  Jian, would
you like to write a patch?
--
Michael

Вложения

Re: JumbleQuery ma treat different GROUP BY expr as the same

От
jian he
Дата:
On Sun, Jan 11, 2026 at 6:13 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> This issue also points to a gap in the regression of
> pg_stat_statements, where we have never bothered testing patterns of
> GROUP BY with the same table and different attributes.  Jian, would
> you like to write a patch?
> --
> Michael

Sure, I will give it a try over the next few days.



Re: JumbleQuery ma treat different GROUP BY expr as the same

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Sat, Jan 10, 2026 at 11:46:27AM -0500, Tom Lane wrote:
>> So I'm inclined to think this was an unintentional change of behavior.

> The difference of behavior is between v17 and v18, as an effect of
> 247dea89f761.  I know that we should not break query ID computations
> in stable branches, but v18 is very recent and we still have many
> years to support it..  So I would like to suggest that we make an
> exception and backpatch a fix to v18.  This behavior is not cool for
> users.

I agree we should fix it in v18, but we really need to push to have
the fix in 18.2 rather than delay longer.  (IME, the .2 release is
about when our more risk-averse users start to think about upgrading.)
So there's a time limit on getting this done.

            regards, tom lane



Re: JumbleQuery ma treat different GROUP BY expr as the same

От
jian he
Дата:
hi.

while working on it, I guess I found another bug,  below JumbleQuery will return
the same result:

SELECT FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b;
SELECT a FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b;
SELECT a, b FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b;

so I think TargetEntry.resjunk should not be marked as query_jumble_ignore.

addRangeTableEntryForGroup
will make RangeTblEntry(RTE_GROUP) have a newly copied original groupexprs.
Query->targetList also has the original groupexprs.

but after parseCheckAggregates->substitute_grouped_columns.
Query->targetList Var node will point to the offset of the RTE_GROUP,
not the RTE_RELATION.
see src/backend/parser/parse_agg.c line 1543.

After parseCheckAggregates, JumbleQuery(Query->targetList) will produce the same
result as long as the grouping columns have the same list of data types.
JumbleQuery(Query->groupClause) will also produce the same result
as long as the grouping columns have the same list of data types.

Since only the RangeTblEntry(RTE_GROUP) have the original grouping expressions,
we can not mark the RangeTblEntry->groupexprs as query_jumble_ignore.

looking at, transformUpdateTargetList->transformTargetList
so i think it's OK to remove query_jumble_ignore from TargetEntry.resjunk for
INSERT/UPDATE/DELETE.


--
jian
https://www.enterprisedb.com/

Вложения

Re: JumbleQuery ma treat different GROUP BY expr as the same

От
Michael Paquier
Дата:
On Mon, Jan 12, 2026 at 04:20:44PM +0800, jian he wrote:
> While working on it, I guess I found another bug,  below JumbleQuery will return
> the same result:
>
> SELECT FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b;
> SELECT a FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b;
> SELECT a, b FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b;
>
> so I think TargetEntry.resjunk should not be marked as query_jumble_ignore.

Not sure how to feel about this one, as a primary node.  3db72ebcbe20
has put a query_jumble_ignore to TargetEntry.resjunk which was simply
a consistent move with the pre-v15 branches because these columns have
always been ignored.  I have never heard complaints about that in the
field with PGSS, TBH.  The original choice comes from this thread,
back in 2012 when this was still integrated into PGSS:
https://www.postgresql.org/message-id/CAEYLb_WGeFCT7MfJ8FXf-CR6BSE6Lbn%2BO1VX3%2BOGrc4Bscn4%3DA%40mail.gmail.com

Anyway, let's not mix apples and oranges for now.  The GROUP BY issue
is a bug worth fixing on its own.  What you are pointing out with
resjunk is the original behavior we have been relying on.  If we
finish by changing it, this should not and cannot be backpatched.

I have expanded a bit the tests, with a couple of extra patterns,
giving the attached.  The behavior is the same as the pre-v17
branches.
--
Michael

Вложения

Re: JumbleQuery ma treat different GROUP BY expr as the same

От
Michael Paquier
Дата:
On Tue, Jan 13, 2026 at 04:09:46PM +0900, Michael Paquier wrote:
> I have expanded a bit the tests, with a couple of extra patterns,
> giving the attached.  The behavior is the same as the pre-v17
> branches.

Done this part for the GROUP BY regression now, as of e217dc7484e5,
down to v18.
--
Michael

Вложения

Re: JumbleQuery ma treat different GROUP BY expr as the same

От
jian he
Дата:
On Tue, Jan 13, 2026 at 3:10 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Jan 12, 2026 at 04:20:44PM +0800, jian he wrote:
> > While working on it, I guess I found another bug,  below JumbleQuery will return
> > the same result:
> >
> > SELECT FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b;
> > SELECT a FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b;
> > SELECT a, b FROM (VALUES (1::INT, 2::INT)) AS t(a, b) ORDER BY a, b;
> >
> > so I think TargetEntry.resjunk should not be marked as query_jumble_ignore.
>
> Not sure how to feel about this one, as a primary node.  3db72ebcbe20
> has put a query_jumble_ignore to TargetEntry.resjunk which was simply
> a consistent move with the pre-v15 branches because these columns have
> always been ignored.  I have never heard complaints about that in the
> field with PGSS, TBH.  The original choice comes from this thread,
> back in 2012 when this was still integrated into PGSS:
> https://www.postgresql.org/message-id/CAEYLb_WGeFCT7MfJ8FXf-CR6BSE6Lbn%2BO1VX3%2BOGrc4Bscn4%3DA%40mail.gmail.com
>

if not remove the query_jumble_ignore from TargetEntry.resjunk
the below query would have the same QueryID.

SELECT COUNT(*), a FROM (VALUES (1::INT, 2::INT)) AS t(a, b) GROUP BY a, b;
SELECT COUNT(*), a, b FROM (VALUES (1::INT, 2::INT)) AS t(a, b) GROUP BY a, b;

It affects queries that include an ORDER BY or GROUP BY clause, so a patch is
attached.



--
jian
https://www.enterprisedb.com/

Вложения