Multiple grouping set specs referencing duplicate alias

Поиск
Список
Период
Сортировка
От David Kimura
Тема Multiple grouping set specs referencing duplicate alias
Дата
Msg-id CAHnPFjSdFx_TtNpQturPMkRSJMYaD5rGP2=8iFH9V24-OjHGiQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Multiple grouping set specs referencing duplicate alias  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi all!

I think I may have stumbled across a case of wrong results on HEAD (same
through version 9.6, though interestingly 9.5 produces different results
altogether).

test=# SELECT i AS ai1, i AS ai2 FROM generate_series(1,3)i GROUP BY
ai2, ROLLUP(ai1) ORDER BY ai1, ai2;

 ai1 | ai2
-----+-----
   1 |   1
     |   1
   2 |   2
     |   2
   3 |   3
     |   3
(6 rows)

I had expected:

 ai1 | ai2
-----+-----
   1 |   1
   2 |   2
   3 |   3
     |   1
     |   2
     |   3
(6 rows)

It seems to me that the plan is missing a Sort node (on ai1 and ai2) above the
Aggregate node.

                   QUERY PLAN
------------------------------------------------
 GroupAggregate
   Group Key: i, i
   Group Key: i
   ->  Sort
         Sort Key: i
         ->  Function Scan on generate_series i

I have a hunch part of the issue may be an assumption that a duplicate aliased
column will produce the same column values and hence isn't included in the
range table, nor subsequently the pathkeys. However, that assumption does not
seem to be true for queries with multiple grouping set specifications:

test=#  SELECT i as ai1, i as ai2 FROM (values (1),(2),(3)) v(i) GROUP
BY ai1, ROLLUP(ai2);
 ai1 | ai2
-----+-----
   1 |   1
   2 |   2
   3 |   3
   1 |
   2 |
   3 |
(6 rows)

It seems to me that excluding the duplicate alias from the pathkeys is leading
to a case where the group order is incorrectly determined to satisfy the sort
order. Thus create_ordered_paths() decides against applying an explicit sort
node. But simply forcing an explicit sort still seems wrong since we've
effectively lost a relevant column for the sort.

I tinkered a bit and hacked together an admittedly ugly patch that triggers an
explicit sort constructed from the parse tree. An alternative approach I had
considered was to update the rewriteHandler to explicitly force the existence of
the duplicate alias column in the range tables. But that also felt meh.

Does this seem like a legit issue? And if so, any thoughts on alternative
approaches?

Thanks,
David Kimura

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Missing update of all_hasnulls in BRIN opclasses
Следующее
От: Anton Voloshin
Дата:
Сообщение: patch suggestion: Fix citext_utf8 test's "Turkish I" with ICU collation provider