Re: behavior of GROUP BY with VOLATILE expressions

Поиск
Список
Период
Сортировка
От Paul George
Тема Re: behavior of GROUP BY with VOLATILE expressions
Дата
Msg-id CALA8mJq9sJKw3p=gKf40D-8M43VJzBHN0mus1zo55ASfGn02tw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: behavior of GROUP BY with VOLATILE expressions  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-hackers
David:

>Only now just grasping that you are trying to group something that is definitionally random.  That just doesn't make sense to me.

Oh, sorry for the confusion. Yeah, totally. I didn't mean to draw specific attention to GROUP BY -- as you've pointed out elsewhere this issue also exists with ORDER BY.

To clean this up a bit, it's specifically the comparison of how volatile functions and expressions are evaluated differently here (covered in prior links you've provided),

postgres=# select random(), random() order by random();
      random       |      random      
-------------------+-------------------
 0.956989895473876 | 0.956989895473876
(1 row)

and, here,

postgres=# select (select random()), (select random()) order by (select random());
       random       |       random      
--------------------+--------------------
 0.2872914386383745 | 0.8976525075618966
(1 row)

Regarding documentation, I think those changes would be useful. There's this suggestion

"An expression or subexpression in
the SELECT list that matches an ORDER BY or GROUP BY item is taken to represent the same value that was sorted or grouped by, even when the
(sub)expression is volatile".

and this one,

"A side-effect of this feature is that ORDER BY expressions containing
volatile functions will execute the volatile function only once for the
entire row; thus any column expressions using the same function will reuse
the same function result."

But I don't think either cover the additional, albeit nuanced, case of volatile scalar subqueries.

-Paul-

On Fri, Jul 19, 2024 at 2:28 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Fri, Jul 19, 2024 at 2:21 PM Paul George <p.a.george19@gmail.com> wrote:
Great, thanks for the links and useful past discussions! I figured I wasn't the first to stumble across this, and it's interesting to see the issue arise with ORDER BY [VOLATILE FUNC] as well.

My question was not so much about changing behavior as it was about understanding what is desired, especially in light of the fact that subqueries behave differently. From my reading of the links you provided, it seems that even the notion of "desired" here is itself dubious and that there is a case for reevaluating RANDOM() everywhere and a case for not doing that. Given this murkiness, is it fair then to say that drawing parallels between how GROUP BY subquery is handled is moot?

Only now just grasping that you are trying to group something that is definitionally random.  That just doesn't make sense to me.  Grouping is for categorical data (loosely defined, something like Invoice# arguably counts as a category if you are looking at invoice details.)

I'll stick with: this whole area, implementation-wise, is going to remain status-quo.  If you've got ideas for documenting it better hopefully a patch goes in at some point.  Mostly that can be done black-box style - inputs and outputs, not code reading.

David J.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joseph Koshakow
Дата:
Сообщение: Re: Remove dependence on integer wrapping
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: documentation structure