Обсуждение: Strange behavior of the random() function

Поиск
Список
Период
Сортировка

Strange behavior of the random() function

От
r.zharkov@postgrespro.ru
Дата:
Hello,
Can anybody explain me that strange behavior?

  select random() as "rnd", random() as "rnd1", random() as "rnd2"
  from generate_series( 1, 5 );
  ---
  0.948556384071708    0.769186236895621    0.381390900816768
  0.684488877654076    0.594888080842793    0.214444123208523
  0.668408528901637    0.452859566546977    0.924952184315771

All of the values are different.

  select random() as "rnd", random() as "rnd1", random() as "rnd2"
  from generate_series( 1, 3 )
  order by random();
  ---
  0.166236794553697    0.166236794553697    0.166236794553697
  0.629051928408444    0.629051928408444    0.629051928408444
  0.698161978274584    0.698161978274584    0.698161978274584

All values in any row are the same.

  select random() as "rnd", random() as "rnd1", random() as "rnd2"
  from generate_series( 1, 3 )
  order by random() || 'test';
  ---
  0.569052047561854    0.531697370111942    0.338135569822043
  0.880468992516398    0.172520748339593    0.0296505615115166
  0.862012444529682    0.119546371512115    0.21626165881753

All of the values are different again.

Is it a bug?


Re: Strange behavior of the random() function

От
Tom Lane
Дата:
r.zharkov@postgrespro.ru writes:
> Can anybody explain me that strange behavior?

It's a squishiness in the SQL language, if you ask me.  Consider this
simplified query:

select random() from generate_series(1, 3) order by random();

Would you expect the output of this query to appear ordered, or not?
There's an argument to be made that the two instances of random() ought
to be executed independently for each row, so that the output would
(probably) seem unordered.  But practical usage generally expects that
we unify the two textually-identical expressions, so that the behavior
is the same as

select random() from generate_series(1, 3) order by 1;

>   select random() as "rnd", random() as "rnd1", random() as "rnd2"
>   from generate_series( 1, 3 )
>   order by random();
> All values in any row are the same.

Here, we've unified *all* the appearances of the ORDER BY expression in
the SELECT-list.  Maybe that's wrong, but it's hard to make a principled
argument why we shouldn't do it.  If only one of them should be unified
with the ORDER BY expression, then which one?

>   select random() as "rnd", random() as "rnd1", random() as "rnd2"
>   from generate_series( 1, 3 )
>   order by random() || 'test';

Here, the ORDER BY expression is not a match to any SELECT-list
expression, so they don't get unified with it.

If you'd like more control over this sort of thing, I'd recommend
using a sub-select, ie

select x,y,z from
 (select random() as x, random() as y, random() as z
  from generate_series(1,3)) ss
order by x;

which makes it clear what your intention is.

There's some interesting related behaviors: compare the outputs of

select random() as x, random() as y, random() as z
  from generate_series(1,3) order by 1;
select random() as x, random() as y, random() as z
  from generate_series(1,3) order by 1,2;
select random() as x, random() as y, random() as z
  from generate_series(1,3) order by 1,2,3;

I could buy the idea that there's a bug involved in that; but again
the question is exactly which textually identical expressions should
get unified and why.

            regards, tom lane


Re: Strange behavior of the random() function

От
Олег Самойлов
Дата:
> 26 сент. 2018 г., в 6:35, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
> r.zharkov@postgrespro.ru writes:
>> Can anybody explain me that strange behavior?
>
> It's a squishiness in the SQL language, if you ask me.  Consider this
> simplified query:
>
> select random() from generate_series(1, 3) order by random();
>
> Would you expect the output of this query to appear ordered, or not?
> There's an argument to be made that the two instances of random() ought
> to be executed independently for each row, so that the output would
> (probably) seem unordered.  But practical usage generally expects that
> we unify the two textually-identical expressions, so that the behavior
> is the same as
>
> select random() from generate_series(1, 3) order by 1;
>
>>  select random() as "rnd", random() as "rnd1", random() as "rnd2"
>>  from generate_series( 1, 3 )
>>  order by random();
>> All values in any row are the same.
>
> Here, we've unified *all* the appearances of the ORDER BY expression in
> the SELECT-list.  Maybe that's wrong, but it's hard to make a principled
> argument why we shouldn't do it.

Because random() is volatile function, but «unified» in such way can be only stable expressions. That the «volatile»
functionbehaves like  the «stable» function this is obviously is a bug.