Обсуждение: select random order by random
Dear sirs,
I was very surprised when I executed such SQL query (under PostgreSQL 8.2):
select random() from generate_series(1, 10) order by random();
I thought I would receive ten random numbers in random order. But I received
ten random numbers sorted numerically:
random
-------------------
0.102324520237744
0.17704638838768
0.533014383167028
0.60182224214077
0.644065519794822
0.750732169486582
0.821376844774932
0.88221683120355
0.889879426918924
0.924697323236614
(10 rows)
I don't understand - why the result is like that? It seems like in each row
both random()s were giving the same result. Why is it like that? What caused
it?
--
Piotr Sobolewski
http://www.piotrsobolewski.w.pl
piotr_sobolewski wrote: > Dear sirs, > > I was very surprised when I executed such SQL query (under PostgreSQL 8.2): > select random() from generate_series(1, 10) order by random(); > > I thought I would receive ten random numbers in random order. But I received > ten random numbers sorted numerically: > random > ------------------- > 0.102324520237744 > 0.17704638838768 > 0.533014383167028 > 0.60182224214077 > 0.644065519794822 > 0.750732169486582 > 0.821376844774932 > 0.88221683120355 > 0.889879426918924 > 0.924697323236614 > (10 rows) > > I don't understand - why the result is like that? It seems like in each row > both random()s were giving the same result. Why is it like that? What caused > it? > Your query specifically requested that the result be ordered by the column "random" in the result set (the default ordering direction being ASC). Your query is semantically identical to: SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo ASC; I should think that you would get a better result if you dropped the ORDER BY clause. brian
piotr_sobolewski@o2.pl ("=?UTF-8?Q?piotr=5Fsobolewski?=") writes:
> I was very surprised when I executed such SQL query (under PostgreSQL 8.2):
> select random() from generate_series(1, 10) order by random();
>
> I thought I would receive ten random numbers in random order. But I received
> ten random numbers sorted numerically:
> random
> -------------------
> 0.102324520237744
> 0.17704638838768
> 0.533014383167028
> 0.60182224214077
> 0.644065519794822
> 0.750732169486582
> 0.821376844774932
> 0.88221683120355
> 0.889879426918924
> 0.924697323236614
> (10 rows)
>
> I don't understand - why the result is like that? It seems like in each row
> both random()s were giving the same result. Why is it like that? What caused
> it?
At first, I thought this was unsurprising, but it's pretty easy to
show that there's more going on than meets the eye... It is a bit
more clear that something interesting is going on if you add extra
columns, and name them all.
For instance, consider:
test@[local]:5433=# select random() as r1, random() as r2, random() as r3 from generate_series(1,10) order by r3;
r1 | r2 | r3
-------------------+-------------------+-------------------
0.246548388153315 | 0.700139089021832 | 0.119033687748015
0.627153669018298 | 0.813135434407741 | 0.197322080843151
0.306632998399436 | 0.545771937351674 | 0.25644090725109
0.345610864460468 | 0.474996185861528 | 0.350776285864413
0.580351672600955 | 0.673816084861755 | 0.443187412340194
0.73298008274287 | 0.756699057295918 | 0.594754341989756
0.932091740425676 | 0.383943342603743 | 0.749452064745128
0.955010122619569 | 0.972370331641287 | 0.893978256732225
0.675367069896311 | 0.800306641962379 | 0.922712546307594
0.778622157406062 | 0.51328693702817 | 0.978598471730947
(10 rows)
You can see that it's ordering by the third column.
If I replicate your query, with extra columns, AND NAMES, I get the following:
test@[local]:5433=# select random() as r1, random() as r2, random() as r3 from generate_series(1,10) order by random();
r1 | r2 | r3
--------------------+--------------------+-------------------
0.0288224648684263 | 0.904462072532624 | 0.27792159980163
0.144174488261342 | 0.406729203648865 | 0.452183415647596
0.263208176475018 | 0.752340068109334 | 0.927179601509124
0.443778183776885 | 0.197728976141661 | 0.556072968058288
0.613984462339431 | 0.0589730669744313 | 0.472951539326459
0.641100264620036 | 0.152739099226892 | 0.528443300165236
0.700987075921148 | 0.160180815029889 | 0.752044326625764
0.778274529613554 | 0.579829142428935 | 0.078228241764009
0.849023841321468 | 0.570575307123363 | 0.742937533650547
0.870425369590521 | 0.837595224380493 | 0.986238476354629
(10 rows)
It is indeed somewhat curious that the query parser chose to interpret
that the "order by random()" was referring to column #1.
--
(format nil "~S@~S" "cbbrowne" "linuxfinances.info")
http://cbbrowne.com/info/lisp.html
"...I'm not one of those who think Bill Gates is the devil. I simply
suspect that if Microsoft ever met up with the devil, it wouldn't need
an interpreter." -- Nicholas Petreley, InfoWorld, Sept 16, 1996
Chris Browne wrote: > If I replicate your query, with extra columns, AND NAMES, I get the following: > > test@[local]:5433=# select random() as r1, random() as r2, random() as r3 from generate_series(1,10) order by random(); > r1 | r2 | r3 > --------------------+--------------------+------------------- > 0.0288224648684263 | 0.904462072532624 | 0.27792159980163 > 0.144174488261342 | 0.406729203648865 | 0.452183415647596 > ... > It is indeed somewhat curious that the query parser chose to interpret > that the "order by random()" was referring to column #1. And even more curiously, IMHO, even specifying column names isn't enough. Note that this: li=# select * from (select (random()*10)::int as a, (random()*10)::int as b from generate_series(1,10) order by a) as x orderby b; a | b ---+---- 0 | 8 1 | 10 3 | 4 4 | 8 5 | 1 5 | 9 6 | 4 6 | 5 8 | 4 9 | 0 (10 rows) is sorted by "a" even though the outermost "order by" clause explicitly said to order by "b". Seems like it's a known odd behavior ... http://archives.postgresql.org/pgsql-general/2006-11/msg01523.php http://archives.postgresql.org/pgsql-general/2006-11/msg01539.php http://archives.postgresql.org/pgsql-general/2006-11/msg01544.php