Re: ORDER BY random() LIMIT 1 slowness

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: ORDER BY random() LIMIT 1 slowness
Дата
Msg-id 4144.1040103853@sss.pgh.pa.us
обсуждение исходный текст
Ответ на ORDER BY random() LIMIT 1 slowness  ("Gavin M. Roy" <gmr@justsportsusa.com>)
Ответы Re: ORDER BY random() LIMIT 1 slowness  (Greg Stark <gsstark@mit.edu>)
Список pgsql-general
"Gavin M. Roy" <gmr@justsportsusa.com> writes:
> SELECT * FROM poetry ORDER BY random() LIMIT 1;
> [ is slow for 35000 rows ]

Yeah.  Basically this query is implemented as
  (a) select all 35000 rows of "poetry";
  (b) compute a random() value for each row;
  (c) sort by the random() values;
  (d) take the first row, discard the rest.

The good news: this gives you a pretty-durn-random selection.  The bad
news: you didn't really care about choosing a random ordering of the
other 34999 rows, but it computed one anyway.

This problem's been discussed before, but I've not seen any really
ideal answer.  SQL is not designed to offer unpredictable results ;-)

If you have a reasonably-uniformly-distributed index key on the rows,
you can try something like

    select * from poetry where key > (scale * random() + min)
    order by key limit 1;

(where scale and min are chosen to make the result cover the range of
index keys).  Given an index on "key" this should pick a result quickly
via an indexscan+limit plan.  However, if the distribution of keys isn't
real uniform then you won't get real random results --- keys just above
larger gaps will be selected with unfairly high probability.  You also
have to know the min and max keys accurately.

I recall some folks have suggested generating an actual random column
(ie, something initialized with "default random()" and suitably indexed)
but this seems like a lot of overhead ... you'd have to be mighty
concerned with the exactness of your random selection to put up with
that, I think.

There are some other suggestions in the archives, IIRC.

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Lamar Owen
Дата:
Сообщение: Re: Upgrading 6.5.1
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: How to access deleted but not vacuum'ed tuples?