Re: selecting random row values in postgres

Поиск
Список
Период
Сортировка
От Geoff Tolley
Тема Re: selecting random row values in postgres
Дата
Msg-id 45DFB208.1010201@polimetrix.com
обсуждение исходный текст
Ответ на Re: selecting random row values in postgres  (Tommy Gildseth <tommy@gildseth.com>)
Список pgsql-sql
Tommy Gildseth wrote:
> Sumeet wrote:
>> Thanks Buddy, really appreciate ur help on this
>>
>> ....problem solved...
>>
>> Is there any way this query can be optimized...i'm running it on a 
>> huge table with joins
> 
> ORDER BY rand() is rather slow on large datasets, since the db has to 
> actually generate a random value for each row in the table, before being 
> able use it to sort by. Preferable ways to do this include f.ex:
> SELECT max(id) FROM table;
> SELECT * FROM table WHERE id > [rand value between 0 and max(id)] ORDER 
> BY id LIMIT 1;

I'd have thought that in most cases the slowness would be due more to the 
sort than the cost of generating the random numbers. Your pair of queries 
neatly avoid doing the sort, but it does raise the question, does it matter 
if not all rows have the same chance of being picked? As an extreme example 
in the above, if there are id's 1-100 and id = 100000 in the table, the 
last one will almost always be picked.

I wonder if the random aggregate that Josh Berkus wrote could be adapted to 
return a record type rather than a random value of a single column? Its big 
advantage is that it requires no sorting and all entries are equiprobable. 
Unfortunately I'm a bit short of time at this end of a Friday to do much 
but give you the URL:

http://www.powerpostgresql.com/Random_Aggregate

HTH,
Geoff


В списке pgsql-sql по дате отправления:

Предыдущее
От: Tommy Gildseth
Дата:
Сообщение: Re: selecting random row values in postgres
Следующее
От: chrisj
Дата:
Сообщение: Re: can someone explain confusing array indexing nomenclature