Обсуждение: Getting a random row

Поиск
Список
Период
Сортировка

Getting a random row

От
Shaul Dar
Дата:
Hi,

I am running performance simulation against a DB. I want to randomly pull different records from a large table. However the table has no columns that hold sequential integer values (1..MAX), i.e. the columns all have "holes" (due to earlier filtering). Also PG does not have a concept of an auto-increment pseudo-column like Oracle's "rownum". Any suggestions?

Thanks,

-- Shaul

Re: Getting a random row

От
Grzegorz Jaśkiewicz
Дата:


On Tue, Oct 13, 2009 at 4:17 PM, Shaul Dar <shauldar@gmail.com> wrote:
Hi,

I am running performance simulation against a DB. I want to randomly pull different records from a large table. However the table has no columns that hold sequential integer values (1..MAX), i.e. the columns all have "holes" (due to earlier filtering).
what do yo umean ? you can restrict range of integer column (or any other type) with constraints, for instance CHECK foo( a between 1 and 100);
 
 
Also PG does not have a concept of an auto-increment pseudo-column like Oracle's "rownum". Any suggestions?
not true - it has sequences, and pseudo type serial. Rtfm!.
 


--
GJ

Re: Getting a random row

От
Thomas Kellerer
Дата:
Shaul Dar, 13.10.2009 17:17:
> Also PG does not have a concept of an auto-increment pseudo-column
> like Oracle's "rownum". Any suggestions?

Yes it does (at least 8.4)

SELECT row_number() over(), the_other_columns...
FROM your_table

So you could do something like:

SELECT *
FROM (
  SELECT row_number() over() as rownum,
         the_other_columns...
  FROM your_table
) t
WHERE t.rownum = a_random_integer_value_lower_than_rowcount;

Thomas



Re: Getting a random row

От
Shaul Dar
Дата:
Sorry, I guess I wasn't clear.
I have an existing table in my DB, and it doesn't have a column with serial values (actually it did originally, but due to later deletions of about 2/3 of the rows the column now has "holes"). I realize I could add a new serial column, but prefer not to change table + the new column would also become nonconsecutive after further deletions. The nice thing about Oracle's "rownum" is that it' a pseudo-column", not a real one, and AFAIK is always valid.

Suggestions?

-- Shaul

2009/10/13 Grzegorz Jaśkiewicz <gryzman@gmail.com>


On Tue, Oct 13, 2009 at 4:17 PM, Shaul Dar <shauldar@gmail.com> wrote:
Hi,

I am running performance simulation against a DB. I want to randomly pull different records from a large table. However the table has no columns that hold sequential integer values (1..MAX), i.e. the columns all have "holes" (due to earlier filtering).
what do yo umean ? you can restrict range of integer column (or any other type) with constraints, for instance CHECK foo( a between 1 and 100);
 
Also PG does not have a concept of an auto-increment pseudo-column like Oracle's "rownum". Any suggestions?
not true - it has sequences, and pseudo type serial. Rtfm!.
 


--
GJ

Re: Getting a random row

От
Michael Glaesemann
Дата:
On Oct 13, 2009, at 11:19 , Grzegorz Jaśkiewicz wrote:

> On Tue, Oct 13, 2009 at 4:17 PM, Shaul Dar <shauldar@gmail.com> wrote:
>
>
>> Also PG does not have a concept of an auto-increment pseudo-column
>> like
>> Oracle's "rownum". Any suggestions?
>>
> not true - it has sequences, and pseudo type serial. Rtfm!.

AIUI, rownum applies numbering to output rows in a SELECT statement,
rather than some actual column of the table, which is likely what the
OP is getting at.

http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html

Michael Glaesemann
grzm seespotcode net




Re: Getting a random row

От
Grzegorz Jaśkiewicz
Дата:


2009/10/13 Shaul Dar <shauldar@gmail.com>
Sorry, I guess I wasn't clear.
I have an existing table in my DB, and it doesn't have a column with serial values (actually it did originally, but due to later deletions of about 2/3 of the rows the column now has "holes"). I realize I could add a new serial column, but prefer not to change table + the new column would also become nonconsecutive after further deletions. The nice thing about Oracle's "rownum" is that it' a pseudo-column", not a real one, and AFAIK is always valid.
change the default of that column to use sequence.
For instance, lookup CREATE SEQUENCE in manual, and ALTER TABLE .. SET DEFAULT ..

for example of how it looks, just create table foo(a serial), and check its definition with \d+ foo 



--
GJ

Re: Getting a random row

От
Shaul Dar
Дата:
Michael,

You are right. I didn't remember the semantics, and Oracle's rownum would not have been helpful. But the new row_number() in 8.4 would probably do the trick (though I use 8.3.7 :-( )

-- Shaul

2009/10/13 Michael Glaesemann <grzm@seespotcode.net>

On Oct 13, 2009, at 11:19 , Grzegorz Jaśkiewicz wrote:

On Tue, Oct 13, 2009 at 4:17 PM, Shaul Dar <shauldar@gmail.com> wrote:


Also PG does not have a concept of an auto-increment pseudo-column like
Oracle's "rownum". Any suggestions?

not true - it has sequences, and pseudo type serial. Rtfm!.

AIUI, rownum applies numbering to output rows in a SELECT statement, rather than some actual column of the table, which is likely what the OP is getting at.

http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html

Michael Glaesemann
grzm seespotcode net




Re: Getting a random row

От
Robert Haas
Дата:
2009/10/13 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>
>
> 2009/10/13 Shaul Dar <shauldar@gmail.com>
>>
>> Sorry, I guess I wasn't clear.
>> I have an existing table in my DB, and it doesn't have a column with
>> serial values (actually it did originally, but due to later deletions of
>> about 2/3 of the rows the column now has "holes"). I realize I could add a
>> new serial column, but prefer not to change table + the new column would
>> also become nonconsecutive after further deletions. The nice thing about
>> Oracle's "rownum" is that it' a pseudo-column", not a real one, and AFAIK is
>> always valid.
>
> change the default of that column to use sequence.
> For instance, lookup CREATE SEQUENCE in manual, and ALTER TABLE .. SET
> DEFAULT ..
>
> for example of how it looks, just create table foo(a serial), and check its
> definition with \d+ foo

This is not really what he's trying to do.  Oracle's rownum has
completely different semantics than this.

But, on 8.4, a window function should do it.

...Robert

Re: Getting a random row

От
bricklen
Дата:
2009/10/13 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>
>
> 2009/10/13 Shaul Dar <shauldar@gmail.com>
>>
>> Sorry, I guess I wasn't clear.
>> I have an existing table in my DB, and it doesn't have a column with
>> serial values (actually it did originally, but due to later deletions of
>> about 2/3 of the rows the column now has "holes"). I realize I could add a
>> new serial column, but prefer not to change table + the new column would
>> also become nonconsecutive after further deletions. The nice thing about
>> Oracle's "rownum" is that it' a pseudo-column", not a real one, and AFAIK is
>> always valid.
>
> change the default of that column to use sequence.
> For instance, lookup CREATE SEQUENCE in manual, and ALTER TABLE .. SET
> DEFAULT ..
>
> for example of how it looks, just create table foo(a serial), and check its
> definition with \d+ foo
>
>
>
> --
> GJ
>


You could emulate rownum (aka rank) using a TEMPORARY sequence applied
to your result set.

http://www.postgresql.org/docs/8.3/interactive/sql-createsequence.html

Not sure if this is what you're after though?

Re: Getting a random row

От
Scott Marlowe
Дата:
On Tue, Oct 13, 2009 at 9:17 AM, Shaul Dar <shauldar@gmail.com> wrote:
> Hi,
>
> I am running performance simulation against a DB. I want to randomly pull
> different records from a large table. However the table has no columns that
> hold sequential integer values (1..MAX), i.e. the columns all have "holes"
> (due to earlier filtering). Also PG does not have a concept of an
> auto-increment pseudo-column like Oracle's "rownum". Any suggestions?

If what you're trying to do is emulate a real world app which randomly
grabs rows, then you want to setup something ahead of time that has a
pseudo random order and not rely on using anything like order by
random() limit 1 or anything like that.  Easiest way is to do
something like:

select id into randomizer from maintable order by random();

then use a cursor to fetch from the table to get "random" rows from
the real table.

Re: Getting a random row

От
Thom Brown
Дата:
2009/10/14 Scott Marlowe <scott.marlowe@gmail.com>:
>
> If what you're trying to do is emulate a real world app which randomly
> grabs rows, then you want to setup something ahead of time that has a
> pseudo random order and not rely on using anything like order by
> random() limit 1 or anything like that.  Easiest way is to do
> something like:
>
> select id into randomizer from maintable order by random();
>
> then use a cursor to fetch from the table to get "random" rows from
> the real table.
>
>

Why not just do something like:

SELECT thisfield, thatfield
FROM my_table
WHERE thisfield IS NOT NULL
ORDER BY RANDOM()
LIMIT 1;

Thom

Re: Getting a random row

От
Pavel Stehule
Дата:
2009/10/14 Thom Brown <thombrown@gmail.com>:
> 2009/10/14 Scott Marlowe <scott.marlowe@gmail.com>:
>>
>> If what you're trying to do is emulate a real world app which randomly
>> grabs rows, then you want to setup something ahead of time that has a
>> pseudo random order and not rely on using anything like order by
>> random() limit 1 or anything like that.  Easiest way is to do
>> something like:
>>
>> select id into randomizer from maintable order by random();
>>
>> then use a cursor to fetch from the table to get "random" rows from
>> the real table.
>>
>>
>
> Why not just do something like:
>
> SELECT thisfield, thatfield
> FROM my_table
> WHERE thisfield IS NOT NULL
> ORDER BY RANDOM()
> LIMIT 1;
>

this works well on small tables. On large tables this query is extremely slow.

regards
Pavel

> Thom
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Getting a random row

От
Scott Marlowe
Дата:
On Wed, Oct 14, 2009 at 1:20 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2009/10/14 Thom Brown <thombrown@gmail.com>:
>> 2009/10/14 Scott Marlowe <scott.marlowe@gmail.com>:
>>>
>>> If what you're trying to do is emulate a real world app which randomly
>>> grabs rows, then you want to setup something ahead of time that has a
>>> pseudo random order and not rely on using anything like order by
>>> random() limit 1 or anything like that.  Easiest way is to do
>>> something like:
>>>
>>> select id into randomizer from maintable order by random();
>>>
>>> then use a cursor to fetch from the table to get "random" rows from
>>> the real table.
>>>
>>>
>>
>> Why not just do something like:
>>
>> SELECT thisfield, thatfield
>> FROM my_table
>> WHERE thisfield IS NOT NULL
>> ORDER BY RANDOM()
>> LIMIT 1;
>>
>
> this works well on small tables. On large tables this query is extremely slow.

Exactly.  If you're running that query over and over your "performance
test" is on how well pgsql can run that very query. :)  Anything else
you do is likely to be noise by comparison.

Re: Getting a random row

От
Віталій Тимчишин
Дата:


2009/10/14 Scott Marlowe <scott.marlowe@gmail.com>
On Wed, Oct 14, 2009 at 1:20 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2009/10/14 Thom Brown <thombrown@gmail.com>:
>> 2009/10/14 Scott Marlowe <scott.marlowe@gmail.com>:
>> Why not just do something like:
>>
>> SELECT thisfield, thatfield
>> FROM my_table
>> WHERE thisfield IS NOT NULL
>> ORDER BY RANDOM()
>> LIMIT 1;
>>
>
> this works well on small tables. On large tables this query is extremely slow.

Exactly.  If you're running that query over and over your "performance
test" is on how well pgsql can run that very query. :)  Anything else
you do is likely to be noise by comparison.


What I am using often to get a set of random rows is
SELECT thisfield, thatfield
FROM my_table
WHERE random() < rowsneeded::float8/(select count * from my_table);
Of course it does not give exact number of rows, but close enough for me.
As of taking one row I'd try:
select * from (
SELECT thisfield, thatfield
FROM my_table
WHERE random() < 100.0/(select count * from my_table))
a order by random() limit 1

I'd say probability of returning no rows is quite low and query can be extended even more by returning first row from table in this rare case.