Обсуждение: Getting a random row
Hi,
I am running performance simulation against a DB. I want to randomly pull different records from a large table. However the table has no columns that hold sequential integer values (1..MAX), i.e. the columns all have "holes" (due to earlier filtering). Also PG does not have a concept of an auto-increment pseudo-column like Oracle's "rownum". Any suggestions?
Thanks,
-- Shaul
I am running performance simulation against a DB. I want to randomly pull different records from a large table. However the table has no columns that hold sequential integer values (1..MAX), i.e. the columns all have "holes" (due to earlier filtering). Also PG does not have a concept of an auto-increment pseudo-column like Oracle's "rownum". Any suggestions?
Thanks,
-- Shaul
On Tue, Oct 13, 2009 at 4:17 PM, Shaul Dar <shauldar@gmail.com> wrote:
Hi,
I am running performance simulation against a DB. I want to randomly pull different records from a large table. However the table has no columns that hold sequential integer values (1..MAX), i.e. the columns all have "holes" (due to earlier filtering).
what do yo umean ? you can restrict range of integer column (or any other type) with constraints, for instance CHECK foo( a between 1 and 100);
Also PG does not have a concept of an auto-increment pseudo-column like Oracle's "rownum". Any suggestions?
not true - it has sequences, and pseudo type serial. Rtfm!.
--
GJ
Shaul Dar, 13.10.2009 17:17: > Also PG does not have a concept of an auto-increment pseudo-column > like Oracle's "rownum". Any suggestions? Yes it does (at least 8.4) SELECT row_number() over(), the_other_columns... FROM your_table So you could do something like: SELECT * FROM ( SELECT row_number() over() as rownum, the_other_columns... FROM your_table ) t WHERE t.rownum = a_random_integer_value_lower_than_rowcount; Thomas
Sorry, I guess I wasn't clear.
I have an existing table in my DB, and it doesn't have a column with serial values (actually it did originally, but due to later deletions of about 2/3 of the rows the column now has "holes"). I realize I could add a new serial column, but prefer not to change table + the new column would also become nonconsecutive after further deletions. The nice thing about Oracle's "rownum" is that it' a pseudo-column", not a real one, and AFAIK is always valid.
Suggestions?
-- Shaul
I have an existing table in my DB, and it doesn't have a column with serial values (actually it did originally, but due to later deletions of about 2/3 of the rows the column now has "holes"). I realize I could add a new serial column, but prefer not to change table + the new column would also become nonconsecutive after further deletions. The nice thing about Oracle's "rownum" is that it' a pseudo-column", not a real one, and AFAIK is always valid.
Suggestions?
-- Shaul
2009/10/13 Grzegorz Jaśkiewicz <gryzman@gmail.com>
On Tue, Oct 13, 2009 at 4:17 PM, Shaul Dar <shauldar@gmail.com> wrote:Hi,
I am running performance simulation against a DB. I want to randomly pull different records from a large table. However the table has no columns that hold sequential integer values (1..MAX), i.e. the columns all have "holes" (due to earlier filtering).what do yo umean ? you can restrict range of integer column (or any other type) with constraints, for instance CHECK foo( a between 1 and 100);
Also PG does not have a concept of an auto-increment pseudo-column like Oracle's "rownum". Any suggestions?not true - it has sequences, and pseudo type serial. Rtfm!.
--
GJ
On Oct 13, 2009, at 11:19 , Grzegorz Jaśkiewicz wrote: > On Tue, Oct 13, 2009 at 4:17 PM, Shaul Dar <shauldar@gmail.com> wrote: > > >> Also PG does not have a concept of an auto-increment pseudo-column >> like >> Oracle's "rownum". Any suggestions? >> > not true - it has sequences, and pseudo type serial. Rtfm!. AIUI, rownum applies numbering to output rows in a SELECT statement, rather than some actual column of the table, which is likely what the OP is getting at. http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html Michael Glaesemann grzm seespotcode net
2009/10/13 Shaul Dar <shauldar@gmail.com>
Sorry, I guess I wasn't clear.
I have an existing table in my DB, and it doesn't have a column with serial values (actually it did originally, but due to later deletions of about 2/3 of the rows the column now has "holes"). I realize I could add a new serial column, but prefer not to change table + the new column would also become nonconsecutive after further deletions. The nice thing about Oracle's "rownum" is that it' a pseudo-column", not a real one, and AFAIK is always valid.
change the default of that column to use sequence.
For instance, lookup CREATE SEQUENCE in manual, and ALTER TABLE .. SET DEFAULT ..
for example of how it looks, just create table foo(a serial), and check its definition with \d+ foo
For instance, lookup CREATE SEQUENCE in manual, and ALTER TABLE .. SET DEFAULT ..
for example of how it looks, just create table foo(a serial), and check its definition with \d+ foo
--
GJ
Michael,
You are right. I didn't remember the semantics, and Oracle's rownum would not have been helpful. But the new row_number() in 8.4 would probably do the trick (though I use 8.3.7 :-( )
-- Shaul
You are right. I didn't remember the semantics, and Oracle's rownum would not have been helpful. But the new row_number() in 8.4 would probably do the trick (though I use 8.3.7 :-( )
-- Shaul
2009/10/13 Michael Glaesemann <grzm@seespotcode.net>
On Oct 13, 2009, at 11:19 , Grzegorz Jaśkiewicz wrote:Also PG does not have a concept of an auto-increment pseudo-column likenot true - it has sequences, and pseudo type serial. Rtfm!.
Oracle's "rownum". Any suggestions?
AIUI, rownum applies numbering to output rows in a SELECT statement, rather than some actual column of the table, which is likely what the OP is getting at.
http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html
Michael Glaesemann
grzm seespotcode net
2009/10/13 Grzegorz Jaśkiewicz <gryzman@gmail.com>: > > > 2009/10/13 Shaul Dar <shauldar@gmail.com> >> >> Sorry, I guess I wasn't clear. >> I have an existing table in my DB, and it doesn't have a column with >> serial values (actually it did originally, but due to later deletions of >> about 2/3 of the rows the column now has "holes"). I realize I could add a >> new serial column, but prefer not to change table + the new column would >> also become nonconsecutive after further deletions. The nice thing about >> Oracle's "rownum" is that it' a pseudo-column", not a real one, and AFAIK is >> always valid. > > change the default of that column to use sequence. > For instance, lookup CREATE SEQUENCE in manual, and ALTER TABLE .. SET > DEFAULT .. > > for example of how it looks, just create table foo(a serial), and check its > definition with \d+ foo This is not really what he's trying to do. Oracle's rownum has completely different semantics than this. But, on 8.4, a window function should do it. ...Robert
2009/10/13 Grzegorz Jaśkiewicz <gryzman@gmail.com>: > > > 2009/10/13 Shaul Dar <shauldar@gmail.com> >> >> Sorry, I guess I wasn't clear. >> I have an existing table in my DB, and it doesn't have a column with >> serial values (actually it did originally, but due to later deletions of >> about 2/3 of the rows the column now has "holes"). I realize I could add a >> new serial column, but prefer not to change table + the new column would >> also become nonconsecutive after further deletions. The nice thing about >> Oracle's "rownum" is that it' a pseudo-column", not a real one, and AFAIK is >> always valid. > > change the default of that column to use sequence. > For instance, lookup CREATE SEQUENCE in manual, and ALTER TABLE .. SET > DEFAULT .. > > for example of how it looks, just create table foo(a serial), and check its > definition with \d+ foo > > > > -- > GJ > You could emulate rownum (aka rank) using a TEMPORARY sequence applied to your result set. http://www.postgresql.org/docs/8.3/interactive/sql-createsequence.html Not sure if this is what you're after though?
On Tue, Oct 13, 2009 at 9:17 AM, Shaul Dar <shauldar@gmail.com> wrote: > Hi, > > I am running performance simulation against a DB. I want to randomly pull > different records from a large table. However the table has no columns that > hold sequential integer values (1..MAX), i.e. the columns all have "holes" > (due to earlier filtering). Also PG does not have a concept of an > auto-increment pseudo-column like Oracle's "rownum". Any suggestions? If what you're trying to do is emulate a real world app which randomly grabs rows, then you want to setup something ahead of time that has a pseudo random order and not rely on using anything like order by random() limit 1 or anything like that. Easiest way is to do something like: select id into randomizer from maintable order by random(); then use a cursor to fetch from the table to get "random" rows from the real table.
2009/10/14 Scott Marlowe <scott.marlowe@gmail.com>: > > If what you're trying to do is emulate a real world app which randomly > grabs rows, then you want to setup something ahead of time that has a > pseudo random order and not rely on using anything like order by > random() limit 1 or anything like that. Easiest way is to do > something like: > > select id into randomizer from maintable order by random(); > > then use a cursor to fetch from the table to get "random" rows from > the real table. > > Why not just do something like: SELECT thisfield, thatfield FROM my_table WHERE thisfield IS NOT NULL ORDER BY RANDOM() LIMIT 1; Thom
2009/10/14 Thom Brown <thombrown@gmail.com>: > 2009/10/14 Scott Marlowe <scott.marlowe@gmail.com>: >> >> If what you're trying to do is emulate a real world app which randomly >> grabs rows, then you want to setup something ahead of time that has a >> pseudo random order and not rely on using anything like order by >> random() limit 1 or anything like that. Easiest way is to do >> something like: >> >> select id into randomizer from maintable order by random(); >> >> then use a cursor to fetch from the table to get "random" rows from >> the real table. >> >> > > Why not just do something like: > > SELECT thisfield, thatfield > FROM my_table > WHERE thisfield IS NOT NULL > ORDER BY RANDOM() > LIMIT 1; > this works well on small tables. On large tables this query is extremely slow. regards Pavel > Thom > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Wed, Oct 14, 2009 at 1:20 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2009/10/14 Thom Brown <thombrown@gmail.com>: >> 2009/10/14 Scott Marlowe <scott.marlowe@gmail.com>: >>> >>> If what you're trying to do is emulate a real world app which randomly >>> grabs rows, then you want to setup something ahead of time that has a >>> pseudo random order and not rely on using anything like order by >>> random() limit 1 or anything like that. Easiest way is to do >>> something like: >>> >>> select id into randomizer from maintable order by random(); >>> >>> then use a cursor to fetch from the table to get "random" rows from >>> the real table. >>> >>> >> >> Why not just do something like: >> >> SELECT thisfield, thatfield >> FROM my_table >> WHERE thisfield IS NOT NULL >> ORDER BY RANDOM() >> LIMIT 1; >> > > this works well on small tables. On large tables this query is extremely slow. Exactly. If you're running that query over and over your "performance test" is on how well pgsql can run that very query. :) Anything else you do is likely to be noise by comparison.
2009/10/14 Scott Marlowe <scott.marlowe@gmail.com>
On Wed, Oct 14, 2009 at 1:20 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:Exactly. If you're running that query over and over your "performance
> 2009/10/14 Thom Brown <thombrown@gmail.com>:
>> 2009/10/14 Scott Marlowe <scott.marlowe@gmail.com>:
>> Why not just do something like:
>>
>> SELECT thisfield, thatfield
>> FROM my_table
>> WHERE thisfield IS NOT NULL
>> ORDER BY RANDOM()
>> LIMIT 1;
>>
>
> this works well on small tables. On large tables this query is extremely slow.
test" is on how well pgsql can run that very query. :) Anything else
you do is likely to be noise by comparison.
What I am using often to get a set of random rows is
SELECT thisfield, thatfield
FROM my_table
WHERE random() < rowsneeded::float8/(select count * from my_table);
Of course it does not give exact number of rows, but close enough for me.
As of taking one row I'd try:
select * from (
SELECT thisfield, thatfield
FROM my_table
WHERE random() < 100.0/(select count * from my_table))
a order by random() limit 1
I'd say probability of returning no rows is quite low and query can be extended even more by returning first row from table in this rare case.