Обсуждение: random rows
How I can return random N rows from my select stmt? like: e.g. what my selectee is a set of 1000 rows, I want randomly pickup 100 of 1000. thanks. Jie LIANG St. Bernard Software 10350 Science Center Drive Suite 100, San Diego, CA 92121 Office:(858)320-4873 jliang@ipinc.com www.stbernard.com www.ipinc.com
thanks. Jie LIANG St. Bernard Software 10350 Science Center Drive Suite 100, San Diego, CA 92121 Office:(858)320-4873 jliang@ipinc.com www.stbernard.com www.ipinc.com On Thu, 26 Apr 2001, Ing. Roberto Andrade Fonseca wrote: > On Thu, 26 Apr 2001, Jie Liang wrote: > > > > > How I can return random N rows from my select stmt? > > like: > > e.g. what my selectee is a set of 1000 rows, I want randomly > > pickup 100 of 1000. > > > I've donne something like: > > prereg_iw=# select random(), nombre from asistente order by 1 limit 10; > random | nombre > ----------------------+--------------- > 0.000214523170243261 | ALFONSO > 0.000478655100091666 | OSCAR > 0.000892118085591177 | JOSE LUIS > 0.000972398091560415 | IGNACIO > 0.00109919300354048 | NORBERTO > 0.00180558208460248 | SALVADOR > 0.00196880800741204 | ANDRES NOE > 0.00197171233686233 | SARA MICHELLE > 0.00226354412839913 | DAVID > 0.00233715307076329 | RODOLFO > (10 rows) > > and it works! > > Saludos, > > Roberto Andrade Fonseca > randrade@abl.com.mx >
Jie Liang <jliang@ipinc.com> wrote: >e.g. what my selectee is a set of 1000 rows, I want randomly pickup 100 of >1000. Have you tried SELECT * FROM selectee ORDER BY random() LIMIT 100; ?o HTH, Ray -- <GRub[B]eR> JHM, jij bent echt nerd :)) <GRub[B]eR> maar wel een goeie :) <GRub[B]eR> Soort van programmerende furby Gezien op #cistron
On Thu, 26 Apr 2001, Jie Liang wrote: > > How I can return random N rows from my select stmt? > like: > e.g. what my selectee is a set of 1000 rows, I want randomly > pickup 100 of 1000. Interesting problem. You might get much better responses than this, but, two ideas that might be workable: * use a WHERE clause that checks random() > .88 . This should give you, on average, about 120 rows out of 1000, and you can add LIMIT 100 to ensure that you get only 100. But you're still biased toward the start of the list. (Or, remove the LIMIT 100, use > .9, but there's no guarantee you'll get 100-- you'll get more or less than that. * have a plpgsql routine that gets 100 random records, and copy these into a temporary table (since plpgsql can't return a recordset.) Query against this table. Or, when all else fails: * do it in your front end (Python/Perl/PHP/Pwhatever). If you get better ideas, and they aren't cc'd to the list, please do so. HTH, -- Joel Burton <jburton@scw.org> Director of Information Systems, Support Center of Washington
On Thu, 26 Apr 2001, Jie Liang wrote: > > I've donne something like: > > > > prereg_iw=# select random(), nombre from asistente order by 1 limit 10; > > random | nombre > > ----------------------+--------------- > > 0.000214523170243261 | ALFONSO > > 0.000478655100091666 | OSCAR > > 0.000892118085591177 | JOSE LUIS > > 0.000972398091560415 | IGNACIO > > 0.00109919300354048 | NORBERTO > > 0.00180558208460248 | SALVADOR > > 0.00196880800741204 | ANDRES NOE > > 0.00197171233686233 | SARA MICHELLE > > 0.00226354412839913 | DAVID > > 0.00233715307076329 | RODOLFO > > (10 rows) How totally obvious in retrospect, and how much better of a solution than the ones I offered. D'oh! Thanks for posting it to the list. -- Joel Burton <jburton@scw.org> Director of Information Systems, Support Center of Washington
Hi, I am non a "real" sysadmin or dba, but "got stuck" doing it and am trying to learn via a fairly difficult problem that my group must deal with: a LARGE volume of data. I have been working from scratch on this for about 3 weeks and have runs lots of tests. I am running postgres7.1 on a Solaris5.7 with 1GB RAM and 2 300MHZ processors and a 6GB partition. The application I am using it for is to COPY a LARGE amount of data (avg of 15k rows of 3 ints every 15 minutes or so avg from 170 machines day/250 at night) into a db and then do a query after the fact. The COPIES are done via the libq PQputline() subroutine. The after-the-fact query will postprocess the data and reduce the amount and granularity of data then load it into a new table. Ultimately, I will have 1 db with a table of about 250M rows and several other dbs each with 10's of millions... Multiple GBs of data. Here are the options I run with postmaster: postmaster -D /evsx/aus16/coers -o "-S 32768" -i -B 8192 -N 2 Here are my IPC params: set shmsys:shminfo_shmmax=524288000 set shmsys:shminfo_shmmin=16 set shmsys:shminfo_shmmni=200 set shmsys:shminfo_shmseg=200 set semsys:seminfo_semmni=500 set semsys:seminfo_semmns=500 set semsys:seminfo_semmsl=500 set semsys:seminfo_semume=100 Here are my questions: 1) NUM OF CONNECTIONS: I use -N 2 because that seems to optimize performance. If I allow more connections, the server bogs down, ultimately to a near-stand still if I allow too many connections. I assume this is because all of the competing connections are all trying to COPY to the same database and they block all but one and all the semaphore chasing slows everything down. The weird thing is that the CPU, iowait and swap waits on top do not elevate very much. What is slowing things down in this case? Currently, I have each client try for a connection and if it fails, wait for rand()%4+1 and then try again. This actually works pretty well, but it seems to me that the server should be handling this and be doing a more efficient job. Also, am I correct in assuming there is no way to keep multiple COPIES to the same db & table from blocking? When I tried to setnonblocking, data got dropped. 2) BOTTLENECK: I was running 2 queries on the 65M rows of data I had collected after I had finished loading. I had not indexed the tables. Based on the top reading below, what is the bottleneck that is slowing the query down? The same phenomenon occurs when COPYing data into the table. last pid: 15973; load averages: 0.98, 0.92, 0.78 41 processes: 38 sleeping, 1 running, 2 on cpu CPU states: 48.5% idle, 43.8% user, 5.2% kernel, 2.6% iowait, 0.0% swap Memory: 1024M real, 17M free, 109M swap in use, 2781M swap free PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND 15919 postgres 1 40 0 71M 69M cpu0 30:58 22.12% postgres 15966 postgres 1 30 0 71M 69M run 12:02 23.63% postgres 3) CONGIGURATION/SETTINGS: Are my IPC params and postmaster options set right for my application? My thinking is that I need lots of shared memory to reduce disk access. Am I missing something? Are there any other configurable kernal params that I need to know about? What exactly will the sort mem (-o "-S") buy me? Thanks for your patience! -- John Coers Intrinsity, Inc. coers@intrinsity.com Austin, Texas
On Thu, 26 Apr 2001, Jie Liang wrote: > > How I can return random N rows from my select stmt? > like: > e.g. what my selectee is a set of 1000 rows, I want randomly > pickup 100 of 1000. I've donne something like: prereg_iw=# select random(), nombre from asistente order by 1 limit 10; random | nombre ----------------------+--------------- 0.000214523170243261 | ALFONSO 0.000478655100091666 | OSCAR 0.000892118085591177 | JOSE LUIS 0.000972398091560415 | IGNACIO 0.00109919300354048 | NORBERTO 0.00180558208460248 | SALVADOR 0.00196880800741204 | ANDRES NOE 0.00197171233686233 | SARA MICHELLE 0.00226354412839913 | DAVID 0.00233715307076329 | RODOLFO (10 rows) and it works! Saludos, Roberto Andrade Fonseca randrade@abl.com.mx