On Wednesday 07 April 2004 10:03, Ken Geis wrote:
> Richard Huxton wrote:
> > On Tuesday 06 April 2004 21:25, Ken Geis wrote:
> >>I am trying to find an efficient way to draw a random sample from a
> >>complex query. I also want it to be easy to use within my application.
> >>
> >>So I've defined a view that encapsulates the query. The id in the
> >>"driving" table is exposed, and I run a query like:
> >>
> >>select * from stats_record_view
> >> where id in (select id from driver_stats
> >> order by random()
> >> limit 30000);
> >
> > How about a join?
> >
> > SELECT s.*
> > FROM
> > stats_record_view s
> > JOIN
> > (SELECT id FROM driver_stats ORDER BY random() LIMIT 30000) AS r
> > ON s.id = r.id;
>
> Yes, I tried this too after I sent the first mail, and this was somewhat
> better. I ended up adding a random column to the driving table, putting
> an index on it, and exposing that column in the view. Now I can say
>
> SELECT * FROM stats_record_view WHERE random < 0.093;
>
> For my application, it's OK if the same sample is picked time after time
> and it may change if data is added.
Fair enough - that'll certainly do it.
> > Also worth checking the various list archives - this has come up in the
> > past, but some time ago.
>
> There are some messages in the archives about how to get a random
> sample. I know how to do that, and that's not why I posted my message.
> Are you saying that the planner behavior I spoke of is in the
> archives? I wouldn't know what to search on to find that thread. Does
> anyone think that the planner issue has merit to address? Can someone
> help me figure out what code I would look at?
I was assuming after getting a random subset they'd see the same problem you
are. If not, probably worth looking at. In which case, an EXPLAIN ANALYZE of
your original query would be good.
--
Richard Huxton
Archonet Ltd