Re: multiple sampling from tables and saving output

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: multiple sampling from tables and saving output
Дата
Msg-id 27273.1107791686@sss.pgh.pa.us
обсуждение исходный текст
Ответ на multiple sampling from tables and saving output  (David Orme <d.orme@imperial.ac.uk>)
Список pgsql-novice
David Orme <d.orme@imperial.ac.uk> writes:
> The process I need to do is a loop of  1000 repetitions of the
> following:

> 1) select a random subset of the data from a table
> 2) save various summaries of the randomly selected data

> I can think of various external ways of doing this - my current plan is
> to use a shell script to resend the same set of instructions repeated
> times using 'psql -f instruction_set.sql'  - but I was wondering if
> there was a canonical way of doing this within pgsql.

If you want a sample of, say, 1% of the rows in a table, you can do

    select * from mytable where random() < 0.01;

and get a genuinely unbiased sample.  Keep in mind though that you can't
get an exact sample size this way --- it'll be close to 1% but probably
not spot on.

            regards, tom lane

В списке pgsql-novice по дате отправления:

Предыдущее
От: DAVANNE Eric - NTR
Дата:
Сообщение: password expiration interval
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Percent of update completed