Re: TABLESAMPLE patch

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: TABLESAMPLE patch
Дата
Msg-id 55283630.7090201@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: TABLESAMPLE patch  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On 10/04/15 22:16, Tomas Vondra wrote:
>
>
> On 04/10/15 21:57, Petr Jelinek wrote:
>> On 10/04/15 21:26, Peter Eisentraut wrote:
>>
>> But this was not really my point, the BERNOULLI just does not work
>> well with row-limit by definition, it applies probability on each
>> individual row and while you can get probability from percentage very
>> easily (just divide by 100), to get it for specific target number of
>> rows you have to know total number of source rows and that's not
>> something we can do very accurately so then you won't get 500 rows
>> but approximately 500 rows.
>
> It's actually even trickier. Even if you happen to know the exact number
> of rows in the table, you can't just convert that into a percentage like
> that and use it for BERNOULLI sampling. It may give you different number
> of result rows, because each row is sampled independently.
>
> That is why we have Vitter's algorithm for reservoir sampling, which
> works very differently from BERNOULLI.
>

Hmm this actually gives me idea - perhaps we could expose Vitter's 
reservoir sampling as another sampling method for people who want the 
"give me 500 rows from table fast" then? We already have it implemented 
it's just matter of adding the glue.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: TABLESAMPLE patch
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: raw output from copy