Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

Поиск
Список
Период
Сортировка
От Josh Berkus
Тема Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
Дата
Msg-id 55C3C0CD.6090805@agliodbs.com
обсуждение исходный текст
Ответ на Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
Список pgsql-hackers
On 08/06/2015 01:14 PM, Josh Berkus wrote:
> On 08/06/2015 01:10 PM, Simon Riggs wrote:
>> Given, user-stated probability of accessing a block of P and N total
>> blocks, there are a few ways to implement block sampling.
>>
>> 1. Test P for each block individually. This gives a range of possible
>> results, with 0 blocks being possible outcome, though decreasing in
>> probability as P increases for fixed N. This is the same way BERNOULLI
>> works, we just do it for blocks rather than rows.
>>
>> 2. We calculate P/N at start of scan and deliver this number blocks by
>> random selection from N available blocks.
>>
>> At present we do (1), exactly as documented. (2) is slightly harder
>> since we'd need to track which blocks have been selected already so we
>> can use a random selection with no replacement algorithm. On a table
>> with uneven distribution of rows this would still return a variable
>> sample size, so it didn't seem worth changing.
> 
> Aha, thanks!
> 
> So, seems like this is just a doc issue? That is, we just need to
> document that using SYSTEM on very small sample sizes may return
> unexpected numbers of results ... and maybe also how the algorithm
> actually works.

Following up on this ... where is TABLESAMPLE documented other than in
the SELECT command?  Doc search on the website is having issues right
now.  I'm happy to write a doc patch.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows