Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

Поиск
Список
Период
Сортировка
От Josh Berkus
Тема Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
Дата
Msg-id 55C3B222.1060301@agliodbs.com
обсуждение исходный текст
Ответы Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
Version: 9.5alpha2

Issue: when requesting small samples, SYSTEM often returns zero rows,
and sometimes returns unexpected numbers of rows.

Example:

create table thous ( id int, val text );
insert into thous select i, i::text || '-val'  from
generate_series(1,100000) as gs(i);
analyze;

This is a 100,000 row table, so a 0.01% sample should be 10 rows.  Since
10 rows is far less than what's on one data page, the documentation
suggests that I should get 1 data page worth of rows.   However:

postgres=# select * from thous tablesample system ( 0.01 );id | val
----+-----
(0 rows)

Time: 0.636 ms

... this query consistently returns 0 rows, in 20 runs.

Ok,let's try a million-row table, which is the example we have in the docs.

postgres=# select * from mil tablesample system ( 0.01 );id | val
----+-----
(0 rows)

Hmmm?

On testing, the query against the million-row table returns 0 rows
around 50% of the time.

This table has around 185 rows per page.  As the sample size goes up,
the number times I get zero rows goes down, but those results seem to
still include data pages with zero rows.  For example, here's a series
of results from a 0.04 sample against the million-row table.

370
370
370
555
555
185
0
925

Since this is a synthetic table I just generated, it contains almost
exactly 185 rows per data page for every data page.  So on a 0.04%
sample, the variation between 370 rows and 555 rows (whether we have 2
or 3 data pages) is expected, since 0.04% of 5406 data pages is 2.16 pages.

The results of 0, 185 and 925 are not.  It really seems like SYSTEM is
treating 0.04% as a maximum, but taking a random number of data pages
somewhere around that maximum, using math which can choose numbers of
pages far outside of the % requested by the user, and which includes 0.

Speaking from a user perspective, SYSTEM seems broken to me.  I can't
imagine using it for anything with a that degree of variation in the
number of results returned, especially if it's possible to return zero
rows from a populated table.

BERNOULLI works as expected.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Race conditions in shm_mq.c
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows