Re: tablesample performance

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: tablesample performance
Дата
Msg-id 31741.1476821180@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: tablesample performance  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: tablesample performance  (Simon Riggs <simon@2ndquadrant.com>)
Список pgsql-general
Simon Riggs <simon@2ndquadrant.com> writes:
> On 18 October 2016 at 19:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If you don't want to have an implicit bias towards earlier blocks,
>> I don't think that either standard tablesample method is really what
>> you want.
>>
>> The contrib/tsm_system_rows tablesample method is a lot closer, in
>> that it will start at a randomly chosen block, but if you just do
>> "tablesample system_rows(1)" then you will always get the first row
>> in whichever block it lands on, so it's still not exactly unbiased.

> Is there a reason why we can't fix the behaviours of the three methods
> mentioned above by making them all start at a random block and a
> random item between min and max?

The standard tablesample methods are constrained by other requirements,
such as repeatability.  I am not sure that loading this one on top of
that is a good idea.  The bias I referred to above is *not* the fault
of the sample methods, rather it's the fault of using "LIMIT 1".

It does seem like maybe it'd be nice for tsm_system_rows to start at a
randomly chosen entry in the first block it visits, rather than always
dumping that entire block.  Then "tablesample system_rows(1)" would
actually give you a pretty random row, and I think we aren't giving up
any useful properties it has now.

            regards, tom lane


В списке pgsql-general по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: tablesample performance
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: tablesample performance