Re: ANALYZE sampling is too good

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: ANALYZE sampling is too good
Дата	11 декабря 2013 г. 05:11:50
Msg-id	CAMkU=1weFZ-k=z2Utu=kTHe7R5eqR45ujWdNVGC+UHU7n+RZNw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: ANALYZE sampling is too good (Simon Riggs <simon@2ndQuadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Tuesday, December 10, 2013, Simon Riggs wrote:

On 11 December 2013 00:28, Greg Stark <stark@mit.edu> wrote:
> On Wed, Dec 11, 2013 at 12:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Block sampling, with parameter to specify sample size. +1
>
> Simon this is very frustrating. Can you define "block sampling"?

Blocks selected using Vitter's algorithm, using a parameterised
fraction of the total.

OK, thanks for defining that.

We only need Vitter's algorithm when we don't know in advance how many items we are sampling from (such as for tuples--unless we want to rely on the previous estimate for the current round of analysis). But for blocks, we do know how many there are, so there are simpler ways to pick them.

When we select a block we should read all rows on that block, to help
identify the extent of clustering within the data.

But we have no mechanism to store such information (or to use it if it were stored), nor even ways to prevent the resulting skew in the sample from seriously messing up the estimates which we do have ways of storing and using.

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 11 декабря 2013 г., 05:04:05
Сообщение: Re: ANALYZE sampling is too good

Следующее

От: Jeff Janes
Дата: 11 декабря 2013 г., 05:33:55
Сообщение: Re: Why we are going to have to go DirectIO

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: ANALYZE sampling is too good

Предыдущее

Следующее