Re: ANALYZE sampling is too good

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: ANALYZE sampling is too good
Дата	6 декабря 2013 г. 08:49:40
Msg-id	CAA4eK1K1R011==4-xuYe9WYFqWQiT=Hayp-Aa4J=gc0Xy9=2xA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: ANALYZE sampling is too good (Peter Geoghegan <pg@heroku.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, Dec 6, 2013 at 7:22 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Dec 5, 2013 at 3:50 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> There are fairly well researched algorithms for block-based sampling
>> which estimate for the skew introduced by looking at consecutive rows in
>> a block.  In general, a minimum sample size of 5% is required, and the
>> error is no worse than our current system.  However, the idea was shot
>> down at the time, partly because I think other hackers didn't get the math.
>
> I think that this certainly warrants revisiting. The benefits would be
> considerable.
>
> Has anyone ever thought about opportunistic ANALYZE piggy-backing on
> other full-table scans? That doesn't really help Greg, because his
> complaint is mostly that a fresh ANALYZE is too expensive, but it
> could be an interesting, albeit risky approach.

Is only fresh ANALYZE costly or consecutive one's are also equally costly?

Doing it in some background operation might not be a bad idea, but doing it
in backend query execution (seq scan) might add overhead for query response time
especially if part or most of data for table is in RAM, so here
overhead due to actual read
might not be very high but the calculation for analyse (like sort)
will make it costly.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: ANALYZE sampling is too good