Re: ANALYZE sampling is too good

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: ANALYZE sampling is too good
Дата
Msg-id CAA4eK1K1R011==4-xuYe9WYFqWQiT=Hayp-Aa4J=gc0Xy9=2xA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: ANALYZE sampling is too good  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On Fri, Dec 6, 2013 at 7:22 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Dec 5, 2013 at 3:50 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> There are fairly well researched algorithms for block-based sampling
>> which estimate for the skew introduced by looking at consecutive rows in
>> a block.  In general, a minimum sample size of 5% is required, and the
>> error is no worse than our current system.  However, the idea was shot
>> down at the time, partly because I think other hackers didn't get the math.
>
> I think that this certainly warrants revisiting. The benefits would be
> considerable.
>
> Has anyone ever thought about opportunistic ANALYZE piggy-backing on
> other full-table scans? That doesn't really help Greg, because his
> complaint is mostly that a fresh ANALYZE is too expensive, but it
> could be an interesting, albeit risky approach.

Is only fresh ANALYZE costly or consecutive one's are also equally costly?

Doing it in some background operation might not be a bad idea, but doing it
in backend query execution (seq scan) might add overhead for query response time
especially if part or most of data for table is in RAM, so here
overhead due to actual read
might not be very high but the calculation for analyse (like sort)
will make it costly.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: Re: WITHIN GROUP patch
Следующее
От: Andres Freund
Дата:
Сообщение: Re: ANALYZE sampling is too good