Re: ANALYZE sampling is too good

Поиск

Список

Период

Сортировка

От	Simon Riggs
Тема	Re: ANALYZE sampling is too good
Дата	10 декабря 2013 г. 22:23:50
Msg-id	CA+U5nMJP0r-qkMaE_JNeEq4akgbZtWaLwkabYY_YoxY8w1u=6Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: ANALYZE sampling is too good (Andres Freund <andres@2ndquadrant.com>)
Ответы	Re: ANALYZE sampling is too good Re: ANALYZE sampling is too good
Список	pgsql-hackers

Дерево обсуждения

On 6 December 2013 09:21, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-12-05 17:52:34 -0800, Peter Geoghegan wrote:
>> Has anyone ever thought about opportunistic ANALYZE piggy-backing on
>> other full-table scans? That doesn't really help Greg, because his
>> complaint is mostly that a fresh ANALYZE is too expensive, but it
>> could be an interesting, albeit risky approach.
>
> What I've been thinking of is
>
> a) making it piggy back on scans vacuum is doing instead of doing
> separate ones all the time (if possible, analyze needs to be more
> frequent). Currently with quite some likelihood the cache will be gone
> again when revisiting.

> b) make analyze incremental. In lots of bigger tables most of the table
> is static - and we actually *do* know that, thanks to the vm. So keep a
> rawer form of what ends in the catalogs around somewhere, chunked by the
> region of the table the statistic is from. Everytime a part of the table
> changes, re-sample only that part. Then recompute the aggregate.

Piggy-backing sounds like a bad thing. If I run a query, I don't want
to be given some extra task thanks! Especially if we might need to run
data type code I'm not authroised to run, or to sample data I may not
be authorised to see. The only way that could work is to kick off an
autovacuum worker to run the ANALYZE as a separate process and then
use synchronous scan behaviour to derive benefit indirectly.

However, these things presume that we need to continue scanning most
of the blocks of the table, which I don't think needs to be the case.
There is a better way.

Back in 2005/6, I advocated a block sampling method, as described by
Chaudri et al (ref?)
That has two advantages

* We don't need to visit all of the blocks, reducing I/O

* It would also give better analysis of clustered data (its all in
that paper...)

The recent patch on TABLESAMPLE contained a rewrite of the guts of
ANALYZE into a generic sample scan, which would then be used as the
basis for a TABLESAMPLE BERNOULLI. A TABLESAMPLE SYSTEM could use a
block sample, as it does in some other DBMS (DB2, SQLServer).

While I was unimpressed with the TABLESAMPLE patch, all it needs is
some committer-love, so Greg...

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Andres Freund
Дата: 10 декабря 2013 г., 22:16:29
Сообщение: Re: Errors on missing pg_subtrans/ files with 9.3

Следующее

От: Andres Freund
Дата: 10 декабря 2013 г., 22:28:48
Сообщение: Re: ANALYZE sampling is too good

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: ANALYZE sampling is too good

Предыдущее

Следующее