Re: ANALYZE sampling is too good

Поиск

Список

Период

Сортировка

От	Josh Berkus
Тема	Re: ANALYZE sampling is too good
Дата	10 декабря 2013 г. 05:22:26
Msg-id	52A67AD6.6050302@agliodbs.com обсуждение исходный текст
Ответ на	ANALYZE sampling is too good (Greg Stark <stark@mit.edu>)
Список	pgsql-hackers

Дерево обсуждения

On 12/09/2013 02:37 PM, Robert Haas wrote:
> I've never seen an n_distinct value of more than 5 digits, regardless
> of reality.  Typically I've seen 20-50k, even if the real number is
> much higher.  But the n_distinct value is only for non-MCVs, so if we
> estimate the selectivity of column = 'rarevalue' to be
> (1-nullfrac-mcvfrac)/n_distinct, then making mcvfrac bigger reduces
> the estimate, and making the MCV list longer naturally makes mcvfrac
> bigger.  I'm not sure how important the
> less-frequent-than-the-least-common-MCV part is, but I'm very sure
> that raising the statistics target helps to solve the problem of
> overestimating the prevalence of uncommon values in a very big table.

I did an analysis of our ndistinct algorithm several years ago ( ~~
8.1), and to sum up:

1. we take far too small of a sample to estimate ndistinct well for
tables larger than 100,000 rows.

2. the estimation algo we have chosen is one which tends to be wrong in
the downwards direction, rather strongly so.  That is, if we could
potentially have an ndistinct of 1000 to 100,000 based on the sample,
our algo estimates 1500 to 3000.

3. Other algos exist.  The tend to be wrong in other directions.

4. Nobody has done an analysis of whether it's worse, on average, to
estimate low vs. high for ndistinct.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Mark Kirkwood
Дата: 10 декабря 2013 г., 05:18:05
Сообщение: Re: ANALYZE sampling is too good

Следующее

От: Mark Kirkwood
Дата: 10 декабря 2013 г., 05:32:11
Сообщение: Re: ANALYZE sampling is too good

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: ANALYZE sampling is too good

Предыдущее

Следующее