Re: Improving N-Distinct estimation by ANALYZE

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Improving N-Distinct estimation by ANALYZE
Дата
Msg-id 87oe2r44sc.fsf@stark.xeocode.com
обсуждение исходный текст
Ответ на Re: Improving N-Distinct estimation by ANALYZE  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: Improving N-Distinct estimation by ANALYZE  (Josh Berkus <josh@agliodbs.com>)
Re: Improving N-Distinct estimation by ANALYZE  (Simon Riggs <simon@2ndquadrant.com>)
Список pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> writes:

> The approach I suggested uses the existing technique for selecting
> random blocks, then either an exhaustive check on all of the rows in a
> block or the existing random row approach, depending upon available
> memory. We need to check all of the rows in a reasonable sample of
> blocks otherwise we might miss clusters of rows in large tables - which
> is the source of the problems identified.
> 
> The other reason was to increase the sample size, which is a win in any
> form of statistics.

Only if your sample is random and independent. The existing mechanism tries
fairly hard to ensure that every record has an equal chance of being selected.
If you read the entire block and not appropriate samples then you'll introduce
systematic sampling errors. For example, if you read an entire block you'll be
biasing towards smaller records.

I think it would be useful to have a knob to increase the sample size
separately from the knob for the amount of data retained in the statistics
tables. Though I think you'll be disappointed and find you have to read an
unreasonably large sample out of the table before you get more useful distinct
estimates.

Certainly it's worth testing this in a low impact way like just keeping the
existing sample method and dialing up the sample sizes before you try anything
that would sacrifice the statistical validity of the more solid estimates.

-- 
greg



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Heads up: upcoming back-branch re-releases
Следующее
От: Jeremy Drake
Дата:
Сообщение: Re: catalog corruption bug