Re: MCV lists for highly skewed distributions

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: MCV lists for highly skewed distributions
Дата
Msg-id CA+TgmoZOHwXjyrw4Fi-Lrdz5ChB6GN77xg4adJQvgg70_9kjAA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: MCV lists for highly skewed distributions  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Ответы Re: MCV lists for highly skewed distributions
Список pgsql-hackers
On Thu, Feb 1, 2018 at 12:21 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> For a highly skewed distribution, it is possible for there to be
> hardly any values (maybe only one) that appears more than 1.25 times
> the average frequency, and so lots of otherwise perfectly good common
> values get discarded. For such a distribution, I don't think that the
> average frequency is particularly interesting, and it shouldn't be
> used to filter the MCV list.

Although I don't think I have enough math background to analyze this
as rigorously as you, I agree with you on this point: the average
frequency doesn't seem very interesting.

One point which I want to emphasize is that the length of the MCV list
bounds the estimated frequency of non-MCVs in two ways: no non-MCV is
ever thought to be more frequent than the least-common MCVs, and
however many non-MCVs we think we have (probably fewer than we
actually have) have to fit into whatever percentage of the table is
consumed by MCVs.  This would be less important if we had reliable
n_distinct estimates, but we don't.  So, even throwing things into the
MCV list that are no more common than the average item can improve
planning in some cases.

> Testing it with the example from [1], it does indeed appear to solve
> the too-many-MCVs problem in that particular case (actually generating
> no MCVs).
>
> Testing it with the highly skewed example at the start of this thread,
> it typically generates a couple more MCVs, producing somewhat better
> estimates, but to get really good estimates for who=17, you need to
> crank up the stats target. It does at least appear to no longer be the
> case that cranking up the stats target has a weak effect.

Hmm, nice result.  I agree with you that it would be nice if we can
come up with a good general-purpose algorithm for this, rather than
making it pluggable.  I don't know whether we can succeed in doing
that but we should try hard, because it's always better when things
just work automatically...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: JIT compiling with LLVM v9.0
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] why not parallel seq scan for slow functions