Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Дата
Msg-id a0aa7043-26cd-5473-a9d4-ef3bd035f132@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Ответы Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers
On 1/10/19 4:20 PM, Dean Rasheed wrote:
> ...
>
> So perhaps what we should do for multivariate stats is simply use the
> relative standard error approach (i.e., reuse the patch in [2] with a
> 20% RSE cutoff). That had a lot of testing at the time, against a wide
> range of data distributions, and proved to be very good, not to
> mention being very simple.
> 
> That approach would encompass both groups more and less common than
> the base frequency, because it relies entirely on the group appearing
> enough times in the sample to infer that any errors on the resulting
> estimates will be reasonably well controlled. It wouldn't actually
> look at the base frequency at all in deciding which items to keep.
> 

I've been looking at this approach today, and I'm a bit puzzled. That
patch essentially uses SRE to compute mincount like this:

    mincount = n*(N-n) / (N-n+0.04*n*(N-1))

and then includes all items more common than this threshold. How could
that handle items significantly less common than the base frequency?

Or did you mean to use the SRE, but in some different way?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: port of INSTALL file generation to XSLT
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Early WIP/PoC for inlining CTEs