Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Дата
Msg-id CAEZATCURhZv0kMsy9PoWvx+4nuvmfGi42f9kte7ZNyWOJRKKjw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Список pgsql-hackers
On Mon, 25 Mar 2019 at 23:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> Attached is an updated patch, fixing all the issues pointed out so far.
> Unless there are some objections, I plan to commit the 0001 part by the
> end of this CF. Part 0002 is a matter for PG13, as previously agreed.
>

Yes, I think that's reasonable. It looks to be in pretty good shape. I
have reviewed most of the actual code, but note that I haven't
reviewed the docs changes and I didn't spend much time reading code
comments. It might benefit from a quick once-over comment tidy up.

I just looked through the latest set of changes and I have a couple of
additional review comments:

In the comment about WIDTH_THRESHOLD, s/pg_statistic/pg_statistic_ext/.

In statext_mcv_build(), I'm not convinced by the logic around keeping
the whole MCV list if it fits. Suppose there were a small number of
very common values, and then a bunch of uniformly distributed less
common values. The sample might consist of all the common values, plus
one or two instances of some of the uncommon ones, leading to a list
that would fit, but it would not be appropriate to keep the uncommon
values on the basis of having seen them only one or two times. The
fact that the list of items seen fits doesn't by itself mean that
they're all common enough to justify being kept. In the per-column
stats case, there are a bunch of other checks that have to pass, which
are intended to test not just that the list fits, but that it believes
that those are all the items in the table. For MV stats, you don't
have that, and so I think it would be best to just remove that test
(the "if (ngroups > nitems)" test) and *always* call
get_mincount_for_mcv_list() to determine how many MCV items to keep.
Otherwise there is a risk of keeping too many MCV items, with the ones
at the tail end of the list producing poor estimates.

Regards,
Dean


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yugo Nagata
Дата:
Сообщение: Re: Improvement of installation document
Следующее
От: Surafel Temesgen
Дата:
Сообщение: Re: FETCH FIRST clause PERCENT option