extended statistics - functional dependencies vs. MCV lists

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема extended statistics - functional dependencies vs. MCV lists
Дата
Msg-id da3bb397-26df-8a0e-4ff4-0cfacb0ababf@enterprisedb.com
обсуждение исходный текст
Список pgsql-hackers
Hi,

over in the pgsql-general channel, Michael Lewis reported [1] a bit
strange behavior switching between good/bad estimates with extended
statistics.

The crux of the issue is that with statistics containing both MCV and
functional dependencies, we prefer applying the MCV. And functional
dependencies are used only for the remaining clauses on columns not
covered by the MCV list.

This works perfectly fine when the clauses match a MCV item (or even
multiple of them). But if there's no matching MCV item, this may be
problematic - statext_mcv_clauselist_selectivity tries to be smart, but
when the MCV represents only a small fraction of the data set the
results may not be far from just a product of selectivities (as if the
clauses were independent).

So I'm wondering about two things:

1) Does it actually make sense to define extended statistics with both
MCV and functional dependencies? ISTM the MCV part will always filter
all the clauses, before we even try to apply the dependencies.

2) Could we consider the functional dependencies when estimating the
part not covered by the MCV list. Of course, this could only help with
equality clauses (as supported by functional dependencies).


regards


[1]
https://www.postgresql.org/message-id/CAMcsB%3Dy%3D3G_%2Bs_zFYPu2-O6OMWOvOkb3t1MU%3D907yk5RC_RaYw%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Autovacuum worker doesn't immediately exit on postmaster death
Следующее
От: David Rowley
Дата:
Сообщение: Re: Hybrid Hash/Nested Loop joins and caching results from subplans