Re: strange plan with bitmap heap scan and multiple partial indexes

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: strange plan with bitmap heap scan and multiple partial indexes
Дата	12 июля 2015 г. 00:20:34
Msg-id	55A18897.5090008@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: strange plan with bitmap heap scan and multiple partial indexes (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: strange plan with bitmap heap scan and multiple partial indexes (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

On 07/11/2015 06:32 PM, Tom Lane wrote:
...
>
> Presumably, this is happening because the numbers of rows actually
> satisfying the index predicates are so small that it's a matter of
> luck whether any of them are included in ANALYZE's sample.
>
> Given this bad data for the index sizes, it's not totally surprising
> that choose_bitmap_and() does something wacko. I'm not sure whether
> we should try to make it smarter, or write this off as "garbage in,
> garbage out".

I think we should make it smarter, if possible - while this example is 
somewhat artificial, partial indexes are often used exactly like this, 
i.e. to index only very small subset of data. A good example may be an 
index on "active invoices", i.e. invoices that were yet sorted out. 
There may be a lot of invoices in the table, but only very small 
fraction of them will be active (and thus in the index).

So I don't think is an artificial problem, and we should not write it 
off as "garbage in".

> Another idea is to not trust any individual ANALYZE's estimate of
> the index rowcount so completely. (I'd thought that the
> moving-average logic would get applied to that, but it doesn't seem
> to be kicking in for some reason.)
>
> We could probably make this smarter if we were willing to apply the
> predicate-proof machinery in more situations; in this example, once
> we know that idx001 is applicable, we really should disregard idx002
> and idx003 altogether because their predicates are implied by
> idx001's. I've always been hesitant to do that because the cost of
> checking seemed likely to greatly outweigh the benefits. But since
> Tomas is nosing around in this territory already, maybe he'd like to
>  investigate that further.

I think there are two possible approaches in general - we may improve 
the statistics somehow, or we may start doing the predicate proofing.

I doubt approaching this at the statistics level alone is sufficient, 
because even with statistics target 10k (i.e. the most detailed one), 
the sample is still fixed-size. So there will always exist a combination 
of a sufficiently large data set and selective partial index, causing 
trouble with the sampling.

Moreover, I can't really think of a way to fix this at the statistics 
level. Maybe there's a clever trick guarding against this particular 
issue, but my personal experience is that whenever I used such a smart 
hack, it eventually caused strange issues elsewhere.

So I think the predicate proofing is a better approach, but of course 
the planning cost may be an issue. But maybe we can make this cheaper by 
some clever tricks? For example, given two predicates A and B, it seems 
that if A => B, then selectivity(A) <= selectivity(B). Could we use this 
to skip some of the expensive stuff? We should have the selectivities 
anyway, no?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 11 июля 2015 г., 23:28:46
Сообщение: TABLESAMPLE patch is really in pretty sad shape

Следующее

От: Joe Conway
Дата: 12 июля 2015 г., 00:22:04
Сообщение: Re: more RLS oversights

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: strange plan with bitmap heap scan and multiple partial indexes

Предыдущее

Следующее