Re: strange plan with bitmap heap scan and multiple partial indexes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: strange plan with bitmap heap scan and multiple partial indexes
Дата
Msg-id 55A18897.5090008@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: strange plan with bitmap heap scan and multiple partial indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: strange plan with bitmap heap scan and multiple partial indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 07/11/2015 06:32 PM, Tom Lane wrote:
...
>
> Presumably, this is happening because the numbers of rows actually
> satisfying the index predicates are so small that it's a matter of
> luck whether any of them are included in ANALYZE's sample.
>
> Given this bad data for the index sizes, it's not totally surprising
> that choose_bitmap_and() does something wacko. I'm not sure whether
> we should try to make it smarter, or write this off as "garbage in,
> garbage out".

I think we should make it smarter, if possible - while this example is 
somewhat artificial, partial indexes are often used exactly like this, 
i.e. to index only very small subset of data. A good example may be an 
index on "active invoices", i.e. invoices that were yet sorted out. 
There may be a lot of invoices in the table, but only very small 
fraction of them will be active (and thus in the index).

So I don't think is an artificial problem, and we should not write it 
off as "garbage in".

> Another idea is to not trust any individual ANALYZE's estimate of
> the index rowcount so completely. (I'd thought that the
> moving-average logic would get applied to that, but it doesn't seem
> to be kicking in for some reason.)
>
> We could probably make this smarter if we were willing to apply the
> predicate-proof machinery in more situations; in this example, once
> we know that idx001 is applicable, we really should disregard idx002
> and idx003 altogether because their predicates are implied by
> idx001's. I've always been hesitant to do that because the cost of
> checking seemed likely to greatly outweigh the benefits. But since
> Tomas is nosing around in this territory already, maybe he'd like to
>  investigate that further.

I think there are two possible approaches in general - we may improve 
the statistics somehow, or we may start doing the predicate proofing.

I doubt approaching this at the statistics level alone is sufficient, 
because even with statistics target 10k (i.e. the most detailed one), 
the sample is still fixed-size. So there will always exist a combination 
of a sufficiently large data set and selective partial index, causing 
trouble with the sampling.

Moreover, I can't really think of a way to fix this at the statistics 
level. Maybe there's a clever trick guarding against this particular 
issue, but my personal experience is that whenever I used such a smart 
hack, it eventually caused strange issues elsewhere.

So I think the predicate proofing is a better approach, but of course 
the planning cost may be an issue. But maybe we can make this cheaper by 
some clever tricks? For example, given two predicates A and B, it seems 
that if A => B, then selectivity(A) <= selectivity(B). Could we use this 
to skip some of the expensive stuff? We should have the selectivities 
anyway, no?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: TABLESAMPLE patch is really in pretty sad shape
Следующее
От: Joe Conway
Дата:
Сообщение: Re: more RLS oversights