Re: strange plan with bitmap heap scan and multiple partial indexes
От | Tomas Vondra |
---|---|
Тема | Re: strange plan with bitmap heap scan and multiple partial indexes |
Дата | |
Msg-id | 55A18897.5090008@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: strange plan with bitmap heap scan and multiple partial indexes (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: strange plan with bitmap heap scan and multiple partial indexes
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
On 07/11/2015 06:32 PM, Tom Lane wrote: ... > > Presumably, this is happening because the numbers of rows actually > satisfying the index predicates are so small that it's a matter of > luck whether any of them are included in ANALYZE's sample. > > Given this bad data for the index sizes, it's not totally surprising > that choose_bitmap_and() does something wacko. I'm not sure whether > we should try to make it smarter, or write this off as "garbage in, > garbage out". I think we should make it smarter, if possible - while this example is somewhat artificial, partial indexes are often used exactly like this, i.e. to index only very small subset of data. A good example may be an index on "active invoices", i.e. invoices that were yet sorted out. There may be a lot of invoices in the table, but only very small fraction of them will be active (and thus in the index). So I don't think is an artificial problem, and we should not write it off as "garbage in". > Another idea is to not trust any individual ANALYZE's estimate of > the index rowcount so completely. (I'd thought that the > moving-average logic would get applied to that, but it doesn't seem > to be kicking in for some reason.) > > We could probably make this smarter if we were willing to apply the > predicate-proof machinery in more situations; in this example, once > we know that idx001 is applicable, we really should disregard idx002 > and idx003 altogether because their predicates are implied by > idx001's. I've always been hesitant to do that because the cost of > checking seemed likely to greatly outweigh the benefits. But since > Tomas is nosing around in this territory already, maybe he'd like to > investigate that further. I think there are two possible approaches in general - we may improve the statistics somehow, or we may start doing the predicate proofing. I doubt approaching this at the statistics level alone is sufficient, because even with statistics target 10k (i.e. the most detailed one), the sample is still fixed-size. So there will always exist a combination of a sufficiently large data set and selective partial index, causing trouble with the sampling. Moreover, I can't really think of a way to fix this at the statistics level. Maybe there's a clever trick guarding against this particular issue, but my personal experience is that whenever I used such a smart hack, it eventually caused strange issues elsewhere. So I think the predicate proofing is a better approach, but of course the planning cost may be an issue. But maybe we can make this cheaper by some clever tricks? For example, given two predicates A and B, it seems that if A => B, then selectivity(A) <= selectivity(B). Could we use this to skip some of the expensive stuff? We should have the selectivities anyway, no? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: