Re: Use of additional index columns in rows filtering

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Use of additional index columns in rows filtering
Дата
Msg-id bf678c8f-2094-a52b-357e-7cc7c7813b41@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Use of additional index columns in rows filtering  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Use of additional index columns in rows filtering  (Peter Geoghegan <pg@bowt.ie>)
Re: Use of additional index columns in rows filtering  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers

On 8/3/23 21:21, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 11:17 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> Not sure. I'm a bit confused about what exactly is so risky on the plan
>> produced with the patch.
> 
> It's all about the worst case. In the scenarios that I'm concerned
> about, we can be quite sure that the saving from not using a BitmapOr
> will be fairly low -- the cost of not having to repeat the same index
> page accesses across several similar index scans is, at best, some
> small multiple of the would-be number of index scans that the BitmapOr
> plan gets. We can be certain that the possible benefits are fixed and
> low. This is always true; presumably the would-be BitmapOr plan can
> never have all that many index scans. And we always know how many
> index scans a BitmapOr plan would use up front.
> 

When you say "index page accesses" do you mean accesses to index pages,
or accesses to heap pages from the index scan?

Because my patch is all about reducing the heap pages, which are usually
the expensive part of the index scan. But you're right the "index scan"
with index filter may access more index pages, because it has fewer
"access predicates".

I don't quite see that with the tenk1 query we've been discussing (the
extra buffers were due to non-allvisible heap pages), but I guess that's
possible.

> On the other hand, the possible downsides have no obvious limit. So
> even if we're almost certain to win on average, we only have to be
> unlucky once to lose all we gained before that point. As a general
> rule, we want the index AM to have all the context required to
> terminate its scan at the earliest possible opportunity. This is
> enormously important in the worst case.
> 

Yeah, I agree there's some asymmetry in the risk/benefit. It's not
unlike e.g. seqscan vs. index scan, where the index scan can't really
save more than what the seqscan costs, but it can get (almost)
arbitrarily expensive.

> It's easier for me to make this argument because I know that we don't
> really need to make any trade-off here. But even if that wasn't the
> case, I'd probably arrive at the same general conclusion.
> 
> Importantly, it isn't possible to make a similar argument that works
> in the opposite direction -- IMV that's the difference between this
> flavor of riskiness, and the inevitable riskiness that comes with any
> planner change. In other words, your patch isn't going to win by an
> unpredictably high amount. Not in the specific scenarios that I'm
> focussed on here, with a BitmapOr + multiple index scans getting
> displaced.
> 

True. It probably can't beat BitmapOr plan if it means moving access
predicate into index filter (or even worse a table filter).

> The certainty about the upside is just as important as the uncertainty
> about the downside. The huge asymmetry matters, and is fairly
> atypical. If, somehow, there was less certainty about the possible
> upside, then my argument wouldn't really work.
> 


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Use of additional index columns in rows filtering
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Using defines for protocol characters