Re: More aggressive vacuuming of temporary tables

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: More aggressive vacuuming of temporary tables
Дата
Msg-id CAH2-WzmcEiPvrwS-Bi4niE3bCyQX4yt+BSi3zVSv2EUHo-0PiQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: More aggressive vacuuming of temporary tables  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Wed, Sep 9, 2020 at 11:02 AM Andres Freund <andres@anarazel.de> wrote:
> Obviously lookup in such a more complicated structure isn't free. Nor is
> building it. So we'd need some heuristics about when to do so. It'd
> probably be OK to occasionally look at the age of the oldest in-progress
> statement, to infer the time for old snapshots based on that. And then
> we could have a GUC that says when it's worth doing more complicated
> lookups.
>
> I don't have a handle on how to deal with the ctid chaining for
> intermediate row versions.

I wonder if it makes sense to add the feature to nbtree first. This
side-steps the practical problem of having to figure out ctid
chaining. You can prove the idea in a relatively low risk way, while
still creating significant value for users.

We can reason about versioning within indexes without having
authoritative information about it close at hand. The trick is to
delay everything until it looks like we'll have to split the page. I
can see very substantial improvements to index bloat from non-HOT
updates with the deduplication-deletion patch I've been working on:

https://postgr.es/m/CAH2-WznjQQwSPSxBiZ6wQyBiKqFmfdjOdeqp28otqw551T7jcQ@mail.gmail.com

Importantly, the patch tends to bound the number of distinct index
entries *per logical row* in the presence of non-HOT updates (at least
for indexes that are not logically modified by the update). ISTM that
this is what really matters. The patch doesn't just reduce index bloat
-- it greatly caps the number of heap pages any given primary key
point lookup query will do. So in practice we eagerly kill precisely
the garbage index tuples that would have caused us the most trouble
without the new mechanism in place.

Avoiding a page split is a big win, so we can probably justify
relatively expensive lookup calls to make this work. This seems
especially likely to be true because we can probably ratchet up to
that only when we see that it will win. If a unique index has 10
entries for one value (10 versions), which is inevitable once HOT
pruning fails, then how many of those 10 do we really need? The answer
is perhaps just 2 or 3 maybe 99%+ of the time. I strongly suspect that
preventing page splits is more valuable than opportunistic heap
pruning (unless the pruning itself prevents page splits, which it may
or may not). Logically unnecessary page splits have obvious lasting
consequences, are usually highly correlated, and affect performance in
ways that are non-linear and likely very harmful in the real world.

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Deleting older versions in unique indexes to avoid page splits
Следующее
От: Victor Yegorov
Дата:
Сообщение: Re: Deleting older versions in unique indexes to avoid page splits