Re: Deleting older versions in unique indexes to avoid page splits

Поиск
Список
Период
Сортировка
От Anastasia Lubennikova
Тема Re: Deleting older versions in unique indexes to avoid page splits
Дата
Msg-id a1d3c9d3-f070-0bd0-8de2-49184e21b35c@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Deleting older versions in unique indexes to avoid page splits  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Deleting older versions in unique indexes to avoid page splits  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On 08.10.2020 02:48, Peter Geoghegan wrote:
On Tue, Jun 30, 2020 at 5:03 PM Peter Geoghegan <pg@bowt.ie> wrote:
Attached is a POC patch that teaches nbtree to delete old duplicate
versions from unique indexes. The optimization targets non-HOT
duplicate version bloat. Although the patch is rather rough, it
nevertheless manages to more or less eliminate a whole class of index
bloat: Unique index bloat from non-HOT updates in workloads where no
transaction lasts for more than a few seconds.
I'm slightly surprised that this thread didn't generate more interest
back in June. After all, maintaining the pristine initial state of
(say) a primary key index even after many high throughput non-HOT
updates (i.e. avoiding "logically unnecessary" page splits entirely)
is quite appealing. It arguably goes some way towards addressing long
held criticisms of our approach to MVCC. Especially if it can be
generalized to all b-tree indexes -- the Uber blog post mentioned
tables that have several indexes, which presumably means that there
can be no HOT updates (the author of the blog post didn't seem to be
aware of HOT at all).
The idea seems very promising, especially when extended to handle non-unique indexes too.
I've been trying to generalize my approach to work with all indexes. I
think that I can find a strategy that is largely effective at
preventing version churn page splits that take place with workloads
that have many non-HOT updates, without any serious downsides for
workloads that do not benefit. I want to get feedback on that now,
since I expect that it will be controversial. Teaching indexes about
how tuples are versioned or chaining tuples seems like a non-starter,
so the trick will be to come up with something that behaves in
approximately the same way as that in cases where it helps.

The approach I have in mind is to pass down a hint from the executor
to btinsert(), which lets nbtree know that the index tuple insert is
in fact associated with a non-HOT update. This hint is only given when
the update did not logically modify the columns that the index covers
That's exactly what I wanted to discuss after the first letter. If we could make (non)HOT-updates index specific, I think it could improve performance a lot.
Here is the maybe-controversial part: The algorithm initially assumes
that all indexes more or less have the same properties as unique
indexes from a versioning point of view, even though that's clearly
not true. That is, it initially assumes that the only reason why there
can be duplicates on any leaf page it looks at is because some
previous transaction also did a non-HOT update that added a new,
unchanged index tuple. The new algorithm only runs when the hint is
passed down from the executor and when the only alternative is to
split the page (or have a deduplication pass), so clearly there is
some justification for this assumption -- it really is highly unlikely
that this update (which is on the verge of splitting the page) just so
happened to be the first such update that affected the page.
To be blunt: It may be controversial that we're accessing multiple
heap pages while holding an exclusive lock on a leaf page, in the
hopes that we can avoid a page split, but without any certainty that
it'll work out.

Sometimes (maybe even most of the time), this assumption turns out to
be mostly correct, and we benefit in the obvious way -- no
"unnecessary" page splits for affected non-unique indexes. Other times
it won't work out, usually because the duplicates are in fact logical
duplicates from distinct logical rows.
I think that this optimization can affect low cardinality indexes negatively, but it is hard to estimate impact without tests. Maybe it won't be a big deal, given that we attempt to eliminate old copies not very often and that low cardinality b-trees are already not very useful. Besides, we can always make this thing optional, so that users could tune it to their workload.


I wonder, how this new feature will interact with physical replication? Replica may have quite different performance profile. For example, there can be long running queries, that now prevent vacuum from removing recently-dead rows. How will we handle same situation with this optimized deletion?

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: Parallel copy
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?