Why not try for a HOT update, even when PageIsFull()?

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Why not try for a HOT update, even when PageIsFull()?
Дата
Msg-id CAH2-WznU1L3+DMPr1F7o2eJBT7=3bAJoY6ZkWABAxNt+-afyTA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Why not try for a HOT update, even when PageIsFull()?  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
Commit 2fd8685e7f simplified the checking of modified attributes that
takes place within heap_update(). This included a micro-optimization
that affects pages marked PageIsFull(): when the target page is marked
with PD_PAGE_FULL (which must have been set by a previous heap_update
call), don't even try to use HOT -- assume that we have no chance in
order to save a few cycles on determining HOT safety.

I doubt that this micro-optimization actually pays for itself, though.
Plus heap_update() is very complicated; do we really need to keep this
special case? The benefit is that we avoid work that we do ~99% of the
time anyway.

Attached patch removes the micro-optimization entirely. This isn't
just refactoring work -- at least to me. I'm also concerned that this
test unnecessarily prevents HOT updates with certain workloads.

There is a reasonable chance that the last updater couldn't fit a new
version on the same heap page (and so marked the page PD_PAGE_FULL) at
a point where the page didn't quite have enough free space for *their*
new tuple, while *almost* having enough space. And so it's worth being
open to the possibility that our own heap_update() call has a smaller
tuple than the first updater, perhaps only by chance (or perhaps
because the original updater couldn't use HOT specifically because
their new tuple was unusually large). Not all tables naturally have
equisized rows, obviously. And even when they do we should still
expect TOAST compression to create variation in the size of physical
heap tuples (some toastable attributes have more entropy than others,
making them less effective targets for compression, etc).

It's not just variability in the size of heap tuples. Comments
describing the micro-optimization claim that there is no chance of
cleanup happening concurrently, so that can't be a factor. But that's
really not true anymore. While it is still true that heap_update holds
a pin on the original page, blocking concurrent pruning (e.g., while
it waits for a tuple heavyweight lock), that in itself doesn't mean
that nobody else can free up space when heap_update() drops its buffer
lock -- things have changed. Commit 8523492d4e taught VACUUM to set
LP_DEAD line pointers to LP_UNUSED, while only holding an exclusive
lock (not a super-exclusive/cleanup lock) on the target heap
page/buffer. That's enough to allow concurrent processing by VACUUM to
go ahead (excluding pruning). And so PageGetHeapFreeSpace() can go
from indicating that the page has 0 space to more than enough space,
due only to concurrent activity by VACUUM (a pin won't prevent that
anymore). This is not especially unlikely with a small table.

I think that it's possible that Tom would have found it easier to
debug an issue that led to a PANIC inside heap_update() earlier this
year (see commit 34f581c39e). That bug was judged to be an old bug in
heap_update(), but we only started to see PANICs when the
aforementioned enhancement to VACUUM went in.

-- 
Peter Geoghegan

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: wait event and archive_command
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Failed transaction statistics to measure the logical replication progress