Re: VM corruption on standby
От | Yura Sokolov |
---|---|
Тема | Re: VM corruption on standby |
Дата | |
Msg-id | 2e220a95-5646-4ac1-ae13-762f960ea3f7@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: VM corruption on standby (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
19.08.2025 16:09, Andres Freund пишет: > Hi, > > On 2025-08-19 15:56:05 +0300, Yura Sokolov wrote: >> 09.08.2025 22:54, Kirill Reshke пишет: >>> On Thu, 7 Aug 2025 at 21:36, Aleksander Alekseev >>> <aleksander@tigerdata.com> wrote: >>> >>>> Perhaps there was a good >>>> reason to update the VM *before* creating WAL records I'm unaware of. >>> >>> Looks like 503c730 intentionally does it this way; however, I have not >>> yet fully understood the reasoning behind it. >> >> I repeat: there was no intention. Neither in commit message, nor in >> discussion about. >> >> There was intention to move visibilitymap_clear under heap page lock and >> into critical section, but there were no any word about logging. >> >> I believe, it was just an unfortunate oversight that the change is made >> before logging. > > The normal pattern *is* to modify the buffer while holding an exclusive lock, > in a critical section, before WAL logging. Look at > src/backend/access/transam/README: > >> The general schema for executing a WAL-logged action is >> >> 1. Pin and exclusive-lock the shared buffer(s) containing the data page(s) >> to be modified. >> >> 2. START_CRIT_SECTION() (Any error during the next three steps must cause a >> PANIC because the shared buffers will contain unlogged changes, which we >> have to ensure don't get to disk. Obviously, you should check conditions >> such as whether there's enough free space on the page before you start the >> critical section.) >> >> 3. Apply the required changes to the shared buffer(s). >> >> 4. Mark the shared buffer(s) as dirty with MarkBufferDirty(). (This must >> happen before the WAL record is inserted; see notes in SyncOneBuffer().) >> Note that marking a buffer dirty with MarkBufferDirty() should only >> happen iff you write a WAL record; see Writing Hints below. >> >> 5. If the relation requires WAL-logging, build a WAL record using >> XLogBeginInsert and XLogRegister* functions, and insert it. (See >> "Constructing a WAL record" below). Then update the page's LSN using the >> returned XLOG location. For instance, >> >> XLogBeginInsert(); >> XLogRegisterBuffer(...) >> XLogRegisterData(...) >> recptr = XLogInsert(rmgr_id, info); >> >> PageSetLSN(dp, recptr); >> >> 6. END_CRIT_SECTION() >> >> 7. Unlock and unpin the buffer(s). There is quite important step in this instruction: > Then update the page's LSN using the returned XLOG location. This step is violated for the call of visibilitymap_clear. Though, probably I'm mistaken this is source of the bug. But it is really source of other kinds of issues. -- regards Yura Sokolov aka funny-falcon
В списке pgsql-hackers по дате отправления: