Re: VM corruption on standby

Поиск
Список
Период
Сортировка
От Yura Sokolov
Тема Re: VM corruption on standby
Дата
Msg-id 2e220a95-5646-4ac1-ae13-762f960ea3f7@postgrespro.ru
обсуждение исходный текст
Ответ на Re: VM corruption on standby  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
19.08.2025 16:09, Andres Freund пишет:
> Hi,
> 
> On 2025-08-19 15:56:05 +0300, Yura Sokolov wrote:
>> 09.08.2025 22:54, Kirill Reshke пишет:
>>> On Thu, 7 Aug 2025 at 21:36, Aleksander Alekseev
>>> <aleksander@tigerdata.com> wrote:
>>>
>>>> Perhaps there was a good
>>>> reason to update the VM *before* creating WAL records I'm unaware of.
>>>
>>> Looks like 503c730 intentionally does it this way; however, I have not
>>> yet fully understood the reasoning behind it.
>>
>> I repeat: there was no intention. Neither in commit message, nor in
>> discussion about.
>>
>> There was intention to move visibilitymap_clear under heap page lock and
>> into critical section, but there were no any word about logging.
>>
>> I believe, it was just an unfortunate oversight that the change is made
>> before logging.
> 
> The normal pattern *is* to modify the buffer while holding an exclusive lock,
> in a critical section, before WAL logging. Look at
> src/backend/access/transam/README:
> 
>> The general schema for executing a WAL-logged action is
>>
>> 1. Pin and exclusive-lock the shared buffer(s) containing the data page(s)
>> to be modified.
>>
>> 2. START_CRIT_SECTION()  (Any error during the next three steps must cause a
>> PANIC because the shared buffers will contain unlogged changes, which we
>> have to ensure don't get to disk.  Obviously, you should check conditions
>> such as whether there's enough free space on the page before you start the
>> critical section.)
>>
>> 3. Apply the required changes to the shared buffer(s).
>>
>> 4. Mark the shared buffer(s) as dirty with MarkBufferDirty().  (This must
>> happen before the WAL record is inserted; see notes in SyncOneBuffer().)
>> Note that marking a buffer dirty with MarkBufferDirty() should only
>> happen iff you write a WAL record; see Writing Hints below.
>>
>> 5. If the relation requires WAL-logging, build a WAL record using
>> XLogBeginInsert and XLogRegister* functions, and insert it.  (See
>> "Constructing a WAL record" below).  Then update the page's LSN using the
>> returned XLOG location.  For instance,
>>
>>         XLogBeginInsert();
>>         XLogRegisterBuffer(...)
>>         XLogRegisterData(...)
>>         recptr = XLogInsert(rmgr_id, info);
>>
>>         PageSetLSN(dp, recptr);
>>
>> 6. END_CRIT_SECTION()
>>
>> 7. Unlock and unpin the buffer(s).

There is quite important step in this instruction:

> Then update the page's LSN using the returned XLOG location.

This step is violated for the call of visibilitymap_clear.

Though, probably I'm mistaken this is source of the bug. But it is really
source of other kinds of issues.

-- 
regards
Yura Sokolov aka funny-falcon



В списке pgsql-hackers по дате отправления: