Re: VM corruption on standby

Поиск
Список
Период
Сортировка
От Kirill Reshke
Тема Re: VM corruption on standby
Дата
Msg-id CALdSSPhGQ1xx10c2NaZgce8qmi+SuKFp6T1uWG_aZvPpvoJRkQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: VM corruption on standby  (Kirill Reshke <reshkekirill@gmail.com>)
Ответы Re: VM corruption on standby
Список pgsql-hackers
On Tue, 19 Aug 2025 at 14:14, Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> This thread is a candidate for [0]
>
>
> [0]https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
>

Let me summarize this thread for ease of understanding of what's going on:

Timeline:
1) Andrey Borodin sends a patch (on 6 Aug) claiming there is
corruption in VM bits.
2) We investigate problem in not with how PostgreSQL modified buffers
or logs changes, but with LWLockReleaseALl in proc_exit(1) after
kill-9 PM
3) We have reached the conclusion that there is no corruption, and
that injection points are not a valid way to reproduce them, because
of WaitLatch and friends.

4) But we now suspect there is another corruption with ANY critical
section in scenario:

I wrote:

> Maybe I'm very wrong about this, but I'm currently suspecting there is
> corruption involving CHECKPOINT, process in CRIT section and kill -9.
>1) Some process p1 locks some buffer (name it buf1), enters CRIT
>section, calls MarkBufferDirty and hangs inside XLogInsert on CondVar
>in (GetXLogBuffer -> AdvanceXLInsertBuffer).
>2) CHECKPOINT (p2) stars and tries to FLUSH dirty buffers, awaiting lock on buf1
>3) Postmaster kill-9-ed
>4) signal of postmaster death delivered to p1, it wakes up in
>WaitLatch/WaitEventSetWaitBlock functions, checks postmaster
>aliveness, and exits releasing all locks.
>5) p2 acquires locks  on buf1 and flushes it to disk.
>6) signal of postmaster death delivered to p2, p2 exits.

5) We create an open item for pg18 and propose revering
bc22dc0e0ddc2dcb6043a732415019cc6b6bf683 or fix it quickly.

Please note that patches in this thread are NOT reproducer of
corruption, as of today we have NO valid repro of corruption

-- 
Best regards,
Kirill Reshke



В списке pgsql-hackers по дате отправления: