Re: VM corruption on standby
От | Kirill Reshke |
---|---|
Тема | Re: VM corruption on standby |
Дата | |
Msg-id | CALdSSPgDAyqt=ORyLMWMpotb9V4Jk1Am+he39mNtBA8+a8TQDw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: VM corruption on standby (Thomas Munro <thomas.munro@gmail.com>) |
Список | pgsql-hackers |
Hi! Thank you for putting attention to this. On Tue, 19 Aug 2025 at 10:32, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Tue, Aug 19, 2025 at 4:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > But I'm of the opinion that proc_exit > > is the wrong thing to use after seeing postmaster death, critical > > section or no. We should assume that system integrity is already > > compromised, and get out as fast as we can with as few side-effects > > as possible. It'll be up to the next generation of postmaster to > > try to clean up. > > Then wouldn't backends blocked in LWLockAcquire(x) hang forever, after > someone who holds x calls _exit()? > > I don't know if there are other ways that LWLockReleaseAll() can lead > to persistent corruption that won't be corrected by crash recovery, > but this one is probably new since the following commit, explaining > the failure to reproduce on v17: > > commit bc22dc0e0ddc2dcb6043a732415019cc6b6bf683 > Author: Alexander Korotkov <akorotkov@postgresql.org> > Date: Wed Apr 2 12:44:24 2025 +0300 > > Get rid of WALBufMappingLock > > Any idea involving deferring the handling of PM death from here > doesn't seem right: you'd keep waiting for the CV, but the backend > that would wake you might have exited. OK. > Hmm, I wonder if there could be a solution in between where we don't > release the locks on PM exit, but we still wake the waiters so they > can observe a new dead state in the lock word (or perhaps a shared > postmaster_is_dead flag), and exit themselves. Since yesterday I was thinking about adding a new state bit for LWLockWaitState. Something like LW_WS_OWNER_DEAD, which will be set by lwlock owner after observing PM death and then checked by containers in LWLockAcquire. so something like: *lock holder in proc_exit(1)* ``` for all my lwlock do: waiter->lwWaiting = LW_WS_OWNER_DEAD; PGSemaphoreUnlock(waiter->sem); ``` *lock contender in LWLockAttemptLock* ``` old_state = pg_atomic_read_u32(&lock->state); /* loop until we've determined whether we could acquire the lock or not */ while (true) { if (old_state & (1<< LW_WS_OWNER_DEAD)) _exit(2) /* or maybe proc_exit(1)*/ .... if (pg_atomic_compare_exchange_u32(&lock->state, &old_state, desired_state)) ... /*rerty*/ } ``` I am not sure this idea is workable though. -- Best regards, Kirill Reshke
В списке pgsql-hackers по дате отправления: