Re: VM corruption on standby
От | Kirill Reshke |
---|---|
Тема | Re: VM corruption on standby |
Дата | |
Msg-id | CALdSSPhoHNmkVQhUGs_w3qu3sPVJS3M9HgYeMvQWDs2j8Go+Ug@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: VM corruption on standby (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
On Tue, 19 Aug 2025 at 20:24, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2025-08-20 03:19:38 +1200, Thomas Munro wrote: > > On Wed, Aug 20, 2025 at 2:57 AM Andres Freund <andres@anarazel.de> wrote: > > > On 2025-08-20 02:54:09 +1200, Thomas Munro wrote: > > > > > On linux - the primary OS with OOM killer troubles - I'm pretty sure'll lwlock > > > > > waiters would get killed due to the postmaster death signal we've configured > > > > > (c.f. PostmasterDeathSignalInit()). > > > > > > > > No, that has a handler that just sets a global variable. That was > > > > done because recovery used to try to read() from the postmaster pipe > > > > after replaying every record. Also we currently have some places that > > > > don't want to be summarily killed (off the top of my head, syncrep > > > > wants to send a special error message, and the logger wants to survive > > > > longer than everyone else to catch as much output as possible, things > > > > I've been thinking about in the context of threads). > > > > > > That makes no sense. We should just _exit(). If postmaster has been killed, > > > trying to stay up longer just makes everything more fragile. Waiting for the > > > logger is *exactly* what we should *not* do - what if the logger also crashed? > > > There's no postmaster around to start it. > > > > Nobody is waiting for the logger. > > Error messages that we might be printing will wait for logger if the pipe is > full, no? I did some crit_sections check for elog usage, and I did not really find any crit section that uses elog(elevel) with elevel < ERROR. But surely there are cases when we might be printing messages inside crit sections, or there can be such sections in backbranches/future branches. Anyway, this case, when we are hung indefinitely while waiting for (already dead logger), might be a rare one. The problem is, even without a logger, on current HEAD, we will fail to stop the system when PM dies, and there is no simple fix. It would be very helpful if LWLock implementation was done using latch wait, there will be no problem then. But we are where we are, so I can see there is a sense in making a try to notify other processes that we are going to die soon and they need to do the same (through shared memory), and then _exit(1). -- Best regards, Kirill Reshke
В списке pgsql-hackers по дате отправления: