Re: VM corruption on standby

Поиск

Список

Период

Сортировка

От	Kirill Reshke
Тема	Re: VM corruption on standby
Дата	19 августа 20:20:57
Msg-id	CALdSSPhoHNmkVQhUGs_w3qu3sPVJS3M9HgYeMvQWDs2j8Go+Ug@mail.gmail.com обсуждение исходный текст
Ответ на	Re: VM corruption on standby (Andres Freund <andres@anarazel.de>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, 19 Aug 2025 at 20:24, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2025-08-20 03:19:38 +1200, Thomas Munro wrote:
> > On Wed, Aug 20, 2025 at 2:57 AM Andres Freund <andres@anarazel.de> wrote:
> > > On 2025-08-20 02:54:09 +1200, Thomas Munro wrote:
> > > > > On linux - the primary OS with OOM killer troubles - I'm pretty sure'll lwlock
> > > > > waiters would get killed due to the postmaster death signal we've configured
> > > > > (c.f. PostmasterDeathSignalInit()).
> > > >
> > > > No, that has a handler that just sets a global variable.  That was
> > > > done because recovery used to try to read() from the postmaster pipe
> > > > after replaying every record.  Also we currently have some places that
> > > > don't want to be summarily killed (off the top of my head, syncrep
> > > > wants to send a special error message, and the logger wants to survive
> > > > longer than everyone else to catch as much output as possible, things
> > > > I've been thinking about in the context of threads).
> > >
> > > That makes no sense. We should just _exit(). If postmaster has been killed,
> > > trying to stay up longer just makes everything more fragile. Waiting for the
> > > logger is *exactly* what we should *not* do - what if the logger also crashed?
> > > There's no postmaster around to start it.
> >
> > Nobody is waiting for the logger.
>
> Error messages that we might be printing will wait for logger if the pipe is
> full, no?

I did some crit_sections check for elog usage, and I did not really
find any crit section that uses elog(elevel) with elevel < ERROR. But
surely there are cases when
we might be printing messages inside crit sections, or there can be
such sections in backbranches/future branches.
Anyway, this case, when we are hung indefinitely while waiting for
(already dead logger), might be a rare one.
The problem is, even without a logger, on current HEAD, we will fail
to stop the system when PM dies, and there is no simple fix.
It would be very helpful if LWLock implementation was done using latch
wait, there will be no problem then.
But we are where we are, so I can see there is a sense in making a try
to notify other processes that we are going to die soon and they need
to do the same (through shared memory), and then _exit(1).

--
Best regards,
Kirill Reshke

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: VM corruption on standby