Re: VM corruption on standby

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: VM corruption on standby
Дата	20 августа 02:59:42
Msg-id	CA+hUKGLqaXJJpsxBBNAe4Xk1Sn8yKRxOAQtnVgNQOoLvtdobxA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: VM corruption on standby (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: VM corruption on standby Re: VM corruption on standby
Список	pgsql-hackers

Дерево обсуждения

On Wed, Aug 20, 2025 at 7:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm inclined to think that we do want to prohibit WaitEventSetWait
> inside a critical section --- it just seems like a bad idea all
> around, even without considering this specific failure mode.

FWIW aio/README.md describes a case where we'd need to wait for an IO,
which might involve a CV to wait for an IO worker to do something, in
order to start writing WAL, which is in a CS.  Starting IO involves
calling pgaio_io_acquire(), and if it can't find a handle it calls
pgaio_io_wait_for_free().  That's all hypothetical for now as v18 is
only doing reads, but it's an important architectural principle.  That
makes me suspect this new edict can't be the final policy, even if v18
uses it to solve the immediate problem.

For v19 I think we should probably attack the original sin and make
this work.  Several mechanisms for unwedging LWLockAcquire() have been
mentioned: (1) the existing LWLockReleaseAll(), which clearly makes
bogus assumptions about system state and cannot stay, (2) some new
thing that would sem_post() all the waiters having set flags that
cause LWLockAcquire() to exit (ie a sort of multiplexing, making our
semaphore-based locks inch towards latch-nature), (3) moving LWLock
over to latches, so the wait would already be multiplexed with PM
death detection, (4) having the parent death signal handler exit
directly (unfortunately Linux and FreeBSD only*), (5) in
multi-threaded prototype work, the whole process exits anyway taking
all backend threads with it** which is a strong motivation to make
multi-process mode act as much like that as possible, eg something the
exits a lot more eagerly and hopefully preemptively than today.

* That's an IRIX invention picked up by Linux and FreeBSD, a sort of
reverse SIGCHLD, and I've tried to recreate it for pure POSIX systems
before.  (1) Maybe it's enough for any backend that learns of
postmaster death to signal everyone else since they can't all be
blocked in sig_wait() unless there is already a deadlock.  (2) I once
tried making the postmaster deathpipe O_ASYNC so that the "owner" gets
a signal when it becomes readable, but it turned out to require a
separate postmaster pipe for every backend (not just a dup'd
descriptor); perhaps this would be plausible if we already had a
bidirectional postmaster control socket protocol and choose to give
every backend process its own socket pair in MP mode, something I've
been looking into for various other reasons.

** I've been studying the unusual logger case in this context and
contemplated running it as a separate process even in MT mode, as its
stated aim didn't sound crazy to me and I was humbly attempting to
preserve that characteristic in MT mode.  Another way to achieve MP/MT
consistency is to decide that the MP design already isn't robust
enough on full pipe and just nuke the logger like everything else.
Reading Andres's earlier comments, I briefly wondered about a
compromise where log senders would make a best effort to send
nonblockingly when they know the postmaster is gone, but that's
neither as reliable as whoever wrote that had in mind (and in their
defence, the logger is basically independent of shared memory state so
whether it should be exiting ASAP or draining final log statements is
at least debatable; barring bugs, it's only going to block progress if
your system is really hosed), nor free entirely of "limping along"
syndrome as Andres argues quite compellingly, so I cancelled that
thought.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: VM corruption on standby