Re: VM corruption on standby
От | Thomas Munro |
---|---|
Тема | Re: VM corruption on standby |
Дата | |
Msg-id | CA+hUKGLqaXJJpsxBBNAe4Xk1Sn8yKRxOAQtnVgNQOoLvtdobxA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: VM corruption on standby (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: VM corruption on standby
Re: VM corruption on standby |
Список | pgsql-hackers |
On Wed, Aug 20, 2025 at 7:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm inclined to think that we do want to prohibit WaitEventSetWait > inside a critical section --- it just seems like a bad idea all > around, even without considering this specific failure mode. FWIW aio/README.md describes a case where we'd need to wait for an IO, which might involve a CV to wait for an IO worker to do something, in order to start writing WAL, which is in a CS. Starting IO involves calling pgaio_io_acquire(), and if it can't find a handle it calls pgaio_io_wait_for_free(). That's all hypothetical for now as v18 is only doing reads, but it's an important architectural principle. That makes me suspect this new edict can't be the final policy, even if v18 uses it to solve the immediate problem. For v19 I think we should probably attack the original sin and make this work. Several mechanisms for unwedging LWLockAcquire() have been mentioned: (1) the existing LWLockReleaseAll(), which clearly makes bogus assumptions about system state and cannot stay, (2) some new thing that would sem_post() all the waiters having set flags that cause LWLockAcquire() to exit (ie a sort of multiplexing, making our semaphore-based locks inch towards latch-nature), (3) moving LWLock over to latches, so the wait would already be multiplexed with PM death detection, (4) having the parent death signal handler exit directly (unfortunately Linux and FreeBSD only*), (5) in multi-threaded prototype work, the whole process exits anyway taking all backend threads with it** which is a strong motivation to make multi-process mode act as much like that as possible, eg something the exits a lot more eagerly and hopefully preemptively than today. * That's an IRIX invention picked up by Linux and FreeBSD, a sort of reverse SIGCHLD, and I've tried to recreate it for pure POSIX systems before. (1) Maybe it's enough for any backend that learns of postmaster death to signal everyone else since they can't all be blocked in sig_wait() unless there is already a deadlock. (2) I once tried making the postmaster deathpipe O_ASYNC so that the "owner" gets a signal when it becomes readable, but it turned out to require a separate postmaster pipe for every backend (not just a dup'd descriptor); perhaps this would be plausible if we already had a bidirectional postmaster control socket protocol and choose to give every backend process its own socket pair in MP mode, something I've been looking into for various other reasons. ** I've been studying the unusual logger case in this context and contemplated running it as a separate process even in MT mode, as its stated aim didn't sound crazy to me and I was humbly attempting to preserve that characteristic in MT mode. Another way to achieve MP/MT consistency is to decide that the MP design already isn't robust enough on full pipe and just nuke the logger like everything else. Reading Andres's earlier comments, I briefly wondered about a compromise where log senders would make a best effort to send nonblockingly when they know the postmaster is gone, but that's neither as reliable as whoever wrote that had in mind (and in their defence, the logger is basically independent of shared memory state so whether it should be exiting ASAP or draining final log statements is at least debatable; barring bugs, it's only going to block progress if your system is really hosed), nor free entirely of "limping along" syndrome as Andres argues quite compellingly, so I cancelled that thought.
В списке pgsql-hackers по дате отправления: