Re: [PATCH] Fix orphaned backend processes on Windows using Job Objects
| От | Thomas Munro |
|---|---|
| Тема | Re: [PATCH] Fix orphaned backend processes on Windows using Job Objects |
| Дата | |
| Msg-id | CA+hUKGLVBGE2KkzLaDkKX9t7=t2BvjtOLXef5NnMv4cAZyoz7w@mail.gmail.com обсуждение исходный текст |
| Ответ на | [PATCH] Fix orphaned backend processes on Windows using Job Objects (Bryan Green <dbryan.green@gmail.com>) |
| Ответы |
Re: [PATCH] Fix orphaned backend processes on Windows using Job Objects
|
| Список | pgsql-hackers |
On Tue, Nov 4, 2025 at 4:12 AM Bryan Green <dbryan.green@gmail.com> wrote: > Current approaches (inherited event handles, shared memory flags) depend > on the postmaster running code during exit. A segfault or kill bypasses > all of that. Huh. I thought PostmasterHandle should be signalled by the kernel, not by any code run by the postmaster itself, when taskkill /f calls something like TerminateProcess(). Is that not the case? Are you sure we haven't broken something on our side? > My proposed solution is to use Windows Job Objects with KILL_ON_JOB_CLOSE. I'm not a Windows guy but I had been researching this independently, and it does seem to be the standard approach, so +1 generally even though I'm still very curious to know *why* the existing coding doesn't work in such a simple case. By coincidence, I'm getting close to posting a stack of patches for better postmaster death handling on Unix too, along with better subprocess and interrupt multiplexing and cleanup. That does more stuff, with connections to multithreading, interrupts, critical sections, a bunch of existing bugs we have with subprocess management on Unix. I will gladly delete my own attempt at Windows job objects from that effort and rebase on top of your patch, and see what review feedback ideas come up in that process. In nearby threads that triggered my work on that, I was a bit worried about the change in behaviour on PM death in the syncrep and syslogger, but I'm beginning to suspect that the vast majority of Linux/systemd deployments probably just nuke the whole cgroup from orbit in this case, so it seems like exceptional behaviour is really just a recipe for fragility on rarer systems, not to mention that it probably has to be like that in a potential multithreaded mode anyway. We should probably just rip all such specialness out, which I'll show in my patch set with some explanations soon. On the other hand, I was thinking of this as a v19 feature, where one can contemplate such changes, but you said: > Job creation can fail if postgres runs under an existing job (service > managers, debuggers). Windows 7 disallows nested jobs. We detect this > with IsProcessInJob(), and if AssignProcessToJobObject() returns > ERROR_ACCESS_DENIED, we log and continue without orphan protection. We currently require Windows 10 (itself recently EOL'd), but PostgreSQL 13-15 sort of claim to work on Windows all the way back to 7, so I'm guessing you're imagining back-patching this? > This patch does not include automated tests because the core > functionality (orphan prevention on crash) requires simulating process > termination, which is difficult to test reliably in CI. Ah yes I ran into problems with that part too...
В списке pgsql-hackers по дате отправления: