Re: [PATCH] Fix orphaned backend processes on Windows using Job Objects
| От | Bryan Green | 
|---|---|
| Тема | Re: [PATCH] Fix orphaned backend processes on Windows using Job Objects | 
| Дата | |
| Msg-id | a7109b5f-6590-476c-810c-18f1af588238@gmail.com обсуждение исходный текст  | 
		
| Ответ на | Re: [PATCH] Fix orphaned backend processes on Windows using Job Objects (Andres Freund <andres@anarazel.de>) | 
| Список | pgsql-hackers | 
On 11/3/2025 9:29 AM, Andres Freund wrote: > On 2025-11-03 09:25:11 -0600, Bryan Green wrote: >> On 11/3/2025 9:19 AM, Andres Freund wrote: >>> Hi, >>> >>> On 2025-11-03 09:12:03 -0600, Bryan Green wrote: >>>> We just need to call CreateJobObject() in PostmasterMain(), configure >>>> with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE, and assign the postmaster. >>>> Children inherit membership automatically. When the job handle closes on >>>> postmaster exit, the kernel terminates all children atomically. This is >>>> kernel-enforced with no polling and no race conditions. >>> >>> What happens if a postmaster child exits irregularly? Is postmaster terminated >>> as well? >>> >> >> No, Job Objects are unidirectional. > > Great. > > >>>> The patch has been tested on Windows 10/11 with both MSVC and MinGW >>>> builds. Nested jobs fail gracefully as expected. Clean shutdown is >>>> unaffected. Crash tests with taskkill /F, debugger abort, and access >>>> violations all correctly terminate children immediately with zero orphans. >>>> >>>> This patch does not include automated tests because the core >>>> functionality (orphan prevention on crash) requires simulating process >>>> termination, which is difficult to test reliably in CI. >>> >>> Why is it difficult to test in CI? We do some related tests in >>> 013_crash_restart.pl, it doesn't seem like it ought to be hard to also add >>> tests for postmaster? >>> >> >> Fair point. I was hesitant because testing the actual orphan prevention >> requires killing the postmaster while backends are active, which seemed >> fragile. But you're right that we already test similar scenarios. >> >> I can add a test to 013_crash_restart.pl (or a new Windows-specific test >> file) that: >> 1. Starts server with active backend >> 2. Kills postmaster ungracefully (taskkill /F) >> 3. Verifies backend process terminates automatically >> 4. Confirms clean restart >> >> Would that be sufficient, or do you have other test scenarios in mind? > > That's pretty much what I had in mind. > > Greetings, > > Andres Freund I've implemented the test in 013_crash_restart.pl. The test passes on Windows 10/11 with both MSVC and MinGW builds. Backends are typically terminated within 100-200ms after postmaster kill, confirming the Job Object KILL_ON_JOB_CLOSE mechanism works as intended. Updated patch (v2) attached. -- Bryan
Вложения
В списке pgsql-hackers по дате отправления: