Обсуждение: Wake up autovacuum launcher from postmaster when a worker exits
When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the launcher. I propose moving that responsibility to the postmaster, because: * It's simpler IMHO * The postmaster is already responsible for sending the signal if fork() fails * It makes it consistent with background workers. When a background worker exits, the postmaster sends the signal to the launching process (if requested). * Postmaster doesn't need to worry about sending the signal to the wrong process if the launcher's PID is reused, because it always has up-to-date PID information, because the launcher is postmaster's child process. That risk was negligible to begin with, but this eliminates completely, so we don't need the comment excusing it it anymore. I'm a little surprised it wasn't done this way to begin with, so I wonder if I'm missing something? - Heikki
Вложения
On Thu, Jan 08, 2026 at 09:57:38PM +0200, Heikki Linnakangas wrote: > When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the launcher. I > propose moving that responsibility to the postmaster, because: This seems generally reasonable to me. So does the patch. > * It makes it consistent with background workers. When a background worker > exits, the postmaster sends the signal to the launching process (if > requested). I've wondered about making autovacuum workers proper background workers. > I'm a little surprised it wasn't done this way to begin with, so I wonder if > I'm missing something? This code dates back to commit e2a186b03c. I skimmed through the nearby thread [0] and didn't immediately notice any discussion about this. My guess is that it seemed simpler to directly alert the launcher, since it's the one that needs to take action. [0] https://postgr.es/m/flat/20070404233954.GK19251%40alvh.no-ip.org -- nathan
On Thu, Jan 8, 2026 at 11:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the > launcher. I propose moving that responsibility to the postmaster, because: > > * It's simpler IMHO > > * The postmaster is already responsible for sending the signal if fork() > fails > > * It makes it consistent with background workers. When a background > worker exits, the postmaster sends the signal to the launching process > (if requested). > > * Postmaster doesn't need to worry about sending the signal to the wrong > process if the launcher's PID is reused, because it always has > up-to-date PID information, because the launcher is postmaster's child > process. That risk was negligible to begin with, but this eliminates > completely, so we don't need the comment excusing it it anymore. It sounds reasonable to me too. +1. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
> On Thu, Jan 8, 2026 at 11:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > > > When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the > > launcher. I propose moving that responsibility to the postmaster, because: > > > > * It's simpler IMHO > > > > * The postmaster is already responsible for sending the signal if > > fork() fails > > > > * It makes it consistent with background workers. When a background > > worker exits, the postmaster sends the signal to the launching process > > (if requested). > > > > * Postmaster doesn't need to worry about sending the signal to the > > wrong process if the launcher's PID is reused, because it always has > > up-to-date PID information, because the launcher is postmaster's child > > process. That risk was negligible to begin with, but this eliminates > > completely, so we don't need the comment excusing it it anymore. > > It sounds reasonable to me too. +1. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com > Hi all, I have completed the testing for this patch with a focus on the signaling logic from postmaster to launcher. Test Environment: - autovacuum_max_workers = 1 - autovacuum_naptime = 5s - Multiple databases (4 total) Observations: I performed cross-database vacuum tests. With the 1-worker limit, I observed that the launcher starts the next worker fora pending database immediately (within the ~1.25s scheduled stagger) after the previous worker exits. The logs show a seamless handover between worker processes (e.g., Worker A exits at 16:06:20.209, and the next scheduledWorker B starts at 16:06:21.447). 2026-01-09 16:06:20.209 CST [1918017] LOG: automatic vacuum of table "db_test2.public.t2": index scans: 0 pages: 222 removed, 0 remain, 222 scanned (100.00% of total), 0 eagerly scanned tuples: 50000 removed, 0 remain, 0 are dead but not yet removable removable cutoff: 813, which was 1 XIDs old when operation ended new relfrozenxid: 813, which is 3 XIDs ahead of previous value frozen: 0 pages from table (0.00% of total) had 0 tuples frozen visibility map: 222 pages set all-visible, 222 pages set all-frozen (0 were all-visible) index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed avg read rate: 0.000 MB/s, avg write rate: 2.794 MB/s buffer usage: 697 hits, 0 reads, 4 dirtied WAL usage: 450 records, 4 full page images, 158352 bytes, 32768 full page image bytes, 0 buffers full memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each) system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s 2026-01-09 16:06:21.447 CST [1918022] DEBUG: autovacuum: processing database "template1" 2026-01-09 16:06:22.699 CST [1918027] DEBUG: autovacuum: processing database "postgres" 2026-01-09 16:06:23.946 CST [1918044] DEBUG: autovacuum: processing database "db_test1" 2026-01-09 16:06:23.958 CST [1918044] LOG: automatic vacuum of table "db_test1.public.t1": index scans: 0 pages: 222 removed, 0 remain, 222 scanned (100.00% of total), 0 eagerly scanned tuples: 50000 removed, 0 remain, 0 are dead but not yet removable removable cutoff: 814, which was 1 XIDs old when operation ended new relfrozenxid: 814, which is 7 XIDs ahead of previous value frozen: 0 pages from table (0.00% of total) had 0 tuples frozen visibility map: 222 pages set all-visible, 222 pages set all-frozen (0 were all-visible) index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed avg read rate: 0.000 MB/s, avg write rate: 2.793 MB/s buffer usage: 697 hits, 0 reads, 4 dirtied WAL usage: 450 records, 4 full page images, 158352 bytes, 32768 full page image bytes, 0 buffers full memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each) system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s This confirms that the postmaster successfully notifies the launcher and the worker slot is freed appropriately before thenotification. The patch looks correct and robust. +1 from my side. Best Regards, Yuan Li(carol)
On 09/01/2026 00:20, Nathan Bossart wrote: > On Thu, Jan 08, 2026 at 09:57:38PM +0200, Heikki Linnakangas wrote: >> * It makes it consistent with background workers. When a background worker >> exits, the postmaster sends the signal to the launching process (if >> requested). > > I've wondered about making autovacuum workers proper background workers. Yeah, me too... >> I'm a little surprised it wasn't done this way to begin with, so I wonder if >> I'm missing something? > > This code dates back to commit e2a186b03c. I skimmed through the nearby > thread [0] and didn't immediately notice any discussion about this. My > guess is that it seemed simpler to directly alert the launcher, since it's > the one that needs to take action. > > [0] https://postgr.es/m/flat/20070404233954.GK19251%40alvh.no-ip.org Thanks for checking. I suspect we wanted to keep postmaster as dumb as possible, having all logic in the child process if at all possible. But we perhaps went a little overboard with that; signaling child processes seems squarely like postmaster's core business. On 09/01/2026 10:27, li carol wrote: >> On Thu, Jan 8, 2026 at 11:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> >>> When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the >>> launcher. I propose moving that responsibility to the postmaster, because: >>> >>> * It's simpler IMHO >>> >>> * The postmaster is already responsible for sending the signal if >>> fork() fails >>> >>> * It makes it consistent with background workers. When a background >>> worker exits, the postmaster sends the signal to the launching process >>> (if requested). >>> >>> * Postmaster doesn't need to worry about sending the signal to the >>> wrong process if the launcher's PID is reused, because it always has >>> up-to-date PID information, because the launcher is postmaster's child >>> process. That risk was negligible to begin with, but this eliminates >>> completely, so we don't need the comment excusing it it anymore. >> >> It sounds reasonable to me too. +1. > > I have completed the testing for this patch with a focus on the signaling logic from postmaster to launcher. > > ... > > This confirms that the postmaster successfully notifies the launcher and the worker slot is freed appropriately beforethe notification. > > The patch looks correct and robust. +1 from my side. Committed, thanks for the reviews and testing! - Heikki