Обсуждение: Wake up autovacuum launcher from postmaster when a worker exits

Поиск
Список
Период
Сортировка

Wake up autovacuum launcher from postmaster when a worker exits

От
Heikki Linnakangas
Дата:
When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the 
launcher. I propose moving that responsibility to the postmaster, because:

* It's simpler IMHO

* The postmaster is already responsible for sending the signal if fork() 
fails

* It makes it consistent with background workers. When a background 
worker exits, the postmaster sends the signal to the launching process 
(if requested).

* Postmaster doesn't need to worry about sending the signal to the wrong 
process if the launcher's PID is reused, because it always has 
up-to-date PID information, because the launcher is postmaster's child 
process. That risk was negligible to begin with, but this eliminates 
completely, so we don't need the comment excusing it it anymore.

I'm a little surprised it wasn't done this way to begin with, so I 
wonder if I'm missing something?

- Heikki

Вложения

Re: Wake up autovacuum launcher from postmaster when a worker exits

От
Nathan Bossart
Дата:
On Thu, Jan 08, 2026 at 09:57:38PM +0200, Heikki Linnakangas wrote:
> When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the launcher. I
> propose moving that responsibility to the postmaster, because:

This seems generally reasonable to me.  So does the patch.

> * It makes it consistent with background workers. When a background worker
> exits, the postmaster sends the signal to the launching process (if
> requested).

I've wondered about making autovacuum workers proper background workers.

> I'm a little surprised it wasn't done this way to begin with, so I wonder if
> I'm missing something?

This code dates back to commit e2a186b03c.  I skimmed through the nearby
thread [0] and didn't immediately notice any discussion about this.  My
guess is that it seemed simpler to directly alert the launcher, since it's
the one that needs to take action.

[0] https://postgr.es/m/flat/20070404233954.GK19251%40alvh.no-ip.org

-- 
nathan



Re: Wake up autovacuum launcher from postmaster when a worker exits

От
Masahiko Sawada
Дата:
On Thu, Jan 8, 2026 at 11:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the
> launcher. I propose moving that responsibility to the postmaster, because:
>
> * It's simpler IMHO
>
> * The postmaster is already responsible for sending the signal if fork()
> fails
>
> * It makes it consistent with background workers. When a background
> worker exits, the postmaster sends the signal to the launching process
> (if requested).
>
> * Postmaster doesn't need to worry about sending the signal to the wrong
> process if the launcher's PID is reused, because it always has
> up-to-date PID information, because the launcher is postmaster's child
> process. That risk was negligible to begin with, but this eliminates
> completely, so we don't need the comment excusing it it anymore.

It sounds reasonable to me too. +1.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



RE: Wake up autovacuum launcher from postmaster when a worker exits

От
li carol
Дата:
> On Thu, Jan 8, 2026 at 11:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >
> > When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the
> > launcher. I propose moving that responsibility to the postmaster, because:
> >
> > * It's simpler IMHO
> >
> > * The postmaster is already responsible for sending the signal if
> > fork() fails
> >
> > * It makes it consistent with background workers. When a background
> > worker exits, the postmaster sends the signal to the launching process
> > (if requested).
> >
> > * Postmaster doesn't need to worry about sending the signal to the
> > wrong process if the launcher's PID is reused, because it always has
> > up-to-date PID information, because the launcher is postmaster's child
> > process. That risk was negligible to begin with, but this eliminates
> > completely, so we don't need the comment excusing it it anymore.
> 
> It sounds reasonable to me too. +1.
> 
> Regards,
> 
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
> 
Hi all,

I have completed the testing for this patch with a focus on the signaling logic from postmaster to launcher.

Test Environment:
- autovacuum_max_workers = 1
- autovacuum_naptime = 5s
- Multiple databases (4 total)

Observations:
I performed cross-database vacuum tests. With the 1-worker limit, I observed that the launcher starts the next worker
fora pending database immediately (within the ~1.25s scheduled stagger) after the previous worker exits.
 
The logs show a seamless handover between worker processes (e.g., Worker A exits at 16:06:20.209, and the next
scheduledWorker B starts at 16:06:21.447). 
 

2026-01-09 16:06:20.209 CST [1918017] LOG:  automatic vacuum of table "db_test2.public.t2": index scans: 0
    pages: 222 removed, 0 remain, 222 scanned (100.00% of total), 0 eagerly scanned
    tuples: 50000 removed, 0 remain, 0 are dead but not yet removable
    removable cutoff: 813, which was 1 XIDs old when operation ended
    new relfrozenxid: 813, which is 3 XIDs ahead of previous value
    frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
    visibility map: 222 pages set all-visible, 222 pages set all-frozen (0 were all-visible)
    index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
    avg read rate: 0.000 MB/s, avg write rate: 2.794 MB/s
    buffer usage: 697 hits, 0 reads, 4 dirtied
    WAL usage: 450 records, 4 full page images, 158352 bytes, 32768 full page image bytes, 0 buffers full
    memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
    system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s
2026-01-09 16:06:21.447 CST [1918022] DEBUG:  autovacuum: processing database "template1"
2026-01-09 16:06:22.699 CST [1918027] DEBUG:  autovacuum: processing database "postgres"
2026-01-09 16:06:23.946 CST [1918044] DEBUG:  autovacuum: processing database "db_test1"
2026-01-09 16:06:23.958 CST [1918044] LOG:  automatic vacuum of table "db_test1.public.t1": index scans: 0
    pages: 222 removed, 0 remain, 222 scanned (100.00% of total), 0 eagerly scanned
    tuples: 50000 removed, 0 remain, 0 are dead but not yet removable
    removable cutoff: 814, which was 1 XIDs old when operation ended
    new relfrozenxid: 814, which is 7 XIDs ahead of previous value
    frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
    visibility map: 222 pages set all-visible, 222 pages set all-frozen (0 were all-visible)
    index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
    avg read rate: 0.000 MB/s, avg write rate: 2.793 MB/s
    buffer usage: 697 hits, 0 reads, 4 dirtied
    WAL usage: 450 records, 4 full page images, 158352 bytes, 32768 full page image bytes, 0 buffers full
    memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
    system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s

This confirms that the postmaster successfully notifies the launcher and the worker slot is freed appropriately before
thenotification.
 

The patch looks correct and robust. +1 from my side.

Best Regards,
Yuan Li(carol)