Обсуждение: BUG #18992: Autovacuum triggering assert - LWLockAnyHeldByMe

Поиск
Список
Период
Сортировка

BUG #18992: Autovacuum triggering assert - LWLockAnyHeldByMe

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      18992
Logged by:          Robins Tharakan
Email address:      tharakan@gmail.com
PostgreSQL version: 18beta2
Operating system:   Ubuntu
Description:

I couldn't repro the assert at will, but the test setup crashed
thrice since yesterday and thus this report. Although this was
on a recent version (1e9b5140c44), the test itself is running after
a few weeks, so the issue may not be recent.


Error Log
=========
TRAP: failed
Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c", Line:
400, PID: 147794
postgres: 1e9b5140c44@sqith: autovacuum worker
(ExceptionalCondition+0xbb)[0x5a609cb46036]
postgres: 1e9b5140c44@sqith: autovacuum worker
(dshash_find+0xab)[0x5a609c6a81f2]
postgres: 1e9b5140c44@sqith: autovacuum worker
(pgstat_drop_entry+0xc2)[0x5a609c968cb7]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x73c443)[0x5a609c95c443]
postgres: 1e9b5140c44@sqith: autovacuum worker
(shmem_exit+0xa6)[0x5a609c8ef83e]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x6cf6e2)[0x5a609c8ef6e2]
postgres: 1e9b5140c44@sqith: autovacuum worker
(proc_exit+0x74)[0x5a609c8ef626]
postgres: 1e9b5140c44@sqith: autovacuum worker
(AutoVacWorkerMain+0x19c)[0x5a609c8231ad]
postgres: 1e9b5140c44@sqith: autovacuum worker
(postmaster_child_launch+0x174)[0x5a609c82ad34]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612191)[0x5a609c832191]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612328)[0x5a609c832328]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x611dd5)[0x5a609c831dd5]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x60ec84)[0x5a609c82ec84]
postgres: 1e9b5140c44@sqith: autovacuum worker
(PostmasterMain+0x1546)[0x5a609c82e5e4]
postgres: 1e9b5140c44@sqith: autovacuum worker (main+0x38c)[0x5a609c6ca6f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f726022a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f726022a28b]
postgres: 1e9b5140c44@sqith: autovacuum worker (_start+0x25)[0x5a609c307fb5]


TRAP: failed
Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c", Line:
400, PID: 147794
postgres: 1e9b5140c44@sqith: autovacuum worker
(ExceptionalCondition+0xbb)[0x5a609cb46036]
postgres: 1e9b5140c44@sqith: autovacuum worker
(dshash_find+0xab)[0x5a609c6a81f2]
postgres: 1e9b5140c44@sqith: autovacuum worker
(pgstat_drop_entry+0xc2)[0x5a609c968cb7]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x73c443)[0x5a609c95c443]
postgres: 1e9b5140c44@sqith: autovacuum worker
(shmem_exit+0xa6)[0x5a609c8ef83e]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x6cf6e2)[0x5a609c8ef6e2]
postgres: 1e9b5140c44@sqith: autovacuum worker
(proc_exit+0x74)[0x5a609c8ef626]
postgres: 1e9b5140c44@sqith: autovacuum worker
(AutoVacWorkerMain+0x19c)[0x5a609c8231ad]
postgres: 1e9b5140c44@sqith: autovacuum worker
(postmaster_child_launch+0x174)[0x5a609c82ad34]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612191)[0x5a609c832191]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612328)[0x5a609c832328]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x611dd5)[0x5a609c831dd5]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x60ec84)[0x5a609c82ec84]
postgres: 1e9b5140c44@sqith: autovacuum worker
(PostmasterMain+0x1546)[0x5a609c82e5e4]
postgres: 1e9b5140c44@sqith: autovacuum worker (main+0x38c)[0x5a609c6ca6f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f726022a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f726022a28b]
postgres: 1e9b5140c44@sqith: autovacuum worker (_start+0x25)[0x5a609c307fb5]
2025-07-19 20:07:07.398 ACST [55365] LOG:  autovacuum worker (PID 147794)
was terminated by signal 6: Aborted



2025-07-20 06:17:50.376 ACST [1190828] FATAL:  can't attach the same segment
more than once
TRAP: failed
Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c", Line:
400, PID: 1190928
postgres: 1e9b5140c44@sqith: autovacuum worker
(ExceptionalCondition+0xbb)[0x56d3cc97c036]
postgres: 1e9b5140c44@sqith: autovacuum worker
(dshash_find+0xab)[0x56d3cc4de1f2]
postgres: 1e9b5140c44@sqith: autovacuum worker
(pgstat_drop_entry+0xc2)[0x56d3cc79ecb7]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x73c443)[0x56d3cc792443]
postgres: 1e9b5140c44@sqith: autovacuum worker
(shmem_exit+0xa6)[0x56d3cc72583e]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x6cf6e2)[0x56d3cc7256e2]
postgres: 1e9b5140c44@sqith: autovacuum worker
(proc_exit+0x74)[0x56d3cc725626]
postgres: 1e9b5140c44@sqith: autovacuum worker
(AutoVacWorkerMain+0x19c)[0x56d3cc6591ad]
postgres: 1e9b5140c44@sqith: autovacuum worker
(postmaster_child_launch+0x174)[0x56d3cc660d34]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612191)[0x56d3cc668191]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612328)[0x56d3cc668328]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x611dd5)[0x56d3cc667dd5]
postgres: 1e9b5140c44@sqith: autovacuum worker (+0x60ec84)[0x56d3cc664c84]
postgres: 1e9b5140c44@sqith: autovacuum worker
(PostmasterMain+0x1546)[0x56d3cc6645e4]
postgres: 1e9b5140c44@sqith: autovacuum worker (main+0x38c)[0x56d3cc5006f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x758b0e42a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x758b0e42a28b]
postgres: 1e9b5140c44@sqith: autovacuum worker (_start+0x25)[0x56d3cc13dfb5]
2025-07-20 06:18:22.919 ACST [169020] LOG:  autovacuum worker (PID 1190928)
was terminated by signal 6: Aborted



Backtrace
=========
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimised
out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimised out>) at
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimised out>, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3  0x00007f726024527e in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4  0x00007f72602288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00005a609cb46069 in ExceptionalCondition (conditionName=0x5a609cd3af40
"!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))", fileName=0x5a609cd3af03
"dshash.c", lineNumber=400) at assert.c:66
#6  0x00005a609c6a81f2 in dshash_find (hash_table=0x5a60ab734b10,
key=0x7ffe06d72500, exclusive=true) at dshash.c:400
#7  0x00005a609c968cb7 in pgstat_drop_entry (kind=6, dboid=0, objid=5015) at
pgstat_shmem.c:988
#8  0x00005a609c95c443 in pgstat_shutdown_hook (code=0, arg=0) at
pgstat.c:622
#9  0x00005a609c8ef83e in shmem_exit (code=0) at ipc.c:243
#10 0x00005a609c8ef6e2 in proc_exit_prepare (code=0) at ipc.c:198
#11 0x00005a609c8ef626 in proc_exit (code=0) at ipc.c:111
#12 0x00005a609c8231ad in AutoVacWorkerMain (startup_data=0x0,
startup_data_len=0) at autovacuum.c:1456
#13 0x00005a609c82ad34 in postmaster_child_launch
(child_type=B_AUTOVAC_WORKER, child_slot=10002, startup_data=0x0,
startup_data_len=0, client_sock=0x0) at launch_backend.c:290
#14 0x00005a609c832191 in StartChildProcess (type=B_AUTOVAC_WORKER) at
postmaster.c:3973
#15 0x00005a609c832328 in StartAutovacuumWorker () at postmaster.c:4037
#16 0x00005a609c831dd5 in process_pm_pmsignal () at postmaster.c:3794
#17 0x00005a609c82ec84 in ServerLoop () at postmaster.c:1695
#18 0x00005a609c82e5e4 in PostmasterMain (argc=3, argv=0x5a60ab733940) at
postmaster.c:1400
#19 0x00005a609c6ca6f3 in main (argc=3, argv=0x5a60ab733940) at main.c:231


Backtrace Full
==============
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimised
out>) at ./nptl/pthread_kill.c:44
        tid = <optimised out>
        ret = 0
        pd = <optimised out>
        old_mask = {__val = {18446744073709551568}}
        ret = <optimised out>
        pd = <optimised out>
        old_mask = <optimised out>
        ret = <optimised out>
        tid = <optimised out>
        ret = <optimised out>
        resultvar = <optimised out>
        resultvar = <optimised out>
        __arg3 = <optimised out>
        __arg2 = <optimised out>
        __arg1 = <optimised out>
        _a3 = <optimised out>
        _a2 = <optimised out>
        _a1 = <optimised out>
        __futex = <optimised out>
        resultvar = <optimised out>
        __arg3 = <optimised out>
        __arg2 = <optimised out>
        __arg1 = <optimised out>
        _a3 = <optimised out>
        _a2 = <optimised out>
        _a1 = <optimised out>
        __futex = <optimised out>
        __private = <optimised out>
        __oldval = <optimised out>
#1  __pthread_kill_internal (signo=6, threadid=<optimised out>) at
./nptl/pthread_kill.c:78
No locals.
#2  __GI___pthread_kill (threadid=<optimised out>, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
No locals.
#3  0x00007f726024527e in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
        ret = <optimised out>
#4  0x00007f72602288ff in __GI_abort () at ./stdlib/abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction =
0x20}, sa_mask = {__val = {2, 140129217590784, 1, 140129217603397, 3,
140729013182628, 12, 140129217603401, 2, 3474071045457511480,
3846749418945733433, 140729013182720, 3833180526369726083, 140729013182784,
16669054559647844352, 140729013186696}}, sa_flags = 114765976, sa_restorer =
0x3}
#5  0x00005a609cb46069 in ExceptionalCondition (conditionName=0x5a609cd3af40
"!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))", fileName=0x5a609cd3af03
"dshash.c", lineNumber=400) at assert.c:66
No locals.
#6  0x00005a609c6a81f2 in dshash_find (hash_table=0x5a60ab734b10,
key=0x7ffe06d72500, exclusive=true) at dshash.c:400
        hash = 147533127
        partition = 4
        item = 0x8cb2d4700000001
#7  0x00005a609c968cb7 in pgstat_drop_entry (kind=6, dboid=0, objid=5015) at
pgstat_shmem.c:988
        key = {kind = 6, dboid = 0, objid = 5015}
        shent = 0x5a609c8c83ec <pgaio_shutdown+406>
        freed = true
#8  0x00005a609c95c443 in pgstat_shutdown_hook (code=0, arg=0) at
pgstat.c:622
No locals.
#9  0x00005a609c8ef83e in shmem_exit (code=0) at ipc.c:243
        __func__ = "shmem_exit"
#10 0x00005a609c8ef6e2 in proc_exit_prepare (code=0) at ipc.c:198
        __func__ = "proc_exit_prepare"
#11 0x00005a609c8ef626 in proc_exit (code=0) at ipc.c:111
        __func__ = "proc_exit"
#12 0x00005a609c8231ad in AutoVacWorkerMain (startup_data=0x0,
startup_data_len=0) at autovacuum.c:1456
        local_sigjmp_buf = {{__jmpbuf = {140729013186712,
-5847007501798591079, 3, 0, 99370996794424, 140129240260608,
-5847007501689539175, -1880687092341131879}, __mask_was_saved = 1,
__saved_mask = {__val = {18446744066192964099, 99370993001878, 0,
99370993001878, 15511593002, 99370993001878, 0, 99371239819984,
99371239819904, 140729013184112, 16669054559647844352,
                140729013184224, 140129216851989, 140729013184144,
16669054559647844352, 16}}}}
        dbid = 4
        __func__ = "AutoVacWorkerMain"
#13 0x00005a609c82ad34 in postmaster_child_launch
(child_type=B_AUTOVAC_WORKER, child_slot=10002, startup_data=0x0,
startup_data_len=0, client_sock=0x0) at launch_backend.c:290
        pid = 0
#14 0x00005a609c832191 in StartChildProcess (type=B_AUTOVAC_WORKER) at
postmaster.c:3973
        pmchild = 0x7f7260f16378
        pid = 32766
        __func__ = "StartChildProcess"


Found using SQLSmith.

-
robins
https://robins.in


Re: BUG #18992: Autovacuum triggering assert - LWLockAnyHeldByMe

От
Dilip Kumar
Дата:
On Sun, Jul 20, 2025 at 9:35 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18992
> Logged by:          Robins Tharakan
> Email address:      tharakan@gmail.com
> PostgreSQL version: 18beta2
> Operating system:   Ubuntu
> Description:
>
> I couldn't repro the assert at will, but the test setup crashed
> thrice since yesterday and thus this report. Although this was
> on a recent version (1e9b5140c44), the test itself is running after
> a few weeks, so the issue may not be recent.
>
>
> Error Log
> =========
> TRAP: failed
> Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c", Line:
> 400, PID: 147794
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (ExceptionalCondition+0xbb)[0x5a609cb46036]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (dshash_find+0xab)[0x5a609c6a81f2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (pgstat_drop_entry+0xc2)[0x5a609c968cb7]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x73c443)[0x5a609c95c443]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (shmem_exit+0xa6)[0x5a609c8ef83e]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x6cf6e2)[0x5a609c8ef6e2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (proc_exit+0x74)[0x5a609c8ef626]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (AutoVacWorkerMain+0x19c)[0x5a609c8231ad]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (postmaster_child_launch+0x174)[0x5a609c82ad34]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612191)[0x5a609c832191]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612328)[0x5a609c832328]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x611dd5)[0x5a609c831dd5]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x60ec84)[0x5a609c82ec84]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (PostmasterMain+0x1546)[0x5a609c82e5e4]
> postgres: 1e9b5140c44@sqith: autovacuum worker (main+0x38c)[0x5a609c6ca6f3]
> /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f726022a1ca]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f726022a28b]
> postgres: 1e9b5140c44@sqith: autovacuum worker (_start+0x25)[0x5a609c307fb5]
>
>
> TRAP: failed
> Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c", Line:
> 400, PID: 147794
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (ExceptionalCondition+0xbb)[0x5a609cb46036]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (dshash_find+0xab)[0x5a609c6a81f2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (pgstat_drop_entry+0xc2)[0x5a609c968cb7]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x73c443)[0x5a609c95c443]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (shmem_exit+0xa6)[0x5a609c8ef83e]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x6cf6e2)[0x5a609c8ef6e2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (proc_exit+0x74)[0x5a609c8ef626]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (AutoVacWorkerMain+0x19c)[0x5a609c8231ad]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (postmaster_child_launch+0x174)[0x5a609c82ad34]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612191)[0x5a609c832191]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612328)[0x5a609c832328]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x611dd5)[0x5a609c831dd5]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x60ec84)[0x5a609c82ec84]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (PostmasterMain+0x1546)[0x5a609c82e5e4]
> postgres: 1e9b5140c44@sqith: autovacuum worker (main+0x38c)[0x5a609c6ca6f3]
> /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f726022a1ca]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f726022a28b]
> postgres: 1e9b5140c44@sqith: autovacuum worker (_start+0x25)[0x5a609c307fb5]
> 2025-07-19 20:07:07.398 ACST [55365] LOG:  autovacuum worker (PID 147794)
> was terminated by signal 6: Aborted
>
>
>
> 2025-07-20 06:17:50.376 ACST [1190828] FATAL:  can't attach the same segment
> more than once
> TRAP: failed
> Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c", Line:
> 400, PID: 1190928
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (ExceptionalCondition+0xbb)[0x56d3cc97c036]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (dshash_find+0xab)[0x56d3cc4de1f2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (pgstat_drop_entry+0xc2)[0x56d3cc79ecb7]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x73c443)[0x56d3cc792443]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (shmem_exit+0xa6)[0x56d3cc72583e]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x6cf6e2)[0x56d3cc7256e2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (proc_exit+0x74)[0x56d3cc725626]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (AutoVacWorkerMain+0x19c)[0x56d3cc6591ad]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (postmaster_child_launch+0x174)[0x56d3cc660d34]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612191)[0x56d3cc668191]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612328)[0x56d3cc668328]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x611dd5)[0x56d3cc667dd5]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x60ec84)[0x56d3cc664c84]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (PostmasterMain+0x1546)[0x56d3cc6645e4]
> postgres: 1e9b5140c44@sqith: autovacuum worker (main+0x38c)[0x56d3cc5006f3]
> /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x758b0e42a1ca]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x758b0e42a28b]
> postgres: 1e9b5140c44@sqith: autovacuum worker (_start+0x25)[0x56d3cc13dfb5]
> 2025-07-20 06:18:22.919 ACST [169020] LOG:  autovacuum worker (PID 1190928)
> was terminated by signal 6: Aborted
>
>
>
> Backtrace
> =========
> (gdb) bt
> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimised
> out>) at ./nptl/pthread_kill.c:44
> #1  __pthread_kill_internal (signo=6, threadid=<optimised out>) at
> ./nptl/pthread_kill.c:78
> #2  __GI___pthread_kill (threadid=<optimised out>, signo=signo@entry=6) at
> ./nptl/pthread_kill.c:89
> #3  0x00007f726024527e in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/posix/raise.c:26
> #4  0x00007f72602288ff in __GI_abort () at ./stdlib/abort.c:79
> #5  0x00005a609cb46069 in ExceptionalCondition (conditionName=0x5a609cd3af40
> "!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))", fileName=0x5a609cd3af03
> "dshash.c", lineNumber=400) at assert.c:66
> #6  0x00005a609c6a81f2 in dshash_find (hash_table=0x5a60ab734b10,
> key=0x7ffe06d72500, exclusive=true) at dshash.c:400
> #7  0x00005a609c968cb7 in pgstat_drop_entry (kind=6, dboid=0, objid=5015) at
> pgstat_shmem.c:988
> #8  0x00005a609c95c443 in pgstat_shutdown_hook (code=0, arg=0) at
> pgstat.c:622
> #9  0x00005a609c8ef83e in shmem_exit (code=0) at ipc.c:243
> #10 0x00005a609c8ef6e2 in proc_exit_prepare (code=0) at ipc.c:198
> #11 0x00005a609c8ef626 in proc_exit (code=0) at ipc.c:111
> #12 0x00005a609c8231ad in AutoVacWorkerMain (startup_data=0x0,
> startup_data_len=0) at autovacuum.c:1456
> #13 0x00005a609c82ad34 in postmaster_child_launch
> (child_type=B_AUTOVAC_WORKER, child_slot=10002, startup_data=0x0,
> startup_data_len=0, client_sock=0x0) at launch_backend.c:290
> #14 0x00005a609c832191 in StartChildProcess (type=B_AUTOVAC_WORKER) at
> postmaster.c:3973
> #15 0x00005a609c832328 in StartAutovacuumWorker () at postmaster.c:4037
> #16 0x00005a609c831dd5 in process_pm_pmsignal () at postmaster.c:3794
> #17 0x00005a609c82ec84 in ServerLoop () at postmaster.c:1695
> #18 0x00005a609c82e5e4 in PostmasterMain (argc=3, argv=0x5a60ab733940) at
> postmaster.c:1400
> #19 0x00005a609c6ca6f3 in main (argc=3, argv=0x5a60ab733940) at main.c:231
>
>
> Backtrace Full
> ==============
> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimised
> out>) at ./nptl/pthread_kill.c:44
>         tid = <optimised out>
>         ret = 0
>         pd = <optimised out>
>         old_mask = {__val = {18446744073709551568}}
>         ret = <optimised out>
>         pd = <optimised out>
>         old_mask = <optimised out>
>         ret = <optimised out>
>         tid = <optimised out>
>         ret = <optimised out>
>         resultvar = <optimised out>
>         resultvar = <optimised out>
>         __arg3 = <optimised out>
>         __arg2 = <optimised out>
>         __arg1 = <optimised out>
>         _a3 = <optimised out>
>         _a2 = <optimised out>
>         _a1 = <optimised out>
>         __futex = <optimised out>
>         resultvar = <optimised out>
>         __arg3 = <optimised out>
>         __arg2 = <optimised out>
>         __arg1 = <optimised out>
>         _a3 = <optimised out>
>         _a2 = <optimised out>
>         _a1 = <optimised out>
>         __futex = <optimised out>
>         __private = <optimised out>
>         __oldval = <optimised out>
> #1  __pthread_kill_internal (signo=6, threadid=<optimised out>) at
> ./nptl/pthread_kill.c:78
> No locals.
> #2  __GI___pthread_kill (threadid=<optimised out>, signo=signo@entry=6) at
> ./nptl/pthread_kill.c:89
> No locals.
> #3  0x00007f726024527e in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/posix/raise.c:26
>         ret = <optimised out>
> #4  0x00007f72602288ff in __GI_abort () at ./stdlib/abort.c:79
>         save_stage = 1
>         act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction =
> 0x20}, sa_mask = {__val = {2, 140129217590784, 1, 140129217603397, 3,
> 140729013182628, 12, 140129217603401, 2, 3474071045457511480,
> 3846749418945733433, 140729013182720, 3833180526369726083, 140729013182784,
> 16669054559647844352, 140729013186696}}, sa_flags = 114765976, sa_restorer =
> 0x3}
> #5  0x00005a609cb46069 in ExceptionalCondition (conditionName=0x5a609cd3af40
> "!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))", fileName=0x5a609cd3af03
> "dshash.c", lineNumber=400) at assert.c:66
> No locals.
> #6  0x00005a609c6a81f2 in dshash_find (hash_table=0x5a60ab734b10,
> key=0x7ffe06d72500, exclusive=true) at dshash.c:400
>         hash = 147533127
>         partition = 4
>         item = 0x8cb2d4700000001
> #7  0x00005a609c968cb7 in pgstat_drop_entry (kind=6, dboid=0, objid=5015) at
> pgstat_shmem.c:988
>         key = {kind = 6, dboid = 0, objid = 5015}
>         shent = 0x5a609c8c83ec <pgaio_shutdown+406>
>         freed = true
> #8  0x00005a609c95c443 in pgstat_shutdown_hook (code=0, arg=0) at
> pgstat.c:622
> No locals.
> #9  0x00005a609c8ef83e in shmem_exit (code=0) at ipc.c:243
>         __func__ = "shmem_exit"
> #10 0x00005a609c8ef6e2 in proc_exit_prepare (code=0) at ipc.c:198
>         __func__ = "proc_exit_prepare"
> #11 0x00005a609c8ef626 in proc_exit (code=0) at ipc.c:111
>         __func__ = "proc_exit"
> #12 0x00005a609c8231ad in AutoVacWorkerMain (startup_data=0x0,
> startup_data_len=0) at autovacuum.c:1456
>         local_sigjmp_buf = {{__jmpbuf = {140729013186712,
> -5847007501798591079, 3, 0, 99370996794424, 140129240260608,
> -5847007501689539175, -1880687092341131879}, __mask_was_saved = 1,
> __saved_mask = {__val = {18446744066192964099, 99370993001878, 0,
> 99370993001878, 15511593002, 99370993001878, 0, 99371239819984,
> 99371239819904, 140729013184112, 16669054559647844352,
>                 140729013184224, 140129216851989, 140729013184144,
> 16669054559647844352, 16}}}}
>         dbid = 4
>         __func__ = "AutoVacWorkerMain"
> #13 0x00005a609c82ad34 in postmaster_child_launch
> (child_type=B_AUTOVAC_WORKER, child_slot=10002, startup_data=0x0,
> startup_data_len=0, client_sock=0x0) at launch_backend.c:290
>         pid = 0
> #14 0x00005a609c832191 in StartChildProcess (type=B_AUTOVAC_WORKER) at
> postmaster.c:3973
>         pmchild = 0x7f7260f16378
>         pid = 32766
>         __func__ = "StartChildProcess"
>
>
> Found using SQLSmith.

So the call stack shows that it hit the exception while cleaning up
stats during proc_exit, due to an error, I am not sure how easy to
locate this, but this is an error path, so can you help with the error
reported in the log, before hitting the exception?

--
Regards,
Dilip Kumar
Google



Re: BUG #18992: Autovacuum triggering assert - LWLockAnyHeldByMe

От
Rahila Syed
Дата:
Hi,

This appears to be a valid issue where the Autovacuum worker fails while already holding an
LWLock on one of the pgStatLocal.shared_hash partitions. As a result, when we attempt to
access this table again during proc_exit cleanup in dshash_find, the assert is triggered. I haven’t
yet checked exactly where the lock is acquired within the Autovacuum worker, but as Dilip mentioned,
reviewing where the error occurs in the Autovacuum worker would be helpful.

Thank you,
Rahila Syed

On Mon, Jul 21, 2025 at 11:22 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Sun, Jul 20, 2025 at 9:35 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18992
> Logged by:          Robins Tharakan
> Email address:      tharakan@gmail.com
> PostgreSQL version: 18beta2
> Operating system:   Ubuntu
> Description:
>
> I couldn't repro the assert at will, but the test setup crashed
> thrice since yesterday and thus this report. Although this was
> on a recent version (1e9b5140c44), the test itself is running after
> a few weeks, so the issue may not be recent.
>
>
> Error Log
> =========
> TRAP: failed
> Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c", Line:
> 400, PID: 147794
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (ExceptionalCondition+0xbb)[0x5a609cb46036]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (dshash_find+0xab)[0x5a609c6a81f2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (pgstat_drop_entry+0xc2)[0x5a609c968cb7]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x73c443)[0x5a609c95c443]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (shmem_exit+0xa6)[0x5a609c8ef83e]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x6cf6e2)[0x5a609c8ef6e2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (proc_exit+0x74)[0x5a609c8ef626]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (AutoVacWorkerMain+0x19c)[0x5a609c8231ad]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (postmaster_child_launch+0x174)[0x5a609c82ad34]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612191)[0x5a609c832191]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612328)[0x5a609c832328]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x611dd5)[0x5a609c831dd5]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x60ec84)[0x5a609c82ec84]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (PostmasterMain+0x1546)[0x5a609c82e5e4]
> postgres: 1e9b5140c44@sqith: autovacuum worker (main+0x38c)[0x5a609c6ca6f3]
> /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f726022a1ca]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f726022a28b]
> postgres: 1e9b5140c44@sqith: autovacuum worker (_start+0x25)[0x5a609c307fb5]
>
>
> TRAP: failed
> Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c", Line:
> 400, PID: 147794
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (ExceptionalCondition+0xbb)[0x5a609cb46036]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (dshash_find+0xab)[0x5a609c6a81f2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (pgstat_drop_entry+0xc2)[0x5a609c968cb7]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x73c443)[0x5a609c95c443]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (shmem_exit+0xa6)[0x5a609c8ef83e]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x6cf6e2)[0x5a609c8ef6e2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (proc_exit+0x74)[0x5a609c8ef626]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (AutoVacWorkerMain+0x19c)[0x5a609c8231ad]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (postmaster_child_launch+0x174)[0x5a609c82ad34]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612191)[0x5a609c832191]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612328)[0x5a609c832328]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x611dd5)[0x5a609c831dd5]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x60ec84)[0x5a609c82ec84]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (PostmasterMain+0x1546)[0x5a609c82e5e4]
> postgres: 1e9b5140c44@sqith: autovacuum worker (main+0x38c)[0x5a609c6ca6f3]
> /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f726022a1ca]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f726022a28b]
> postgres: 1e9b5140c44@sqith: autovacuum worker (_start+0x25)[0x5a609c307fb5]
> 2025-07-19 20:07:07.398 ACST [55365] LOG:  autovacuum worker (PID 147794)
> was terminated by signal 6: Aborted
>
>
>
> 2025-07-20 06:17:50.376 ACST [1190828] FATAL:  can't attach the same segment
> more than once
> TRAP: failed
> Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c", Line:
> 400, PID: 1190928
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (ExceptionalCondition+0xbb)[0x56d3cc97c036]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (dshash_find+0xab)[0x56d3cc4de1f2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (pgstat_drop_entry+0xc2)[0x56d3cc79ecb7]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x73c443)[0x56d3cc792443]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (shmem_exit+0xa6)[0x56d3cc72583e]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x6cf6e2)[0x56d3cc7256e2]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (proc_exit+0x74)[0x56d3cc725626]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (AutoVacWorkerMain+0x19c)[0x56d3cc6591ad]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (postmaster_child_launch+0x174)[0x56d3cc660d34]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612191)[0x56d3cc668191]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x612328)[0x56d3cc668328]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x611dd5)[0x56d3cc667dd5]
> postgres: 1e9b5140c44@sqith: autovacuum worker (+0x60ec84)[0x56d3cc664c84]
> postgres: 1e9b5140c44@sqith: autovacuum worker
> (PostmasterMain+0x1546)[0x56d3cc6645e4]
> postgres: 1e9b5140c44@sqith: autovacuum worker (main+0x38c)[0x56d3cc5006f3]
> /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x758b0e42a1ca]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x758b0e42a28b]
> postgres: 1e9b5140c44@sqith: autovacuum worker (_start+0x25)[0x56d3cc13dfb5]
> 2025-07-20 06:18:22.919 ACST [169020] LOG:  autovacuum worker (PID 1190928)
> was terminated by signal 6: Aborted
>
>
>
> Backtrace
> =========
> (gdb) bt
> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimised
> out>) at ./nptl/pthread_kill.c:44
> #1  __pthread_kill_internal (signo=6, threadid=<optimised out>) at
> ./nptl/pthread_kill.c:78
> #2  __GI___pthread_kill (threadid=<optimised out>, signo=signo@entry=6) at
> ./nptl/pthread_kill.c:89
> #3  0x00007f726024527e in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/posix/raise.c:26
> #4  0x00007f72602288ff in __GI_abort () at ./stdlib/abort.c:79
> #5  0x00005a609cb46069 in ExceptionalCondition (conditionName=0x5a609cd3af40
> "!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))", fileName=0x5a609cd3af03
> "dshash.c", lineNumber=400) at assert.c:66
> #6  0x00005a609c6a81f2 in dshash_find (hash_table=0x5a60ab734b10,
> key=0x7ffe06d72500, exclusive=true) at dshash.c:400
> #7  0x00005a609c968cb7 in pgstat_drop_entry (kind=6, dboid=0, objid=5015) at
> pgstat_shmem.c:988
> #8  0x00005a609c95c443 in pgstat_shutdown_hook (code=0, arg=0) at
> pgstat.c:622
> #9  0x00005a609c8ef83e in shmem_exit (code=0) at ipc.c:243
> #10 0x00005a609c8ef6e2 in proc_exit_prepare (code=0) at ipc.c:198
> #11 0x00005a609c8ef626 in proc_exit (code=0) at ipc.c:111
> #12 0x00005a609c8231ad in AutoVacWorkerMain (startup_data=0x0,
> startup_data_len=0) at autovacuum.c:1456
> #13 0x00005a609c82ad34 in postmaster_child_launch
> (child_type=B_AUTOVAC_WORKER, child_slot=10002, startup_data=0x0,
> startup_data_len=0, client_sock=0x0) at launch_backend.c:290
> #14 0x00005a609c832191 in StartChildProcess (type=B_AUTOVAC_WORKER) at
> postmaster.c:3973
> #15 0x00005a609c832328 in StartAutovacuumWorker () at postmaster.c:4037
> #16 0x00005a609c831dd5 in process_pm_pmsignal () at postmaster.c:3794
> #17 0x00005a609c82ec84 in ServerLoop () at postmaster.c:1695
> #18 0x00005a609c82e5e4 in PostmasterMain (argc=3, argv=0x5a60ab733940) at
> postmaster.c:1400
> #19 0x00005a609c6ca6f3 in main (argc=3, argv=0x5a60ab733940) at main.c:231
>
>
> Backtrace Full
> ==============
> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimised
> out>) at ./nptl/pthread_kill.c:44
>         tid = <optimised out>
>         ret = 0
>         pd = <optimised out>
>         old_mask = {__val = {18446744073709551568}}
>         ret = <optimised out>
>         pd = <optimised out>
>         old_mask = <optimised out>
>         ret = <optimised out>
>         tid = <optimised out>
>         ret = <optimised out>
>         resultvar = <optimised out>
>         resultvar = <optimised out>
>         __arg3 = <optimised out>
>         __arg2 = <optimised out>
>         __arg1 = <optimised out>
>         _a3 = <optimised out>
>         _a2 = <optimised out>
>         _a1 = <optimised out>
>         __futex = <optimised out>
>         resultvar = <optimised out>
>         __arg3 = <optimised out>
>         __arg2 = <optimised out>
>         __arg1 = <optimised out>
>         _a3 = <optimised out>
>         _a2 = <optimised out>
>         _a1 = <optimised out>
>         __futex = <optimised out>
>         __private = <optimised out>
>         __oldval = <optimised out>
> #1  __pthread_kill_internal (signo=6, threadid=<optimised out>) at
> ./nptl/pthread_kill.c:78
> No locals.
> #2  __GI___pthread_kill (threadid=<optimised out>, signo=signo@entry=6) at
> ./nptl/pthread_kill.c:89
> No locals.
> #3  0x00007f726024527e in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/posix/raise.c:26
>         ret = <optimised out>
> #4  0x00007f72602288ff in __GI_abort () at ./stdlib/abort.c:79
>         save_stage = 1
>         act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction =
> 0x20}, sa_mask = {__val = {2, 140129217590784, 1, 140129217603397, 3,
> 140729013182628, 12, 140129217603401, 2, 3474071045457511480,
> 3846749418945733433, 140729013182720, 3833180526369726083, 140729013182784,
> 16669054559647844352, 140729013186696}}, sa_flags = 114765976, sa_restorer =
> 0x3}
> #5  0x00005a609cb46069 in ExceptionalCondition (conditionName=0x5a609cd3af40
> "!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))", fileName=0x5a609cd3af03
> "dshash.c", lineNumber=400) at assert.c:66
> No locals.
> #6  0x00005a609c6a81f2 in dshash_find (hash_table=0x5a60ab734b10,
> key=0x7ffe06d72500, exclusive=true) at dshash.c:400
>         hash = 147533127
>         partition = 4
>         item = 0x8cb2d4700000001
> #7  0x00005a609c968cb7 in pgstat_drop_entry (kind=6, dboid=0, objid=5015) at
> pgstat_shmem.c:988
>         key = {kind = 6, dboid = 0, objid = 5015}
>         shent = 0x5a609c8c83ec <pgaio_shutdown+406>
>         freed = true
> #8  0x00005a609c95c443 in pgstat_shutdown_hook (code=0, arg=0) at
> pgstat.c:622
> No locals.
> #9  0x00005a609c8ef83e in shmem_exit (code=0) at ipc.c:243
>         __func__ = "shmem_exit"
> #10 0x00005a609c8ef6e2 in proc_exit_prepare (code=0) at ipc.c:198
>         __func__ = "proc_exit_prepare"
> #11 0x00005a609c8ef626 in proc_exit (code=0) at ipc.c:111
>         __func__ = "proc_exit"
> #12 0x00005a609c8231ad in AutoVacWorkerMain (startup_data=0x0,
> startup_data_len=0) at autovacuum.c:1456
>         local_sigjmp_buf = {{__jmpbuf = {140729013186712,
> -5847007501798591079, 3, 0, 99370996794424, 140129240260608,
> -5847007501689539175, -1880687092341131879}, __mask_was_saved = 1,
> __saved_mask = {__val = {18446744066192964099, 99370993001878, 0,
> 99370993001878, 15511593002, 99370993001878, 0, 99371239819984,
> 99371239819904, 140729013184112, 16669054559647844352,
>                 140729013184224, 140129216851989, 140729013184144,
> 16669054559647844352, 16}}}}
>         dbid = 4
>         __func__ = "AutoVacWorkerMain"
> #13 0x00005a609c82ad34 in postmaster_child_launch
> (child_type=B_AUTOVAC_WORKER, child_slot=10002, startup_data=0x0,
> startup_data_len=0, client_sock=0x0) at launch_backend.c:290
>         pid = 0
> #14 0x00005a609c832191 in StartChildProcess (type=B_AUTOVAC_WORKER) at
> postmaster.c:3973
>         pmchild = 0x7f7260f16378
>         pid = 32766
>         __func__ = "StartChildProcess"
>
>
> Found using SQLSmith.

So the call stack shows that it hit the exception while cleaning up
stats during proc_exit, due to an error, I am not sure how easy to
locate this, but this is an error path, so can you help with the error
reported in the log, before hitting the exception?

--
Regards,
Dilip Kumar
Google


Re: BUG #18992: Autovacuum triggering assert - LWLockAnyHeldByMe

От
Michael Paquier
Дата:
On Mon, Jul 21, 2025 at 12:51:14PM +0530, Rahila Syed wrote:
> This appears to be a valid issue where the Autovacuum worker fails while
> already holding an
> LWLock on one of the pgStatLocal.shared_hash partitions. As a result, when
> we attempt to
> access this table again during proc_exit cleanup in dshash_find, the assert
> is triggered. I haven’t
> yet checked exactly where the lock is acquired within the Autovacuum
> worker, but as Dilip mentioned,
> reviewing where the error occurs in the Autovacuum worker would be helpful.

Per dsm_attach@dsm.c, about the original FATAL message "can't attach
the same segment more than once" that triggers the assertion
afterwards:
     * If you're hitting this error, you probably want to attempt to find an
     * existing mapping via dsm_find_mapping() before calling dsm_attach() to
     * create a new one.

One thing that we could do is to upgrade this FATAL to a PANIC, to get
an idea of the stack where the original problem happens.

The stack is referencing a backend-level stats getting dropped by an
autovacuum worker as a result of pgstat_drop_entry() done in
pgstat_shutdown_hook(), so it looks like we are reaching a new error
state in v18 that could not happen before within the DSM, as an after
effect of the FATAL causing the autovacuum worker to stop.  Never seen
this one.  We're already doing stats reports in the
pgstat_report_stat() call with manipulations of the pgstats
dshash while shutting down.

objid at 5015 means that the procnum is set as such.  How many
max_connections do you have?  It seems like a high number points to a
better reproducibility.

Robins, is that your host with gcc experimental?  Could it be possible
to re-run the test with a patched build with the FATAL upgraded to
PANIC and see what happens?
--
Michael

Вложения

Re: BUG #18992: Autovacuum triggering assert - LWLockAnyHeldByMe

От
Dilip Kumar
Дата:
On Wed, Jul 23, 2025 at 7:09 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Jul 21, 2025 at 12:51:14PM +0530, Rahila Syed wrote:
> > This appears to be a valid issue where the Autovacuum worker fails while
> > already holding an
> > LWLock on one of the pgStatLocal.shared_hash partitions. As a result, when
> > we attempt to
> > access this table again during proc_exit cleanup in dshash_find, the assert
> > is triggered. I haven’t
> > yet checked exactly where the lock is acquired within the Autovacuum
> > worker, but as Dilip mentioned,
> > reviewing where the error occurs in the Autovacuum worker would be helpful.
>
> Per dsm_attach@dsm.c, about the original FATAL message "can't attach
> the same segment more than once" that triggers the assertion
> afterwards:
>      * If you're hitting this error, you probably want to attempt to find an
>      * existing mapping via dsm_find_mapping() before calling dsm_attach() to
>      * create a new one.
>
> One thing that we could do is to upgrade this FATAL to a PANIC, to get
> an idea of the stack where the original problem happens.
>
> The stack is referencing a backend-level stats getting dropped by an
> autovacuum worker as a result of pgstat_drop_entry() done in
> pgstat_shutdown_hook(), so it looks like we are reaching a new error
> state in v18 that could not happen before within the DSM, as an after
> effect of the FATAL causing the autovacuum worker to stop.  Never seen
> this one.  We're already doing stats reports in the
> pgstat_report_stat() call with manipulations of the pgstats
> dshash while shutting down.
>
> objid at 5015 means that the procnum is set as such.  How many
> max_connections do you have?  It seems like a high number points to a
> better reproducibility.
>
> Robins, is that your host with gcc experimental?  Could it be possible
> to re-run the test with a patched build with the FATAL upgraded to
> PANIC and see what happens?

Yeah that would be a good idea.

I was looking into the vacuum code to see when we acquire this lock
and what's the possibilities of throwing error without release the
lock, IIUC we acquire this lock while updating the vacuum stats using
pgstat_report_vacuum(), under that we get a
pgstat_get_entry_ref_locked() and if we do not find a cached entry we
acquire the partition lock of the hash for a short while and release
it after increasing the entries refcount.  So in this particular path
I don't see the possibility of throwing an error while holding the
lock.

--
Regards,
Dilip Kumar
Google