Обсуждение: Segmentation fault on proc exit after dshash_find_or_insert
Hi,
If a process encounters a FATAL error after acquiring a dshash lock but before releasing it,
and it is not within a transaction, it can lead to a segmentation fault.
The FATAL error causes the backend to exit, triggering proc_exit() and similar functions.
In the absence of a transaction, LWLockReleaseAll() is delayed until ProcKill. ProcKill is
an on_shmem_exit callback, and dsm_backend_shutdown() is called before any
on_shmem_exit callbacks are invoked.
Consequently, if a dshash lock was acquired before the FATAL error occurred, the lock
will only be released after dsm_backend_shutdown() detaches the DSM segment containing
the lock, resulting in a segmentation fault.
Please find a reproducer attached. I have modified the test_dsm_registry module to create
a background worker that does nothing but throws a FATAL error after acquiring the dshash lock.
The reason this must be executed in the background worker is to ensure it runs without a transaction.
To trigger the segmentation fault, apply the 0001-Reproducer* patch, run make install in the
test_dsm_registry module, specify test_dsm_registry as shared_preload_libraries in postgresql.conf,
and start the server.
Please find attached a fix to call LWLockReleaseAll() early in the shmem_exit() routine. This ensures
that the dshash lock is released before dsm_backend_shutdown() is called. This will also ensure that
any subsequent callbacks invoked in shmem_exit() will not fail to acquire any lock.
Thank you,
If a process encounters a FATAL error after acquiring a dshash lock but before releasing it,
and it is not within a transaction, it can lead to a segmentation fault.
The FATAL error causes the backend to exit, triggering proc_exit() and similar functions.
In the absence of a transaction, LWLockReleaseAll() is delayed until ProcKill. ProcKill is
an on_shmem_exit callback, and dsm_backend_shutdown() is called before any
on_shmem_exit callbacks are invoked.
Consequently, if a dshash lock was acquired before the FATAL error occurred, the lock
will only be released after dsm_backend_shutdown() detaches the DSM segment containing
the lock, resulting in a segmentation fault.
Please find a reproducer attached. I have modified the test_dsm_registry module to create
a background worker that does nothing but throws a FATAL error after acquiring the dshash lock.
The reason this must be executed in the background worker is to ensure it runs without a transaction.
To trigger the segmentation fault, apply the 0001-Reproducer* patch, run make install in the
test_dsm_registry module, specify test_dsm_registry as shared_preload_libraries in postgresql.conf,
and start the server.
Please find attached a fix to call LWLockReleaseAll() early in the shmem_exit() routine. This ensures
that the dshash lock is released before dsm_backend_shutdown() is called. This will also ensure that
any subsequent callbacks invoked in shmem_exit() will not fail to acquire any lock.
Please see the backtrace below.
```
```
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000055a7515af56c in pg_atomic_fetch_sub_u32_impl (ptr=0x7f92c4b334f4, sub_=262144)
at ../../../../src/include/port/atomics/generic-gcc.h:218
218 return __sync_fetch_and_sub(&ptr->value, sub_);
(gdb) bt
#0 0x000055a7515af56c in pg_atomic_fetch_sub_u32_impl (ptr=0x7f92c4b334f4, sub_=262144)
at ../../../../src/include/port/atomics/generic-gcc.h:218
#1 0x000055a7515af625 in pg_atomic_sub_fetch_u32_impl (ptr=0x7f92c4b334f4, sub_=262144)
at ../../../../src/include/port/atomics/generic.h:232
#2 0x000055a7515af709 in pg_atomic_sub_fetch_u32 (ptr=0x7f92c4b334f4, sub_=262144)
at ../../../../src/include/port/atomics.h:441
#3 0x000055a7515b1583 in LWLockReleaseInternal (lock=0x7f92c4b334f0, mode=LW_EXCLUSIVE) at lwlock.c:1840
#4 0x000055a7515b1638 in LWLockRelease (lock=0x7f92c4b334f0) at lwlock.c:1902
#5 0x000055a7515b16e9 in LWLockReleaseAll () at lwlock.c:1951
#6 0x000055a7515ba63d in ProcKill (code=1, arg=0) at proc.c:953
#7 0x000055a7515913af in shmem_exit (code=1) at ipc.c:276
#8 0x000055a75159119b in proc_exit_prepare (code=1) at ipc.c:198
#9 0x000055a7515910df in proc_exit (code=1) at ipc.c:111
#10 0x000055a7517be71d in errfinish (filename=0x7f92ce41d062 "test_dsm_registry.c", lineno=187,
funcname=0x7f92ce41d160 <__func__.0> "TestDSMRegistryMain") at elog.c:596
#11 0x00007f92ce41ca62 in TestDSMRegistryMain (main_arg=0) at test_dsm_registry.c:187
#12 0x000055a7514db00c in BackgroundWorkerMain (startup_data=0x55a752dd8028, startup_data_len=1472)
at bgworker.c:846
#13 0x000055a7514de1e8 in postmaster_child_launch (child_type=B_BG_WORKER, child_slot=239,
startup_data=0x55a752dd8028, startup_data_len=1472, client_sock=0x0) at launch_backend.c:268
#14 0x000055a7514e530d in StartBackgroundWorker (rw=0x55a752dd8028) at postmaster.c:4168
#15 0x000055a7514e55a4 in maybe_start_bgworkers () at postmaster.c:4334
#16 0x000055a7514e4200 in LaunchMissingBackgroundProcesses () at postmaster.c:3408
#17 0x000055a7514e205b in ServerLoop () at postmaster.c:1728
#18 0x000055a7514e18b0 in PostmasterMain (argc=3, argv=0x55a752dd0e70) at postmaster.c:1403
#19 0x000055a75138eead in main (argc=3, argv=0x55a752dd0e70) at main.c:231
```
#0 0x000055a7515af56c in pg_atomic_fetch_sub_u32_impl (ptr=0x7f92c4b334f4, sub_=262144)
at ../../../../src/include/port/atomics/generic-gcc.h:218
218 return __sync_fetch_and_sub(&ptr->value, sub_);
(gdb) bt
#0 0x000055a7515af56c in pg_atomic_fetch_sub_u32_impl (ptr=0x7f92c4b334f4, sub_=262144)
at ../../../../src/include/port/atomics/generic-gcc.h:218
#1 0x000055a7515af625 in pg_atomic_sub_fetch_u32_impl (ptr=0x7f92c4b334f4, sub_=262144)
at ../../../../src/include/port/atomics/generic.h:232
#2 0x000055a7515af709 in pg_atomic_sub_fetch_u32 (ptr=0x7f92c4b334f4, sub_=262144)
at ../../../../src/include/port/atomics.h:441
#3 0x000055a7515b1583 in LWLockReleaseInternal (lock=0x7f92c4b334f0, mode=LW_EXCLUSIVE) at lwlock.c:1840
#4 0x000055a7515b1638 in LWLockRelease (lock=0x7f92c4b334f0) at lwlock.c:1902
#5 0x000055a7515b16e9 in LWLockReleaseAll () at lwlock.c:1951
#6 0x000055a7515ba63d in ProcKill (code=1, arg=0) at proc.c:953
#7 0x000055a7515913af in shmem_exit (code=1) at ipc.c:276
#8 0x000055a75159119b in proc_exit_prepare (code=1) at ipc.c:198
#9 0x000055a7515910df in proc_exit (code=1) at ipc.c:111
#10 0x000055a7517be71d in errfinish (filename=0x7f92ce41d062 "test_dsm_registry.c", lineno=187,
funcname=0x7f92ce41d160 <__func__.0> "TestDSMRegistryMain") at elog.c:596
#11 0x00007f92ce41ca62 in TestDSMRegistryMain (main_arg=0) at test_dsm_registry.c:187
#12 0x000055a7514db00c in BackgroundWorkerMain (startup_data=0x55a752dd8028, startup_data_len=1472)
at bgworker.c:846
#13 0x000055a7514de1e8 in postmaster_child_launch (child_type=B_BG_WORKER, child_slot=239,
startup_data=0x55a752dd8028, startup_data_len=1472, client_sock=0x0) at launch_backend.c:268
#14 0x000055a7514e530d in StartBackgroundWorker (rw=0x55a752dd8028) at postmaster.c:4168
#15 0x000055a7514e55a4 in maybe_start_bgworkers () at postmaster.c:4334
#16 0x000055a7514e4200 in LaunchMissingBackgroundProcesses () at postmaster.c:3408
#17 0x000055a7514e205b in ServerLoop () at postmaster.c:1728
#18 0x000055a7514e18b0 in PostmasterMain (argc=3, argv=0x55a752dd0e70) at postmaster.c:1403
#19 0x000055a75138eead in main (argc=3, argv=0x55a752dd0e70) at main.c:231
```
Thank you,
Rahila Syed
Вложения
Hi Rahila,
On Fri, Nov 21, 2025 at 8:45 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
>
> Hi,
>
> If a process encounters a FATAL error after acquiring a dshash lock but before releasing it,
> and it is not within a transaction, it can lead to a segmentation fault.
>
> The FATAL error causes the backend to exit, triggering proc_exit() and similar functions.
> In the absence of a transaction, LWLockReleaseAll() is delayed until ProcKill. ProcKill is
> an on_shmem_exit callback, and dsm_backend_shutdown() is called before any
> on_shmem_exit callbacks are invoked.
> Consequently, if a dshash lock was acquired before the FATAL error occurred, the lock
> will only be released after dsm_backend_shutdown() detaches the DSM segment containing
> the lock, resulting in a segmentation fault.
Thanks for the report.
I am not super familiar with this code path, but this raises a broader
question for me: are there other resources residing in DSM (besides
LWLocks) that might be accessed during the shutdown sequence?
We know dshash and dsa locks (LWLocks) face this risk because ProcKill
runs as an on_shmem_exit callback, which happens after
dsm_backend_shutdown() has already detached the memory.
This patch fixes the specific case for LWLocks, but if there are any
other on_shmem_exit callbacks that attempt to touch DSM memory, they
would still trigger a similar segfault. Do we need to worry about
other cleanup routines, or is ProcKill the only consumer of DSM data
at this stage?
> Please find a reproducer attached. I have modified the test_dsm_registry module to create
> a background worker that does nothing but throws a FATAL error after acquiring the dshash lock.
> The reason this must be executed in the background worker is to ensure it runs without a transaction.
>
> To trigger the segmentation fault, apply the 0001-Reproducer* patch, run make install in the
> test_dsm_registry module, specify test_dsm_registry as shared_preload_libraries in postgresql.conf,
> and start the server.
>
> Please find attached a fix to call LWLockReleaseAll() early in the shmem_exit() routine. This ensures
> that the dshash lock is released before dsm_backend_shutdown() is called. This will also ensure that
> any subsequent callbacks invoked in shmem_exit() will not fail to acquire any lock.
@@ -229,6 +230,14 @@ shmem_exit(int code)
{
shmem_exit_inprogress = true;
+ /*
+ * Make sure we release any pending locks so that any callbacks called
+ * subsequently do not fail to acquire any locks. This also fixes a seg
+ * fault due to releasing a dshash lock after the dsm segment containing
+ * the lock has been detached by dsm_backend_shutdown().
+ */
+ LWLockReleaseAll();
+
/*
* Call before_shmem_exit callbacks.
*
Again, not an expert, but I am concerned about placing
LWLockReleaseAll() at the very top, before before_shmem_exit()
callbacks run.
One of those callbacks might rely on locks being held or assume the
consistency of shared memory structures protected by those locks. It
seems safer to sandwich the release between the two callback lists:
after before_shmem_exit is done, but before dsm_backend_shutdown()
runs.
If we move it there, we should also reword the comment to focus on the
resource lifespan rather than just the symptom ("seg fault"):
/*
* Release all LWLocks before we tear down DSM segments.
*
* LWLocks that reside in dynamic shared memory (e.g., dshash partition
* locks) must be released before the backing segment is detached by
* dsm_backend_shutdown(). If we wait for ProcKill (via on_shmem_exit),
* the memory will already be unmapped, leading to access violations.
*/
--
Thanks, Amit Langote
On Tue, Dec 2, 2025 at 9:40 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Hi Rahila,
>
> On Fri, Nov 21, 2025 at 8:45 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
> >
> > Hi,
> >
> > If a process encounters a FATAL error after acquiring a dshash lock but before releasing it,
> > and it is not within a transaction, it can lead to a segmentation fault.
> >
> > The FATAL error causes the backend to exit, triggering proc_exit() and similar functions.
> > In the absence of a transaction, LWLockReleaseAll() is delayed until ProcKill. ProcKill is
> > an on_shmem_exit callback, and dsm_backend_shutdown() is called before any
> > on_shmem_exit callbacks are invoked.
> > Consequently, if a dshash lock was acquired before the FATAL error occurred, the lock
> > will only be released after dsm_backend_shutdown() detaches the DSM segment containing
> > the lock, resulting in a segmentation fault.
>
> Thanks for the report.
>
> I am not super familiar with this code path, but this raises a broader
> question for me: are there other resources residing in DSM (besides
> LWLocks) that might be accessed during the shutdown sequence?
>
> We know dshash and dsa locks (LWLocks) face this risk because ProcKill
> runs as an on_shmem_exit callback, which happens after
> dsm_backend_shutdown() has already detached the memory.
>
> This patch fixes the specific case for LWLocks, but if there are any
> other on_shmem_exit callbacks that attempt to touch DSM memory, they
> would still trigger a similar segfault. Do we need to worry about
> other cleanup routines, or is ProcKill the only consumer of DSM data
> at this stage?
>
> > Please find a reproducer attached. I have modified the test_dsm_registry module to create
> > a background worker that does nothing but throws a FATAL error after acquiring the dshash lock.
> > The reason this must be executed in the background worker is to ensure it runs without a transaction.
> >
> > To trigger the segmentation fault, apply the 0001-Reproducer* patch, run make install in the
> > test_dsm_registry module, specify test_dsm_registry as shared_preload_libraries in postgresql.conf,
> > and start the server.
> >
> > Please find attached a fix to call LWLockReleaseAll() early in the shmem_exit() routine. This ensures
> > that the dshash lock is released before dsm_backend_shutdown() is called. This will also ensure that
> > any subsequent callbacks invoked in shmem_exit() will not fail to acquire any lock.
>
> @@ -229,6 +230,14 @@ shmem_exit(int code)
> {
> shmem_exit_inprogress = true;
>
> + /*
> + * Make sure we release any pending locks so that any callbacks called
> + * subsequently do not fail to acquire any locks. This also fixes a seg
> + * fault due to releasing a dshash lock after the dsm segment containing
> + * the lock has been detached by dsm_backend_shutdown().
> + */
> + LWLockReleaseAll();
> +
> /*
> * Call before_shmem_exit callbacks.
> *
>
> Again, not an expert, but I am concerned about placing
> LWLockReleaseAll() at the very top, before before_shmem_exit()
> callbacks run.
>
> One of those callbacks might rely on locks being held or assume the
> consistency of shared memory structures protected by those locks. It
> seems safer to sandwich the release between the two callback lists:
> after before_shmem_exit is done, but before dsm_backend_shutdown()
> runs.
I think it makes sense to place LWLockReleaseAll() right before the
dsm_backend_shutdown() which is detaching the process from the dsm
desgments.
--
Regards,
Dilip Kumar
Google
Hi,
On 2025-12-02 13:10:29 +0900, Amit Langote wrote:
> On Fri, Nov 21, 2025 at 8:45 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
> >
> > Hi,
> >
> > If a process encounters a FATAL error after acquiring a dshash lock but before releasing it,
> > and it is not within a transaction, it can lead to a segmentation fault.
> >
> > The FATAL error causes the backend to exit, triggering proc_exit() and similar functions.
> > In the absence of a transaction, LWLockReleaseAll() is delayed until ProcKill. ProcKill is
> > an on_shmem_exit callback, and dsm_backend_shutdown() is called before any
> > on_shmem_exit callbacks are invoked.
> > Consequently, if a dshash lock was acquired before the FATAL error occurred, the lock
> > will only be released after dsm_backend_shutdown() detaches the DSM segment containing
> > the lock, resulting in a segmentation fault.
>
> Thanks for the report.
>
> I am not super familiar with this code path, but this raises a broader
> question for me: are there other resources residing in DSM (besides
> LWLocks) that might be accessed during the shutdown sequence?
>
> We know dshash and dsa locks (LWLocks) face this risk because ProcKill
> runs as an on_shmem_exit callback, which happens after
> dsm_backend_shutdown() has already detached the memory.
>
> This patch fixes the specific case for LWLocks, but if there are any
> other on_shmem_exit callbacks that attempt to touch DSM memory, they
> would still trigger a similar segfault. Do we need to worry about
> other cleanup routines, or is ProcKill the only consumer of DSM data
> at this stage?
I don't think it's really right to frame it as ProcKill() being a consumer of
DSM data - it's just releasing all held lwlocks, and we happen to hold an
lwlock inside a DSM in the problematic case...
There are many other places that do LWLockReleaseAll()...
> > Please find a reproducer attached. I have modified the test_dsm_registry module to create
> > a background worker that does nothing but throws a FATAL error after acquiring the dshash lock.
> > The reason this must be executed in the background worker is to ensure it runs without a transaction.
> >
> > To trigger the segmentation fault, apply the 0001-Reproducer* patch, run make install in the
> > test_dsm_registry module, specify test_dsm_registry as shared_preload_libraries in postgresql.conf,
> > and start the server.
> >
> > Please find attached a fix to call LWLockReleaseAll() early in the shmem_exit() routine. This ensures
> > that the dshash lock is released before dsm_backend_shutdown() is called. This will also ensure that
> > any subsequent callbacks invoked in shmem_exit() will not fail to acquire any lock.
>
> @@ -229,6 +230,14 @@ shmem_exit(int code)
> {
> shmem_exit_inprogress = true;
>
> + /*
> + * Make sure we release any pending locks so that any callbacks called
> + * subsequently do not fail to acquire any locks. This also fixes a seg
> + * fault due to releasing a dshash lock after the dsm segment containing
> + * the lock has been detached by dsm_backend_shutdown().
> + */
> + LWLockReleaseAll();
> +
> /*
> * Call before_shmem_exit callbacks.
> *
>
> Again, not an expert, but I am concerned about placing
> LWLockReleaseAll() at the very top, before before_shmem_exit()
> callbacks run.
I think it's actually kind of required for correctness, independent of this
crash. If we errored out while holding an lwlock, we cannot reliably run
*any* code acquiring an lwlock, because lwlocks are not reentrant.
> One of those callbacks might rely on locks being held or assume the
> consistency of shared memory structures protected by those locks. It
> seems safer to sandwich the release between the two callback lists:
> after before_shmem_exit is done, but before dsm_backend_shutdown()
> runs.
I don't agree. You *cannot* rely on lwlocks working after an error before you
have called LWLockReleaseAll(). I.e. currently *any* use of lwlocks in
before_shmem_exit is unsafe. The only reason we haven't really noticed that is
that most of the top-level error handlers (i.e. sigsetjmp()s) do an
AbortCurrentTransaction(), which does an LWLockReleaseAll() if in a tx, and
most lwlock acquisitions happen within a transaction. But if you ever do stuff
outside of a transaction, the AbortCurrentTransaction() won't do
LWLockReleaseAll(), and you're in trouble, as the case here.
IOW, I think we need to do LWLockReleaseAll() at the top of proc_exit(), at
least in case of FATAL errors.
We probably should add a note to LWLockReleaseAll() indicating that we rely on
LWLockReleaseAll() working even if CreateLWLocks()/InitializeLWLocks() etc
hasn't yet been called...
Greetings,
Andres Freund
On Thu, Dec 4, 2025 at 12:33 AM Andres Freund <andres@anarazel.de> wrote:
> On 2025-12-02 13:10:29 +0900, Amit Langote wrote:
> > On Fri, Nov 21, 2025 at 8:45 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
> > > If a process encounters a FATAL error after acquiring a dshash lock but before releasing it,
> > > and it is not within a transaction, it can lead to a segmentation fault.
> > >
> > > The FATAL error causes the backend to exit, triggering proc_exit() and similar functions.
> > > In the absence of a transaction, LWLockReleaseAll() is delayed until ProcKill. ProcKill is
> > > an on_shmem_exit callback, and dsm_backend_shutdown() is called before any
> > > on_shmem_exit callbacks are invoked.
> > > Consequently, if a dshash lock was acquired before the FATAL error occurred, the lock
> > > will only be released after dsm_backend_shutdown() detaches the DSM segment containing
> > > the lock, resulting in a segmentation fault.
> >
> > Thanks for the report.
> >
> > I am not super familiar with this code path, but this raises a broader
> > question for me: are there other resources residing in DSM (besides
> > LWLocks) that might be accessed during the shutdown sequence?
> >
> > We know dshash and dsa locks (LWLocks) face this risk because ProcKill
> > runs as an on_shmem_exit callback, which happens after
> > dsm_backend_shutdown() has already detached the memory.
> >
> > This patch fixes the specific case for LWLocks, but if there are any
> > other on_shmem_exit callbacks that attempt to touch DSM memory, they
> > would still trigger a similar segfault. Do we need to worry about
> > other cleanup routines, or is ProcKill the only consumer of DSM data
> > at this stage?
>
> I don't think it's really right to frame it as ProcKill() being a consumer of
> DSM data - it's just releasing all held lwlocks, and we happen to hold an
> lwlock inside a DSM in the problematic case...
>
> There are many other places that do LWLockReleaseAll()...
Sure, I was just wondering if there might be other stuff in these DSM
segment possibly being accessible from on_shmem_exit callbacks. But,
maybe we don't have to address all such risks in this patch.
> > > Please find a reproducer attached. I have modified the test_dsm_registry module to create
> > > a background worker that does nothing but throws a FATAL error after acquiring the dshash lock.
> > > The reason this must be executed in the background worker is to ensure it runs without a transaction.
> > >
> > > To trigger the segmentation fault, apply the 0001-Reproducer* patch, run make install in the
> > > test_dsm_registry module, specify test_dsm_registry as shared_preload_libraries in postgresql.conf,
> > > and start the server.
> > >
> > > Please find attached a fix to call LWLockReleaseAll() early in the shmem_exit() routine. This ensures
> > > that the dshash lock is released before dsm_backend_shutdown() is called. This will also ensure that
> > > any subsequent callbacks invoked in shmem_exit() will not fail to acquire any lock.
> >
> > @@ -229,6 +230,14 @@ shmem_exit(int code)
> > {
> > shmem_exit_inprogress = true;
> >
> > + /*
> > + * Make sure we release any pending locks so that any callbacks called
> > + * subsequently do not fail to acquire any locks. This also fixes a seg
> > + * fault due to releasing a dshash lock after the dsm segment containing
> > + * the lock has been detached by dsm_backend_shutdown().
> > + */
> > + LWLockReleaseAll();
> > +
> > /*
> > * Call before_shmem_exit callbacks.
> > *
> >
> > Again, not an expert, but I am concerned about placing
> > LWLockReleaseAll() at the very top, before before_shmem_exit()
> > callbacks run.
>
> I think it's actually kind of required for correctness, independent of this
> crash. If we errored out while holding an lwlock, we cannot reliably run
> *any* code acquiring an lwlock, because lwlocks are not reentrant.
>
> > One of those callbacks might rely on locks being held or assume the
> > consistency of shared memory structures protected by those locks. It
> > seems safer to sandwich the release between the two callback lists:
> > after before_shmem_exit is done, but before dsm_backend_shutdown()
> > runs.
>
> I don't agree. You *cannot* rely on lwlocks working after an error before you
> have called LWLockReleaseAll(). I.e. currently *any* use of lwlocks in
> before_shmem_exit is unsafe. The only reason we haven't really noticed that is
> that most of the top-level error handlers (i.e. sigsetjmp()s) do an
> AbortCurrentTransaction(), which does an LWLockReleaseAll() if in a tx, and
> most lwlock acquisitions happen within a transaction. But if you ever do stuff
> outside of a transaction, the AbortCurrentTransaction() won't do
> LWLockReleaseAll(), and you're in trouble, as the case here.
>
> IOW, I think we need to do LWLockReleaseAll() at the top of proc_exit(), at
> least in case of FATAL errors.
Oh, so not at the top of not shmem_exit() as Rahila has proposed?
> We probably should add a note to LWLockReleaseAll() indicating that we rely on
> LWLockReleaseAll() working even if CreateLWLocks()/InitializeLWLocks() etc
> hasn't yet been called...
Makes sense. AFAICS, num_held_lwlocks would be 0 in that case, so
LWLockReleaseAll() would be a no-op.
--
Thanks, Amit Langote
Hi, On 2025-12-04 11:06:20 +0900, Amit Langote wrote: > On Thu, Dec 4, 2025 at 12:33 AM Andres Freund <andres@anarazel.de> wrote: > > I don't agree. You *cannot* rely on lwlocks working after an error before you > > have called LWLockReleaseAll(). I.e. currently *any* use of lwlocks in > > before_shmem_exit is unsafe. The only reason we haven't really noticed that is > > that most of the top-level error handlers (i.e. sigsetjmp()s) do an > > AbortCurrentTransaction(), which does an LWLockReleaseAll() if in a tx, and > > most lwlock acquisitions happen within a transaction. But if you ever do stuff > > outside of a transaction, the AbortCurrentTransaction() won't do > > LWLockReleaseAll(), and you're in trouble, as the case here. > > > > IOW, I think we need to do LWLockReleaseAll() at the top of proc_exit(), at > > least in case of FATAL errors. > > Oh, so not at the top of not shmem_exit() as Rahila has proposed? Oh, ontop of shmem_exit() seems fine, what I was trying to express was that it should happen unconditionally at the start of exit processing, not just at the tail. > > We probably should add a note to LWLockReleaseAll() indicating that we rely on > > LWLockReleaseAll() working even if CreateLWLocks()/InitializeLWLocks() etc > > hasn't yet been called... > > Makes sense. AFAICS, num_held_lwlocks would be 0 in that case, so > LWLockReleaseAll() would be a no-op. Right. I just meant we should add a comment noting that we rely on that fact... Greetings, Andres Freund
Hi, On Fri, Dec 5, 2025 at 12:33 AM Andres Freund <andres@anarazel.de> wrote: > On 2025-12-04 11:06:20 +0900, Amit Langote wrote: > > On Thu, Dec 4, 2025 at 12:33 AM Andres Freund <andres@anarazel.de> wrote: > > > I don't agree. You *cannot* rely on lwlocks working after an error before you > > > have called LWLockReleaseAll(). I.e. currently *any* use of lwlocks in > > > before_shmem_exit is unsafe. The only reason we haven't really noticed that is > > > that most of the top-level error handlers (i.e. sigsetjmp()s) do an > > > AbortCurrentTransaction(), which does an LWLockReleaseAll() if in a tx, and > > > most lwlock acquisitions happen within a transaction. But if you ever do stuff > > > outside of a transaction, the AbortCurrentTransaction() won't do > > > LWLockReleaseAll(), and you're in trouble, as the case here. > > > > > > IOW, I think we need to do LWLockReleaseAll() at the top of proc_exit(), at > > > least in case of FATAL errors. > > > > Oh, so not at the top of not shmem_exit() as Rahila has proposed? > > Oh, ontop of shmem_exit() seems fine, what I was trying to express was that it > should happen unconditionally at the start of exit processing, not just at the > tail. Ah, ok. I was talking about this with Rahila today and she pointed out to me that whether we add it to the top of proc_exit() or shmem_exit() doesn't really make any material difference to the fact that it will get done at the start of exit processing as you say, at least today. So I think we can keep it like Rahila originally proposed. > > > We probably should add a note to LWLockReleaseAll() indicating that we rely on > > > LWLockReleaseAll() working even if CreateLWLocks()/InitializeLWLocks() etc > > > hasn't yet been called... > > > > Makes sense. AFAICS, num_held_lwlocks would be 0 in that case, so > > LWLockReleaseAll() would be a no-op. > > Right. I just meant we should add a comment noting that we rely on that > fact... Ok, got it. Maybe like this: @@ -1940,6 +1940,10 @@ LWLockReleaseClearVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val) * unchanged by this operation. This is necessary since InterruptHoldoffCount * has been set to an appropriate level earlier in error recovery. We could * decrement it below zero if we allow it to drop for each released lock! + * + * Note that this function must be safe to call even if the LWLock subsystem + * hasn't been initialized (e.g., during early startup error recovery). + * In that case, num_held_lwlocks will be 0, and we'll do nothing. */ void LWLockReleaseAll(void) -- Thanks, Amit Langote
On Fri, Dec 5, 2025 at 1:03 AM Amit Langote <amitlangote09@gmail.com> wrote:
> On Fri, Dec 5, 2025 at 12:33 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2025-12-04 11:06:20 +0900, Amit Langote wrote:
> > > On Thu, Dec 4, 2025 at 12:33 AM Andres Freund <andres@anarazel.de> wrote:
> > > > I don't agree. You *cannot* rely on lwlocks working after an error before you
> > > > have called LWLockReleaseAll(). I.e. currently *any* use of lwlocks in
> > > > before_shmem_exit is unsafe. The only reason we haven't really noticed that is
> > > > that most of the top-level error handlers (i.e. sigsetjmp()s) do an
> > > > AbortCurrentTransaction(), which does an LWLockReleaseAll() if in a tx, and
> > > > most lwlock acquisitions happen within a transaction. But if you ever do stuff
> > > > outside of a transaction, the AbortCurrentTransaction() won't do
> > > > LWLockReleaseAll(), and you're in trouble, as the case here.
> > > >
> > > > IOW, I think we need to do LWLockReleaseAll() at the top of proc_exit(), at
> > > > least in case of FATAL errors.
> > >
> > > Oh, so not at the top of not shmem_exit() as Rahila has proposed?
> >
> > Oh, ontop of shmem_exit() seems fine, what I was trying to express was that it
> > should happen unconditionally at the start of exit processing, not just at the
> > tail.
>
> Ah, ok. I was talking about this with Rahila today and she pointed
> out to me that whether we add it to the top of proc_exit() or
> shmem_exit() doesn't really make any material difference to the fact
> that it will get done at the start of exit processing as you say, at
> least today. So I think we can keep it like Rahila originally
> proposed.
>
> > > > We probably should add a note to LWLockReleaseAll() indicating that we rely on
> > > > LWLockReleaseAll() working even if CreateLWLocks()/InitializeLWLocks() etc
> > > > hasn't yet been called...
> > >
> > > Makes sense. AFAICS, num_held_lwlocks would be 0 in that case, so
> > > LWLockReleaseAll() would be a no-op.
> >
> > Right. I just meant we should add a comment noting that we rely on that
> > fact...
>
> Ok, got it. Maybe like this:
>
> @@ -1940,6 +1940,10 @@ LWLockReleaseClearVar(LWLock *lock,
> pg_atomic_uint64 *valptr, uint64 val)
> * unchanged by this operation. This is necessary since InterruptHoldoffCount
> * has been set to an appropriate level earlier in error recovery. We could
> * decrement it below zero if we allow it to drop for each released lock!
> + *
> + * Note that this function must be safe to call even if the LWLock subsystem
> + * hasn't been initialized (e.g., during early startup error recovery).
> + * In that case, num_held_lwlocks will be 0, and we'll do nothing.
> */
> void
> LWLockReleaseAll(void)
I have done that in the attached v2.
I also updated the comment that describes before_shmem_exit callbacks,
removing the "such as releasing lwlocks" example since we now do that
before the callbacks run:
/*
* Call before_shmem_exit callbacks.
*
* These should be things that need most of the system to still be up and
* working, such as cleanup of temp relations, which requires catalog
- * access; or things that need to be completed because later cleanup steps
- * depend on them, such as releasing lwlocks.
+ * access.
*/
elog(DEBUG3, "shmem_exit(%d): %d before_shmem_exit callbacks to make",
code, before_shmem_exit_index);
Does anyone see a problem with that change?
I've also drafted a commit message that explains the DSM detachment
issue and notes that this is safe because locks are in an
unpredictable state after errors anyway. Let me know if you'd like me
to adjust the emphasis.
Rahila mentioned adding a test case exercising the fixed code path,
but I don't think that's strictly necessary because reproducing the
exact failure scenario (background worker hitting FATAL while holding
a dshash lock) would be complex and potentially fragile as a
regression test. Happy to add one if others feel strongly about it,
though.
--
Thanks, Amit Langote
Вложения
On Tue, Dec 16, 2025 at 6:29 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Fri, Dec 5, 2025 at 1:03 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Fri, Dec 5, 2025 at 12:33 AM Andres Freund <andres@anarazel.de> wrote: > > > On 2025-12-04 11:06:20 +0900, Amit Langote wrote: > > > > On Thu, Dec 4, 2025 at 12:33 AM Andres Freund <andres@anarazel.de> wrote: > > > > > I don't agree. You *cannot* rely on lwlocks working after an error before you > > > > > have called LWLockReleaseAll(). I.e. currently *any* use of lwlocks in > > > > > before_shmem_exit is unsafe. The only reason we haven't really noticed that is > > > > > that most of the top-level error handlers (i.e. sigsetjmp()s) do an > > > > > AbortCurrentTransaction(), which does an LWLockReleaseAll() if in a tx, and > > > > > most lwlock acquisitions happen within a transaction. But if you ever do stuff > > > > > outside of a transaction, the AbortCurrentTransaction() won't do > > > > > LWLockReleaseAll(), and you're in trouble, as the case here. > > > > > > > > > > IOW, I think we need to do LWLockReleaseAll() at the top of proc_exit(), at > > > > > least in case of FATAL errors. > > > > > > > > Oh, so not at the top of not shmem_exit() as Rahila has proposed? > > > > > > Oh, ontop of shmem_exit() seems fine, what I was trying to express was that it > > > should happen unconditionally at the start of exit processing, not just at the > > > tail. > > > > Ah, ok. I was talking about this with Rahila today and she pointed > > out to me that whether we add it to the top of proc_exit() or > > shmem_exit() doesn't really make any material difference to the fact > > that it will get done at the start of exit processing as you say, at > > least today. So I think we can keep it like Rahila originally > > proposed. > > > > > > > We probably should add a note to LWLockReleaseAll() indicating that we rely on > > > > > LWLockReleaseAll() working even if CreateLWLocks()/InitializeLWLocks() etc > > > > > hasn't yet been called... > > > > > > > > Makes sense. AFAICS, num_held_lwlocks would be 0 in that case, so > > > > LWLockReleaseAll() would be a no-op. > > > > > > Right. I just meant we should add a comment noting that we rely on that > > > fact... > > > > Ok, got it. Maybe like this: > > > > @@ -1940,6 +1940,10 @@ LWLockReleaseClearVar(LWLock *lock, > > pg_atomic_uint64 *valptr, uint64 val) > > * unchanged by this operation. This is necessary since InterruptHoldoffCount > > * has been set to an appropriate level earlier in error recovery. We could > > * decrement it below zero if we allow it to drop for each released lock! > > + * > > + * Note that this function must be safe to call even if the LWLock subsystem > > + * hasn't been initialized (e.g., during early startup error recovery). > > + * In that case, num_held_lwlocks will be 0, and we'll do nothing. > > */ > > void > > LWLockReleaseAll(void) > > I have done that in the attached v2. > > I also updated the comment that describes before_shmem_exit callbacks, > removing the "such as releasing lwlocks" example since we now do that > before the callbacks run: > > /* > * Call before_shmem_exit callbacks. > * > * These should be things that need most of the system to still be up and > * working, such as cleanup of temp relations, which requires catalog > - * access; or things that need to be completed because later cleanup steps > - * depend on them, such as releasing lwlocks. > + * access. > */ > elog(DEBUG3, "shmem_exit(%d): %d before_shmem_exit callbacks to make", > code, before_shmem_exit_index); > > Does anyone see a problem with that change? > > I've also drafted a commit message that explains the DSM detachment > issue and notes that this is safe because locks are in an > unpredictable state after errors anyway. Let me know if you'd like me > to adjust the emphasis. > > Rahila mentioned adding a test case exercising the fixed code path, > but I don't think that's strictly necessary because reproducing the > exact failure scenario (background worker hitting FATAL while holding > a dshash lock) would be complex and potentially fragile as a > regression test. Happy to add one if others feel strongly about it, > though. Oops, forgot an #include. Fixed in the attached. -- Thanks, Amit Langote
Вложения
Hi Amit,
Oops, forgot an #include. Fixed in the attached.
Thank you for the updated patch and the detailed commit message. You have explained the problem
quite well and the changes look good to me.
I would just add one more comment, which I have attached as a separate patch with this email.
Please have a look.
Thank you,
Rahila Syed
Вложения
Hi Rahila,
On Wed, Dec 17, 2025 at 3:57 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
>
> Hi Amit,
>
>
>>
>> Oops, forgot an #include. Fixed in the attached.
>>
>
> Thank you for the updated patch and the detailed commit message. You have explained the problem
> quite well and the changes look good to me.
Thanks for looking.
> I would just add one more comment, which I have attached as a separate patch with this email.
> Please have a look.
/*
* Release any LWLocks we might be holding, before running callbacks that
- * may detach the memory containing those locks.
+ * may detach the memory containing those locks. Releasing all the locks
+ * ensures that any callbacks executed afterward will be able to acquire
+ * any lock.
*/
Hmm, I'm not sure I follow. Maybe it has to do with something you
were trying to do when you ran into this bug, but why would callbacks
be acquiring locks after an error and why would it be safe to do so?
Are you saying that LWLockReleaseAll() cleans up unsafe-to-access
locks so that new ones can be taken after that point?
--
Thanks, Amit Langote
Hi Amit,
/*
* Release any LWLocks we might be holding, before running callbacks that
- * may detach the memory containing those locks.
+ * may detach the memory containing those locks. Releasing all the locks
+ * ensures that any callbacks executed afterward will be able to acquire
+ * any lock.
*/
Hmm, I'm not sure I follow. Maybe it has to do with something you
were trying to do when you ran into this bug, but why would callbacks
be acquiring locks after an error and why would it be safe to do so?
It seems common for both before and on shmem exit callbacks to acquire locks.
For instance, RemoveTempRelationsCallback eventually calls performDeletion,
For instance, RemoveTempRelationsCallback eventually calls performDeletion,
which acquires an LW lock.
Are you saying that LWLockReleaseAll() cleans up unsafe-to-access
locks so that new ones can be taken after that point?
Yes, what I mean is that acquiring a lock within a callback will work,
regardless of whether it was held when the error occurred, since we ensure
all locks are released before executing the callbacks.
regardless of whether it was held when the error occurred, since we ensure
all locks are released before executing the callbacks.
Thank you,
Rahila Syed
On Wed, Dec 17, 2025 at 19:37 Rahila Syed <rahilasyed90@gmail.com> wrote: > Hi Amit, >> /* >> * Release any LWLocks we might be holding, before running callbacks that >> - * may detach the memory containing those locks. >> + * may detach the memory containing those locks. Releasing all the locks >> + * ensures that any callbacks executed afterward will be able to acquire >> + * any lock. >> */ >> >> Hmm, I'm not sure I follow. Maybe it has to do with something you >> were trying to do when you ran into this bug, but why would callbacks >> be acquiring locks after an error and why would it be safe to do so? > > It seems common for both before and on shmem exit callbacks to acquire locks. > For instance, RemoveTempRelationsCallback eventually calls performDeletion, > which acquires an LW lock. >> >> Are you saying that LWLockReleaseAll() cleans up unsafe-to-access >> locks so that new ones can be taken after that point? > > Yes, what I mean is that acquiring a lock within a callback will work, > regardless of whether it was held when the error occurred, since we ensure > all locks are released before executing the callbacks. I get confused easily it seems. :-) You’re right, this is how I understood it too the last time we were here, but I remembered something Andres wrote upthread and interpreted it wrongly in my head. Also, now that I read the full comment, the first sentence doesn’t look quite right; callbacks don’t detach segments. How about rewording the comment a bit, like this: /* * Release any LWLocks we might be holding before callbacks run. This * prevents accessing locks in detached DSM segments and allows callbacks * to acquire new locks. */ -- Thanks, Amit Langote
Hi Amit,
/*
* Release any LWLocks we might be holding before callbacks run. This
* prevents accessing locks in detached DSM segments and allows callbacks
* to acquire new locks.
*/
This looks good to me.
Thank you,
Rahila Syed
On Thu, Dec 18, 2025 at 2:01 PM Rahila Syed <rahilasyed90@gmail.com> wrote: > > Hi Amit, > >> >> /* >> * Release any LWLocks we might be holding before callbacks run. This >> * prevents accessing locks in detached DSM segments and allows callbacks >> * to acquire new locks. >> */ >> > > This looks good to me. Thanks. Updated the commit message too to be more accurate in the attached updated patch. I'll wait a bit more (until next year :)) before committing this so maybe others can comment on it. -- Thanks, Amit Langote
Вложения
On 2025-Dec-18, Amit Langote wrote: > Thanks. Updated the commit message too to be more accurate in the > attached updated patch. Looks good to me. I would add an Assert(num_held_lwlocks == 0) at the end of LWLockReleaseAll(), to make it clear that it's idempotent (which is important for the case where ProcKill will call it again shortly after). Are you going to push this soon? Looking at ProcKill, I notice that we do some LWLock ops after its LWLockReleaseAll() call, which seems a bit silly. Why not do that right after the "if (MyProc->lockGroupLeader != NULL)" block instead? Nothing uses LWLocks from there on. This can be a separate commit. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Hi Alvaro, On Wed, Jan 14, 2026 at 6:36 PM Álvaro Herrera <alvherre@kurilemu.de> wrote > On 2025-Dec-18, Amit Langote wrote: > > > Thanks. Updated the commit message too to be more accurate in the > > attached updated patch. > > Looks good to me. Thanks for looking. > I would add an Assert(num_held_lwlocks == 0) at the > end of LWLockReleaseAll(), to make it clear that it's idempotent (which > is important for the case where ProcKill will call it again shortly > after). Makes sense. Will do. > Are you going to push this soon? Yes, I will try tomorrow. > Looking at ProcKill, I notice that we do some LWLock ops after its > LWLockReleaseAll() call, which seems a bit silly. Why not do that right > after the "if (MyProc->lockGroupLeader != NULL)" block instead? Nothing > uses LWLocks from there on. This can be a separate commit. Just to confirm: you're suggesting moving the LWLockReleaseAll() call to after the "if (MyProc->lockGroupLeader != NULL)" block? Makes sense -- odd to release all locks right before then going ahead and acquiring one. Agreed it should be a separate commit. -- Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> writes:
> On Wed, Jan 14, 2026 at 6:36 PM Álvaro Herrera <alvherre@kurilemu.de> wrote
>> Looking at ProcKill, I notice that we do some LWLock ops after its
>> LWLockReleaseAll() call, which seems a bit silly. Why not do that right
>> after the "if (MyProc->lockGroupLeader != NULL)" block instead? Nothing
>> uses LWLocks from there on. This can be a separate commit.
> Just to confirm: you're suggesting moving the LWLockReleaseAll() call
> to after the "if (MyProc->lockGroupLeader != NULL)" block? Makes sense
> -- odd to release all locks right before then going ahead and
> acquiring one. Agreed it should be a separate commit.
I think the idea there might be to make sure that we have released
any pre-existing hold of that lock. Otherwise this could be
a self-deadlock.
regards, tom lane
On Thu, Jan 15, 2026 at 12:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > On Wed, Jan 14, 2026 at 6:36 PM Álvaro Herrera <alvherre@kurilemu.de> wrote > >> Looking at ProcKill, I notice that we do some LWLock ops after its > >> LWLockReleaseAll() call, which seems a bit silly. Why not do that right > >> after the "if (MyProc->lockGroupLeader != NULL)" block instead? Nothing > >> uses LWLocks from there on. This can be a separate commit. > > > Just to confirm: you're suggesting moving the LWLockReleaseAll() call > > to after the "if (MyProc->lockGroupLeader != NULL)" block? Makes sense > > -- odd to release all locks right before then going ahead and > > acquiring one. Agreed it should be a separate commit. > > I think the idea there might be to make sure that we have released > any pre-existing hold of that lock. Otherwise this could be > a self-deadlock. Hmm, good point. Though with this patch, which adds LWLockReleaseAll() at the start of shmem_exit(), we would have already released any such lock before we get to ProcKill(). But, probably best to leave ProcKill() alone given this subtlety. -- Thanks, Amit Langote
On Thu, Jan 15, 2026 at 8:57 Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Jan 15, 2026 at 12:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Wed, Jan 14, 2026 at 6:36 PM Álvaro Herrera <alvherre@kurilemu.de> wrote
> >> Looking at ProcKill, I notice that we do some LWLock ops after its
> >> LWLockReleaseAll() call, which seems a bit silly. Why not do that right
> >> after the "if (MyProc->lockGroupLeader != NULL)" block instead? Nothing
> >> uses LWLocks from there on. This can be a separate commit.
>
> > Just to confirm: you're suggesting moving the LWLockReleaseAll() call
> > to after the "if (MyProc->lockGroupLeader != NULL)" block? Makes sense
> > -- odd to release all locks right before then going ahead and
> > acquiring one. Agreed it should be a separate commit.
>
> I think the idea there might be to make sure that we have released
> any pre-existing hold of that lock. Otherwise this could be
> a self-deadlock.
Hmm, good point. Though with this patch, which adds LWLockReleaseAll()
at the start of shmem_exit(), we would have already released any such
lock before we get to ProcKill().
Scratch that. shmem_exit() is hardly the only caller of ProcKill() and while the existing callers seem to be disciplined, any future callers may not always release locks beforehand.
Hi Álvaro, On Wed, Jan 14, 2026 at 10:56 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Jan 14, 2026 at 6:36 PM Álvaro Herrera <alvherre@kurilemu.de> wrote > > I would add an Assert(num_held_lwlocks == 0) at the > > end of LWLockReleaseAll(), to make it clear that it's idempotent (which > > is important for the case where ProcKill will call it again shortly > > after). > > Makes sense. Will do. Done and pushed. I struggled to write a comment to describe the Assert though. -- Thanks, Amit Langote