Обсуждение: [PATCH] Fix ProcKill lock-group vs procLatch recycle race
Hello all,
The following are the patches and demonstration material for a
concurrency bug in ProcKill() where a lock-group leader and a member can
exit in parallel.
------------------------------------------------------------------------
Problem
------------------------------------------------------------------------
If a leader detaches from the lock group under leader_lwlock but
has not yet reached DisownLatch(&MyProc->procLatch), a concurrent
last follower can still put the *leader* PGPROC on a free list, or
the leader and the follower can make inconsistent decisions about
*who* returns which PGPROC, so that a slot is linked into the free
list with procLatch still owned, or is pushed twice. A new backend
that recycles the slot can then hit:
PANIC: latch already owned by PID ...
A concrete interleaving (lock group leader vs last member)
is the following(PG15 code).
=== Lock group leader ===
L1: LWLockAcquire(leader_lwlock)
L2: dlist_delete(&MyProc->lockGroupLink)
The list contains follower.
L3: dlist_is_empty → false (the follower still in)
L4: else if (leader != MyProc) → false, do nothing
L5: LWLockRelease(leader_lwlock)
*lwlock released*
>>>WINDOW OPEN HERE<<<
L6: SwitchBackToLocalLatch()
L7: pgstat_reset_wait_event_storage()
L8: proc = MyProc; MyProc = NULL;
L9: DisownLatch(&proc->procLatch)
*only here owner_pid = 0*
L10: SpinLockAcquire(ProcStructLock)
L11: if (proc->lockGroupLeader == NULL) → false, skip
L12: SpinLockRelease(ProcStructLock)
===========================
=== What the last lock group member does in that WINDOW ===
M1: LWLockAcquire(leader_lwlock)
Can proceed as soon as L5 releases it
M2: dlist_delete(&MyProc->lockGroupLink)
The list becomes EMPTY
M3: dlist_is_empty → true
M4: leader->lockGroupLeader = NULL
Flip leader's "don't free me" flag
M5: leader != MyProc → true
M6: SpinLockAcquire(ProcStructLock)
M7: push leader's PGPROC onto freelist
!!! while leader's latch is still owned !!!
M8: SpinLockRelease(ProcStructLock)
M9: LWLockRelease(leader_lwlock)
===========================================================
At M7, the leader's PGPROC is back on freeProcs with
procLatch.owner_pid == leader_pid (non-zero). The leader is still
sitting between L5 and L9.
=== The new backend picks it up in InitProcess() ===
B1: SpinLockAcquire(ProcStructLock)
B2: MyProc = *procgloballist
pops the leader's PGPROC
B3: *procgloballist = MyProc->links.next
B4: SpinLockRelease(ProcStructLock)
B5: ...
B6: OwnLatch(&MyProc->procLatch)
→ owner_pid != 0
→ elog(PANIC, "latch already owned by PID %d", owner_pid);
=====================================================
Note that all PG versions since 9.6 are affected.
The fix is to run SwitchBackToLocalLatch(),
pgstat_reset_wait_event_storage() and DisownLatch() before the
lock-group block, and to decide under leader_lwlock
(push_leader / push_self) with a single freelist update section at
the end. This matches what we implemented; detailed commentary
is in the commit messages and in the patch.
Mainline parallel query usually avoids the race: the leader is not
expected to reach ProcKill with another group member still in play
the way some extension-driven workloads can be.
------------------------------------------------------------------------
Testing
------------------------------------------------------------------------
Technically, a regression test should be included. However, developing
one is non-trivial given the current PG18 injection point
implementation. Because ProcKill() partially de-initializes data
necessary for injection points to function, a significant workaround
would be required. Given the complexity of such a workaround, I would
like to discuss whether a test is mandatory for this patch.
I have provided versions without the test (0001, 0002) as well as a test
that uses an injection point workaround to reproduce the bug
deterministically(0003). If a test for the bug fix is required, please
note that I can only provide it for PG17+, as earlier versions do not
support injection points.
------------------------------------------------------------------------
Attached patches
------------------------------------------------------------------------
1) 0001-ProcKill-REL15-lockgroup-freelist-race.patch
Core proc.c for REL_15_STABLE.
No TAP: PG15 has no suitable injection support for a core
reproducer; fix-only for that line.
2) 0002-ProcKill-PG18-core-lockgroup-freelist-race.patch
Same *core* fix for PG18: proc.c only.
3) 0003-PG18-unfixed-repro-tap-injection-harness.patch
*Demonstration only* Shows the PANIC; not the shape to land after
the fix without replacing the test.
--
Best regards,
Vlad
Вложения
Hi Vlad, Concurrency problems are the most difficult to discover and reproduce. Thanks for that work! Since every patch is subject to testing, it is recommended to create a CommitFest card for it. Once in the CF-card, the patch set is picked up by CFBot and is thoroughly tested. You also will see whether the patch successfully applies to the target branch or not. You also might want to stick to a single PostgreSQL version in a thread, as all attached files are applied to the upstream during a CFBot job, which is impossible when a mail message includes patches for several PG versions simultaneously. Adhering to a single version per a thread and using a CommitFest card also make reviewers' task easier, for the reason that they can see the version and the base commit to which the patch should be applied. Patches for other PG's versions can probably be moved to separated threads. It is all by now. I am still looking into the code of the patch. I hope, later I can give some useful comments in regard to the solution itself. P.s. I tried to apply the patch to 3b28dad70e2. After 0002-ProcKill-PG18-core-lockgroup-freelist-race had been applied successfully, subsequent applying of 0003-PG18-unfixed-repro-tap-injection-harness failed. Best regards, Evgeny Voropaev, Tantor Labs, LLC.
> On 27 Apr 2026, at 13:14, Vlad Lesin <vladlesin@gmail.com> wrote: > > Problem > ------------------------------------------------------------------------ > > If a leader detaches from the lock group under leader_lwlock but > has not yet reached DisownLatch(&MyProc->procLatch), a concurrent > last follower can still put the *leader* PGPROC on a free list, or > the leader and the follower can make inconsistent decisions about > *who* returns which PGPROC, so that a slot is linked into the free > list with procLatch still owned, or is pushed twice. A new backend > that recycles the slot can then hit: > > PANIC: latch already owned by PID ... > > A concrete interleaving (lock group leader vs last member) > is the following(PG15 code). Yeah, the problem seems real to me. Moreover we had related buildfarm failures [0] and Deep from GP reported observing the problem there too. Yugabyte folks also observed this [1]. The invariant that latch should not be on freelist until it is disowned seems reasonable to me. But the test and the fix both are very confusing here. They are not patch steps as someone might expect given 0001,0002,0003 prefixes. They are not based on PG 18 as filenames states. To help resolve this confusion I'm posting following sequence: 1. vAB1-0001-Add-regression-test-for-ProcKill-lock-group-pro.patch This is an original test that is expected to demonstrate problem. It contains heavy injection points refactoring, I assume it's not intended for commit. This test was taken from a file 0003-PG18-unfixed-repro-tap-injection-harness.patch 2. vAB1-0002-Canonicalize-test-with-infrastructure.patch My changes needed to make test runnable. 3. vAB1-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch Fix for the problem, proposed by the thread starter, rebased on current HEAD and test patch. The test passes after this step. I would like to recommend author to make the patch leaner and easier for review. Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/CA%2BhUKGJ_0RGcr7oUNzcHdn7zHqHSB_wLSd3JyS2YC_DYB%2B-V%3Dg%40mail.gmail.com [1] https://github.com/yugabyte/yugabyte-db/issues/20309
Вложения
Andrey, thank you for your fixes. On 5/5/26 12:07, Andrey Borodin wrote: > To help resolve this confusion I'm posting following sequence: > > 1. vAB1-0001-Add-regression-test-for-ProcKill-lock-group-pro.patch > This is an original test that is expected to demonstrate problem. > It contains heavy injection points refactoring, I assume it's not intended for commit. > This test was taken from a file 0003-PG18-unfixed-repro-tap-injection-harness.patch > > 2. vAB1-0002-Canonicalize-test-with-infrastructure.patch > My changes needed to make test runnable. > > 3. vAB1-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch > Fix for the problem, proposed by the thread starter, rebased on current HEAD > and test patch. > The test passes after this step. Deferring pgstat_reset_wait_event_storage() call in (3) enables the test in (1) to work once (2) is applied. Without this change, the test hangs. It might make sense to commit the test. -- Best regards, Vlad
> On 5 May 2026, at 21:31, Vlad Lesin <vladlesin@gmail.com> wrote: > > It might make sense to commit the test. Let's try to work on it, maybe we can bring it to good shape to consider for a commit. cc to Michael: prockill_race needs to build the same InjectionPointCondition payload that injection_wait consumes to know which PID to block. The struct is currently private to injection_points.c, so the patch extracts it into a small header that prockill_race.c includes via a relative "../injection_points/" path. That works but feels non-idiomatic. Since injection_points grows organically to support new bug reproducers anyway, making the condition type part of its public header seems like a natural fit - but we are not sure the fix is committable as-is, so we wanted to ask before doing any more cleanup: is this refactor acceptable at all, and if so, would you prefer a proper installed header (as contrib/pg_plan_advice does) over the relative include? (attached step 0001, other steps are not about injection points infrastructure) To Vlad: I simplified the test as much as I could. The hand-rolled polling loop is replaced with poll_query_until, big inline comments explaining the controller-session trick and the PGPROC-scan rationale are moved into the C helper functions where the mechanism lives, and outcome classification is two plain ok() calls. The fix is intact. I did not find a way to make it simpler - early DisownLatch, decisions under leader_lwlock, deferred pushes under freeProcsLock, and the leader-skips-self-push case each address a distinct invariant the bug violated. What I have not fully reasoned through is whether the new ordering affects any surrounding invariants I might have missed, so a second further review of a bug fix would be good. Also maybe consider alternative possible names to prockill_race that would be idiomatic to stuff in test modules... It's kind of not relevant to current stage of the patch, but anyway. Thank you! Best regards, Andrey Borodin.
Вложения
On Wed, May 06, 2026 at 02:51:00PM +0500, Andrey Borodin wrote: > cc to Michael: > > prockill_race needs to build the same InjectionPointCondition payload that > injection_wait consumes to know which PID to block. The struct is currently > private to injection_points.c, so the patch extracts it into a small header > that prockill_race.c includes via a relative "../injection_points/" path. > That works but feels non-idiomatic. Since injection_points grows organically > to support new bug reproducers anyway, making the condition type part of its > public header seems like a natural fit - but we are not sure the fix is > committable as-is, so we wanted to ask before doing any more cleanup: is > this refactor acceptable at all, and if so, would you prefer a proper > installed header (as contrib/pg_plan_advice does) over the relative include? I did not look at the bug fix in details, so this is a comment about the structure of the test. +#include "../injection_points/injection_point_condition.h" Hmm. I would not see a problem in just moving all that to the module injection_points instead, and keep it there, including your TAP test. Noah has done something similar for its removable_cutoff() business, and we are living well with it. One issue with the structure you are proposing is that I suspect that it makes some installcheck scenarios more iffy to deal with. More callbacks in the test module is fine. -- Michael
Вложения
> On 7 May 2026, at 03:54, Michael Paquier <michael@paquier.xyz> wrote: > > +#include "../injection_points/injection_point_condition.h" > > Hmm. I would not see a problem in just moving all that to the module > injection_points instead, and keep it there, including your TAP test. > Noah has done something similar for its removable_cutoff() business, > and we are living well with it. One issue with the structure you are > proposing is that I suspect that it makes some installcheck scenarios > more iffy to deal with. More callbacks in the test module is fine. Thanks, Michael! Done so. I still keep Vlad's injection_point_condition.h though. It seems useful. But no more Meson\Makefile changes, and new sql stuff lives now near removable_cutoff() SQL. Vlad, there are 2 patches in the patchset again :) Now we need an expert in ProcKill(). I also will review the patchset in more detail, but, perhaps, after pgconf.dev. Best regards, Andrey Borodin.