Обсуждение: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

Поиск
Список
Период
Сортировка

[PATCH] Fix ProcKill lock-group vs procLatch recycle race

От
Vlad Lesin
Дата:
Hello all,

The following are the patches and demonstration material for a 
concurrency bug in ProcKill() where a lock-group leader and a member can 
exit in parallel.

------------------------------------------------------------------------
Problem
------------------------------------------------------------------------

If a leader detaches from the lock group under leader_lwlock but
has not yet reached DisownLatch(&MyProc->procLatch), a concurrent
last follower can still put the *leader* PGPROC on a free list, or
the leader and the follower can make inconsistent decisions about
*who* returns which PGPROC, so that a slot is linked into the free
list with procLatch still owned, or is pushed twice.  A new backend
that recycles the slot can then hit:

      PANIC: latch already owned by PID ...

A concrete interleaving (lock group leader vs last member)
is the following(PG15 code).

=== Lock group leader ===
L1: LWLockAcquire(leader_lwlock)
L2: dlist_delete(&MyProc->lockGroupLink)
      The list contains follower.
L3: dlist_is_empty → false (the follower still in)
L4: else if (leader != MyProc) → false, do nothing
L5: LWLockRelease(leader_lwlock)
      *lwlock released*

       >>>WINDOW OPEN HERE<<<

L6: SwitchBackToLocalLatch()
L7: pgstat_reset_wait_event_storage()
L8: proc = MyProc; MyProc = NULL;
L9: DisownLatch(&proc->procLatch)
      *only here owner_pid = 0*
L10: SpinLockAcquire(ProcStructLock)
L11: if (proc->lockGroupLeader == NULL) → false, skip
L12: SpinLockRelease(ProcStructLock)
===========================

=== What the last lock group member does in that WINDOW ===
M1: LWLockAcquire(leader_lwlock)
      Can proceed as soon as L5 releases it
M2: dlist_delete(&MyProc->lockGroupLink)
      The list becomes EMPTY
M3: dlist_is_empty → true
M4: leader->lockGroupLeader = NULL
      Flip leader's "don't free me" flag
M5: leader != MyProc → true
M6: SpinLockAcquire(ProcStructLock)
M7: push leader's PGPROC onto freelist
      !!! while leader's latch is still owned !!!
M8: SpinLockRelease(ProcStructLock)
M9: LWLockRelease(leader_lwlock)
===========================================================

At M7, the leader's PGPROC is back on freeProcs with
procLatch.owner_pid == leader_pid (non-zero). The leader is still
sitting between L5 and L9.

=== The new backend picks it up in InitProcess() ===
B1: SpinLockAcquire(ProcStructLock)
B2: MyProc = *procgloballist
      pops the leader's PGPROC
B3: *procgloballist = MyProc->links.next
B4: SpinLockRelease(ProcStructLock)
B5: ...
B6: OwnLatch(&MyProc->procLatch)
     → owner_pid != 0
     → elog(PANIC, "latch already owned by PID %d", owner_pid);
=====================================================

Note that all PG versions since 9.6 are affected.

The fix is to run SwitchBackToLocalLatch(),
pgstat_reset_wait_event_storage() and DisownLatch() before the
lock-group block, and to decide under leader_lwlock
(push_leader / push_self) with a single freelist update section at
the end.  This matches what we implemented; detailed commentary
is in the commit messages and in the patch.

Mainline parallel query usually avoids the race: the leader is not
expected to reach ProcKill with another group member still in play
the way some extension-driven workloads can be.

------------------------------------------------------------------------
Testing
------------------------------------------------------------------------

Technically, a regression test should be included. However, developing 
one is non-trivial given the current PG18 injection point 
implementation. Because ProcKill() partially de-initializes data 
necessary for injection points to function, a significant workaround 
would be required. Given the complexity of such a workaround, I would 
like to discuss whether a test is mandatory for this patch.

I have provided versions without the test (0001, 0002) as well as a test 
that uses an injection point workaround to reproduce the bug 
deterministically(0003). If a test for the bug fix is required, please 
note that I can only provide it for PG17+, as earlier versions do not 
support injection points.

------------------------------------------------------------------------
Attached patches
------------------------------------------------------------------------
   1) 0001-ProcKill-REL15-lockgroup-freelist-race.patch
       Core proc.c for REL_15_STABLE.
       No TAP: PG15 has no suitable injection support for a core
       reproducer; fix-only for that line.
   2) 0002-ProcKill-PG18-core-lockgroup-freelist-race.patch
       Same *core* fix for PG18: proc.c only.
   3) 0003-PG18-unfixed-repro-tap-injection-harness.patch
       *Demonstration only* Shows the PANIC; not the shape to land after
       the fix without replacing the test.

-- 
Best regards,
Vlad
Вложения

Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

От
Evgeny Voropaev
Дата:
Hi Vlad,

Concurrency problems are the most difficult to discover and
reproduce. Thanks for that work!

Since every patch is subject to testing, it is recommended to create a
CommitFest card for it. Once in the CF-card, the patch set is picked up
by CFBot and is thoroughly tested. You also will see whether the patch
successfully applies to the target branch or not.

You also might want to stick to a single PostgreSQL version in a thread,
as all attached files are applied to the upstream during a CFBot job,
which is impossible when a mail message includes patches for several
PG versions simultaneously. Adhering to a single version per a thread
and using a CommitFest card also make reviewers' task easier, for the
reason that they can see the version and the base commit to which the
patch should be applied. Patches for other PG's versions can probably
be moved to separated threads.

It is all by now. I am still looking into the code of the patch. I
hope, later I can give some useful comments in regard to the solution
itself.

P.s. I tried to apply the patch to 3b28dad70e2. After
0002-ProcKill-PG18-core-lockgroup-freelist-race had been applied
successfully, subsequent applying of
0003-PG18-unfixed-repro-tap-injection-harness failed.

Best regards,
Evgeny Voropaev,
Tantor Labs, LLC.



Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

От
Andrey Borodin
Дата:

> On 27 Apr 2026, at 13:14, Vlad Lesin <vladlesin@gmail.com> wrote:
>
> Problem
> ------------------------------------------------------------------------
>
> If a leader detaches from the lock group under leader_lwlock but
> has not yet reached DisownLatch(&MyProc->procLatch), a concurrent
> last follower can still put the *leader* PGPROC on a free list, or
> the leader and the follower can make inconsistent decisions about
> *who* returns which PGPROC, so that a slot is linked into the free
> list with procLatch still owned, or is pushed twice.  A new backend
> that recycles the slot can then hit:
>
>     PANIC: latch already owned by PID ...
>
> A concrete interleaving (lock group leader vs last member)
> is the following(PG15 code).

Yeah, the problem seems real to me. Moreover we had related buildfarm
failures [0] and Deep from GP reported observing the problem there too.
Yugabyte folks also observed this [1].

The invariant that latch should not be on freelist until it is disowned seems
reasonable to me.

But the test and the fix both are very confusing here. They are not patch steps
as someone might expect given 0001,0002,0003 prefixes. They are not based on
PG 18 as filenames states.

To help resolve this confusion I'm posting following sequence:

1. vAB1-0001-Add-regression-test-for-ProcKill-lock-group-pro.patch
This is an original test that is expected to demonstrate problem.
It contains heavy injection points refactoring, I assume it's not intended for commit.
This test was taken from a file 0003-PG18-unfixed-repro-tap-injection-harness.patch

2. vAB1-0002-Canonicalize-test-with-infrastructure.patch
My changes needed to make test runnable.

3. vAB1-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch
Fix for the problem, proposed by the thread starter, rebased on current HEAD
and test patch.
The test passes after this step.

I would like to recommend author to make the patch leaner and easier for review.


Best regards, Andrey Borodin.

[0]
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ_0RGcr7oUNzcHdn7zHqHSB_wLSd3JyS2YC_DYB%2B-V%3Dg%40mail.gmail.com
[1] https://github.com/yugabyte/yugabyte-db/issues/20309


Вложения

Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

От
Vlad Lesin
Дата:
Andrey, thank you for your fixes.

On 5/5/26 12:07, Andrey Borodin wrote:

> To help resolve this confusion I'm posting following sequence:
> 
> 1. vAB1-0001-Add-regression-test-for-ProcKill-lock-group-pro.patch
> This is an original test that is expected to demonstrate problem.
> It contains heavy injection points refactoring, I assume it's not intended for commit.
> This test was taken from a file 0003-PG18-unfixed-repro-tap-injection-harness.patch
> 
> 2. vAB1-0002-Canonicalize-test-with-infrastructure.patch
> My changes needed to make test runnable.
> 
> 3. vAB1-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch
> Fix for the problem, proposed by the thread starter, rebased on current HEAD
> and test patch.
> The test passes after this step.
Deferring pgstat_reset_wait_event_storage() call in (3) enables the test 
in (1) to work once (2) is applied. Without this change, the test hangs. 
It might make sense to commit the test.

-- 
Best regards,
Vlad



Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

От
Andrey Borodin
Дата:

> On 5 May 2026, at 21:31, Vlad Lesin <vladlesin@gmail.com> wrote:
>
> It might make sense to commit the test.

Let's try to work on it, maybe we can bring it to good shape to consider for
a commit.

cc to Michael:

prockill_race needs to build the same InjectionPointCondition payload that
injection_wait consumes to know which PID to block. The struct is currently
private to injection_points.c, so the patch extracts it into a small header
that prockill_race.c includes via a relative "../injection_points/" path.
That works but feels non-idiomatic. Since injection_points grows organically
to support new bug reproducers anyway, making the condition type part of its
public header seems like a natural fit - but we are not sure the fix is
committable as-is, so we wanted to ask before doing any more cleanup: is
this refactor acceptable at all, and if so, would you prefer a proper
installed header (as contrib/pg_plan_advice does) over the relative include?

(attached step 0001, other steps are not about injection points infrastructure)

To Vlad:

I simplified the test as much as I could.  The hand-rolled polling loop is
replaced with poll_query_until, big inline comments explaining the
controller-session trick and the PGPROC-scan rationale are moved into the
C helper functions where the mechanism lives, and outcome classification is
two plain ok() calls.

The fix is intact.  I did not find a way to make it simpler - early
DisownLatch, decisions under leader_lwlock, deferred pushes under
freeProcsLock, and the leader-skips-self-push case each address a distinct
invariant the bug violated.  What I have not fully reasoned through is
whether the new ordering affects any surrounding invariants I might have
missed, so a second further review of a bug fix would be good.

Also maybe consider alternative possible names to prockill_race that would be
idiomatic to stuff in test modules... It's kind of not relevant to current stage of the
patch, but anyway.

Thank you!


Best regards, Andrey Borodin.


Вложения

Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

От
Michael Paquier
Дата:
On Wed, May 06, 2026 at 02:51:00PM +0500, Andrey Borodin wrote:
> cc to Michael:
>
> prockill_race needs to build the same InjectionPointCondition payload that
> injection_wait consumes to know which PID to block. The struct is currently
> private to injection_points.c, so the patch extracts it into a small header
> that prockill_race.c includes via a relative "../injection_points/" path.
> That works but feels non-idiomatic. Since injection_points grows organically
> to support new bug reproducers anyway, making the condition type part of its
> public header seems like a natural fit - but we are not sure the fix is
> committable as-is, so we wanted to ask before doing any more cleanup: is
> this refactor acceptable at all, and if so, would you prefer a proper
> installed header (as contrib/pg_plan_advice does) over the relative include?

I did not look at the bug fix in details, so this is a comment about
the structure of the test.

+#include "../injection_points/injection_point_condition.h"

Hmm.  I would not see a problem in just moving all that to the module
injection_points instead, and keep it there, including your TAP test.
Noah has done something similar for its removable_cutoff() business,
and we are living well with it.  One issue with the structure you are
proposing is that I suspect that it makes some installcheck scenarios
more iffy to deal with.  More callbacks in the test module is fine.
--
Michael

Вложения

Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

От
Andrey Borodin
Дата:

> On 7 May 2026, at 03:54, Michael Paquier <michael@paquier.xyz> wrote:
>
> +#include "../injection_points/injection_point_condition.h"
>
> Hmm.  I would not see a problem in just moving all that to the module
> injection_points instead, and keep it there, including your TAP test.
> Noah has done something similar for its removable_cutoff() business,
> and we are living well with it.  One issue with the structure you are
> proposing is that I suspect that it makes some installcheck scenarios
> more iffy to deal with.  More callbacks in the test module is fine.

Thanks, Michael!

Done so. I still keep Vlad's injection_point_condition.h though.
It seems useful. But no more Meson\Makefile changes, and new sql stuff lives
now near removable_cutoff() SQL.

Vlad, there are 2 patches in the patchset again :)

Now we need an expert in ProcKill(). I also will review the patchset in more detail,
but, perhaps, after pgconf.dev.


Best regards, Andrey Borodin.


Вложения