Обсуждение: pgsql: Fix remaining race condition with CLOG truncation and LISTEN/NOT

Поиск
Список
Период
Сортировка

pgsql: Fix remaining race condition with CLOG truncation and LISTEN/NOT

От
Heikki Linnakangas
Дата:
Fix remaining race condition with CLOG truncation and LISTEN/NOTIFY

Previous commit fixed a bug where VACUUM would truncate the CLOG
that's still needed to check the commit status of XIDs in the async
notify queue, but as mentioned in the commit message, it wasn't a full
fix. If a backend is executing asyncQueueReadAllNotifications() and
has just made a local copy of an async SLRU page which contains old
XIDs, vacuum can concurrently truncate the CLOG covering those XIDs,
and the backend still gets an error when it calls
TransactionIdDidCommit() on those XIDs in the local copy. This commit
fixes that race condition.

To fix, hold the SLRU bank lock across the TransactionIdDidCommit()
calls in NOTIFY processing.

Per Tom Lane's idea. Backpatch to all supported versions.

Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Discussion: https://www.postgresql.org/message-id/2759499.1761756503@sss.pgh.pa.us
Backpatch-through: 14

Branch
------
REL_16_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/44e8c60be66c7c174431c4cd7948c1ff015fe516

Modified Files
--------------
src/backend/commands/async.c | 123 ++++++++++++++++++++++---------------------
1 file changed, 62 insertions(+), 61 deletions(-)


Re: pgsql: Fix remaining race condition with CLOG truncation and LISTEN/NOT

От
Michael Paquier
Дата:
Hi Heikki,

On Wed, Nov 12, 2025 at 07:16:38PM +0000, Heikki Linnakangas wrote:
> Fix remaining race condition with CLOG truncation and LISTEN/NOTIFY
>
> Previous commit fixed a bug where VACUUM would truncate the CLOG
> that's still needed to check the commit status of XIDs in the async
> notify queue, but as mentioned in the commit message, it wasn't a full
> fix. If a backend is executing asyncQueueReadAllNotifications() and
> has just made a local copy of an async SLRU page which contains old
> XIDs, vacuum can concurrently truncate the CLOG covering those XIDs,
> and the backend still gets an error when it calls
> TransactionIdDidCommit() on those XIDs in the local copy. This commit
> fixes that race condition.
>
> To fix, hold the SLRU bank lock across the TransactionIdDidCommit()
> calls in NOTIFY processing.

While reading through this set of changes, the addition of this commit
across the v14~v16 range has made me smile:

+    * Now that we have let go of the SLRU bank lock, send the notifications
+    * to our backend

SLRU bank locks have been introduced in v17 by Alvaro in 53c2a97a9266.
Just noticed while going through.
--
Michael

Вложения