Обсуждение: pgsql: Fix remaining race condition with CLOG truncation and LISTEN/NOT
Fix remaining race condition with CLOG truncation and LISTEN/NOTIFY Previous commit fixed a bug where VACUUM would truncate the CLOG that's still needed to check the commit status of XIDs in the async notify queue, but as mentioned in the commit message, it wasn't a full fix. If a backend is executing asyncQueueReadAllNotifications() and has just made a local copy of an async SLRU page which contains old XIDs, vacuum can concurrently truncate the CLOG covering those XIDs, and the backend still gets an error when it calls TransactionIdDidCommit() on those XIDs in the local copy. This commit fixes that race condition. To fix, hold the SLRU bank lock across the TransactionIdDidCommit() calls in NOTIFY processing. Per Tom Lane's idea. Backpatch to all supported versions. Reviewed-by: Joel Jacobson <joel@compiler.org> Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> Discussion: https://www.postgresql.org/message-id/2759499.1761756503@sss.pgh.pa.us Backpatch-through: 14 Branch ------ REL_16_STABLE Details ------- https://git.postgresql.org/pg/commitdiff/44e8c60be66c7c174431c4cd7948c1ff015fe516 Modified Files -------------- src/backend/commands/async.c | 123 ++++++++++++++++++++++--------------------- 1 file changed, 62 insertions(+), 61 deletions(-)
Re: pgsql: Fix remaining race condition with CLOG truncation and LISTEN/NOT
От
Michael Paquier
Дата:
Hi Heikki, On Wed, Nov 12, 2025 at 07:16:38PM +0000, Heikki Linnakangas wrote: > Fix remaining race condition with CLOG truncation and LISTEN/NOTIFY > > Previous commit fixed a bug where VACUUM would truncate the CLOG > that's still needed to check the commit status of XIDs in the async > notify queue, but as mentioned in the commit message, it wasn't a full > fix. If a backend is executing asyncQueueReadAllNotifications() and > has just made a local copy of an async SLRU page which contains old > XIDs, vacuum can concurrently truncate the CLOG covering those XIDs, > and the backend still gets an error when it calls > TransactionIdDidCommit() on those XIDs in the local copy. This commit > fixes that race condition. > > To fix, hold the SLRU bank lock across the TransactionIdDidCommit() > calls in NOTIFY processing. While reading through this set of changes, the addition of this commit across the v14~v16 range has made me smile: + * Now that we have let go of the SLRU bank lock, send the notifications + * to our backend SLRU bank locks have been introduced in v17 by Alvaro in 53c2a97a9266. Just noticed while going through. -- Michael