On Wednesday, November 12, 2025 7:27 PM Pradeep Kumar <spradeepkumar29@gmail.com> wote:
> I've been investigating the assert failure in
> ProcArraySetReplicationSlotXmin() and would like to share my approach and get
> feedback. Instead of inverting the locks and what robert shared before [1].
> Instead of unconditionally updating procArray->replication_slot_xmin in
> ProcArraySetReplicationSlotXmin() in procarray.c, I made the updates
> conditional:
> 1) Only update if the incoming xmin is valid
> 2) Only update if it's older than the currently stored xmin
> 3) Do the same for procArray->replication_slot_catalog_xmin
...
> In above block of code ensures we always track the minimum xmin across all
> active replication slots without losing data. And also no need to worry about
> locks. And also while reproducing this issue [2] In SnapBuildInitialSnapshot()
> while we computing safexid by calling GetOldestSafeDecodingTransactionId(false)
> will enters into first case and update the oldestSafeXid =
> procArray->replication_slot_xmin. So it won't return nextXid. And also it solves
> this issue [2].
Thanks for evaluating new approach, but I think this approach could not work
because we expect replication_slot_xmin to be set to an invalid number when the
last slot is dropped, while this approach would disallow that, causing WALs to
be retained. For a detailed explanation, please refer to [1].
While testing the patches across all branches, I noticed that an additional lock
needs to be added in the launcher.c where
ReplicationSlotsComputeRequiredXmin(true) was recently added for conflict
detection slot. I have modified the original patch accordingly.
BTW, I am not adding a test using an injection point because it does not seem
practical to insert an injection point inner
ReplicationSlotsComputeRequiredXmin. The reason is that the injection point
function internally calls CHECK_FOR_INTERRUPTS(), but the key functions in the
patch holds the lwlock, holding holds interrupts.
I am sharing the patches for all branches for reference.
[1]
https://www.postgresql.org/message-id/TY4PR01MB169070EE618FA2908B3D2F2AE94C3A%40TY4PR01MB16907.jpnprd01.prod.outlook.com
Best Regards,
Hou zj