Re: Assertion failure in SnapBuildInitialSnapshot()
| От | Masahiko Sawada |
|---|---|
| Тема | Re: Assertion failure in SnapBuildInitialSnapshot() |
| Дата | |
| Msg-id | CAD21AoAKED+XSZA187x-uVv=PSM4-0b2R-zgNBMSw2tj9LEkZA@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Assertion failure in SnapBuildInitialSnapshot() (Masahiko Sawada <sawada.mshk@gmail.com>) |
| Ответы |
RE: Assertion failure in SnapBuildInitialSnapshot()
|
| Список | pgsql-hackers |
On Mon, Nov 24, 2025 at 10:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Nov 24, 2025 at 1:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Nov 21, 2025 at 9:17 AM Zhijie Hou (Fujitsu) > > <houzj.fnst@fujitsu.com> wrote: > > > > > > On Thursday, November 13, 2025 12:56 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > > > > > > > > > I have been thinking if there a way to avoid holding ReplicationSlotControlLock > > > exclusively in ReplicationSlotsComputeRequiredXmin() because that could cause > > > lock contention when many slots exist and advancements occur frequently. > > > > > > Given that the bug arises from a race condition between slot creation and > > > concurrent slot xmin computation, I think another way is that, we acquire the > > > ReplicationSlotControlLock exclusively only during slot creation to do the > > > initial update of the slot xmin. In ReplicationSlotsComputeRequiredXmin(), we > > > still hold the ReplicationSlotControlLock in shared mode until the global slot > > > xmin is updated in ProcArraySetReplicationSlotXmin(). This approach prevents > > > concurrent computations and updates of new xmin horizons by other backends > > > during the initial slot xmin update process, while it still permits concurrent > > > calls to ReplicationSlotsComputeRequiredXmin(). > > > > > > > Yeah, this seems to work. > > +1 Given that the computation of xmin and catalog_xmin among all slots could be executed concurrently, could the following scenario happen where procArray->replication_slot_xmin and procArray->replication_slot_catalog_xmin are retreat to a non-invalid XID? 1. Suppose the initial value procArray->replication_slot_catalog_xmin is 50. 2. Process-A updates its owned slot's catalog_xmin to 100, and computes the new catalog_xmin as 100 while holding ReplicationSlotControlLock in a shared mode in ReplicationSlotsComputeRequiredLSN(). But it doesn't update the procArray's catalog_xmin value yet. 3. Process-B updates its owned slot's catalog_xmin to 150, and computes the new catalog_xmin as 150. 4. Process-B updates the procArray->replication_slot_catalog_xmin to 150. 5. Process-A updates the procArray->repilcation_slot_catalog_xmin to 100, which was 150. It might be worth adding an assertion to ProcArraySetReplicationSlotXmin(), checking if the new xmin and catalog_xmin values are either >= the current values or an InvalidTransactionId. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: