RE: Assertion failure in SnapBuildInitialSnapshot()
| От | Zhijie Hou (Fujitsu) |
|---|---|
| Тема | RE: Assertion failure in SnapBuildInitialSnapshot() |
| Дата | |
| Msg-id | TY4PR01MB169070EE618FA2908B3D2F2AE94C3A@TY4PR01MB16907.jpnprd01.prod.outlook.com обсуждение исходный текст |
| Ответ на | Re: Assertion failure in SnapBuildInitialSnapshot() (Masahiko Sawada <sawada.mshk@gmail.com>) |
| Ответы |
Re: Assertion failure in SnapBuildInitialSnapshot()
|
| Список | pgsql-hackers |
On Friday, November 7, 2025 2:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Thu, Nov 6, 2025 at 2:36 AM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > On Thu, Nov 6, 2025 at 12:03 PM Zhijie Hou (Fujitsu) > > <houzj.fnst@fujitsu.com> wrote: > > > > > > On Thursday, October 30, 2025 7:01 AM Masahiko Sawada > <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > Also, I think it's worth considering the idea Robert shared before[1]: > > > > > > > > --- > > > > But what about just surgically preventing that? > > > > ProcArraySetReplicationSlotXmin() could refuse to retreat the values, > > > > perhaps? If it computes an older value than what's there, it just does > nothing? > > > > --- > > > > > > > > We did a similar fix for confirmed_flush LSN by commit ad5eaf390c582, > and it > > > > sounds reasonable to me that ProcArraySetReplicationSlotXmin() > refuses to > > > > retreat the values. > > > > > > I reviewed the thread and think that we could not straightforwardly apply a > > > similar strategy to prevent the retreat of xmin/catalog_xmin here. This is > > > because we maintain a central value > > > (replication_slot_xmin/replication_slot_catalog_xmin) in > > > ProcArraySetReplicationSlotXmin, where the value is expected to decrease > when > > > certain slots are dropped or invalidated. > > > > > > > Good point. This can happen when the last slot is invalidated or dropped. > > After the last slot is invalidated or dropped, both slot_xmin and > slot_catalog_xmin values are set InvalidTransactionId. Then in this > case, these values are ignored when computing the oldest safe decoding > XID in GetOldestSafeDecodingTransactionId(), no? Or do you mean that > there is a case where slot_xmin and slot_catalog_xmin retreat to a > valid XID? I think when replication_slot_xmin is invalid, GetOldestSafeDecodingTransactionId would return nextXid, which can be greater than the original snap.xmin if some transaction IDs have been assigned. After reviewing the report [1], the bug appears reproducible when replication_slot_xmin is set to InvalidTransactionId (specific reproduction steps are detailed at [2]) as well. Therefore, if we adopt the approach to prevent retreating these values, we need to somehow avoid resetting replication_slot_xmin, but that seems conflict with the behavior of resetting replication_slot_xmin when dropping the last slot. [1] https://www.postgresql.org/message-id/CAD21AoDKJBB6p4X-%2B057Vz44Xyc-zDFbWJ%2Bg9FL6qAF5PC2iFg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAA4eK1KDFeh%3DZbvSWPx%3Dir2QOXBxJbH0K8YqifDtG3xJENLR%2Bw%40mail.gmail.com Best Regards, Hou zj
В списке pgsql-hackers по дате отправления: