Re: walsender: Assert MyReplicationSlot is set before use
| От | Chao Li |
|---|---|
| Тема | Re: walsender: Assert MyReplicationSlot is set before use |
| Дата | |
| Msg-id | 9A9121BE-C7E1-4DC8-9D29-FCA73BBDB703@gmail.com обсуждение исходный текст |
| Ответ на | RE: walsender: Assert MyReplicationSlot is set before use ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
| Ответы |
Re: walsender: Assert MyReplicationSlot is set before use
|
| Список | pgsql-hackers |
> On Feb 2, 2026, at 15:37, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Monday, February 2, 2026 3:16 PM Chao Li <li.evan.chao@gmail.com> wrote: >> >>> On Feb 2, 2026, at 14:43, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> >> wrote: >>> >>> On Monday, February 2, 2026 2:25 PM Chao Li <li.evan.chao@gmail.com> wrote: >>> >>>> >>>> Another related issue is in StartLogicalReplication(): the function >>>> initially asserts MyReplicationSlot == NULL, then calls >>>> ReplicationSlotAcquire(), which is expected to set up >>>> MyReplicationSlot. Since MyReplicationSlot is dereferenced later in >>>> this function, I think it would be clearer and safer to add an >>>> Assert(MyReplicationSlot != NULL) immediately after the call to >> ReplicationSlotAcquire(). >>> >>> I don't think the suggested Assert is unnecessary, as the primary >>> function of >>> ReplicationSlotAcquire() is to get MyReplicationSlot assigned a valid >>> value. Any issues that arise should be handled within the function >>> itself. I think an Assert placed after this function does not add additional >> safety. >>> >> >> Yes, ReplicationSlotAcquire() has a retry logic to ensure MyReplicationSlot to >> be set. So adding a NULL check here is a bit redundant. How about >> Assert(SlotIsLogical(MyReplicationSlot))? Does it make more sense? > > I think we cannot assume the slot type here. A suitable checking might > be: If a physical slot was acquired during logical replication, report an error, > just like we do in StartReplication(). Good point. In StartReplication(), we check MyReplicationSlot is not logical, correspondingly, in StartLogicalReplication(),we should check MyReplicationSlot is not physical. > > I haven't verified whether this error will occur in practice, but I think we > could reproduce this by dropping the original logical slot and then creating a > new physical slot with the same name concurrently. > Without sufficient load, this race is hard to trigger manually, so I added a TAP test using an injection point. With theinjection point in place, the TAP test can hit the error: ``` # +++ tap check in src/test/recovery +++ t/052_logical_slot_type_race.pl .. ok All tests successful. Files=1, Tests=2, 3 wallclock secs ( 0.00 usr 0.00 sys + 0.11 cusr 0.36 csys = 0.47 CPU) Result: PASS ``` I think this test demonstrates that adding a sanity check in StartLogicalReplication() is necessary. I’ve put the TAP script in 0002 and will leave it to the committer to decide whether to accept the test. PFA v2: * 0001 changes the Assert to explicitly verify that MyReplicationSlot is a logical slot. * 0002 adds a TAP test covering a race where a logical slot is dropped while a physical slot with the same name is createdconcurrently. If 0002 is accepted, I’m fine with squashing it into 0001. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Вложения
В списке pgsql-hackers по дате отправления: