Re: Physical replication slot advance is not persistent

Поиск
Список
Период
Сортировка
От Alexey Kondratov
Тема Re: Physical replication slot advance is not persistent
Дата
Msg-id 682b1960-cb89-8738-dbdf-8033662fb80f@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Physical replication slot advance is not persistent  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Physical replication slot advance is not persistent  (Alexey Kondratov <a.kondratov@postgrespro.ru>)
Список pgsql-hackers
On 26.12.2019 11:33, Kyotaro Horiguchi wrote:
> At Wed, 25 Dec 2019 20:28:04 +0300, Alexey Kondratov <a.kondratov@postgrespro.ru> wrote in
>>> Yep, it helps with physical replication slot persistence after
>>> advance, but the whole validation (moveto <= endlsn) does not make
>>> sense for me. The value of moveto should be >= than minlsn ==
>>> confirmed_flush / restart_lsn, while endlsn == retlsn is also always
>>> initialized with confirmed_flush / restart_lsn. Thus, your condition
>>> seems to be true in any case, even if it was no-op one, which we were
>>> intended to catch.
> ...
>> If I get it correctly, then we already keep previous slot position in
>> the minlsn, so we just have to compare endlsn with minlsn and treat
>> endlsn <= minlsn as a no-op without slot state flushing.
> I think you're right about the condition. (endlsn cannot be less than
> minlsn, though) But I came to think that we shouldn't use locations in
> that decision.
>
>> Attached is a patch that does this, so it fixes the bug without
>> affecting any user-facing behavior. Detailed comment section and DEBUG
>> output are also added. What do you think now?
>>
>> I have also forgotten to mention that all versions down to 11.0 should
>> be affected with this bug.
> pg_replication_slot_advance is the only caller of
> pg_logical/physical_replication_slot_advacne so there's no apparent
> determinant on who-does-what about dirtying and other housekeeping
> calculation like *ComputeRequired*() functions, but the current shape
> seems a kind of inconsistent between logical and physical.
>
> I think pg_logaical/physical_replication_slot_advance should dirty the
> slot if they actually changed anything. And
> pg_replication_slot_advance should do the housekeeping if the slots
> are dirtied.  (Otherwise both the caller function should dirty the
> slot in lieu of the two.)
>
> The attached does that.

Both approaches looks fine for me: my last patch with as minimal 
intervention as possible and yours refactoring. I think that it is a 
right direction to let everyone who modifies slot->data also mark slot 
as dirty.

I found some comment section in your code as rather misleading:

+        /*
+         * We don't need to dirty the slot only for the above change, 
but dirty
+         * this slot for the same reason with
+         * pg_logical_replication_slot_advance.
+         */

We just modified MyReplicationSlot->data, which is "On-Disk data of a 
replication slot, preserved across restarts.", so it definitely should 
be marked as dirty, not because pg_logical_replication_slot_advance does 
the same.

Also I think that using this transient variable in 
ReplicationSlotIsDirty is not necessary. MyReplicationSlot is already a 
pointer to the slot in shared memory.

+    ReplicationSlot *slot = MyReplicationSlot;
+
+    Assert(MyReplicationSlot != NULL);
+
+    SpinLockAcquire(&slot->mutex);

Otherwise it looks fine for me, so attached is the same diff, but with 
these proposed corrections.

Another concern is that ReplicationSlotIsDirty is added with the only 
one user. It also cannot be used by SaveSlotToPath due to the 
simultaneous usage of both flags dirty and just_dirtied there.

In that way, I hope that we should call ReplicationSlotSave 
unconditionally in the pg_replication_slot_advance, so slot will be 
saved or not automatically based on the slot->dirty flag. In the same 
time, ReplicationSlotsComputeRequiredXmin and 
ReplicationSlotsComputeRequiredLSN should be called by anyone, who 
modifies xmin and LSN fields in the slot. Otherwise, currently we are 
getting some leaky abstractions.


Regards

-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Fix comment typos.
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Assert failure due to "drop schema pg_temp_3 cascade" fortemporary tables and \d+ is not showing any info after drooping temp tableschema