Re: Physical replication slot advance is not persistent

Поиск
Список
Период
Сортировка
От Alexey Kondratov
Тема Re: Physical replication slot advance is not persistent
Дата
Msg-id 82163ef4a380c5255e5004659d02559e@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Physical replication slot advance is not persistent  (Andres Freund <andres@anarazel.de>)
Ответы Re: Physical replication slot advance is not persistent  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On 2020-06-09 20:19, Andres Freund wrote:
> Hi,
> 
> On 2020-01-28 17:01:14 +0900, Michael Paquier wrote:
>> So attached is an updated patch which addresses the problem just by
>> marking a physical slot as dirty if any advancing is done.  Some
>> documentation is added about the fact that an advance is persistent
>> only at the follow-up checkpoint.  And the tests are fixed to not use
>> a fake LSN but instead advance to the latest LSN position produced.
> 
>> 
>> -    /* Update the on disk state when lsn was updated. */
>> -    if (XLogRecPtrIsInvalid(endlsn))
>> -    {
>> -        ReplicationSlotMarkDirty();
>> -        ReplicationSlotsComputeRequiredXmin(false);
>> -        ReplicationSlotsComputeRequiredLSN();
>> -        ReplicationSlotSave();
>> -    }
>> -
> 
> I am quite confused by the wholesale removal of these lines. That 
> wasn't
> in previous versions of the patch. As far as I can tell not calling
> ReplicationSlotsComputeRequiredLSN() for the physical slot leads to the
> global minimum LSN never beeing advanced, and thus WAL reserved by the
> slot not being removable.  Only if there's some independent call to
> ReplicationSlotsComputeRequiredLSN() XLogSetReplicationSlotMinimumLSN()
> will be called, allowing for slots to advance.
> 
> I realize this stuff has been broken since the introduction in
> 9c7d06d6068 (due to the above being if (XLogRecPtrIsInvalid()) rather
> than if (!XLogRecPtrIsInvalid()) , but this seems to make it even 
> worse?
> 

Yes, there was a ReplicationSlotsComputeRequiredLSN() call inside 
pg_physical_replication_slot_advance() in the v5 of the patch:

@@ -370,6 +370,11 @@ pg_physical_replication_slot_advance(XLogRecPtr 
moveto)
          MyReplicationSlot->data.restart_lsn = moveto;
          SpinLockRelease(&MyReplicationSlot->mutex);
          retlsn = moveto;
+
+        ReplicationSlotMarkDirty();
+
+        /* We moved retart_lsn, update the global value. */
+        ReplicationSlotsComputeRequiredLSN();

But later it has been missed and we have not noticed that.

I think that adding it back as per attached will be enough.

> 
> I find it really depressing how much obviously untested stuff gets
> added in this area.
> 

Prior to this patch pg_replication_slot_advance was not being tested at 
all. Unfortunately, added tests appeared to be not enough to cover all 
cases. It seems that the whole machinery of WAL holding and trimming is 
worth to be tested more thoroughly.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: elog(DEBUG2 in SpinLocked section.
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: Disallow cancellation of waiting for synchronous replication