Re: Physical replication slot advance is not persistent

Поиск

Список

Период

Сортировка

От	Alexey Kondratov
Тема	Re: Physical replication slot advance is not persistent
Дата	29 декабря 2019 г. 15:12:16
Msg-id	175c2760666a78205e053207794c0f8f@postgrespro.ru обсуждение исходный текст
Ответ на	Re: Physical replication slot advance is not persistent (Alexey Kondratov <a.kondratov@postgrespro.ru>)
Ответы	Re: Physical replication slot advance is not persistent (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On 2019-12-26 16:35, Alexey Kondratov wrote:
> 
> Another concern is that ReplicationSlotIsDirty is added with the only
> one user. It also cannot be used by SaveSlotToPath due to the
> simultaneous usage of both flags dirty and just_dirtied there.
> 
> In that way, I hope that we should call ReplicationSlotSave
> unconditionally in the pg_replication_slot_advance, so slot will be
> saved or not automatically based on the slot->dirty flag. In the same
> time, ReplicationSlotsComputeRequiredXmin and
> ReplicationSlotsComputeRequiredLSN should be called by anyone, who
> modifies xmin and LSN fields in the slot. Otherwise, currently we are
> getting some leaky abstractions.
> 

It seems that there was even a race in the order of actions inside 
pg_replication_slot_advance, it did following:

- ReplicationSlotMarkDirty();
- ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
- ReplicationSlotSave();

1) Mark slot as dirty, which actually does nothing immediately, but 
setting dirty flag;
2) Do compute new global required LSN;
3) Flush slot state to disk.

If someone will utilise old WAL and after that crash will happen between 
steps 2) and 3), then we start with old value of restart_lsn, but 
without required WAL. I do not know how to properly reproduce it without 
gdb and power off, so the chance is pretty low, but still it could be a 
case.

Logical slots were not affected again, since there was a proper 
operations order (with comments) and slot flushing routines inside 
LogicalConfirmReceivedLocation.

Thus, in the attached patch I have decided to do not perform slot 
flushing in the pg_replication_slot_advance at all and do it in the 
pg_physical_replication_slot_advance instead, as it is done in the 
LogicalConfirmReceivedLocation.

Since this bugfix have not moved forward during the week, I will put it 
on the 01.2020 commitfest. Kyotaro, if you do not object I will add you 
as a reviewer, as you have already gave a lot of feedback, thank you for 
that!

Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Вложения

v3-0001-Make-physical-slot-advance-to-be-persistent.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tatsuo Ishii
Дата: 29 декабря 2019 г., 14:24:04
Сообщение: Re: Incremental View Maintenance: ERROR: out of shared memory

Следующее

От: Julien Rouhaud
Дата: 29 декабря 2019 г., 15:32:31
Сообщение: Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Physical replication slot advance is not persistent

Вложения

Предыдущее

Следующее