Re: Issue with logical replication slot during switchover

Поиск

Список

Период

Сортировка

От	Fabrice Chapuis
Тема	Re: Issue with logical replication slot during switchover
Дата	11 ноября 18:56:56
Msg-id	CAA5-nLASa+dhSXkifQJgisBB+c_pZyN_faYmH0nrEy05CSJoGQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Issue with logical replication slot during switchover (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: Issue with logical replication slot during switchover
Список	pgsql-hackers

Дерево обсуждения

Hi Amit,

if I resume your scenario

1. A standby S has a failover slot slot1 synchronized with slot1 on primary P

2. We promote S

3. On P we drop slot1 and create slot1 again with failover mode (a subscriber exist on another instance by example)

4. A rewind is performed on P the former primary to rejoin S the former standby

5. On P slot1 is automatically dropped and recreated to be synchronized

In which context this kind of scenario could happend?

Isn't the goal to find a solution for a switchover which is carried out for maintenance on a Postgres cluster, the aim is to find a compromise to cover the most likely scenarios.
Do you think we must come back to the allow_overwrite flag approach or another solution?

Best Regards,

Fabrice

On Mon, Nov 10, 2025 at 1:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 31, 2025 at 2:58 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>
> Instead of dropping such slots, what we actually need is a way to safely set synced=false->true and continue operating.
>
> Operating logical replication setups is already extremely complex and error-prone — this is not theoretical, it’s something many of us face daily.
> So rather than adding more speculative features or workarounds, I think we should focus on addressing real operational pain points and the inconsistencies in the current design.
>
> A slot created on the primary (which later becomes a standby) with failover=true has a very clear purpose. The failover flag already indicates that purpose; synced shouldn’t override it.
>

I think this is not as clear as you are saying as compared to WAL. In
failover cases, we bump the WAL timelines on new primary and also have
facilities like pg_rewind to ensure that old primary can follow the
new primary after divergence. For slots, there is no such facility,
now, there is an argument that for slot's it is sufficient to match
the name and failover to say that it is okay to overwrite the slot on
old primary. However, it is not clear whether it is always safe to do
so, for example, if the old primary ran after divergence for sometime
and one has re-created the slot with same name and failover property,
it will no longer be the same slot. Unlike WAL, we don't maintain the
slot's history, so it is not equally clear that we can overwrite old
primary's slot's as it is.

--
With Regards,
Amit Kapila.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Issue with logical replication slot during switchover