Re: Issue with logical replication slot during switchover
| От | Fabrice Chapuis |
|---|---|
| Тема | Re: Issue with logical replication slot during switchover |
| Дата | |
| Msg-id | CAA5-nLASa+dhSXkifQJgisBB+c_pZyN_faYmH0nrEy05CSJoGQ@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Issue with logical replication slot during switchover (Amit Kapila <amit.kapila16@gmail.com>) |
| Ответы |
Re: Issue with logical replication slot during switchover
|
| Список | pgsql-hackers |
Hi Amit,
if I resume your scenario
1. A standby S has a failover slot slot1 synchronized with slot1 on primary P
2. We promote S
3. On P we drop slot1 and create slot1 again with failover mode (a subscriber exist on another instance by example)
4. A rewind is performed on P the former primary to rejoin S the former standby
5. On P slot1 is automatically dropped and recreated to be synchronized
In which context this kind of scenario could happend?
Isn't the goal to find a solution for a switchover which is carried out for maintenance on a Postgres cluster, the aim is to find a compromise to cover the most likely scenarios.
Do you think we must come back to the allow_overwrite flag approach or another solution?
Do you think we must come back to the allow_overwrite flag approach or another solution?
Best Regards,
Fabrice
Fabrice
On Mon, Nov 10, 2025 at 1:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 31, 2025 at 2:58 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>
> Instead of dropping such slots, what we actually need is a way to safely set synced=false->true and continue operating.
>
> Operating logical replication setups is already extremely complex and error-prone — this is not theoretical, it’s something many of us face daily.
> So rather than adding more speculative features or workarounds, I think we should focus on addressing real operational pain points and the inconsistencies in the current design.
>
> A slot created on the primary (which later becomes a standby) with failover=true has a very clear purpose. The failover flag already indicates that purpose; synced shouldn’t override it.
>
I think this is not as clear as you are saying as compared to WAL. In
failover cases, we bump the WAL timelines on new primary and also have
facilities like pg_rewind to ensure that old primary can follow the
new primary after divergence. For slots, there is no such facility,
now, there is an argument that for slot's it is sufficient to match
the name and failover to say that it is okay to overwrite the slot on
old primary. However, it is not clear whether it is always safe to do
so, for example, if the old primary ran after divergence for sometime
and one has re-created the slot with same name and failover property,
it will no longer be the same slot. Unlike WAL, we don't maintain the
slot's history, so it is not equally clear that we can overwrite old
primary's slot's as it is.
--
With Regards,
Amit Kapila.
В списке pgsql-hackers по дате отправления: