Re: Issue with logical replication slot during switchover

Поиск
Список
Период
Сортировка
От Fabrice Chapuis
Тема Re: Issue with logical replication slot during switchover
Дата
Msg-id CAA5-nLCx5qL=Gk7DYFmtyoOtASAEtsvDpo=ZMyYFnuMv=qVFuQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Issue with logical replication slot during switchover  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
Hi Masahiko,

I agree with your analysis. But why not to take into account the parameter synchronized_standby_slots (garantee the slot on the target has received wal before playing decoded changes).
I take your setup. 

node1 => node2 => node3
node1 is a primary
node2 are standby
slot1: is a failover slot created on node2

node2: synchronized_standby_slots=node3,primary_conninfo=node1,slot1: failover=true
node3: synchronized_standby_slots=node2,primary_conninfo=node2,slot1: failover=true

Case 1) switchover between node2 and node3
node1 => node3 => node2
node2: synchronized_standby_slots=node3,primary_conninfo=node3,slot1:failover=true => the slot could be overwritten, node2 connect now to node 3 and it is the same node which is informed in synchronized_standby_slots

Case 2) Restart of node2
node1 => node2 => node3
node2: synchronized_standby_slots=node3,primary_conninfo=node1,slot1:failover=true => slot could not be overwritten because primary_conninfo <> synchronized_standby_slots

Regards,

Fabrice

On Thu, Nov 20, 2025 at 9:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Nov 20, 2025 at 6:26 AM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> > I think we need to clarify that suppose the standby has a slot with
> > failover=true and synced=false and the primary has the slot with the
> > same name, failover=true, and synced=true...
> I'm not sure to understand the semantics related to the `synced` flag but why `synced` flag can be true on a primary instance? AFAICS if `synced=true` then it means taht the slot is inactive and it is synchronized with a slot on a remote instance. On a primary, what is the meaning of having the flag synced set to true?

I think that the synced can be true on the primary if the slot was
previously synced and the instance is now working as the primary. But
the synced flag being true doesn't mean anything on the primary. It
works only on the standby.

> There's already an open thread dealing with this issue [1].
> The problem I see is being able to distinguish between 2 situations:
> 1) A failover slot has been created on a standby (failover=true and synced=false) in a context of cascading standby. In this case the slot must not be deleted.
> 2) A former primary has a slot (failover=true and synced=false) that must be resynchronized and that can be overwritten.

Right.

> Why not to use a slot's metadata (allow_overwrite) to treat these two situations separately.

I'm not sure that the allow_overwrite idea is the best approach. For
example, suppose that in a cascading replication setup (node-1 ->
node2 -> node3) we create a failover slot on node2 (failover=true,
synced=false, and allow_overwrite=false), the slot is synchronized to
the node3 (failover=true, synced=true, allow_overwrite=false). If we
do a switchover between node2 and node3, node3 joins the primary,
node1, and node2 now joins node3 as a cascaded standby (i.e.,
replication setup is now node1 -> node3 -> node2). I guess that in
this case the slot on node2 wants to be overwritten by the one on the
node3, but it's not allowed because the slot on node2 has
allow_overwrite=false.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

В списке pgsql-hackers по дате отправления: