Re: Issue with logical replication slot during switchover

Поиск
Список
Период
Сортировка
От Fabrice Chapuis
Тема Re: Issue with logical replication slot during switchover
Дата
Msg-id CAA5-nLAPrE729BiCGKKUU_9b+CA2nxrKLyc-W+SbmU2ojFeehQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Issue with logical replication slot during switchover  (Alexander Kukushkin <cyberdemn@gmail.com>)
Ответы Re: Issue with logical replication slot during switchover
Список pgsql-hackers
Thanks for your large analyze and explanation Alexander. If we go in the direction you propose and leave the option to use a suplementary flag allow_overwrite, may I propose you some modifications in the patch v0 you have attached:
 
 Why modifying this function?
 
 drop_local_obsolete_slots(List *remote_slot_list)
 
 List    *local_slots = get_local_synced_slots(); => as the failover slot we check has synced = false then it wont' enter the loop and dropping the slot
 
 If we want to resynchronize the slot properly why not to drop it and recreate cleanly in place of putting the synced flag to true although the slot is not actually synchronized. Here is the code I wrote in the patch version 6 with the check on the failover flag.
 
 retry:
/* Search for the named slot */
if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
{
bool synced;
bool failover;

SpinLockAcquire(&slot->mutex);
synced = slot->data.synced;
failover = slot->data.failover;
SpinLockRelease(&slot->mutex);

if (!synced)
{
/* User-created slot with the same name exists, raise ERROR. */
if (!failover)
ereport(ERROR,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("exiting from slot synchronization because same"
   " name slot \"%s\" already exists on the standby",
   remote_slot->name));

/*
* At some point we were a primary, and it was expected to have
* synced = false and failover = true. To resynchronize the slot we could
* drop it and replay the code for the slot to be recreated cleanly.
*/
ereport(LOG,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("slot \"%s\" already exists"
" on the standby but will be drop and recreated to be 
resynchronized",
remote_slot->name));

/* Get rid of a replication slot that is no longer wanted */
ReplicationSlotAcquire(remote_slot->name, true, false);
ReplicationSlotDropAcquired();
goto retry;
}

Thanks four your appreciation and feedback

Regards,

Fabrice

On Fri, Oct 31, 2025 at 10:28 AM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
Hi,

On Fri, 31 Oct 2025 at 09:16, Fabrice Chapuis <fabrice636861@gmail.com> wrote:
Hi, 
I indeed proposed a solution at the top of this thread to modify only the value of the synced attribute, but the discussion was redirected to adding an extra parameter to the function pg_create_logical_replication_slot() to overwrite a failover slot
We had discussed this point in another thread, please see [1]. After
discussion it was decided to not go this way.

[1]: https://www.postgresql.org/message-id/OS0PR01MB57161FF469DE049765DD53A89475A%40OS0PR01MB5716.jpnprd01.prod.outlook.com


I’ve read through the referenced discussion, and my impression is that we might be trying to design a solution around assumptions that are unlikely to hold in practice.
There was an argument that at some point we might allow creating logical failover slots on cascading standbys. However, if we consider all practical scenarios, it seems very unlikely that such a feature could work reliably with the current design.
Let me try to explain why.

Consider the following setup:
node1 - primary  
node2 - standby, replicating from node1  
node3 - standby, replicating from node1, has logical slot foo (failover=true, synced=false)  
node4 - standby, replicating from node3, has logical slot foo (failover=true, synced=true)

1) If node1 fails, we could promote either node2 or node3:
1.a) If we promote node2, we must first create a physical slot for node3, update primary_conninfo on node3 to point to node2, wait until node3 connects, and until catalog_xmin on the physical slot becomes non-NULL. Only then would it be safe to promote node2. This introduces unnecessary steps, complexity, and waiting — increasing downtime, which defeats the goal of high availability.
1.b) If we promote node3, promotion itself is fast, but subscribers will still be using the slot on the original primary. This again defeats the purpose of doing logical replication from a standby, and it won’t be possible to switch subscribers to node4 (see below).
2) If node3 fails, we might want to replace it with node4. But node4 has a slot with failover=true and synced=true, and synced=true prevents it from being used for streaming because it’s a standby.

In other words, with the current design, allowing creation of logical failover slots on standbys doesn’t bring any real benefit — such “synced” slots can’t actually be used later.

One could argue that we could add a function to switch synced=true->false on a standby, but that would just add another workaround on top of an already fragile design, increasing operational complexity without solving the underlying issue.

The same applies to proposals like allow_overwrite. If such a flag is introduced, in practice it will almost always be used unconditionally, e.g.:
SELECT pg_create_logical_replication_slot('<name>', '<plugin>', failover := true, allow_overwrite := true);

Right now, logical failover slots can’t be reused after a switchover, which is a perfectly normal operation.
The only current workaround is to detect standbys with failover=true, synced=false and drop those slots, hoping they’ll be resynchronized. But resynchronization is asynchronous, unpredictable, and may take an unbounded amount of time. If the primary fails during that window, there might be no standby with ready logical slots.

Instead of dropping such slots, what we actually need is a way to safely set synced=false->true and continue operating.

Operating logical replication setups is already extremely complex and error-prone — this is not theoretical, it’s something many of us face daily.
So rather than adding more speculative features or workarounds, I think we should focus on addressing real operational pain points and the inconsistencies in the current design.

A slot created on the primary (which later becomes a standby) with failover=true has a very clear purpose. The failover flag already indicates that purpose; synced shouldn’t override it.

Regards,
--
Alexander Kukushkin

В списке pgsql-hackers по дате отправления: