Victor Sudakov wrote:
>
> Do you perchance know what is the correct procedure of temporarily
> taking down a replica in a Patroni cluster, e.g. for 5-10 minutes of
> hardware maintenance?
>
> The problem is that after stopping the patroni process (service) on a
> replica, patroni removes the corresponding physical replication slot
> from the leader, and unless the wal_keep_size value is unsanely high,
> the replica, when up again, cannot restart streaming because the WAL
> segments are already gone from the leader.
>
> Well, you all know:
> <%%%>LOG: started streaming WAL from primary at B4A0/E2000000 on timeline 8
> <%%%>FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000080000B4A0000000E2 has
alreadybeen removed
> <%%%>LOG: waiting for WAL to become available at B4A0/E2002000
>
> Do you think there is a way to tell Patroni that a replica is down
> temporarily and its replication slot should not be removed?
>
> Or, what am I missing?
As WAL archiving (wal-g) is enabled in this cluster anyway, do you
think adding "postgresql.parameters.restore_command" to the Patroni
config will help in this situation?
restore_command works very well in regular Postgres clusters catching
up from a big replication delay and permits to have wal_keep_size=0,
however does anyone know if there are any Patroni-specific reasons not
to use restore_command under Patroni?
--
Victor Sudakov VAS4-RIPE
http://vas.tomsk.ru/
2:5005/49@fidonet