Now, How can I further investigate it ? my wal_keep_segment is assigned to 100 but since friday 261 wals were generated so I guess I dont have another option but to sync the node again. However, I want to understand why it happened. What can you advice me to check ?
What is the connectivity between the nodes, any firewalls? What’s the settings for wal_sender_timeout and wal_receiver_timeout? Why not use a replication slot or have it fail over to using the archived WALs instead of full database restore?
There should be other messages in Postgresql logs.