It happens from time to time. At first I thought it was the router problem, so I changed the host in repmgr's configuration from its intranet address to 127.0.0.1, but it persists. Here is what happened according to repmgrd:
2023-10-17 00:21:50+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened 2023-10-17 00:21:50.471347+08: repmgrd_local_reconnect on node2, reconnected to local node after 0 seconds - happened 2023-10-17 10:23:05+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened 2023-10-17 10:23:05.473997+08: repmgrd_local_reconnect on node2, reconnected to local node after 0 seconds - happened 2023-10-17 13:22:23+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened 2023-10-17 13:22:23.552278+08: repmgrd_local_reconnect on node2, reconnected to local node after 0 seconds - happened
So I get to the postgresql side and its log reports: 2023-10-17 00:21:50.415 CST [2210264] LOG: could not receive data from client: Connection reset by peer 2023-10-17 10:23:05.420 CST [2249257] LOG: could not receive data from client: Connection reset by peer 2023-10-17 13:22:23.486 CST [2260546] LOG: could not receive data from client: Connection reset by peer
I have set up keepalive feature in postgresql.conf to prevent router from cutting off TCP connection:
tcp_keepalives_idle = 20
tcp_keepalives_interval = 10
tcp_keepalives_count = 3
So do you have any idea on what went wrong? BTW, postgresql version is 15.4 while repmgr version is 5.4dev.