On 20/11/18 10:48 μ.μ., Rui DeSousa wrote:
Hey, I was reading the docs, it seems it means :
net.ipv4.tcp_keepalive_time + net.ipv4.tcp_keepalive_intvl * net.ipv4.tcp_keepalive_probes = 2hrs 11 Mins 15 Secs, rather than 18 Hrs
Yeah, that’s correct. I wonder why it didn’t terminate.
Most probably because there was another created clone, cloud migration magic, that's my theory, albeit not confirmed by the provider. Logical worker (walreceiver) was still alive and happy even after the primary crushed. I have the logs from the other standby and it immediately detected the problem (PANIC on the primary) and retried. No firewall dropping packets, in every test I did, the logical bgworker detects any problems *instantly*, and retries after 5 secs by default.