Supporting TCP_SYNCNT in libpq
От | Francesco Canovai |
---|---|
Тема | Supporting TCP_SYNCNT in libpq |
Дата | |
Msg-id | CAJNLuXDoMhyTvQT9d3WkxQmOuc-=PQfB0khGcmHDzibp1ti7zQ@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Supporting TCP_SYNCNT in libpq
Re: Supporting TCP_SYNCNT in libpq |
Список | pgsql-hackers |
This patch introduces support for a `tcp_syn_count` parameter in libpq, allowing control over the number of SYN retransmissions when initiating a connection. The primary goal is to prevent the walreceiver from getting stuck resending SYNs for an extended period, up to `net.ipv4.tcp_syn_retries` (127 seconds by default), in the event of network disruptions. A specific scenario where this can occur is during a failover in Kubernetes: * The primary node fails, a standby is promoted, and other standbys attempt to reconnect to the service representing the new primary. * The `primary_conninfo` of the standby points to a service, usually managed via iptables rules. * If the walreceiver's initial SYN is dropped due to outdated rules, the connection may remain stranded until the system timeout is reached. * As a result, a second standby may reattach after a couple of minutes. In the case of synchronous replication, this can block the writes from the application. In this scenario, `tcp_user_timeout` could close a connection retrying the SYNs (even though it doesn't seem to do it from the documentation, it works) the parameter will affect the entire connection. `connect_timeout`, doesn't work with `PQconnectPoll`, so it won't prevent the walreceiver from timing out. Thank you, Francesco
Вложения
В списке pgsql-hackers по дате отправления: