Re: Fwd: libpq: indefinite block on poll during network problems

Поиск
Список
Период
Сортировка
От Dmitry Samonenko
Тема Re: Fwd: libpq: indefinite block on poll during network problems
Дата
Msg-id CAFKp+3cbU3s-V-HEUvg-n+Qx4G4kCD6=n8jxuvK1ORV6K_uayQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fwd: libpq: indefinite block on poll during network problems  (Adrian Klaver <adrian.klaver@aklaver.com>)
Ответы Re: Fwd: libpq: indefinite block on poll during network problems
Список pgsql-general
Guys, first of all: thank you for you help and cooperation. I have received several mails suggesting tweaks for tcp_keepalive and usage of postgresql server functions/features (cancel, statement timeout), but as I said - it won't help.

I have reproduced the problem scenario. Logs are attached. I walk you through.

== Setup ==
Client and server applications are placed on separate hosts. Client = 192.168.15.4, Server = 192.168.15.7. Both are in local net. Both are synchronized using 3rd party NTP server. Lets look in strace_export.txt - top 8 lines = socket setup. Keepalive option is set. Client's OS keepalive parameters:

[root@krr2srv1wsn1 dtp_generator]# sysctl -a | grep keepalive
net.ipv4.tcp_keepalive_intvl = 5
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 10

This means that after 10 seconds of idle connection first TCP Keep-Alive probe is sent. If 3 probes with 5 second interval fail - connection should be considered dead.

Server configuration is in postgresql.conf.

== Part 1. TCP Keep Alive ==
At 11:25:35.847138 connection to the server is made and the first query is sent. Got response fast at 11:25:35.858582. No other queries were made for the next minute to catch keep alive packets. Wireshark 1.8.2 marks 13 - 36 frames as Keep-Alive, so we can see that it's configured right and definitely works.

== Part 2. The Problem ==
At 11:26:40.933017 queries generation is started on client side. Client is configured to perform 1 request per second. After some arbitrary time next command is executed on server node:
[root@cluster1]# date && iptables -A OUTPUT -p tcp --sport 5432 -j DROP && iptables -A INPUT -p tcp --dport 5432 -j DROP

11:26:47 is outputed to console. As you can see in client trace file, this time corresponds to frame 55 - the last query is made. strace shows send && poll syscalls. And... that's it. The client got blocked on poll.

== Part 3. The aftermath ==
The Client was blocked ~2 minutes. I killed application with SIGTERM, which you can see in strace. At the time application was still waiting on libpq's poll. The Pcap file show no trace of keep-alive packets after server was isolated with iptable's rules. As I said earlier: TCP Keep-Alive is done on idle connection only. When TCP retransmission kicks-in - TCP Keep-Alive is not performed.


Let me repeat myself again: the problem is NOT with the server. The problem is with libpq's PGgetResult which ultimately leads to very optimistic poll routine.

Thank you.

With regards, Dmitry Samonenko.


Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: xbzhang
Дата:
Сообщение: How to implement the skip errors for copy from ?
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Fwd: libpq: indefinite block on poll during network problems