Обсуждение: libpq: PQexec may block indefinitly

Поиск
Список
Период
Сортировка

libpq: PQexec may block indefinitly

От
Dmitry Samonenko
Дата:
Greetings.

I have an application which uses libpq for interaction with remote PostgreSQL server 9.2. Clients and Server nodes are running Linux and connection is established using TCPv4. The client application has some small fault-tolerance features, which are activated when server related problems are encountered.

One day some bad things happened with network layer hardware and, long story short, host with PSQL server got isolated. All TCP messages routed to server node were NOT delivered or acknowledged in any way. None of fault-tolerance features were triggered. Said critical network problem was resolved and I started to investigate why clients got fully inoperable.

I have successfully reproduced the problem in the laboratory environment. These iptables commands should be run on the server node after some period of client <-> server interaction:

# iptables -A OUTPUT -p tcp --sport 5432 -j DROP
# iptables -A INPUT  -p tcp --dport 5432 -j DROP

After this my client blocks in libpq code according to debugger. I made a glimpse over master branch of libpq sources and some questions arose. Namely:

1. Connection to PSQL server is made without an option to specify SO_RCVTIMEO and SO_SNDTIMEO. Why is that? Is setting socket timeouts considered harmful?
2. PQexec ultimately leads to PQwait, which after some function calls "lands" in pqSocketCheck and pqSocketPoll. These 2 functions have parameter end_time. It is set (-1) for PQexec scenario, which leads to infinite poll timeout in pqSocketPoll. Is it possible to implement configurable timeout for PQexec calls? Is there some implemented features, which should be used to handle situation like this?

Currently, I have changed Linux kernel tcp4 stack counters responsible for retransmission, so OS actually closes socket after some period. This is detected by pqSocketPoll's poll and libpq handles situation correctly - error is reported to my application. But it's just a workaround.

So, this infinite poll situation looks like imperfection to me and I think it should be fixed. Waiting for your comments: is it a bug or a feature?

With regards,
  Dmitry Samonenko

Re: libpq: PQexec may block indefinitly

От
Amit Kapila
Дата:
On Mon, May 26, 2014 at 1:34 PM, Dmitry Samonenko <shreddingwork@gmail.com> wrote:
> 1. Connection to PSQL server is made without an option to specify SO_RCVTIMEO and SO_SNDTIMEO. Why is that? Is setting socket timeouts considered harmful?
> 2. PQexec ultimately leads to PQwait, which after some function calls "lands" in pqSocketCheck and pqSocketPoll. These 2 functions have parameter end_time. It is set (-1) for PQexec scenario, which leads to infinite poll timeout in pqSocketPoll. Is it possible to implement configurable timeout for PQexec calls? Is there some implemented features, which should be used to handle situation like this?

Have you tried using Cancel functionality:
http://www.postgresql.org/docs/9.4/static/libpq-cancel.html

> Currently, I have changed Linux kernel tcp4 stack counters responsible for retransmission, so OS actually closes socket after some period. This is detected by pqSocketPoll's poll and libpq handles situation correctly - error is reported to my application. But it's just a workaround.

There are certain tcp parameters which can be configured for connections.

tcp_keepalives_idle, tcp_keepalives_interval, tcp_keepalives_count



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com