Обсуждение: server crash => libpq poll() hangs forever (Linux)

Поиск
Список
Период
Сортировка

server crash => libpq poll() hangs forever (Linux)

От
Marinos Yannikos
Дата:
Hi,

we had a kernel panic crashing our DB server today and all libpq clients (C and
Perl clients) got stuck in poll() for hours even after the server was back up,
i.e. longer than the tcp timeout should be:

#0  0x00002b2283b31c8f in poll () from /lib/libc.so.6
#1  0x00002b228446f4af in PQmblen () from /usr/lib/libpq.so.4
#2  0x00002b228446f590 in pqWaitTimed () from /usr/lib/libpq.so.4
#3  0x00002b228446ee72 in PQgetResult () from /usr/lib/libpq.so.4
#4  0x00002b228446ef4e in PQgetResult () from /usr/lib/libpq.so.4
#5  0x00002b2284341ffe in pg_st_prepare_statement ()
    from /usr/local/lib/perl/5.8.8/auto/DBD/Pg/Pg.so
#6  0x00002b228434eb25 in pg_st_execute ()
[...]

It seems that poll() never receives a connection closed notification under Linux
(https://lists.linux-foundation.org/pipermail/bugme-new/2003-April/008335.html -
very old report, I can't find any newer information), so I am unsure how to
handle such a case gracefully. I guess I'm having the same problem as reported in

http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg105844.html

but there's no real conclusion there. Any suggestions? Can libpq be configured
to use epoll or select perhaps? Is the libpq (8.1.19-0etch1) too old?

Server version is 8.4.4, using tcp (no SSL).

Regards,
  Marinos




Re: server crash => libpq poll() hangs forever (Linux)

От
Tom Lane
Дата:
Marinos Yannikos <mjy@geizhals.at> writes:
> It seems that poll() never receives a connection closed notification under Linux
> (https://lists.linux-foundation.org/pipermail/bugme-new/2003-April/008335.html -
> very old report,

"very old report" is right.  What makes you think that has anything to
do with modern kernel versions?

            regards, tom lane

Re: server crash => libpq poll() hangs forever (Linux)

От
björn lundin
Дата:
On 9 Juni, 16:37, t...@sss.pgh.pa.us (Tom Lane) wrote:
> Marinos Yannikos <m...@geizhals.at> writes:
> > It seems that poll() never receives a connection closed notification under Linux
> > (https://lists.linux-foundation.org/pipermail/bugme-new/2003-April/008...-
> > very old report,
>
> "very old report" is right.  What makes you think that has anything to
> do with modern kernel versions?

Interesting. The bug report includes a short code snippet which
compiles to a c program,
that shows the bug is still present. I'm on

bnl@tova:~$ uname -a
Linux tova 2.6.31-22-generic #60-Ubuntu SMP Thu May 27 00:22:23 UTC
2010 i686 GNU/Linux

is it really so, that the bug is still valid, or does the code snippet
show something else?

/Björn


Re: server crash => libpq poll() hangs forever (Linux)

От
Alvaro Herrera
Дата:
Excerpts from björn lundin's message of mié jun 09 16:17:57 -0400 2010:
> On 9 Juni, 16:37, t...@sss.pgh.pa.us (Tom Lane) wrote:
> > Marinos Yannikos <m...@geizhals.at> writes:
> > > It seems that poll() never receives a connection closed notification under Linux
> > > (https://lists.linux-foundation.org/pipermail/bugme-new/2003-April/008...-
> > > very old report,
> >
> > "very old report" is right.  What makes you think that has anything to
> > do with modern kernel versions?
>
> Interesting. The bug report includes a short code snippet which
> compiles to a c program,
> that shows the bug is still present. I'm on

That test program uses UDP sockets.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: server crash => libpq poll() hangs forever (Linux)

От
Tom Lane
Дата:
=?ISO-8859-1?Q?bj=F6rn_lundin?= <b.f.lundin@gmail.com> writes:
> On 9 Juni, 16:37, t...@sss.pgh.pa.us (Tom Lane) wrote:
>> "very old report" is right. �What makes you think that has anything to
>> do with modern kernel versions?

> Interesting. The bug report includes a short code snippet which
> compiles to a c program,
> that shows the bug is still present. I'm on

Mph.  Reading the bug report and the code snippet more closely, the
complaint is totally irrelevant to libpq anyway.  What he's complaining
about is a case where another thread of a multithreaded application
close()s the descriptor that a poll() is using.  That is *not* related
to the other end of the connection closing the connection, which is the
case the OP was concerned about.

            regards, tom lane

Re: server crash => libpq poll() hangs forever (Linux)

От
Marinos Yannikos
Дата:
Am 09.06.2010 16:37, schrieb Tom Lane:
> Marinos Yannikos<mjy@geizhals.at>  writes:
>> It seems that poll() never receives a connection closed notification under Linux
>> (https://lists.linux-foundation.org/pipermail/bugme-new/2003-April/008335.html -
>> very old report,
>
> "very old report" is right.  What makes you think that has anything to
> do with modern kernel versions?

Mainly that I have no other explanation for multiple clients hanging on multiple
client machines (all kernel 2.6.18, some were Xen instances, the server uses
2.6.26) in libpq's poll(), for up to 2 days 10 hours after the server crash
(until we found/restarted them). 2.6.18 is probably not "modern", but reliable,
does anyone have more information regarding poll() changes in more recent
kernels? We'll upgrade some boxes and see if anything changes.

Regards,
  Marinos