Обсуждение: Connections getting stuck sending data to client

Поиск
Список
Период
Сортировка

Connections getting stuck sending data to client

От
Chris Butler
Дата:
I've been having intermittent problems with our DB server (running
postgresql 8.3.3) reaching its connection limit, all because of a SELECT
statement that's stuck while sending data. This gets stuck because there's a
transaction waiting to do an ALTER TABLE, then the subsequent SELECTs wait
for the ALTER.

The problem seems to be that the client connection drops while the server is
sending data to the client. On the server I end up with:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State
[...]
tcp        1  17376 db.zedcore.com:postgres     portia.healthjobsuk.c:36520 CLOSE_WAIT

By the time our monitoring system has picked up the problem, netstat on the
client contains no trace of the offending port.

As you can see, the Send-Q is quite full. Also, if I strace the postgres
process, I get:

[root@db2:~]# strace -fp 13572
Process 13572 attached - interrupt to quit
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 1) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0

It appears that the server is blocking on the send operation, I guess
because the send buffer is already full. Also, I was unable to kill the
backend process with either SIGINT or TERM (possibly a bug?).

A tcpdump of traffic for that port:

14:34:07.975493 IP db.zedcore.com.postgres > portia.healthjobsuk.com.36520: . 2086302610:2086304058(1448) ack 381974125
win78 <nop,nop,timestamp 3025282443 595580714> 
14:34:07.976483 IP portia.healthjobsuk.com > db.zedcore.com: ICMP portia.healthjobsuk.com tcp port 36520 unreachable,
length556 
14:36:07.960308 IP db.zedcore.com.postgres > portia.healthjobsuk.com.36520: . 0:1448(1448) ack 1 win 78
<nop,nop,timestamp3025402443 595580714> 
14:36:07.961565 IP portia.healthjobsuk.com > db.zedcore.com: ICMP portia.healthjobsuk.com tcp port 36520 unreachable,
length556 


A bit of googling led me towards changing the TCP keepalive settings in
postgresql.conf, although I don't think they've made any difference.

Does anyone have any ideas what's happening, and whether there's anything I
can do to stop the problem?

--
Chris Butler
Zedcore Systems Ltd
UK tel: 0114 238 1828 ext 72

Re: Connections getting stuck sending data to client

От
"Scott Marlowe"
Дата:
On Fri, Oct 31, 2008 at 9:36 AM, Chris Butler <chrisb@zedcore.com> wrote:
> I've been having intermittent problems with our DB server (running
> postgresql 8.3.3) reaching its connection limit, all because of a SELECT
> statement that's stuck while sending data. This gets stuck because there's a
> transaction waiting to do an ALTER TABLE, then the subsequent SELECTs wait
> for the ALTER.
>
> The problem seems to be that the client connection drops while the server is
> sending data to the client.

Are the clients windows?  I know there's a well known and published
bug in windows where it freaks out on blocking ports and the
connection hangs.  There's a kb article on it on the ms site.  It's
been around since windows 95 or so and there appears to be no plans to
fix it.

If it's not that I'm not sure what it would be.

Re: Connections getting stuck sending data to client (FIXED)

От
Chris Butler
Дата:
[Forgot to send this to the list first time]

On Fri, Oct 31, 2008 at 12:27:27PM -0600, Scott Marlowe wrote:
> On Fri, Oct 31, 2008 at 9:36 AM, Chris Butler <chrisb@zedcore.com> wrote:
> > I've been having intermittent problems with our DB server (running
> > postgresql 8.3.3) reaching its connection limit, all because of a SELECT
> > statement that's stuck while sending data. This gets stuck because there's a
> > transaction waiting to do an ALTER TABLE, then the subsequent SELECTs wait
> > for the ALTER.
> >
> > The problem seems to be that the client connection drops while the server is
> > sending data to the client.
> Are the clients windows?  I know there's a well known and published
> bug in windows where it freaks out on blocking ports and the
> connection hangs.  There's a kb article on it on the ms site.  It's
> been around since windows 95 or so and there appears to be no plans to
> fix it.

The client is a PHP webapp running on a Debian VPS.

Shortly after sending my previous message, I did spot an error in the
client's firewall. It was sending back an ICMP port-unreachable when
rejecting TCP packets, rather than a TCP Reset, which I guess is why the
connection stayed open rather than closing immediately.

Still unsure why it's dropping the connection, but at least it doesn't get
stuck any more.

--
Chris Butler
Zedcore Systems Ltd
UK tel: 0114 238 1828 ext 72