Обсуждение: Hanging queries and I/O exceptions

Поиск
Список
Период
Сортировка

Hanging queries and I/O exceptions

От
Jan de Visser
Дата:
Hello,

While doing performance tests on Windows Server 2003 we observed to following
two problems.

Environment: J2EE application running in JBoss application server, against
pgsql 8.1 database. Load is caused by a smallish number of (very) complex
transactions, typically about 5-10 concurrently.

The first one, which bothers me the most, is that after about 6-8 hours the
application stops processing. No errors are reported, neither by the JDBC
driver nor by the server, but when I kill the application server, I see that
all my connections hang in a SQL statements (which never seem to return):

2006-03-03 08:17:12 4504 6632560 LOG:  duration: 45087000.000 ms  statement:
EXECUTE <unnamed>  [PREPARE:  SELECT objID FROM objects WHERE objID = $1 FOR
UPDATE]

I think I can reliably reproduce this by loading the app, and waiting a couple
of hours.



The second problem is less predictable:

JDBC exception:

An I/O error occured while sending to the backend.
org.postgresql.util.PSQLException: An I/O error occured while sending to the
backend.
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:214)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250)


In my server log, I have:

2006-03-02 12:31:02 5692 6436342 LOG:  could not receive data from client: A
non-blocking socket operation could not be completed immediately.

At the time my box is fairly heavy loaded, but still responsive. Server and
JBoss appserver live on the same dual 2Ghz Opteron.

A quick Google told me that:

1. More people have seen this.
2. No solutions.
3. The server message appears to indicate an unhandled WSAEWOULDBLOCK winsock
error on recv(), which MSDN said is to be expected and should be retried.

Is this a known bug?

jan


--
--------------------------------------------------------------
Jan de Visser                     jdevisser@digitalfairway.com

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------

Re: Hanging queries on dual CPU windows

От
Jan de Visser
Дата:
I have more information on this issue.

First of, the problem now happens after about 1-2 hours, as opposed to the 6-8
I mentioned earlier. Yey for shorter test cycles.

Furtermore, it does not happen on Linux machines, both single CPU and dual
CPU, nor on single CPU windows machines. We can only reproduce on a dual CPU
windows machine, and if we take one CPU out, it does not happen.

I executed the following after it hung:

db=# select l.pid, c.relname, l.mode, l.granted, l.page, l.tuple
from pg_locks l, pg_class c where c.oid = l.relation order by l.pid;

Which showed me that several transactions where waiting for a particular row
which was locked by another transaction. This transaction had no pending
locks (so no deadlock), but just does not complete and hence never
relinquishes the lock.

What gives? has anybody ever heard of problems like this on dual CPU windows
machines?

jan



On Monday 06 March 2006 09:38, Jan de Visser wrote:
> Hello,
>
> While doing performance tests on Windows Server 2003 we observed to
> following two problems.
>
> Environment: J2EE application running in JBoss application server, against
> pgsql 8.1 database. Load is caused by a smallish number of (very) complex
> transactions, typically about 5-10 concurrently.
>
> The first one, which bothers me the most, is that after about 6-8 hours the
> application stops processing. No errors are reported, neither by the JDBC
> driver nor by the server, but when I kill the application server, I see
> that all my connections hang in a SQL statements (which never seem to
> return):
>
> 2006-03-03 08:17:12 4504 6632560 LOG:  duration: 45087000.000 ms
>  statement: EXECUTE <unnamed>  [PREPARE:  SELECT objID FROM objects WHERE
> objID = $1 FOR UPDATE]
>
> I think I can reliably reproduce this by loading the app, and waiting a
> couple of hours.

--
--------------------------------------------------------------
Jan de Visser                     jdevisser@digitalfairway.com

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------

Re: Hanging queries on dual CPU windows

От
Tom Lane
Дата:
Jan de Visser <jdevisser@digitalfairway.com> writes:
> Furtermore, it does not happen on Linux machines, both single CPU and dual
> CPU, nor on single CPU windows machines. We can only reproduce on a dual CPU
> windows machine, and if we take one CPU out, it does not happen.
> ...
> Which showed me that several transactions where waiting for a particular row
> which was locked by another transaction. This transaction had no pending
> locks (so no deadlock), but just does not complete and hence never
> relinquishes the lock.

Is the stuck transaction still consuming CPU time, or just stopped?

Is it possible to get a stack trace from the stuck process?  I dunno
if you've got anything gdb-equivalent under Windows, but that's the
first thing I'd be interested in ...

            regards, tom lane

Re: Hanging queries on dual CPU windows

От
Jan de Visser
Дата:
On Thursday 09 March 2006 15:10, Tom Lane wrote:
> Jan de Visser <jdevisser@digitalfairway.com> writes:
> > Furtermore, it does not happen on Linux machines, both single CPU and
> > dual CPU, nor on single CPU windows machines. We can only reproduce on a
> > dual CPU windows machine, and if we take one CPU out, it does not happen.
> > ...
> > Which showed me that several transactions where waiting for a particular
> > row which was locked by another transaction. This transaction had no
> > pending locks (so no deadlock), but just does not complete and hence
> > never relinquishes the lock.
>
> Is the stuck transaction still consuming CPU time, or just stopped?

CPU drops off. In fact, that's my main clue something's wrong ;-)

>
> Is it possible to get a stack trace from the stuck process?  I dunno
> if you've got anything gdb-equivalent under Windows, but that's the
> first thing I'd be interested in ...

I wouldn't know. I'm hardly a windows expert. Prefer not to touch the stuff,
myself. Can do some research though...

>
>             regards, tom lane

jan

--
--------------------------------------------------------------
Jan de Visser                     jdevisser@digitalfairway.com

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------

Re: Hanging queries on dual CPU windows

От
Jan de Visser
Дата:
On Thursday 09 March 2006 15:10, Tom Lane wrote:
> Is it possible to get a stack trace from the stuck process?  I dunno
> if you've got anything gdb-equivalent under Windows, but that's the
> first thing I'd be interested in ...

Here ya go:

http://www.devisser-siderius.com/stack1.jpg
http://www.devisser-siderius.com/stack2.jpg
http://www.devisser-siderius.com/stack3.jpg

There are three threads in the process. I guess thread 1 (stack1.jpg) is the
most interesting.

I also noted that cranking up concurrency in my app reproduces the problem in
about 4 minutes ;-)

With thanks to Magnus Hagander for the Process Explorer hint.

jan

--
--------------------------------------------------------------
Jan de Visser                     jdevisser@digitalfairway.com

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------