Обсуждение: How to diagnose application "hangs" in pg_receive?

Поиск
Список
Период
Сортировка

How to diagnose application "hangs" in pg_receive?

От
Clemens Eisserer
Дата:
Hi,

A hibernate based application I wrote suffers from "hangs" - the
thread which should fetch the data is stuck at pg_receive (the
function reading back data from the socket).
My first idea were threading errors, however after careful code review
I couldn't find suspicious code.
The application doesn't do any explicit locking, an excerpt of the
locking table is at the end of the email.

Any idea what could be going wrong here?
Except a broken connction, what could be the reason for never arriving
reply from the server?

Thank you in advance, Clemens


   datname   |            relname             |           mode
  | granted | usename  |             substr             |
query_start          |       age       | procpid

-------------+--------------------------------+--------------------------+---------+----------+--------------------------------+-------------------------------+-----------------+---------
 khbldb_prod | startkladde_pkey               | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | startkladde                    |
ShareUpdateExclusiveLock | t       | postgres | autovacuum: VACUUM
public.star | 2011-05-21 11:19:51.780587+01 | 03:43:21.367065 |
18241
 khbldb_prod | kladde_flugnr_index            | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | kladde_schleppid_index         | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | kladde_lehrer_index            | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | kladde_nachplatzid_index       | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | kladde_bmok_index              | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | kladde_vonplatzid_index        | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod |                                | ExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | kladde_datum_index             | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | statusindex                    | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | kladde_pilot_index             | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | bzbisindex                     | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | kladde_flugzeug_index          | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241
 khbldb_prod | kladde_geschlepptid_index      | RowExclusiveLock
  | t       | postgres | autovacuum: VACUUM public.star | 2011-05-21
11:19:51.780587+01 | 03:43:21.367065 |   18241

//No eclusive locks are held by the application:
 khbldb_prod | verkauf_type_index             | AccessShareLock
  | t       | khbl     | <IDLE> in transaction          | 2011-05-21
13:04:53.930505+01 | 01:58:19.217147 |   18637
 khbldb_prod | rechnungsstatus_pkey           | AccessShareLock
  | t       | khbl     | <IDLE> in transaction          | 2011-05-21
13:04:53.930505+01 | 01:58:19.217147 |   18637
 khbldb_prod | bzbisindex                     | AccessShareLock
  | t       | khbl     | <IDLE> in transaction          | 2011-05-21
13:04:53.930505+01 | 01:58:19.217147 |   18637
 khbldb_prod | rechnungsstatus                | AccessShareLock
  | t       | khbl     | <IDLE> in transaction          | 2011-05-21
13:04:53.930505+01 | 01:58:19.217147 |   18637
 khbldb_prod | kladde_lehrer_index            | AccessShareLock
  | t       | khbl     | <IDLE> in transaction          | 2011-05-21
13:04:53.930505+01 | 01:58:19.217147 |   18637

Re: How to diagnose application "hangs" in pg_receive?

От
Hannu Krosing
Дата:
On Sat, 2011-05-21 at 16:08 +0200, Clemens Eisserer wrote:
> Hi,
>
> A hibernate based application I wrote suffers from "hangs" - the
> thread which should fetch the data is stuck at pg_receive (the
> function reading back data from the socket).
> My first idea were threading errors, however after careful code review
> I couldn't find suspicious code.
> The application doesn't do any explicit locking, an excerpt of the
> locking table is at the end of the email.
>
> Any idea what could be going wrong here?

I'm not an expert in JDBC so all the debugging advice is on server
side :)

> Except a broken connction, what could be the reason for never arriving
> reply from the server?

Maybe too small connection pool somewhere on the way, so the statement
is waiting for connection, not for query answer ?

to see if this is the case, set full logging in postgresql.conf and see
if the query reaches the database at all.

You could also start by setting statement_timeout to some sensible value
so that you can rule out a hang statement / query

maybe

set statement_timeout to '5s';

would be a good value to start.

> Thank you in advance, Clemens


--
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/


Re: How to diagnose application "hangs" in pg_receive?

От
"Kevin Grittner"
Дата:
Clemens Eisserer  wrote:

> A hibernate based application I wrote suffers from "hangs" - the
> thread which should fetch the data is stuck at pg_receive (the
> function reading back data from the socket).

You have two things happening which look bad, and could cause you to
experience extreme bloat.  It may not be that your application is
blocked, but that your database has gotten into such bad shape that
queries are taking A Very Long Time.

First, all those autovacuum processes which have been running for
hours make me wonder if you're on a very old version of PostgreSQL
-- we had some bugs for a while which might explain those if you
haven't kept up with bug-fix minor releases:

http://www.postgresql.org/support/versioning

Please post the output from running the query on this page:

http://wiki.postgresql.org/wiki/Server_Configuration

Second, those " in transaction" connections which are hours
old will prevent autovacuum (even if you don't have an old version
with bugs) from cleaning up old versions of rows, making your
queries progressively slower.  These would be caused by poor
programming technique in the application or the application
framework, and should be considered to be serious bugs.

-Kevin



Re: How to diagnose application "hangs" in pg_receive?

От
Clemens Eisserer
Дата:
Hi Kevin,

The tables are so small and the queries so simple, I am very sure its
not caused by bad shape of the database.
(and the queries run fast, iff they run)

I know that using those long-running transactions are very ugly,
however its a low-client-count app (2-5 clients) which all disconnect
at night where VACUUM and ANALYZE are performed once a week as cron
job.

Thanks, Clemens

2011/5/21 Kevin Grittner <Kevin.Grittner@wicourts.gov>:
> Clemens Eisserer  wrote:
>
>> A hibernate based application I wrote suffers from "hangs" - the
>> thread which should fetch the data is stuck at pg_receive (the
>> function reading back data from the socket).
>
> You have two things happening which look bad, and could cause you to
> experience extreme bloat.  It may not be that your application is
> blocked, but that your database has gotten into such bad shape that
> queries are taking A Very Long Time.
>
> First, all those autovacuum processes which have been running for
> hours make me wonder if you're on a very old version of PostgreSQL
> -- we had some bugs for a while which might explain those if you
> haven't kept up with bug-fix minor releases:
>
> http://www.postgresql.org/support/versioning
>
> Please post the output from running the query on this page:
>
> http://wiki.postgresql.org/wiki/Server_Configuration
>
> Second, those " in transaction" connections which are hours
> old will prevent autovacuum (even if you don't have an old version
> with bugs) from cleaning up old versions of rows, making your
> queries progressively slower.  These would be caused by poor
> programming technique in the application or the application
> framework, and should be considered to be serious bugs.
>
> -Kevin
>
>
>

Re: How to diagnose application "hangs" in pg_receive?

От
Maciek Sakrejda
Дата:
> Any idea what could be going wrong here?
> Except a broken connction, what could be the reason for never arriving
> reply from the server?

Stupid question, but could this be related to the issues discussed
here: http://archives.postgresql.org/pgsql-jdbc/2011-04/msg00080.php ?
Whatever happened to that patch? The e-mail thread peters out...
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: How to diagnose application "hangs" in pg_receive?

От
"Kevin Grittner"
Дата:
Please read this carefully:

http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

You haven't told us what version of PostgreSQL you're running, in
spite of my asking that you post the results of a query which would
have told us that and other useful information.  There are a number
of things this could be, alone or in combination, but you have not
provided anywhere near enough factual information to rule much in
or out.  The only evidence that you have provided suggests that at
least part of your problem has been bloat, but you dismissed that
out of hand without showing any sign that you understand that issue
or have checked for it.

In follow-ups, please try to avoid hand-wavy terms like "forever";
tell us whether you waited a minute, an hour, ten days, whatever
"forever" actually was, along with any observations on CPU and disk
usage during this time.  You didn't show the active query in your
listings -- were those taking during the time the query was hanging?

-Kevin




Re: How to diagnose application "hangs" in pg_receive?

От
Clemens Eisserer
Дата:
Hi Kevin,

If you feel personally attacked, just because I highlighted that I am
sure the problem is not caused by a query taking a long time to
execute, I am sorry.  Nontheless, please find my detailed answers
below.

Thanks, Clemens


> You haven't told us what version of PostgreSQL you're running,
PostgreSQL 8.4.7
postgresql-8.4-701.jdbc3.jar


> spite of my asking that you post the results of a query which would
> have told us that and other useful information.
Because as I said, the problem is not the execution of the query.
Last time I experienced such a hang it was a simple "select .... from
... where id=x" query, which executes usually in a few ms.


> The only evidence that you have provided suggests that at
> least part of your problem has been bloat, but you dismissed that
> out of hand without showing any sign that you understand that issue
> or have checked for it.
The database has about 2MB of data stored and is mostly read-only.
I know what bloat is, understand the issue and have given a clear
indication that I know bloat is not the problem.
Therefor narrowing down the possibilities, and to direct guessing into
another direction.

> In follow-ups, please try to avoid hand-wavy terms like "forever";
> tell us whether you waited a minute, an hour, ten days, whatever
> "forever" actually was, along with any observations on CPU and disk
> usage during this time.  You didn't show the active query in your
> listings -- were those taking during the time the query was hanging?

Forever means, what forever usually means - as long as I terminate the client.
I took this hand-wavy term to make clear that my understanding of the
problem is that its not a slowly running query.

Re: How to diagnose application "hangs" in pg_receive?

От
"Kevin Grittner"
Дата:
Clemens Eisserer  wrote:

> If you feel personally attacked

Far from it -- I just feel like my time is being wasted.  If you provide
the information I requested in the prior emails, I'll probably take
another shot at helping.

-Kevin