Обсуждение: Postgres entering zombie state once a week in production evnvironment

Поиск
Список
Период
Сортировка

Postgres entering zombie state once a week in production evnvironment

От
Bryce Nesbitt
Дата:
We have a medium scale installation of Postgres 8.3 that is freezing 
about once a week.  I'm looking for any hints on how to diagnose the 
situation, as nothing is logged.

The system is matched pair of Sunfire servers, running Debian Etch with 
a 2.6.18-6-amd64 kernel, PostgreSQL 8.3.4, and DRBD 8.0.13.

During a failed state, pg_stat_activity will show hundreds of statements 
pending.  query_start will show the statements arriving at a normal rate 
(a few per second), but clearly they never complete.  The bulk of these 
statement are a simple select that starts each web session, a statement 
that generally completes in tenths of milliseconds.  Restarting postgres 
restores normal operation, at the loss of all chance of figuring out 
what was wrong.

postgresql.conf has customized:
log_destination = 'syslog'
log_min_error_statement = error
log_min_duration_statement = 5000

Where can I go from here?  I'm turning on log_checkpoints now, though 
all I see is "postgres[14412]: [4-1] LOG:  checkpoint starting: time" 
with no actual time printed.



Re: Postgres entering zombie state once a week in production evnvironment

От
Achilleas Mantzios
Дата:
Στις Tuesday 14 April 2009 09:25:54 ο/η Bryce Nesbitt έγραψε:
> We have a medium scale installation of Postgres 8.3 that is freezing
> about once a week.  I'm looking for any hints on how to diagnose the
> situation, as nothing is logged.
>
> The system is matched pair of Sunfire servers, running Debian Etch with
> a 2.6.18-6-amd64 kernel, PostgreSQL 8.3.4, and DRBD 8.0.13.
>
> During a failed state, pg_stat_activity will show hundreds of statements
> pending.  query_start will show the statements arriving at a normal rate
> (a few per second), but clearly they never complete.  The bulk of these
> statement are a simple select that starts each web session, a statement
> that generally completes in tenths of milliseconds.  Restarting postgres
> restores normal operation, at the loss of all chance of figuring out
> what was wrong.
>
> postgresql.conf has customized:
> log_destination = 'syslog'
> log_min_error_statement = error
> log_min_duration_statement = 5000
>
> Where can I go from here?  I'm turning on log_checkpoints now, though
> all I see is "postgres[14412]: [4-1] LOG:  checkpoint starting: time"
> with no actual time printed.
>
>

Did you check for locks?
SELECT * from pg_locks;
SELECT l.locktype,c.relname,l.pid,l.mode from pg_locks l,pg_class c where l.relation=c.oid ;
Were you able to issue other select commands during the problem? Were you able to start psql and do some simple
command?
Have you ruled out the possibility of the application being locked for other reasons? (limits, network failure,etc....)


--
Achilleas Mantzios


Re: Postgres entering zombie state once a week in production evnvironment

От
Scott Marlowe
Дата:
On Tue, Apr 14, 2009 at 12:25 AM, Bryce Nesbitt <bryce2@obviously.com> wrote:
> We have a medium scale installation of Postgres 8.3 that is freezing about
> once a week.  I'm looking for any hints on how to diagnose the situation, as
> nothing is logged.
>
> The system is matched pair of Sunfire servers, running Debian Etch with a
> 2.6.18-6-amd64 kernel, PostgreSQL 8.3.4, and DRBD 8.0.13.
>
> During a failed state, pg_stat_activity will show hundreds of statements
> pending.  query_start will show the statements arriving at a normal rate (a
> few per second), but clearly they never complete.  The bulk of these
> statement are a simple select that starts each web session, a statement that
> generally completes in tenths of milliseconds.  Restarting postgres restores
> normal operation, at the loss of all chance of figuring out what was wrong.

What does pg_locks say during this time?  Specifically about locks
that aren't granted?


Re: Postgres entering zombie state once a week in production evnvironment

От
Bryce Nesbitt
Дата:
Thanks for the thoughts on what to check.   Unfortunately, the priority 
of the people responding to the incidents has been to get the system 
live again.  I will add these items to a list that, hopefully, will be 
run through prior to restarting Postgres.

Achilleas Mantzios wrote:
> Did you check for locks?
> SELECT * from pg_locks;
> SELECT l.locktype,c.relname,l.pid,l.mode from pg_locks l,pg_class c where l.relation=c.oid ;
> Were you able to issue other select commands during the problem? Were you able to start psql and do some simple
command?
> Have you ruled out the possibility of the application being locked for other reasons? (limits, network
failure,etc....


Re: Postgres entering zombie state once a week in production evnvironment

От
Bryce Nesbitt
Дата:
Scott Marlowe wrote:
> What does pg_locks say during this time?  Specifically about locks
> that aren't granted?
I don't know, yet.  Though these events go for 15-30 minutes before 
postgres restart, and no deadlocks are detected, so I don't think it is 
locks.


Re: Postgres entering zombie state once a week in production evnvironment

От
Scott Marlowe
Дата:
On Tue, Apr 14, 2009 at 2:59 PM, Bryce Nesbitt <bryce2@obviously.com> wrote:
> Scott Marlowe wrote:
>>
>> What does pg_locks say during this time?  Specifically about locks
>> that aren't granted?
>
> I don't know, yet.  Though these events go for 15-30 minutes before postgres
> restart, and no deadlocks are detected, so I don't think it is locks.

Ummm, deadlocks <> locks blocking other queries.