Обсуждение: Excess idle processes, leaking WAL logs?

Поиск

Список

Период

Сортировка

Excess idle processes, leaking WAL logs?

От

"Greg Cox"

Дата:

03 января 2007 г., 17:52:06

I'm a sysadmin, not a DBA, and I inherited a legacy app after the developer
left the company. It's 4 early mod_perl (1.29) + early DBI (1.43) app
servers, going against postgres 7.4.6.

The DB just crapped itself a few days ago. In the postmortem, we found out
that the number of processes on the server had been climbing as the server
ran, going from having some 100 processes, to about 350, which we believe
were mostly idle postgres processes (someone else got the page-out for
support). The uptime was around 6 months.

In the 3 days since it died, I've been watching it and, once again, the
processes are climbing slowly again. They're idle postgres processes,
fairly evenly distributed against the app servers, and more interestingly,
if I do lsof|grep postgres, I see a large number of lines (now 55):

postmaste 24521 postgres 55u REG 58,0 16777216 2899982
/a/path/pdm/pgsql/data/pg_xlog/0000030F000000E2 (deleted)

I looked in the 7.4 docs about WAL. checkpoint_timeout is 300,
checkpoint_segments is 8, plenty of space in pg_xlog, but there's 18 files
in there, some a few hours old, which I would suspect would not be the case
if it were checkpointing properly.

Am I chasing up the right tree over these leaking processes/connections, or
is the WAL logs thing just a red herring? Sadly, I can't edit the app code,
and I doubt I could upgrade the DB unless I can really business-justify that
a minor change would do it (I'd probably have to stay in the 7.4 series due
to timid managers), but if there's little baby tweaks I'm missing, or if I
can say "yeah, this version of postgres leaks, plan for 3-month-reboots or
moving to 7.4.x", it'd work for me.

Thanks for any help!
-Greg

_________________________________________________________________
Dave vs. Carl: The Insignificant Championship Series. �Who will win?

http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://davevscarl.spaces.live.com/?icid=T001MSN38C07001

Re: Excess idle processes, leaking WAL logs?

От

"Brandon Aiken"

Дата:

03 января 2007 г., 18:27:13

I highly recommend moving to the latest 7.4.x.  7.4.6 is well over 2 years old, and the numerous security issues alone
shouldbe convincing enough to upgrade.  Updating this far appears to be a multi-step process, though.  See the release
notes.

There was a race condition prior to 7.4.10 in transaction logging:
http://www.postgresql.org/docs/7.4/interactive/release-7-4-10.html

There's also a pg_clog race condition in 7.4.12, but that doesn't seem obviously related to me.

You didn't say if you're using clustering, but you may need to use this to correct this issue.
http://www.postgresql.org/docs/7.4/interactive/app-pgresetxlog.html

I am by no means an expert with PostgreSQL; however, it seems likely that at least some of the problems you're having
havebeen identified and corrected. 


--
Brandon Aiken
CS/IT Systems Engineer
-----Original Message-----
From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org] On Behalf Of Greg Cox
Sent: Wednesday, January 03, 2007 4:52 PM
To: pgsql-novice@postgresql.org
Subject: [NOVICE] Excess idle processes, leaking WAL logs?

I'm a sysadmin, not a DBA, and I inherited a legacy app after the developer
left the company.  It's 4 early mod_perl (1.29) + early DBI (1.43) app
servers, going against postgres 7.4.6.

The DB just crapped itself a few days ago.  In the postmortem, we found out
that the number of processes on the server had been climbing as the server
ran, going from having some 100 processes, to about 350, which we believe
were mostly idle postgres processes (someone else got the page-out for
support).  The uptime was around 6 months.

In the 3 days since it died, I've been watching it and, once again, the
processes are climbing slowly again.  They're idle postgres processes,
fairly evenly distributed against the app servers, and more interestingly,
if I do lsof|grep postgres, I see a large number of lines (now 55):

postmaste 24521 postgres   55u   REG       58,0   16777216   2899982
/a/path/pdm/pgsql/data/pg_xlog/0000030F000000E2 (deleted)

I looked in the 7.4 docs about WAL.  checkpoint_timeout is 300,
checkpoint_segments is 8, plenty of space in pg_xlog, but there's 18 files
in there, some a few hours old, which I would suspect would not be the case
if it were checkpointing properly.

Am I chasing up the right tree over these leaking processes/connections, or
is the WAL logs thing just a red herring?  Sadly, I can't edit the app code,
and I doubt I could upgrade the DB unless I can really business-justify that
a minor change would do it (I'd probably have to stay in the 7.4 series due
to timid managers), but if there's little baby tweaks I'm missing, or if I
can say "yeah, this version of postgres leaks, plan for 3-month-reboots or
moving to 7.4.x", it'd work for me.

Thanks for any help!
-Greg

_________________________________________________________________
Dave vs. Carl: The Insignificant Championship Series.  Who will win?

http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://davevscarl.spaces.live.com/?icid=T001MSN38C07001


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: Excess idle processes, leaking WAL logs?

От

Tom Lane

Дата:

03 января 2007 г., 19:13:34

"Greg Cox" <ratness@hotmail.com> writes:
> In the 3 days since it died, I've been watching it and, once again, the
> processes are climbing slowly again.  They're idle postgres processes,
> fairly evenly distributed against the app servers, and more interestingly,

This isn't a postgres bug, it's a problem in your application code; it's
not closing sessions it doesn't need.  Possibly a problem in a
connection-pooling layer?

> I looked in the 7.4 docs about WAL.  checkpoint_timeout is 300,
> checkpoint_segments is 8, plenty of space in pg_xlog, but there's 18 files
> in there, some a few hours old, which I would suspect would not be the case
> if it were checkpointing properly.

No, that doesn't sound out of line.  The steady state is to have about
2 * checkpoint_segments + 1 WAL files.  You might have one or so more
than I'd have expected but that's not enough to evidence a problem.
If you are concerned about whether checkpoints are really happening,
use pg_controldata to keep an eye on the last-checkpoint time in
pg_control.

I concur with the other comment that you really ought to be on 7.4.latest
but that's not going to magically fix your connection-pooling issue.

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Excess idle processes, leaking WAL logs?

Excess idle processes, leaking WAL logs?

Re: Excess idle processes, leaking WAL logs?

Re: Excess idle processes, leaking WAL logs?