Обсуждение: [Solved] 8.3 Stats Collector Stuck at 100% CPU

Поиск
Список
Период
Сортировка

[Solved] 8.3 Stats Collector Stuck at 100% CPU

От
Josh Kupershmidt
Дата:
Hi all,

I wanted to share the solution to $SUBJECT. I noticed that the stats
collector process for one of our servers was pegged around 80-100% CPU
usage, and looked to have been that way for several days or weeks.
Server info:
 * Postgres 8.3.4
 * Linux 2.6.18-92.1.13.el5 #1 SMP x86_64
 * CentOS release 5.2 (Final)
 * 7 databases, total of ~4300 rows in pg_class tables across all
these databases

I found several previous threads complaining about the same problem
with 8.3. A few concluded this was a FreeBSD-specific problem (it
might have been, for them), and one was with 8.4, but was from having
650 or so databases in a single instance.

The solution for me was simply to run pg_stat_reset(), and the problem
went away within a few seconds. The $PGDATA/global/pgstat.stat file
was 1.2GB before the reset, and went down to ~250KB after the reset.
It looks like this file is adding on 4MB or so per day at the current
rate, so I'll likely have to do this again in a few months.

It might be worthwhile to document this fix somewhere, perhaps at
http://www.postgresql.org/docs/8.3/static/monitoring-stats.html#MONITORING-STATS-FUNCS-TABLE
since I had to search around quite a bit to find the pg_stat_reset()
solution below, thanks to Tom Lane:

"100% of CPU utilization postgres process":
http://archives.postgresql.org/pgsql-general/2010-01/msg01079.php

Josh

Re: [Solved] 8.3 Stats Collector Stuck at 100% CPU

От
Tom Lane
Дата:
Josh Kupershmidt <schmiddy@gmail.com> writes:
> The solution for me was simply to run pg_stat_reset(), and the problem
> went away within a few seconds. The $PGDATA/global/pgstat.stat file
> was 1.2GB before the reset, and went down to ~250KB after the reset.
> It looks like this file is adding on 4MB or so per day at the current
> rate, so I'll likely have to do this again in a few months.

Hm.  It sounds like you are "leaking" stats collector table entries for
some reason.  It would be good to fix the underlying problem rather than
just resign yourself to a manual workaround.  Is there anything unusual
about your workload that might trigger this?

            regards, tom lane

Re: [Solved] 8.3 Stats Collector Stuck at 100% CPU

От
Josh Kupershmidt
Дата:
> Hm.  It sounds like you are "leaking" stats collector table entries for
> some reason.  It would be good to fix the underlying problem rather than
> just resign yourself to a manual workaround.  Is there anything unusual
> about your workload that might trigger this?

Don't think the database setup is that unusual. I did overestimate how
quickly pgstat.stat is growing; it's only gone up to 800KB in the 20
hours since I ran pg_stat_reset(). I thought it was growing 4MB per
day because last night it had grown to 500KB in just 3 hours. Also, it
had ballooned to 1.2 GB after running for around a year, I think.

Relevant stats-related info I can think of:
 * default_statistics_target = 10
 * I don't think any tables have had ALTER TABLE SET STATISTICS done
 * the two really active databases have 2300 and 490 rows in pg_class
 * mostly bulk updates/inserts
 * PGDATA is 1.6 TB

If there's some useful debugging info on the stats collector process I
can gather from the server, I'd be happy to try.

Josh

Re: [Solved] 8.3 Stats Collector Stuck at 100% CPU

От
Tom Lane
Дата:
Josh Kupershmidt <schmiddy@gmail.com> writes:
>> Hm. �It sounds like you are "leaking" stats collector table entries for
>> some reason. �It would be good to fix the underlying problem rather than
>> just resign yourself to a manual workaround. �Is there anything unusual
>> about your workload that might trigger this?

> Don't think the database setup is that unusual. I did overestimate how
> quickly pgstat.stat is growing; it's only gone up to 800KB in the 20
> hours since I ran pg_stat_reset(). I thought it was growing 4MB per
> day because last night it had grown to 500KB in just 3 hours. Also, it
> had ballooned to 1.2 GB after running for around a year, I think.

What would be expected is for it to ramp up fairly quickly to one entry
per table in the database, and then stabilize.  Maybe the 1.2GB figure
represents the fallout from some strange event rather than a gradual
leakage.

> Relevant stats-related info I can think of:
>  * default_statistics_target = 10
>  * I don't think any tables have had ALTER TABLE SET STATISTICS done
>  * the two really active databases have 2300 and 490 rows in pg_class
>  * mostly bulk updates/inserts
>  * PGDATA is 1.6 TB

> If there's some useful debugging info on the stats collector process I
> can gather from the server, I'd be happy to try.

Do you have a copy of the 1.2GB file and would you be willing to send me
it if so?  There shouldn't be any especially private info in there, just
table OIDs and access counts.  (1.2GB would be a lot of data to mail but
I bet it gzips down to a lot less.)

            regards, tom lane

Re: [Solved] 8.3 Stats Collector Stuck at 100% CPU

От
Josh Kupershmidt
Дата:
On Thu, Apr 1, 2010 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Do you have a copy of the 1.2GB file and would you be willing to send me
> it if so?  There shouldn't be any especially private info in there, just
> table OIDs and access counts.  (1.2GB would be a lot of data to mail but
> I bet it gzips down to a lot less.)

I didn't keep a copy last night, and I don't think we have a
filesystem-level backup for this machine :-(

I will try to remember to keep a copy if this happens again.
Josh