Обсуждение: [Solved] 8.3 Stats Collector Stuck at 100% CPU
Hi all, I wanted to share the solution to $SUBJECT. I noticed that the stats collector process for one of our servers was pegged around 80-100% CPU usage, and looked to have been that way for several days or weeks. Server info: * Postgres 8.3.4 * Linux 2.6.18-92.1.13.el5 #1 SMP x86_64 * CentOS release 5.2 (Final) * 7 databases, total of ~4300 rows in pg_class tables across all these databases I found several previous threads complaining about the same problem with 8.3. A few concluded this was a FreeBSD-specific problem (it might have been, for them), and one was with 8.4, but was from having 650 or so databases in a single instance. The solution for me was simply to run pg_stat_reset(), and the problem went away within a few seconds. The $PGDATA/global/pgstat.stat file was 1.2GB before the reset, and went down to ~250KB after the reset. It looks like this file is adding on 4MB or so per day at the current rate, so I'll likely have to do this again in a few months. It might be worthwhile to document this fix somewhere, perhaps at http://www.postgresql.org/docs/8.3/static/monitoring-stats.html#MONITORING-STATS-FUNCS-TABLE since I had to search around quite a bit to find the pg_stat_reset() solution below, thanks to Tom Lane: "100% of CPU utilization postgres process": http://archives.postgresql.org/pgsql-general/2010-01/msg01079.php Josh
Josh Kupershmidt <schmiddy@gmail.com> writes: > The solution for me was simply to run pg_stat_reset(), and the problem > went away within a few seconds. The $PGDATA/global/pgstat.stat file > was 1.2GB before the reset, and went down to ~250KB after the reset. > It looks like this file is adding on 4MB or so per day at the current > rate, so I'll likely have to do this again in a few months. Hm. It sounds like you are "leaking" stats collector table entries for some reason. It would be good to fix the underlying problem rather than just resign yourself to a manual workaround. Is there anything unusual about your workload that might trigger this? regards, tom lane
> Hm. It sounds like you are "leaking" stats collector table entries for > some reason. It would be good to fix the underlying problem rather than > just resign yourself to a manual workaround. Is there anything unusual > about your workload that might trigger this? Don't think the database setup is that unusual. I did overestimate how quickly pgstat.stat is growing; it's only gone up to 800KB in the 20 hours since I ran pg_stat_reset(). I thought it was growing 4MB per day because last night it had grown to 500KB in just 3 hours. Also, it had ballooned to 1.2 GB after running for around a year, I think. Relevant stats-related info I can think of: * default_statistics_target = 10 * I don't think any tables have had ALTER TABLE SET STATISTICS done * the two really active databases have 2300 and 490 rows in pg_class * mostly bulk updates/inserts * PGDATA is 1.6 TB If there's some useful debugging info on the stats collector process I can gather from the server, I'd be happy to try. Josh
Josh Kupershmidt <schmiddy@gmail.com> writes: >> Hm. �It sounds like you are "leaking" stats collector table entries for >> some reason. �It would be good to fix the underlying problem rather than >> just resign yourself to a manual workaround. �Is there anything unusual >> about your workload that might trigger this? > Don't think the database setup is that unusual. I did overestimate how > quickly pgstat.stat is growing; it's only gone up to 800KB in the 20 > hours since I ran pg_stat_reset(). I thought it was growing 4MB per > day because last night it had grown to 500KB in just 3 hours. Also, it > had ballooned to 1.2 GB after running for around a year, I think. What would be expected is for it to ramp up fairly quickly to one entry per table in the database, and then stabilize. Maybe the 1.2GB figure represents the fallout from some strange event rather than a gradual leakage. > Relevant stats-related info I can think of: > * default_statistics_target = 10 > * I don't think any tables have had ALTER TABLE SET STATISTICS done > * the two really active databases have 2300 and 490 rows in pg_class > * mostly bulk updates/inserts > * PGDATA is 1.6 TB > If there's some useful debugging info on the stats collector process I > can gather from the server, I'd be happy to try. Do you have a copy of the 1.2GB file and would you be willing to send me it if so? There shouldn't be any especially private info in there, just table OIDs and access counts. (1.2GB would be a lot of data to mail but I bet it gzips down to a lot less.) regards, tom lane
On Thu, Apr 1, 2010 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Do you have a copy of the 1.2GB file and would you be willing to send me > it if so? There shouldn't be any especially private info in there, just > table OIDs and access counts. (1.2GB would be a lot of data to mail but > I bet it gzips down to a lot less.) I didn't keep a copy last night, and I don't think we have a filesystem-level backup for this machine :-( I will try to remember to keep a copy if this happens again. Josh