Обсуждение: stats collector process high CPU utilization
Greetings, Since upgrading to 8.2.3 yesterday, the stats collector process has had very high CPU utilization; it is consuming roughly 80-90% of one CPU. The server seems a lot more sluggish than it was before. Is this normal operation for 8.2 or something I should look into correcting? stats_start_collector = true stats_command_string = true stats_block_level = true stats_row_level = true stats_reset_on_server_start = false -- Benjamin Minshall <minshall@intellicon.biz> Senior Developer -- Intellicon, Inc.
Вложения
Benjamin Minshall <minshall@intellicon.biz> writes: > Since upgrading to 8.2.3 yesterday, the stats collector process has had > very high CPU utilization; it is consuming roughly 80-90% of one CPU. > The server seems a lot more sluggish than it was before. Is this normal > operation for 8.2 or something I should look into correcting? What version did you update from, and what platform is this? regards, tom lane
Tom Lane wrote: > Benjamin Minshall <minshall@intellicon.biz> writes: >> Since upgrading to 8.2.3 yesterday, the stats collector process has had >> very high CPU utilization; it is consuming roughly 80-90% of one CPU. >> The server seems a lot more sluggish than it was before. Is this normal >> operation for 8.2 or something I should look into correcting? > > What version did you update from, and what platform is this? > > regards, tom lane I upgraded from 8.1.5. The system is a dual Xeon 2.4Ghz, 4Gb RAM running linux kernel 2.6 series. -- Benjamin Minshall <minshall@intellicon.biz> Senior Developer -- Intellicon, Inc. http://www.intellicon.biz
Вложения
Benjamin Minshall <minshall@intellicon.biz> writes: > Tom Lane wrote: >> Benjamin Minshall <minshall@intellicon.biz> writes: >>> Since upgrading to 8.2.3 yesterday, the stats collector process has had >>> very high CPU utilization; it is consuming roughly 80-90% of one CPU. >>> The server seems a lot more sluggish than it was before. Is this normal >>> operation for 8.2 or something I should look into correcting? >> What version did you update from, and what platform is this? > I upgraded from 8.1.5. The system is a dual Xeon 2.4Ghz, 4Gb RAM > running linux kernel 2.6 series. OK, I was trying to correlate it with post-8.2.0 patches but evidently that's the wrong tree to bark up. No, this isn't an expected behavior. Is there anything unusual about your database (huge numbers of tables, or some such)? Can you gather some info about what it's doing? strace'ing the stats collector might prove interesting, also if you have built it with --enable-debug then oprofile results would be helpful. regards, tom lane
Tom Lane wrote: > > OK, I was trying to correlate it with post-8.2.0 patches but evidently > that's the wrong tree to bark up. No, this isn't an expected behavior. I talked with a co-worker and discovered that we went from 8.1.5 to 8.2.2, ran a few hours then went to 8.2.3 after the patch was released. I do not know if the high utilization was a problem during the few hours on 8.2.2. > Is there anything unusual about your database (huge numbers of tables, > or some such)? Nothing unusual. I have a few databases of about 10GB each; the workload is mostly inserts using COPY or parameterized INSERTS inside transaction blocks. > Can you gather some info about what it's doing? > strace'ing the stats collector might prove interesting, also if you have > built it with --enable-debug then oprofile results would be helpful. I will gather some strace info later today when I have a chance to shutdown the server. Thanks. -- Benjamin Minshall <minshall@intellicon.biz> Senior Developer -- Intellicon, Inc. http://www.intellicon.biz
Вложения
Benjamin Minshall <minshall@intellicon.biz> writes: > Tom Lane wrote: >> Can you gather some info about what it's doing? >> strace'ing the stats collector might prove interesting, also if you have >> built it with --enable-debug then oprofile results would be helpful. > I will gather some strace info later today when I have a chance to > shutdown the server. I don't see why you'd need to shut anything down. Just run strace -p stats-process-ID for a few seconds or minutes (enough to gather maybe a few thousand lines of output). regards, tom lane
Tom Lane wrote: > Benjamin Minshall <minshall@intellicon.biz> writes: >> Tom Lane wrote: >>> Can you gather some info about what it's doing? >>> strace'ing the stats collector might prove interesting, also if you have >>> built it with --enable-debug then oprofile results would be helpful. > >> I will gather some strace info later today when I have a chance to >> shutdown the server. > > I don't see why you'd need to shut anything down. Just run > strace -p stats-process-ID > for a few seconds or minutes (enough to gather maybe a few thousand > lines of output). > Seems the problem may be related to a huge global/pgstat.stat file. Under 8.1.5 it was about 1 MB; now it's 90 MB in 8.2.3. I ran strace for 60 seconds: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 95.71 1.119004 48652 23 rename 4.29 0.050128 0 508599 write 0.00 0.000019 0 249 22 poll 0.00 0.000000 0 23 open 0.00 0.000000 0 23 close 0.00 0.000000 0 34 getppid 0.00 0.000000 0 23 munmap 0.00 0.000000 0 23 setitimer 0.00 0.000000 0 23 22 sigreturn 0.00 0.000000 0 23 mmap2 0.00 0.000000 0 23 fstat64 0.00 0.000000 0 216 recv ------ ----------- ----------- --------- --------- ---------------- 100.00 1.169151 509282 44 total I attached an excerpt of the full strace with the many thousands of write calls filtered. -- Benjamin Minshall <minshall@intellicon.biz> Senior Developer -- Intellicon, Inc. http://www.intellicon.biz . . . write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0T\274\355"..., 4096) = 4096 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0/O\227\27\230"..., 4096) = 4096 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3363) = 3363 close(3) = 0 munmap(0xacdfc000, 4096) = 0 rename("global/pgstat.tmp", "global/pgstat.stat") = 0 poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 recv(5, "\1\0\0\0\320\3\0\0:\204\30\0\16\0\0\0\1\0\0\0\0\0\0\0\301"..., 1000, 0) = 976 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 500}}, NULL) = 0 poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 recv(5, "\1\0\0\0\320\3\0\0:\204\30\0\16\0\0\0\0\0\0\0\0\0\0\0\322"..., 1000, 0) = 976 poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 recv(5, "\1\0\0\0H\3\0\0:\204\30\0\f\0\0\0\1\0\0\0\0\0\0\0\301\204"..., 1000, 0) = 840 poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 recv(5, "\1\0\0\0\320\3\0\0:\204\30\0\16\0\0\0\1\0\0\0\0\0\0\0\301"..., 1000, 0) = 976 poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 recv(5, "\1\0\0\0\320\3\0\0:\204\30\0\16\0\0\0\0\0\0\0\0\0\0\000"..., 1000, 0) = 976 poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 recv(5, "\1\0\0\0\240\0\0\0:\204\30\0\2\0\0\0\0\0\0\0\0\0\0\0\303"..., 1000, 0) = 160 poll([{fd=5, events=POLLIN|POLLERR}], 1, 2000) = -1 EINTR (Interrupted system call) --- SIGALRM (Alarm clock) @ 0 (0) --- sigreturn() = ? (mask now []) getppid() = 22447 open("global/pgstat.tmp", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xacdfc000 write(3, "\226\274\245\1D\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 write(3, "Z\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 . . .
Вложения
Benjamin Minshall <minshall@intellicon.biz> writes: > Seems the problem may be related to a huge global/pgstat.stat file. > Under 8.1.5 it was about 1 MB; now it's 90 MB in 8.2.3. Yoi. We didn't do anything that would bloat that file if it were storing the same information as before. What I'm betting is that it's storing info on a whole lot more tables than before. Did you decide to start running autovacuum when you updated to 8.2? How many tables are visible in the pg_stats views? regards, tom lane
Tom Lane wrote: > Benjamin Minshall <minshall@intellicon.biz> writes: >> Seems the problem may be related to a huge global/pgstat.stat file. >> Under 8.1.5 it was about 1 MB; now it's 90 MB in 8.2.3. > > Yoi. We didn't do anything that would bloat that file if it were > storing the same information as before. What I'm betting is that it's > storing info on a whole lot more tables than before. The server is running on the same actual production data, schema and workload as before. > Did you decide to start running autovacuum when you updated to 8.2? Autovacuum was on and functioning before the update. > How many tables are visible in the pg_stats views? There are about 15 databases in the cluster each with around 90 tables. A count of pg_stats yields between 500 and 800 rows in each database. select count(*) from (select distinct tablename from pg_stats) as i; count ------- 92 (1 row) select count(*) from pg_stats; count ------- 628 (1 row) -- Benjamin Minshall <minshall@intellicon.biz> Senior Developer -- Intellicon, Inc. http://www.intellicon.biz
Вложения
Benjamin Minshall <minshall@intellicon.biz> writes: > Tom Lane wrote: >> How many tables are visible in the pg_stats views? > There are about 15 databases in the cluster each with around 90 tables. > A count of pg_stats yields between 500 and 800 rows in each database. Sorry, I was imprecise. The view "pg_stats" doesn't have anything to do with the stats collector; what I was interested in was the contents of the "pg_stat_xxx" and "pg_statio_xxx" views. It'd be enough to check pg_stat_all_indexes and pg_stat_all_tables, probably. Also, do you have the 8.1 installation still available to get the comparable counts there? regards, tom lane
> Benjamin Minshall <minshall@intellicon.biz> writes: >> Tom Lane wrote: >>> How many tables are visible in the pg_stats views? > >> There are about 15 databases in the cluster each with around 90 tables. >> A count of pg_stats yields between 500 and 800 rows in each database. > > Sorry, I was imprecise. The view "pg_stats" doesn't have anything to do > with the stats collector; what I was interested in was the contents of > the "pg_stat_xxx" and "pg_statio_xxx" views. It'd be enough to check > pg_stat_all_indexes and pg_stat_all_tables, probably. Also, do you have > the 8.1 installation still available to get the comparable counts there? > I checked all 15 databases on both 8.1 and 8.2; they were all quite consistent: pg_stat_all_indexes has about 315 rows per database pg_stat_all_tables has about 260 rows per database The pg_statio_* views match in count to the pg_stat_* views as well. While exploring this problem, I've noticed that one of the frequent insert processes creates a few temporary tables to do post-processing. Is it possible that the stats collector is getting bloated with stats from these short-lived temporary tables? During periods of high activity it could be creating temporary tables as often as two per second.
minshall@intellicon.biz writes: > While exploring this problem, I've noticed that one of the frequent insert > processes creates a few temporary tables to do post-processing. Is it > possible that the stats collector is getting bloated with stats from these > short-lived temporary tables? During periods of high activity it could be > creating temporary tables as often as two per second. Hmmm ... that's an interesting point, but offhand I don't see why it'd cause more of a problem in 8.2 than 8.1. Alvaro, any thoughts? regards, tom lane
Tom Lane wrote: > minshall@intellicon.biz writes: > > While exploring this problem, I've noticed that one of the frequent insert > > processes creates a few temporary tables to do post-processing. Is it > > possible that the stats collector is getting bloated with stats from these > > short-lived temporary tables? During periods of high activity it could be > > creating temporary tables as often as two per second. > > Hmmm ... that's an interesting point, but offhand I don't see why it'd > cause more of a problem in 8.2 than 8.1. Alvaro, any thoughts? No idea. I do have a very crude piece of code to read a pgstat.stat file and output some info about what it finds (table OIDs basically IIRC). Maybe it can be helpful to examine what's in the bloated stat file. Regarding temp tables, I'd think that the pgstat entries should be getting dropped at some point in both releases. Maybe there's a bug preventing that in 8.2? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Regarding temp tables, I'd think that the pgstat entries should be > getting dropped at some point in both releases. Maybe there's a bug > preventing that in 8.2? Hmmm ... I did rewrite the backend-side code for that just recently for performance reasons ... could I have broken it? Anyone want to take a second look at http://archives.postgresql.org/pgsql-committers/2007-01/msg00171.php regards, tom lane
I wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Regarding temp tables, I'd think that the pgstat entries should be >> getting dropped at some point in both releases. Maybe there's a bug >> preventing that in 8.2? > Hmmm ... I did rewrite the backend-side code for that just recently for > performance reasons ... could I have broken it? I did some testing with HEAD and verified that pgstat_vacuum_tabstat() still seems to do what it's supposed to, so that theory falls down. Alvaro, could you send Benjamin your stat-file-dumper tool so we can get some more info? Alternatively, if Benjamin wants to send me a copy of his stats file (off-list), I'd be happy to take a look. regards, tom lane
Tom Lane wrote: > I wrote: >> Alvaro Herrera <alvherre@commandprompt.com> writes: >>> Regarding temp tables, I'd think that the pgstat entries should be >>> getting dropped at some point in both releases. Maybe there's a bug >>> preventing that in 8.2? > >> Hmmm ... I did rewrite the backend-side code for that just recently for >> performance reasons ... could I have broken it? > > I did some testing with HEAD and verified that pgstat_vacuum_tabstat() > still seems to do what it's supposed to, so that theory falls down. > > Alvaro, could you send Benjamin your stat-file-dumper tool so we can > get some more info? > Alternatively, if Benjamin wants to send me a copy > of his stats file (off-list), I'd be happy to take a look. > > regards, tom lane When I checked on the server this morning, the huge stats file has returned to a normal size. I set up a script to track CPU usage and stats file size, and it appears to have decreased from 90MB down to about 2MB over roughly 6 hours last night. The CPU usage of the stats collector also decreased accordingly. The application logs indicate that there was no variation in the workload over this time period, however the file size started to decrease soon after the nightly pg_dump backups completed. Coincidence perhaps? Nonetheless, I would appreciate a copy of Alvaro's stat file tool just to see if anything stands out in the collected stats. Thanks for your help, Tom. -- Benjamin Minshall <minshall@intellicon.biz> Senior Developer -- Intellicon, Inc. http://www.intellicon.biz
Вложения
Benjamin Minshall <minshall@intellicon.biz> writes: > When I checked on the server this morning, the huge stats file has > returned to a normal size. I set up a script to track CPU usage and > stats file size, and it appears to have decreased from 90MB down to > about 2MB over roughly 6 hours last night. The CPU usage of the stats > collector also decreased accordingly. > The application logs indicate that there was no variation in the > workload over this time period, however the file size started to > decrease soon after the nightly pg_dump backups completed. Coincidence > perhaps? Well, that's pretty interesting. What are your vacuuming arrangements for this installation? Could the drop in file size have coincided with VACUUM operations? Because the ultimate backstop against bloated stats files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to clean out any entries that shouldn't be there anymore. It's sounding like what you had was just transient bloat, in which case it might be useful to inquire whether anything out-of-the-ordinary had been done to the database right before the excessive-CPU-usage problem started. regards, tom lane
Tom Lane wrote: > Well, that's pretty interesting. What are your vacuuming arrangements > for this installation? Could the drop in file size have coincided with > VACUUM operations? Because the ultimate backstop against bloated stats > files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to > clean out any entries that shouldn't be there anymore. VACUUM and ANALYZE are done by autovacuum only, no cron jobs. autovacuum_naptime is 30 seconds so it should make it to each database every 10 minutes or so. Do you think that more aggressive vacuuming would prevent future swelling of the stats file? > It's sounding like what you had was just transient bloat, in which case > it might be useful to inquire whether anything out-of-the-ordinary had > been done to the database right before the excessive-CPU-usage problem > started. I don't believe that there was any unusual activity on the server, but I have set up some more detailed logging to hopefully identify a pattern if the problem resurfaces. Thanks. -- Benjamin Minshall <minshall@intellicon.biz> Senior Developer -- Intellicon, Inc. http://www.intellicon.biz
Вложения
Benjamin Minshall <minshall@intellicon.biz> writes: > Tom Lane wrote: >> It's sounding like what you had was just transient bloat, in which case >> it might be useful to inquire whether anything out-of-the-ordinary had >> been done to the database right before the excessive-CPU-usage problem >> started. > I don't believe that there was any unusual activity on the server, but I > have set up some more detailed logging to hopefully identify a pattern > if the problem resurfaces. A further report led us to realize that 8.2.x in fact has a nasty bug here: the stats collector is supposed to dump its stats to a file at most every 500 milliseconds, but the code was actually waiting only 500 microseconds :-(. The larger the stats file, the more obvious this problem gets. If you want to patch this before 8.2.4, try this... Index: pgstat.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.140.2.2 diff -c -r1.140.2.2 pgstat.c *** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2 --- pgstat.c 1 Mar 2007 20:04:50 -0000 *************** *** 1689,1695 **** /* Preset the delay between status file writes */ MemSet(&write_timeout, 0, sizeof(struct itimerval)); write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; ! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000; /* * Read in an existing statistics stats file or initialize the stats to --- 1689,1695 ---- /* Preset the delay between status file writes */ MemSet(&write_timeout, 0, sizeof(struct itimerval)); write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; ! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000; /* * Read in an existing statistics stats file or initialize the stats to regards, tom lane
Tom Lane wrote: > A further report led us to realize that 8.2.x in fact has a nasty bug > here: the stats collector is supposed to dump its stats to a file at > most every 500 milliseconds, but the code was actually waiting only > 500 microseconds :-(. The larger the stats file, the more obvious > this problem gets. > > If you want to patch this before 8.2.4, try this... > Thanks for the follow-up on this issue, Tom. I was able to link the original huge stats file problem to some long(ish) running transactions which blocked VACUUM, but this patch will really help. Thanks. -Ben
Вложения
On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Benjamin Minshall <minshall@intellicon.biz> writes: > > Tom Lane wrote: > >> It's sounding like what you had was just transient bloat, in which case > >> it might be useful to inquire whether anything out-of-the-ordinary had > >> been done to the database right before the excessive-CPU-usage problem > >> started. > > > I don't believe that there was any unusual activity on the server, but I > > have set up some more detailed logging to hopefully identify a pattern > > if the problem resurfaces. > > A further report led us to realize that 8.2.x in fact has a nasty bug > here: the stats collector is supposed to dump its stats to a file at > most every 500 milliseconds, but the code was actually waiting only > 500 microseconds :-(. The larger the stats file, the more obvious > this problem gets. I think this explains the trigger that was blowing up my FC4 box. merlin
"Merlin Moncure" <mmoncure@gmail.com> writes: > On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A further report led us to realize that 8.2.x in fact has a nasty bug >> here: the stats collector is supposed to dump its stats to a file at >> most every 500 milliseconds, but the code was actually waiting only >> 500 microseconds :-(. The larger the stats file, the more obvious >> this problem gets. > I think this explains the trigger that was blowing up my FC4 box. I dug in the archives a bit and couldn't find the report you're referring to? regards, tom lane
On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Merlin Moncure" <mmoncure@gmail.com> writes: > > On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> A further report led us to realize that 8.2.x in fact has a nasty bug > >> here: the stats collector is supposed to dump its stats to a file at > >> most every 500 milliseconds, but the code was actually waiting only > >> 500 microseconds :-(. The larger the stats file, the more obvious > >> this problem gets. > > > I think this explains the trigger that was blowing up my FC4 box. > > I dug in the archives a bit and couldn't find the report you're > referring to? I was referring to this: http://archives.postgresql.org/pgsql-hackers/2007-02/msg01418.php Even though the fundamental reason was obvious (and btw, I inherited this server less than two months ago), I was still curious what was making 8.2 blow up a box that was handling a million tps/hour for over a year. :-) merlin
"Merlin Moncure" <mmoncure@gmail.com> writes: > On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Merlin Moncure" <mmoncure@gmail.com> writes: >>> I think this explains the trigger that was blowing up my FC4 box. >> >> I dug in the archives a bit and couldn't find the report you're >> referring to? > I was referring to this: > http://archives.postgresql.org/pgsql-hackers/2007-02/msg01418.php Oh, the kernel-panic thing. Hm, I wouldn't have thought that replacing a file at a huge rate would induce a kernel panic ... but who knows? Do you want to try installing the one-liner patch and see if the panic goes away? Actually I was wondering a bit if that strange Windows error discussed earlier today could be triggered by this behavior: http://archives.postgresql.org/pgsql-general/2007-03/msg00000.php regards, tom lane
Tom Lane wrote: > "Merlin Moncure" <mmoncure@gmail.com> writes: >> On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> "Merlin Moncure" <mmoncure@gmail.com> writes: >>>> I think this explains the trigger that was blowing up my FC4 box. >>> I dug in the archives a bit and couldn't find the report you're >>> referring to? > >> I was referring to this: >> http://archives.postgresql.org/pgsql-hackers/2007-02/msg01418.php > > Oh, the kernel-panic thing. Hm, I wouldn't have thought that replacing > a file at a huge rate would induce a kernel panic ... but who knows? > Do you want to try installing the one-liner patch and see if the panic > goes away? > > Actually I was wondering a bit if that strange Windows error discussed > earlier today could be triggered by this behavior: > http://archives.postgresql.org/pgsql-general/2007-03/msg00000.php I think that's very likely. If we're updaitng the file *that* often, we're certainly doing something that's very unusual for the windows filesystem, and possibly for the hardware as well :-) //Magnus
Sorry, I introduced this bug. --------------------------------------------------------------------------- Tom Lane wrote: > Benjamin Minshall <minshall@intellicon.biz> writes: > > Tom Lane wrote: > >> It's sounding like what you had was just transient bloat, in which case > >> it might be useful to inquire whether anything out-of-the-ordinary had > >> been done to the database right before the excessive-CPU-usage problem > >> started. > > > I don't believe that there was any unusual activity on the server, but I > > have set up some more detailed logging to hopefully identify a pattern > > if the problem resurfaces. > > A further report led us to realize that 8.2.x in fact has a nasty bug > here: the stats collector is supposed to dump its stats to a file at > most every 500 milliseconds, but the code was actually waiting only > 500 microseconds :-(. The larger the stats file, the more obvious > this problem gets. > > If you want to patch this before 8.2.4, try this... > > Index: pgstat.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v > retrieving revision 1.140.2.2 > diff -c -r1.140.2.2 pgstat.c > *** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2 > --- pgstat.c 1 Mar 2007 20:04:50 -0000 > *************** > *** 1689,1695 **** > /* Preset the delay between status file writes */ > MemSet(&write_timeout, 0, sizeof(struct itimerval)); > write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; > ! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000; > > /* > * Read in an existing statistics stats file or initialize the stats to > --- 1689,1695 ---- > /* Preset the delay between status file writes */ > MemSet(&write_timeout, 0, sizeof(struct itimerval)); > write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; > ! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000; > > /* > * Read in an existing statistics stats file or initialize the stats to > > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Sorry, I introduced this bug. To the gallows with you! :) Don't feel bad, there were several hackers that missed the math on that one. Joshua D. Drake > > --------------------------------------------------------------------------- > > Tom Lane wrote: >> Benjamin Minshall <minshall@intellicon.biz> writes: >>> Tom Lane wrote: >>>> It's sounding like what you had was just transient bloat, in which case >>>> it might be useful to inquire whether anything out-of-the-ordinary had >>>> been done to the database right before the excessive-CPU-usage problem >>>> started. >>> I don't believe that there was any unusual activity on the server, but I >>> have set up some more detailed logging to hopefully identify a pattern >>> if the problem resurfaces. >> A further report led us to realize that 8.2.x in fact has a nasty bug >> here: the stats collector is supposed to dump its stats to a file at >> most every 500 milliseconds, but the code was actually waiting only >> 500 microseconds :-(. The larger the stats file, the more obvious >> this problem gets. >> >> If you want to patch this before 8.2.4, try this... >> >> Index: pgstat.c >> =================================================================== >> RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v >> retrieving revision 1.140.2.2 >> diff -c -r1.140.2.2 pgstat.c >> *** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2 >> --- pgstat.c 1 Mar 2007 20:04:50 -0000 >> *************** >> *** 1689,1695 **** >> /* Preset the delay between status file writes */ >> MemSet(&write_timeout, 0, sizeof(struct itimerval)); >> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; >> ! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000; >> >> /* >> * Read in an existing statistics stats file or initialize the stats to >> --- 1689,1695 ---- >> /* Preset the delay between status file writes */ >> MemSet(&write_timeout, 0, sizeof(struct itimerval)); >> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; >> ! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000; >> >> /* >> * Read in an existing statistics stats file or initialize the stats to >> >> >> regards, tom lane >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 1: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/