Обсуждение: stats collector process high CPU utilization

Поиск
Список
Период
Сортировка

stats collector process high CPU utilization

От
Benjamin Minshall
Дата:
Greetings,

Since upgrading to 8.2.3 yesterday, the stats collector process has had
very high CPU utilization; it is consuming roughly 80-90% of one CPU.
The server seems a lot more sluggish than it was before.  Is this normal
operation for 8.2 or something I should look into correcting?

stats_start_collector = true
stats_command_string = true
stats_block_level = true
stats_row_level = true
stats_reset_on_server_start = false

--
Benjamin Minshall <minshall@intellicon.biz>
Senior Developer -- Intellicon, Inc.

Вложения

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
Benjamin Minshall <minshall@intellicon.biz> writes:
> Since upgrading to 8.2.3 yesterday, the stats collector process has had
> very high CPU utilization; it is consuming roughly 80-90% of one CPU.
> The server seems a lot more sluggish than it was before.  Is this normal
> operation for 8.2 or something I should look into correcting?

What version did you update from, and what platform is this?

            regards, tom lane

Re: stats collector process high CPU utilization

От
Benjamin Minshall
Дата:
Tom Lane wrote:
> Benjamin Minshall <minshall@intellicon.biz> writes:
>> Since upgrading to 8.2.3 yesterday, the stats collector process has had
>> very high CPU utilization; it is consuming roughly 80-90% of one CPU.
>> The server seems a lot more sluggish than it was before.  Is this normal
>> operation for 8.2 or something I should look into correcting?
>
> What version did you update from, and what platform is this?
>
>             regards, tom lane

I upgraded from 8.1.5.  The system is a dual Xeon 2.4Ghz, 4Gb RAM
running linux kernel 2.6 series.

--
Benjamin Minshall <minshall@intellicon.biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz

Вложения

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
Benjamin Minshall <minshall@intellicon.biz> writes:
> Tom Lane wrote:
>> Benjamin Minshall <minshall@intellicon.biz> writes:
>>> Since upgrading to 8.2.3 yesterday, the stats collector process has had
>>> very high CPU utilization; it is consuming roughly 80-90% of one CPU.
>>> The server seems a lot more sluggish than it was before.  Is this normal
>>> operation for 8.2 or something I should look into correcting?

>> What version did you update from, and what platform is this?

> I upgraded from 8.1.5.  The system is a dual Xeon 2.4Ghz, 4Gb RAM
> running linux kernel 2.6 series.

OK, I was trying to correlate it with post-8.2.0 patches but evidently
that's the wrong tree to bark up.  No, this isn't an expected behavior.
Is there anything unusual about your database (huge numbers of tables,
or some such)?  Can you gather some info about what it's doing?
strace'ing the stats collector might prove interesting, also if you have
built it with --enable-debug then oprofile results would be helpful.

            regards, tom lane

Re: stats collector process high CPU utilization

От
Benjamin Minshall
Дата:
Tom Lane wrote:
>
> OK, I was trying to correlate it with post-8.2.0 patches but evidently
> that's the wrong tree to bark up.  No, this isn't an expected behavior.

I talked with a co-worker and discovered that we went from 8.1.5 to
8.2.2, ran a few hours then went to 8.2.3 after the patch was released.
  I do not know if the high utilization was a problem during the few
hours on 8.2.2.

> Is there anything unusual about your database (huge numbers of tables,
> or some such)?

Nothing unusual.  I have a few databases of about 10GB each; the
workload is mostly inserts using COPY or parameterized INSERTS inside
transaction blocks.

> Can you gather some info about what it's doing?
> strace'ing the stats collector might prove interesting, also if you have
> built it with --enable-debug then oprofile results would be helpful.

I will gather some strace info later today when I have a chance to
shutdown the server.

Thanks.

--
Benjamin Minshall <minshall@intellicon.biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz

Вложения

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
Benjamin Minshall <minshall@intellicon.biz> writes:
> Tom Lane wrote:
>> Can you gather some info about what it's doing?
>> strace'ing the stats collector might prove interesting, also if you have
>> built it with --enable-debug then oprofile results would be helpful.

> I will gather some strace info later today when I have a chance to
> shutdown the server.

I don't see why you'd need to shut anything down.  Just run
    strace -p stats-process-ID
for a few seconds or minutes (enough to gather maybe a few thousand
lines of output).

            regards, tom lane

Re: stats collector process high CPU utilization

От
Benjamin Minshall
Дата:
Tom Lane wrote:
> Benjamin Minshall <minshall@intellicon.biz> writes:
>> Tom Lane wrote:
>>> Can you gather some info about what it's doing?
>>> strace'ing the stats collector might prove interesting, also if you have
>>> built it with --enable-debug then oprofile results would be helpful.
>
>> I will gather some strace info later today when I have a chance to
>> shutdown the server.
>
> I don't see why you'd need to shut anything down.  Just run
>     strace -p stats-process-ID
> for a few seconds or minutes (enough to gather maybe a few thousand
> lines of output).
>

Seems the problem may be related to a huge global/pgstat.stat file.
Under 8.1.5 it was about 1 MB; now it's 90 MB in 8.2.3.

I ran strace for 60 seconds:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  95.71    1.119004       48652        23           rename
   4.29    0.050128           0    508599           write
   0.00    0.000019           0       249        22 poll
   0.00    0.000000           0        23           open
   0.00    0.000000           0        23           close
   0.00    0.000000           0        34           getppid
   0.00    0.000000           0        23           munmap
   0.00    0.000000           0        23           setitimer
   0.00    0.000000           0        23        22 sigreturn
   0.00    0.000000           0        23           mmap2
   0.00    0.000000           0        23           fstat64
   0.00    0.000000           0       216           recv
------ ----------- ----------- --------- --------- ----------------
100.00    1.169151                509282        44 total

I attached an excerpt of the full strace with the many thousands of
write calls filtered.

--
Benjamin Minshall <minshall@intellicon.biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz
.
.
.
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0T\274\355"..., 4096) = 4096
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0/O\227\27\230"..., 4096) = 4096
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3363) = 3363
close(3)                                = 0
munmap(0xacdfc000, 4096)                = 0
rename("global/pgstat.tmp", "global/pgstat.stat") = 0
poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
recv(5, "\1\0\0\0\320\3\0\0:\204\30\0\16\0\0\0\1\0\0\0\0\0\0\0\301"..., 1000, 0) = 976
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 500}}, NULL) = 0
poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
recv(5, "\1\0\0\0\320\3\0\0:\204\30\0\16\0\0\0\0\0\0\0\0\0\0\0\322"..., 1000, 0) = 976
poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
recv(5, "\1\0\0\0H\3\0\0:\204\30\0\f\0\0\0\1\0\0\0\0\0\0\0\301\204"..., 1000, 0) = 840
poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
recv(5, "\1\0\0\0\320\3\0\0:\204\30\0\16\0\0\0\1\0\0\0\0\0\0\0\301"..., 1000, 0) = 976
poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
recv(5, "\1\0\0\0\320\3\0\0:\204\30\0\16\0\0\0\0\0\0\0\0\0\0\000"..., 1000, 0) = 976
poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
recv(5, "\1\0\0\0\240\0\0\0:\204\30\0\2\0\0\0\0\0\0\0\0\0\0\0\303"..., 1000, 0) = 160
poll([{fd=5, events=POLLIN|POLLERR}], 1, 2000) = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
getppid()                               = 22447
open("global/pgstat.tmp", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
fstat64(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xacdfc000
write(3, "\226\274\245\1D\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
write(3, "Z\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
.
.
.

Вложения

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
Benjamin Minshall <minshall@intellicon.biz> writes:
> Seems the problem may be related to a huge global/pgstat.stat file.
> Under 8.1.5 it was about 1 MB; now it's 90 MB in 8.2.3.

Yoi.  We didn't do anything that would bloat that file if it were
storing the same information as before.  What I'm betting is that it's
storing info on a whole lot more tables than before.  Did you decide
to start running autovacuum when you updated to 8.2?  How many tables
are visible in the pg_stats views?

            regards, tom lane

Re: stats collector process high CPU utilization

От
Benjamin Minshall
Дата:
Tom Lane wrote:
> Benjamin Minshall <minshall@intellicon.biz> writes:
>> Seems the problem may be related to a huge global/pgstat.stat file.
>> Under 8.1.5 it was about 1 MB; now it's 90 MB in 8.2.3.
>
> Yoi.  We didn't do anything that would bloat that file if it were
> storing the same information as before.  What I'm betting is that it's
> storing info on a whole lot more tables than before.

The server is running on the same actual production data, schema and
workload as before.

> Did you decide to start running autovacuum when you updated to 8.2?

Autovacuum was on and functioning before the update.

> How many tables are visible in the pg_stats views?

There are about 15 databases in the cluster each with around 90 tables.
  A count of pg_stats yields between 500 and 800 rows in each database.

select count(*) from (select distinct tablename from pg_stats) as i;
  count
-------
     92
(1 row)

select count(*) from pg_stats;
  count
-------
    628
(1 row)

--
Benjamin Minshall <minshall@intellicon.biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz

Вложения

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
Benjamin Minshall <minshall@intellicon.biz> writes:
> Tom Lane wrote:
>> How many tables are visible in the pg_stats views?

> There are about 15 databases in the cluster each with around 90 tables.
>   A count of pg_stats yields between 500 and 800 rows in each database.

Sorry, I was imprecise.  The view "pg_stats" doesn't have anything to do
with the stats collector; what I was interested in was the contents of
the "pg_stat_xxx" and "pg_statio_xxx" views.  It'd be enough to check
pg_stat_all_indexes and pg_stat_all_tables, probably.  Also, do you have
the 8.1 installation still available to get the comparable counts there?

            regards, tom lane

Re: stats collector process high CPU utilization

От
minshall@intellicon.biz
Дата:
> Benjamin Minshall <minshall@intellicon.biz> writes:
>> Tom Lane wrote:
>>> How many tables are visible in the pg_stats views?
>
>> There are about 15 databases in the cluster each with around 90 tables.
>>   A count of pg_stats yields between 500 and 800 rows in each database.
>
> Sorry, I was imprecise.  The view "pg_stats" doesn't have anything to do
> with the stats collector; what I was interested in was the contents of
> the "pg_stat_xxx" and "pg_statio_xxx" views.  It'd be enough to check
> pg_stat_all_indexes and pg_stat_all_tables, probably.  Also, do you have
> the 8.1 installation still available to get the comparable counts there?
>

I checked all 15 databases on both 8.1 and 8.2; they were all quite
consistent:

pg_stat_all_indexes has about 315 rows per database
pg_stat_all_tables has about 260 rows per database

The pg_statio_* views match in count to the pg_stat_* views as well.

While exploring this problem, I've noticed that one of the frequent insert
processes creates a few temporary tables to do post-processing.  Is it
possible that the stats collector is getting bloated with stats from these
short-lived temporary tables?  During periods of high activity it could be
creating temporary tables as often as two per second.

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
minshall@intellicon.biz writes:
> While exploring this problem, I've noticed that one of the frequent insert
> processes creates a few temporary tables to do post-processing.  Is it
> possible that the stats collector is getting bloated with stats from these
> short-lived temporary tables?  During periods of high activity it could be
> creating temporary tables as often as two per second.

Hmmm ... that's an interesting point, but offhand I don't see why it'd
cause more of a problem in 8.2 than 8.1.  Alvaro, any thoughts?

            regards, tom lane

Re: stats collector process high CPU utilization

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> minshall@intellicon.biz writes:
> > While exploring this problem, I've noticed that one of the frequent insert
> > processes creates a few temporary tables to do post-processing.  Is it
> > possible that the stats collector is getting bloated with stats from these
> > short-lived temporary tables?  During periods of high activity it could be
> > creating temporary tables as often as two per second.
>
> Hmmm ... that's an interesting point, but offhand I don't see why it'd
> cause more of a problem in 8.2 than 8.1.  Alvaro, any thoughts?

No idea.  I do have a very crude piece of code to read a pgstat.stat
file and output some info about what it finds (table OIDs basically
IIRC).  Maybe it can be helpful to examine what's in the bloated stat
file.

Regarding temp tables, I'd think that the pgstat entries should be
getting dropped at some point in both releases.  Maybe there's a bug
preventing that in 8.2?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Regarding temp tables, I'd think that the pgstat entries should be
> getting dropped at some point in both releases.  Maybe there's a bug
> preventing that in 8.2?

Hmmm ... I did rewrite the backend-side code for that just recently for
performance reasons ... could I have broken it?  Anyone want to take a
second look at
http://archives.postgresql.org/pgsql-committers/2007-01/msg00171.php

            regards, tom lane

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
I wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Regarding temp tables, I'd think that the pgstat entries should be
>> getting dropped at some point in both releases.  Maybe there's a bug
>> preventing that in 8.2?

> Hmmm ... I did rewrite the backend-side code for that just recently for
> performance reasons ... could I have broken it?

I did some testing with HEAD and verified that pgstat_vacuum_tabstat()
still seems to do what it's supposed to, so that theory falls down.

Alvaro, could you send Benjamin your stat-file-dumper tool so we can
get some more info?  Alternatively, if Benjamin wants to send me a copy
of his stats file (off-list), I'd be happy to take a look.

            regards, tom lane

Re: stats collector process high CPU utilization

От
Benjamin Minshall
Дата:
Tom Lane wrote:
> I wrote:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>> Regarding temp tables, I'd think that the pgstat entries should be
>>> getting dropped at some point in both releases.  Maybe there's a bug
>>> preventing that in 8.2?
>
>> Hmmm ... I did rewrite the backend-side code for that just recently for
>> performance reasons ... could I have broken it?
>
> I did some testing with HEAD and verified that pgstat_vacuum_tabstat()
> still seems to do what it's supposed to, so that theory falls down.
>
> Alvaro, could you send Benjamin your stat-file-dumper tool so we can
> get some more info?

> Alternatively, if Benjamin wants to send me a copy
> of his stats file (off-list), I'd be happy to take a look.
>
>             regards, tom lane

When I checked on the server this morning, the huge stats file has
returned to a normal size.  I set up a script to track CPU usage and
stats file size, and it appears to have decreased from 90MB down to
about 2MB over roughly 6 hours last night.  The CPU usage of the stats
collector also decreased accordingly.

The application logs indicate that there was no variation in the
workload over this time period, however the file size started to
decrease soon after the nightly pg_dump backups completed.  Coincidence
perhaps?

Nonetheless, I would appreciate a copy of Alvaro's stat file tool just
to see if anything stands out in the collected stats.

Thanks for your help, Tom.

--
Benjamin Minshall <minshall@intellicon.biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz

Вложения

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
Benjamin Minshall <minshall@intellicon.biz> writes:
> When I checked on the server this morning, the huge stats file has
> returned to a normal size.  I set up a script to track CPU usage and
> stats file size, and it appears to have decreased from 90MB down to
> about 2MB over roughly 6 hours last night.  The CPU usage of the stats
> collector also decreased accordingly.

> The application logs indicate that there was no variation in the
> workload over this time period, however the file size started to
> decrease soon after the nightly pg_dump backups completed.  Coincidence
> perhaps?

Well, that's pretty interesting.  What are your vacuuming arrangements
for this installation?  Could the drop in file size have coincided with
VACUUM operations?  Because the ultimate backstop against bloated stats
files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to
clean out any entries that shouldn't be there anymore.

It's sounding like what you had was just transient bloat, in which case
it might be useful to inquire whether anything out-of-the-ordinary had
been done to the database right before the excessive-CPU-usage problem
started.

            regards, tom lane

Re: stats collector process high CPU utilization

От
Benjamin Minshall
Дата:
Tom Lane wrote:
> Well, that's pretty interesting.  What are your vacuuming arrangements
> for this installation?  Could the drop in file size have coincided with
> VACUUM operations?  Because the ultimate backstop against bloated stats
> files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to
> clean out any entries that shouldn't be there anymore.

VACUUM and ANALYZE are done by autovacuum only, no cron jobs.
autovacuum_naptime is 30 seconds so it should make it to each database
every 10 minutes or so.  Do you think that more aggressive vacuuming
would prevent future swelling of the stats file?

> It's sounding like what you had was just transient bloat, in which case
> it might be useful to inquire whether anything out-of-the-ordinary had
> been done to the database right before the excessive-CPU-usage problem
> started.

I don't believe that there was any unusual activity on the server, but I
have set up some more detailed logging to hopefully identify a pattern
if the problem resurfaces.

Thanks.

--
Benjamin Minshall <minshall@intellicon.biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz

Вложения

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
Benjamin Minshall <minshall@intellicon.biz> writes:
> Tom Lane wrote:
>> It's sounding like what you had was just transient bloat, in which case
>> it might be useful to inquire whether anything out-of-the-ordinary had
>> been done to the database right before the excessive-CPU-usage problem
>> started.

> I don't believe that there was any unusual activity on the server, but I
> have set up some more detailed logging to hopefully identify a pattern
> if the problem resurfaces.

A further report led us to realize that 8.2.x in fact has a nasty bug
here: the stats collector is supposed to dump its stats to a file at
most every 500 milliseconds, but the code was actually waiting only
500 microseconds :-(.  The larger the stats file, the more obvious
this problem gets.

If you want to patch this before 8.2.4, try this...

Index: pgstat.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.140.2.2
diff -c -r1.140.2.2 pgstat.c
*** pgstat.c    26 Jan 2007 20:07:01 -0000    1.140.2.2
--- pgstat.c    1 Mar 2007 20:04:50 -0000
***************
*** 1689,1695 ****
      /* Preset the delay between status file writes */
      MemSet(&write_timeout, 0, sizeof(struct itimerval));
      write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
!     write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;

      /*
       * Read in an existing statistics stats file or initialize the stats to
--- 1689,1695 ----
      /* Preset the delay between status file writes */
      MemSet(&write_timeout, 0, sizeof(struct itimerval));
      write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
!     write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;

      /*
       * Read in an existing statistics stats file or initialize the stats to


            regards, tom lane

Re: stats collector process high CPU utilization

От
Benjamin Minshall
Дата:
Tom Lane wrote:
> A further report led us to realize that 8.2.x in fact has a nasty bug
> here: the stats collector is supposed to dump its stats to a file at
> most every 500 milliseconds, but the code was actually waiting only
> 500 microseconds :-(.  The larger the stats file, the more obvious
> this problem gets.
>
> If you want to patch this before 8.2.4, try this...
>

Thanks for the follow-up on this issue, Tom.  I was able to link the
original huge stats file problem to some long(ish) running transactions
which blocked VACUUM, but this patch will really help.  Thanks.

-Ben

Вложения

Re: stats collector process high CPU utilization

От
"Merlin Moncure"
Дата:
On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Benjamin Minshall <minshall@intellicon.biz> writes:
> > Tom Lane wrote:
> >> It's sounding like what you had was just transient bloat, in which case
> >> it might be useful to inquire whether anything out-of-the-ordinary had
> >> been done to the database right before the excessive-CPU-usage problem
> >> started.
>
> > I don't believe that there was any unusual activity on the server, but I
> > have set up some more detailed logging to hopefully identify a pattern
> > if the problem resurfaces.
>
> A further report led us to realize that 8.2.x in fact has a nasty bug
> here: the stats collector is supposed to dump its stats to a file at
> most every 500 milliseconds, but the code was actually waiting only
> 500 microseconds :-(.  The larger the stats file, the more obvious
> this problem gets.

I think this explains the trigger that was blowing up my FC4 box.

merlin

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A further report led us to realize that 8.2.x in fact has a nasty bug
>> here: the stats collector is supposed to dump its stats to a file at
>> most every 500 milliseconds, but the code was actually waiting only
>> 500 microseconds :-(.  The larger the stats file, the more obvious
>> this problem gets.

> I think this explains the trigger that was blowing up my FC4 box.

I dug in the archives a bit and couldn't find the report you're
referring to?

            regards, tom lane

Re: stats collector process high CPU utilization

От
"Merlin Moncure"
Дата:
On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
> > On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> A further report led us to realize that 8.2.x in fact has a nasty bug
> >> here: the stats collector is supposed to dump its stats to a file at
> >> most every 500 milliseconds, but the code was actually waiting only
> >> 500 microseconds :-(.  The larger the stats file, the more obvious
> >> this problem gets.
>
> > I think this explains the trigger that was blowing up my FC4 box.
>
> I dug in the archives a bit and couldn't find the report you're
> referring to?

I was referring to this:
http://archives.postgresql.org/pgsql-hackers/2007-02/msg01418.php

Even though the fundamental reason was obvious (and btw, I inherited
this server less than two months ago), I was still curious what was
making 8.2 blow up a box that was handling a million tps/hour for over
a year. :-)

merlin

Re: stats collector process high CPU utilization

От
Tom Lane
Дата:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Merlin Moncure" <mmoncure@gmail.com> writes:
>>> I think this explains the trigger that was blowing up my FC4 box.
>>
>> I dug in the archives a bit and couldn't find the report you're
>> referring to?

> I was referring to this:
> http://archives.postgresql.org/pgsql-hackers/2007-02/msg01418.php

Oh, the kernel-panic thing.  Hm, I wouldn't have thought that replacing
a file at a huge rate would induce a kernel panic ... but who knows?
Do you want to try installing the one-liner patch and see if the panic
goes away?

Actually I was wondering a bit if that strange Windows error discussed
earlier today could be triggered by this behavior:
http://archives.postgresql.org/pgsql-general/2007-03/msg00000.php

            regards, tom lane

Re: stats collector process high CPU utilization

От
Magnus Hagander
Дата:
Tom Lane wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
>> On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> "Merlin Moncure" <mmoncure@gmail.com> writes:
>>>> I think this explains the trigger that was blowing up my FC4 box.
>>> I dug in the archives a bit and couldn't find the report you're
>>> referring to?
>
>> I was referring to this:
>> http://archives.postgresql.org/pgsql-hackers/2007-02/msg01418.php
>
> Oh, the kernel-panic thing.  Hm, I wouldn't have thought that replacing
> a file at a huge rate would induce a kernel panic ... but who knows?
> Do you want to try installing the one-liner patch and see if the panic
> goes away?
>
> Actually I was wondering a bit if that strange Windows error discussed
> earlier today could be triggered by this behavior:
> http://archives.postgresql.org/pgsql-general/2007-03/msg00000.php

I think that's very likely. If we're updaitng the file *that* often,
we're certainly doing something that's very unusual for the windows
filesystem, and possibly for the hardware as well :-)

//Magnus

Re: stats collector process high CPU utilization

От
Bruce Momjian
Дата:
Sorry, I introduced this bug.

---------------------------------------------------------------------------

Tom Lane wrote:
> Benjamin Minshall <minshall@intellicon.biz> writes:
> > Tom Lane wrote:
> >> It's sounding like what you had was just transient bloat, in which case
> >> it might be useful to inquire whether anything out-of-the-ordinary had
> >> been done to the database right before the excessive-CPU-usage problem
> >> started.
>
> > I don't believe that there was any unusual activity on the server, but I
> > have set up some more detailed logging to hopefully identify a pattern
> > if the problem resurfaces.
>
> A further report led us to realize that 8.2.x in fact has a nasty bug
> here: the stats collector is supposed to dump its stats to a file at
> most every 500 milliseconds, but the code was actually waiting only
> 500 microseconds :-(.  The larger the stats file, the more obvious
> this problem gets.
>
> If you want to patch this before 8.2.4, try this...
>
> Index: pgstat.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
> retrieving revision 1.140.2.2
> diff -c -r1.140.2.2 pgstat.c
> *** pgstat.c    26 Jan 2007 20:07:01 -0000    1.140.2.2
> --- pgstat.c    1 Mar 2007 20:04:50 -0000
> ***************
> *** 1689,1695 ****
>       /* Preset the delay between status file writes */
>       MemSet(&write_timeout, 0, sizeof(struct itimerval));
>       write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
> !     write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;
>
>       /*
>        * Read in an existing statistics stats file or initialize the stats to
> --- 1689,1695 ----
>       /* Preset the delay between status file writes */
>       MemSet(&write_timeout, 0, sizeof(struct itimerval));
>       write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
> !     write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;
>
>       /*
>        * Read in an existing statistics stats file or initialize the stats to
>
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: stats collector process high CPU utilization

От
"Joshua D. Drake"
Дата:
Bruce Momjian wrote:
> Sorry, I introduced this bug.

To the gallows with you! :) Don't feel bad, there were several hackers
that missed the math on that one.

Joshua D. Drake



>
> ---------------------------------------------------------------------------
>
> Tom Lane wrote:
>> Benjamin Minshall <minshall@intellicon.biz> writes:
>>> Tom Lane wrote:
>>>> It's sounding like what you had was just transient bloat, in which case
>>>> it might be useful to inquire whether anything out-of-the-ordinary had
>>>> been done to the database right before the excessive-CPU-usage problem
>>>> started.
>>> I don't believe that there was any unusual activity on the server, but I
>>> have set up some more detailed logging to hopefully identify a pattern
>>> if the problem resurfaces.
>> A further report led us to realize that 8.2.x in fact has a nasty bug
>> here: the stats collector is supposed to dump its stats to a file at
>> most every 500 milliseconds, but the code was actually waiting only
>> 500 microseconds :-(.  The larger the stats file, the more obvious
>> this problem gets.
>>
>> If you want to patch this before 8.2.4, try this...
>>
>> Index: pgstat.c
>> ===================================================================
>> RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
>> retrieving revision 1.140.2.2
>> diff -c -r1.140.2.2 pgstat.c
>> *** pgstat.c    26 Jan 2007 20:07:01 -0000    1.140.2.2
>> --- pgstat.c    1 Mar 2007 20:04:50 -0000
>> ***************
>> *** 1689,1695 ****
>>       /* Preset the delay between status file writes */
>>       MemSet(&write_timeout, 0, sizeof(struct itimerval));
>>       write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
>> !     write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;
>>
>>       /*
>>        * Read in an existing statistics stats file or initialize the stats to
>> --- 1689,1695 ----
>>       /* Preset the delay between status file writes */
>>       MemSet(&write_timeout, 0, sizeof(struct itimerval));
>>       write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
>> !     write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;
>>
>>       /*
>>        * Read in an existing statistics stats file or initialize the stats to
>>
>>
>>             regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>        subscribe-nomail command to majordomo@postgresql.org so that your
>>        message can get through to the mailing list cleanly
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/