Обсуждение: PG 8.3 and server load

От:
Phoenix Kiula
Дата:

I'm on a CentOS 5 OS 64 bit, latest kernel and all of that.
PG version is 8.3.7, compiled as 64bit.
The memory is 8GB.
It's a 2 x Dual Core Intel 5310.
Hard disks are Raid 1, SCSI 15 rpm.

The server is running just one website. So there's Apache 2.2.11,
MySQL (for some small tasks, almost negligible).

And then there's PG, which in the "top" command shows up as the main beast.

My server load is going to 64, 63, 65, and so on.

Where should I start debugging? What should I see? TOP command does
not yield anything meaningful. I mean, even if it shows that postgres
user for "postmaster" and nobody user for "httpd" (apache) are the
main resource hogs, what should I start with in terms of debugging?

От:
Ivan Voras
Дата:

Phoenix Kiula wrote:
> I'm on a CentOS 5 OS 64 bit, latest kernel and all of that.
> PG version is 8.3.7, compiled as 64bit.
> The memory is 8GB.
> It's a 2 x Dual Core Intel 5310.
> Hard disks are Raid 1, SCSI 15 rpm.
>
> The server is running just one website. So there's Apache 2.2.11,
> MySQL (for some small tasks, almost negligible).
>
> And then there's PG, which in the "top" command shows up as the main beast.
>
> My server load is going to 64, 63, 65, and so on.
>
> Where should I start debugging? What should I see? TOP command does
> not yield anything meaningful. I mean, even if it shows that postgres
> user for "postmaster" and nobody user for "httpd" (apache) are the
> main resource hogs, what should I start with in terms of debugging?

If postgres or apache are the reason for the high load, it means you
have lots of simultaneous users hitting either server.

The only thing you can do (except of course denying service to the
users) is investigate which requests / queries take the most time and
optimize them.

pgtop (http://pgfoundry.org/projects/pgtop/) might help you see what is
your database doing. You will also probably need to use something like
pqa (http://pqa.projects.postgresql.org/) to find top running queries.

Unfortunately, if you cannot significantly optimize your queries, there
is not much else you can do with the hardware you have.

От:
Guillaume Cottenceau
Дата:

Ivan Voras <ivoras 'at' freebsd.org> writes:

> pgtop (http://pgfoundry.org/projects/pgtop/) might help you see what
> is your database doing.

A simpler (but most probably less powerful) method would be to
activate "stats_command_string = on" in the server configuration,
then issue that query to view the currently running queries:

SELECT procpid, datname, current_query, query_start FROM pg_stat_activity WHERE current_query <> '<IDLE>'

That may also be interesting.

--
Guillaume Cottenceau

От:
Andy Colson
Дата:

Phoenix Kiula wrote:
> I'm on a CentOS 5 OS 64 bit, latest kernel and all of that.
> PG version is 8.3.7, compiled as 64bit.
> The memory is 8GB.
> It's a 2 x Dual Core Intel 5310.
> Hard disks are Raid 1, SCSI 15 rpm.
>
> The server is running just one website. So there's Apache 2.2.11,
> MySQL (for some small tasks, almost negligible).
>
> And then there's PG, which in the "top" command shows up as the main beast.
>
> My server load is going to 64, 63, 65, and so on.
>
> Where should I start debugging? What should I see? TOP command does
> not yield anything meaningful. I mean, even if it shows that postgres
> user for "postmaster" and nobody user for "httpd" (apache) are the
> main resource hogs, what should I start with in terms of debugging?
>

1) check if you are using swap space.  Use free and make sure swap/used
is a small number.  Check vmstat and see if swpd is moving up and down.
  (Posting a handful of lines from vmstat might help us).

2) check 'ps ax|grep postgres' and make sure nothing says "idle in
transaction"

3) I had a web box where the number of apache clients was set very high,
and the box was brought to its knees by the sheer number of connections.
  check "ps ax|grep http|wc --lines" and make sure its not too big.
(perhaps less than 100)

-Andy

От:
Karl Denninger
Дата:

Andy Colson wrote:
> Phoenix Kiula wrote:
>> I'm on a CentOS 5 OS 64 bit, latest kernel and all of that.
>> PG version is 8.3.7, compiled as 64bit.
>> The memory is 8GB.
>> It's a 2 x Dual Core Intel 5310.
>> Hard disks are Raid 1, SCSI 15 rpm.
>>
>> The server is running just one website. So there's Apache 2.2.11,
>> MySQL (for some small tasks, almost negligible).
>>
>> And then there's PG, which in the "top" command shows up as the main
>> beast.
>>
>> My server load is going to 64, 63, 65, and so on.
>>
>> Where should I start debugging? What should I see? TOP command does
>> not yield anything meaningful. I mean, even if it shows that postgres
>> user for "postmaster" and nobody user for "httpd" (apache) are the
>> main resource hogs, what should I start with in terms of debugging?
>>
>
> 1) check if you are using swap space.  Use free and make sure
> swap/used is a small number.  Check vmstat and see if swpd is moving
> up and down.  (Posting a handful of lines from vmstat might help us).
>
> 2) check 'ps ax|grep postgres' and make sure nothing says "idle in
> transaction"
>
> 3) I had a web box where the number of apache clients was set very
> high, and the box was brought to its knees by the sheer number of
> connections.  check "ps ax|grep http|wc --lines" and make sure its not
> too big. (perhaps less than 100)
>
> -Andy
>
I will observe that in some benchmark tests I've done on my application
(a VERY heavy Postgres user) CentOS was RADICALLY inferior in terms of
carrying capacity and performance to FreeBSD on the same hardware.

I have no idea why - you wouldn't expect this sort of result, but it is
what it is.  The test platform in my case was a Core i7 box (8 cores
SMP) with 6GB of memory running 64-bit code across the board.  Disks
were on a 3Ware coprocessor board.

I was quite surprised by this given that in general CentOS seems to be
comparable for base Apache (web service) use to FreeBSD, but due to this
recommend strongly in favor of FreeBSD for applications where web
service + PostgreSQL are the intended application mix.

-- Karl

От:
Andy Colson
Дата:

Phoenix Kiula wrote:
> Thanks, but swap is not changing, there is no idle transaction, and
> number of connections are 28/29.
>
> Here are some command line stamps...any other ideas?
>
>
>
> [MYSITE] ~ > date && vmstat
> Wed Aug 19 10:00:37 CDT 2009
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  3  1  20920  25736  60172 7594988    0    0    74   153    0     3 10  5 74 12
>
> [MYSITE] ~ > date && vmstat
> Wed Aug 19 10:00:40 CDT 2009
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  0  1  20920  34696  60124 7593996    0    0    74   153    0     3 10  5 74 12
>
> [MYSITE] ~ > ps ax|grep postgres
> 25302 ?        Ss     0:00 postgres: logger process
> 25352 ?        Ss     0:07 postgres: writer process
> 25353 ?        Ss     4:21 postgres: stats collector process
> 23483 ?        Ds     0:00 postgres: snipurl_snipurl snipurl
> 127.0.0.1(51622) UPDATE
> 23485 pts/12   S+     0:00 grep postgres
>
> [MYSITE] ~ > date && vmstat
> Wed Aug 19 10:00:55 CDT 2009
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  0  0  20920  49464  60272 7597748    0    0    74   153    0     3 10  5 74 12
>
> [MYSITE] ~ > ps ax|grep http|wc --lines
> 28
>
> [MYSITE] ~ > ps ax|grep http|wc --lines
> 29
>
> [MYSITE] ~ > ps ax|grep postgres
> 25302 ?        Ss     0:00 postgres: logger process
> 25352 ?        Ss     0:07 postgres: writer process
> 25353 ?        Ss     4:21 postgres: stats collector process
> 24718 pts/12   S+     0:00 grep postgres
>
> [MYSITE] ~ > date && vmstat
> Wed Aug 19 10:01:23 CDT 2009
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  0  0  20920 106376  59220 7531016    0    0    74   153    0     3 10  5 74 12
>
>
>
>
> On Wed, Aug 19, 2009 at 10:01 PM, Andy Colson<> wrote:
>> Phoenix Kiula wrote:
>>> I'm on a CentOS 5 OS 64 bit, latest kernel and all of that.
>>> PG version is 8.3.7, compiled as 64bit.
>>> The memory is 8GB.
>>> It's a 2 x Dual Core Intel 5310.
>>> Hard disks are Raid 1, SCSI 15 rpm.
>>>
>>> The server is running just one website. So there's Apache 2.2.11,
>>> MySQL (for some small tasks, almost negligible).
>>>
>>> And then there's PG, which in the "top" command shows up as the main
>>> beast.
>>>
>>> My server load is going to 64, 63, 65, and so on.
>>>
>>> Where should I start debugging? What should I see? TOP command does
>>> not yield anything meaningful. I mean, even if it shows that postgres
>>> user for "postmaster" and nobody user for "httpd" (apache) are the
>>> main resource hogs, what should I start with in terms of debugging?
>>>
>> 1) check if you are using swap space.  Use free and make sure swap/used is a
>> small number.  Check vmstat and see if swpd is moving up and down.  (Posting
>> a handful of lines from vmstat might help us).
>>
>> 2) check 'ps ax|grep postgres' and make sure nothing says "idle in
>> transaction"
>>
>> 3) I had a web box where the number of apache clients was set very high, and
>> the box was brought to its knees by the sheer number of connections.  check
>> "ps ax|grep http|wc --lines" and make sure its not too big. (perhaps less
>> than 100)
>>
>> -Andy
>>
>>

the first line of vmstat is an average since bootup.  Kinda useless.
run it as:  'vmstat 4'

it will print a line every 4 seconds, which will be a summary of
everything that happened in the last 4 seconds.

since boot, you've written out an average of 153 blocks (the bo column).
  Thats very small, so your not io bound.

but... you have average 74% idle cpu.  So your not cpu bound either?

Ahh?  I'm not sure what that means.  Maybe I'm reading something wrong?

-Andy

От:
Phoenix Kiula
Дата:

On Wed, Aug 19, 2009 at 11:25 PM, Andy Colson<> wrote:

....<snip>.....


>
> the first line of vmstat is an average since bootup.  Kinda useless. run it
> as:  'vmstat 4'
>
> it will print a line every 4 seconds, which will be a summary of everything
> that happened in the last 4 seconds.
>
> since boot, you've written out an average of 153 blocks (the bo column).
>  Thats very small, so your not io bound.
>
> but... you have average 74% idle cpu.  So your not cpu bound either?
>
> Ahh?  I'm not sure what that means.  Maybe I'm reading something wrong?
>
> -Andy
>




~ > vmstat 4
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  2  16128  35056  62800 7697428    0    0    74   153    0     3 10  5 74 12
 0  0  16128  38256  62836 7698172    0    0   166   219 1386  1440  7  4 85  4
 0  1  16128  34704  62872 7698916    0    0   119   314 1441  1589  7  4 85  5
 0  0  16128  29544  62912 7699396    0    0   142   144 1443  1418  6  3 88  2
 7  1  16128  26784  62832 7692196    0    0   343   241 1492  1671  8  5 83  4
 0  0  16128  32840  62880 7693188    0    0   253   215 1459  1511  7  4 85  4
 0  0  16128  30112  62940 7693908    0    0   187   216 1395  1282  6  3 87  4

От:
"Kevin Grittner"
Дата:

Andy Colson <> wrote:
> Phoenix Kiula wrote:

>>>> It's a 2 x Dual Core Intel 5310.

> you have average 74% idle cpu.  So your not cpu bound either?

Or one CPU is pegged and the other three are idle....

-Kevin

От:
Andy Colson
Дата:

Kevin Grittner wrote:
> Andy Colson <> wrote:
>> Phoenix Kiula wrote:
>
>>>>> It's a 2 x Dual Core Intel 5310.
>
>> you have average 74% idle cpu.  So your not cpu bound either?
>
> Or one CPU is pegged and the other three are idle....
>
> -Kevin

Ahh, yeah...

Phoenix:  run top again, and hit the '1' key.  It'll show you stats for
each cpu.  Is one pegged and the others idle?


do a 'cat /proc/cpuinfo' and make sure your os is seeing all your cpus.

-Andy

От:
Matthew Wakeling
Дата:

On Wed, 19 Aug 2009, Phoenix Kiula wrote:
> ~ > vmstat 4
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> 0  2  16128  35056  62800 7697428    0    0    74   153    0     3 10  5 74 12
> 0  0  16128  38256  62836 7698172    0    0   166   219 1386  1440  7  4 85  4
> 0  1  16128  34704  62872 7698916    0    0   119   314 1441  1589  7  4 85  5
> 0  0  16128  29544  62912 7699396    0    0   142   144 1443  1418  6  3 88  2
> 7  1  16128  26784  62832 7692196    0    0   343   241 1492  1671  8  5 83  4
> 0  0  16128  32840  62880 7693188    0    0   253   215 1459  1511  7  4 85  4
> 0  0  16128  30112  62940 7693908    0    0   187   216 1395  1282  6  3 87  4

As far as I can see from this, your machine isn't very busy at all.

> [MYSITE] ~ > ps ax|grep postgres
> 25302 ?        Ss     0:00 postgres: logger process
> 25352 ?        Ss     0:07 postgres: writer process
> 25353 ?        Ss     4:21 postgres: stats collector process
> 24718 pts/12   S+     0:00 grep postgres

Moreover, Postgres isn't doing anything either.

So, what is the problem that you are seeing? What do you want to change?

Matthew

--
Surely the value of C++ is zero, but C's value is now 1?
  -- map36, commenting on the "No, C++ isn't equal to D. 'C' is undeclared
  [...] C++ should really be called 1" response to "C++ -- shouldn't it
  be called D?"

От:
Phoenix Kiula
Дата:

On Wed, Aug 19, 2009 at 11:37 PM, Andy Colson<> wrote:

>
> Phoenix:  run top again, and hit the '1' key.  It'll show you stats for each
> cpu.  Is one pegged and the others idle?
>


top - 10:38:53 up 29 days, 5 min,  1 user,  load average: 64.99, 65.17, 65.06
Tasks: 568 total,   1 running, 537 sleeping,   6 stopped,  24 zombie
Cpu0  : 17.7% us,  7.7% sy,  0.0% ni, 74.0% id,  0.7% wa,  0.0% hi,  0.0% si
Cpu1  :  6.3% us,  5.6% sy,  0.0% ni, 84.4% id,  3.6% wa,  0.0% hi,  0.0% si
Cpu2  :  5.6% us,  5.9% sy,  0.0% ni, 86.8% id,  1.7% wa,  0.0% hi,  0.0% si
Cpu3  :  5.6% us,  4.0% sy,  0.0% ni, 74.2% id, 16.2% wa,  0.0% hi,  0.0% si
Mem:   8310256k total,  8277416k used,    32840k free,    61944k buffers
Swap:  2096440k total,    16128k used,  2080312k free,  7664224k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                            
9922 nobody    15   0 49024  16m 7408 S  3.0  0.2   0:00.52 httpd                                                                              
9630 nobody    15   0 49020  16m 7420 S  2.3  0.2   0:00.60 httpd                                                                              
9848 nobody    16   0 48992  16m 7372 S  2.3  0.2   0:00.51 httpd                                                                              
10995 nobody    15   0 49024  16m 7304 S  2.3  0.2   0:00.35 httpd                                                                              
11031 nobody    15   0 48860  16m 7104 S  2.3  0.2   0:00.34 httpd                                                                              
6701 nobody    15   0 49028  17m 7576 S  2.0  0.2   0:01.50 httpd                                                                              
10996 nobody    15   0 48992  16m 7328 S  2.0  0.2   0:00.31 httpd                                                                              
12232 nobody    15   0 48860  16m 7004 S  1.7  0.2   0:00.05 httpd                                                                              
9876 nobody    15   0 48992  16m 7400 S  1.3  0.2   0:00.73 httpd                                                                              
12231 nobody    15   0 48860  16m 6932 S  1.3  0.2   0:00.04 httpd                                                                              
12233 nobody    16   0 48860  16m 6960 S  1.3  0.2   0:00.04 httpd                                                                              
20315 postgres  19   0  325m 9732 9380 S  1.0  0.1   0:10.39 postmaster                                                                          
31573 nobody    15   0 49024  17m 7664 S  1.0  0.2   0:03.14 httpd                                                                              
7954 nobody    15   0 49032  16m 7400 S  1.0  0.2   0:01.14 httpd                                                                              
9918 nobody    15   0 48956  16m 7344 S  1.0  0.2   0:00.44 httpd                                                                              
12298 nobody    16   0 48860  16m 6780 S  1.0  0.2   0:00.03 httpd                                                                              
6479 nobody    16   0 49040  16m 7412 S  0.7  0.2   0:01.20 httpd                                                                              
7950 nobody    15   0 49020  16m 7388 S  0.7  0.2   0:00.83 httpd                                                                              
7951 nobody    15   0 49032  16m 7384 S  0.7  0.2   0:01.03 httpd                                                                              
9875 nobody    15   0 48948  16m 7096 S  0.7  0.2   0:00.51 httpd                                                                              
9916 nobody    16   0 48860  16m 7124 S  0.7  0.2   0:00.59 httpd                                                                              
10969 nobody    15   0 49036  16m 7380 S  0.7  0.2   0:00.29 httpd                                                                              
11752 root      16   0  3620 1288  772 R  0.7  0.0   0:00.14 top                                                                                
12309 nobody    16   0 48860  16m 6844 S  0.7  0.2   0:00.02 httpd                                                                              
20676 mysql     15   0  182m  20m 2916 S  0.3  0.3   0:00.95 mysqld                                                                              
20811 root      21   0 47920  14m 5872 S  0.3  0.2   0:00.71 httpd                                                                              
7952 nobody    15   0 49024  16m 7524 S  0.3  0.2   0:00.96 httpd                                                                              
11036 nobody    15   0 48992  16m 7320 S  0.3  0.2   0:00.36 httpd                                                                              
12230 nobody    15   0 48860  16m 6956 S  0.3  0.2   0:00.01 httpd                                                                              
12297 nobody    16   0 48860  16m 6932 S  0.3  0.2   0:00.01 httpd                                                                              
12299 nobody    16   0 48992  16m 7120 S  0.3  0.2   0:00.01 httpd                                                                              
12301 nobody    20   0 48860  16m 6816 S  0.3  0.2   0:00.01 httpd                                                                              
12307 nobody    15   0 48860  16m 6880 S  0.3  0.2   0:00.01 httpd    
           



> do a 'cat /proc/cpuinfo' and make sure your os is seeing all your cpus.
>



I guess it's using all 4? 



От:
Andy Colson
Дата:

Phoenix Kiula wrote:
> On Wed, Aug 19, 2009 at 11:37 PM, Andy Colson<
> <mailto:>> wrote:
>
>  >
>  > Phoenix:  run top again, and hit the '1' key.  It'll show you stats
> for each
>  > cpu.  Is one pegged and the others idle?
>  >
>
>
> top - 10:38:53 up 29 days, 5 min,  1 user,  load average: 64.99, 65.17,
> 65.06
> Tasks: 568 total,   1 running, 537 sleeping,   6 stopped,  24 zombie
> Cpu0  : 17.7% us,  7.7% sy,  0.0% ni, 74.0% id,  0.7% wa,  0.0% hi,  0.0% si
> Cpu1  :  6.3% us,  5.6% sy,  0.0% ni, 84.4% id,  3.6% wa,  0.0% hi,  0.0% si
> Cpu2  :  5.6% us,  5.9% sy,  0.0% ni, 86.8% id,  1.7% wa,  0.0% hi,  0.0% si
> Cpu3  :  5.6% us,  4.0% sy,  0.0% ni, 74.2% id, 16.2% wa,  0.0% hi,  0.0% si
> Mem:   8310256k total,  8277416k used,    32840k free,    61944k buffers
> Swap:  2096440k total,    16128k used,  2080312k free,  7664224k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
> 9922 nobody    15   0 49024  16m 7408 S  3.0  0.2   0:00.52 httpd
>
> 9630 nobody    15   0 49020  16m 7420 S  2.3  0.2   0:00.60 httpd
>
> 9848 nobody    16   0 48992  16m 7372 S  2.3  0.2   0:00.51 httpd
>
> 10995 nobody    15   0 49024  16m 7304 S  2.3  0.2   0:00.35 httpd
>
> 11031 nobody    15   0 48860  16m 7104 S  2.3  0.2   0:00.34 httpd
>
> 6701 nobody    15   0 49028  17m 7576 S  2.0  0.2   0:01.50 httpd
>
> 10996 nobody    15   0 48992  16m 7328 S  2.0  0.2   0:00.31 httpd
>
> 12232 nobody    15   0 48860  16m 7004 S  1.7  0.2   0:00.05 httpd
>
> 9876 nobody    15   0 48992  16m 7400 S  1.3  0.2   0:00.73 httpd
>
> 12231 nobody    15   0 48860  16m 6932 S  1.3  0.2   0:00.04 httpd
>
> 12233 nobody    16   0 48860  16m 6960 S  1.3  0.2   0:00.04 httpd
>
> 20315 postgres  19   0  325m 9732 9380 S  1.0  0.1   0:10.39 postmaster
>
> 31573 nobody    15   0 49024  17m 7664 S  1.0  0.2   0:03.14 httpd
>
> 7954 nobody    15   0 49032  16m 7400 S  1.0  0.2   0:01.14 httpd
>
> 9918 nobody    15   0 48956  16m 7344 S  1.0  0.2   0:00.44 httpd
>
> 12298 nobody    16   0 48860  16m 6780 S  1.0  0.2   0:00.03 httpd
>
> 6479 nobody    16   0 49040  16m 7412 S  0.7  0.2   0:01.20 httpd
>
> 7950 nobody    15   0 49020  16m 7388 S  0.7  0.2   0:00.83 httpd
>
> 7951 nobody    15   0 49032  16m 7384 S  0.7  0.2   0:01.03 httpd
>
> 9875 nobody    15   0 48948  16m 7096 S  0.7  0.2   0:00.51 httpd
>
> 9916 nobody    16   0 48860  16m 7124 S  0.7  0.2   0:00.59 httpd
>
> 10969 nobody    15   0 49036  16m 7380 S  0.7  0.2   0:00.29 httpd
>
> 11752 root      16   0  3620 1288  772 R  0.7  0.0   0:00.14 top
>
> 12309 nobody    16   0 48860  16m 6844 S  0.7  0.2   0:00.02 httpd
>
> 20676 mysql     15   0  182m  20m 2916 S  0.3  0.3   0:00.95 mysqld
>
> 20811 root      21   0 47920  14m 5872 S  0.3  0.2   0:00.71 httpd
>
> 7952 nobody    15   0 49024  16m 7524 S  0.3  0.2   0:00.96 httpd
>
> 11036 nobody    15   0 48992  16m 7320 S  0.3  0.2   0:00.36 httpd
>
> 12230 nobody    15   0 48860  16m 6956 S  0.3  0.2   0:00.01 httpd
>
> 12297 nobody    16   0 48860  16m 6932 S  0.3  0.2   0:00.01 httpd
>
> 12299 nobody    16   0 48992  16m 7120 S  0.3  0.2   0:00.01 httpd
>
> 12301 nobody    20   0 48860  16m 6816 S  0.3  0.2   0:00.01 httpd
>
> 12307 nobody    15   0 48860  16m 6880 S  0.3  0.2   0:00.01 httpd
>
>
>
>
>  > do a 'cat /proc/cpuinfo' and make sure your os is seeing all your cpus.
>  >
>
>
>
> I guess it's using all 4?

Yeah.

You aren't serving data from a shared drive (smb or nsf) are you?  You
have a bunch of httpd just sitting around doing very little.

Or do you have any php/perl/python/whatever turning around and doing
network stuff?

Check your nic's for errors (run ifconfig), check these stats:

RX packets:15606269 errors:0 dropped:0 overruns:0 frame:0
TX packets:13173940 errors:5 dropped:0 overruns:0 carrier:10
           collisions:0 txqueuelen:1000


the load average is a summary of a bunch of things, including whats
waiting on something else.  I'll bet your httpd's are sitting around
waiting on something, (its not cpu or disk, it must be something else),
which is causing the load average to spike up.

-Andy

От:
Tom Lane
Дата:

Phoenix Kiula <> writes:
> top - 10:38:53 up 29 days, 5 min,  1 user,  load average: 64.99, 65.17,
> 65.06
> Tasks: 568 total,   1 running, 537 sleeping,   6 stopped,  24 zombie
> Cpu0  : 17.7% us,  7.7% sy,  0.0% ni, 74.0% id,  0.7% wa,  0.0% hi,  0.0% si
> Cpu1  :  6.3% us,  5.6% sy,  0.0% ni, 84.4% id,  3.6% wa,  0.0% hi,  0.0% si
> Cpu2  :  5.6% us,  5.9% sy,  0.0% ni, 86.8% id,  1.7% wa,  0.0% hi,  0.0% si
> Cpu3  :  5.6% us,  4.0% sy,  0.0% ni, 74.2% id, 16.2% wa,  0.0% hi,  0.0% si
> Mem:   8310256k total,  8277416k used,    32840k free,    61944k buffers
> Swap:  2096440k total,    16128k used,  2080312k free,  7664224k cached

It sure looks from here like your box is not under any particular
stress.  The only thing that suggests a problem is the high load
average, but since that doesn't agree with any other measurements,
I'm inclined to think that the load average is simply wrong.
Do you have any actual evidence of a problem (like slow response)?

(I've seen load averages that had nothing to do with observable
reality on other Unixes, though not before on RHEL.)

            regards, tom lane

От:
Guillaume Cottenceau
Дата:

Phoenix Kiula <phoenix.kiula 'at' gmail.com> writes:

> Tasks: 568 total,   1 running, 537 sleeping,   6 stopped,  24 zombie

The stopped and zombie processes look odd. Any reason for these?

--
Guillaume Cottenceau

От:
Scott Marlowe
Дата:

On Wed, Aug 19, 2009 at 9:40 AM, Phoenix Kiula<> wrote:
> On Wed, Aug 19, 2009 at 11:37 PM, Andy Colson<> wrote:
>
>>
>> Phoenix:  run top again, and hit the '1' key.  It'll show you stats for
>> each
>> cpu.  Is one pegged and the others idle?
>
> top - 10:38:53 up 29 days, 5 min,  1 user,  load average: 64.99, 65.17,
> 65.06
> Tasks: 568 total,   1 running, 537 sleeping,   6 stopped,  24 zombie
> Cpu0  : 17.7% us,  7.7% sy,  0.0% ni, 74.0% id,  0.7% wa,  0.0% hi,  0.0% si
> Cpu1  :  6.3% us,  5.6% sy,  0.0% ni, 84.4% id,  3.6% wa,  0.0% hi,  0.0% si
> Cpu2  :  5.6% us,  5.9% sy,  0.0% ni, 86.8% id,  1.7% wa,  0.0% hi,  0.0% si
> Cpu3  :  5.6% us,  4.0% sy,  0.0% ni, 74.2% id, 16.2% wa,  0.0% hi,  0.0% si
> Mem:   8310256k total,  8277416k used,    32840k free,    61944k buffers
> Swap:  2096440k total,    16128k used,  2080312k free,  7664224k cached
>

OK, nothing looks odd except, as pointed out, the stopped, zombie and
high load.  The actual amount of stuff running is minimal.

I'm wondering if you've got something causing apache children to crash
and go zombie.  What parts of this setup are compiled by hand?  Are
you sure that you don't have something like apache compiled against
one version of zlib and php-mysql against another?  Not that exact
problem, but it's one of many ways to make a crash prone apache.

От:
Ivan Voras
Дата:

Scott Marlowe wrote:
> On Wed, Aug 19, 2009 at 9:40 AM, Phoenix Kiula<> wrote:
>> On Wed, Aug 19, 2009 at 11:37 PM, Andy Colson<> wrote:
>>
>>> Phoenix:  run top again, and hit the '1' key.  It'll show you stats for
>>> each
>>> cpu.  Is one pegged and the others idle?
>> top - 10:38:53 up 29 days, 5 min,  1 user,  load average: 64.99, 65.17,
>> 65.06
>> Tasks: 568 total,   1 running, 537 sleeping,   6 stopped,  24 zombie
>> Cpu0  : 17.7% us,  7.7% sy,  0.0% ni, 74.0% id,  0.7% wa,  0.0% hi,  0.0% si
>> Cpu1  :  6.3% us,  5.6% sy,  0.0% ni, 84.4% id,  3.6% wa,  0.0% hi,  0.0% si
>> Cpu2  :  5.6% us,  5.9% sy,  0.0% ni, 86.8% id,  1.7% wa,  0.0% hi,  0.0% si
>> Cpu3  :  5.6% us,  4.0% sy,  0.0% ni, 74.2% id, 16.2% wa,  0.0% hi,  0.0% si
>> Mem:   8310256k total,  8277416k used,    32840k free,    61944k buffers
>> Swap:  2096440k total,    16128k used,  2080312k free,  7664224k cached
>>
>
> OK, nothing looks odd except, as pointed out, the stopped, zombie and
> high load.  The actual amount of stuff running is minimal.
>
> I'm wondering if you've got something causing apache children to crash
> and go zombie.  What parts of this setup are compiled by hand?  Are

Good point. Does Linux have "last PID" field in top? If so, you could
monitor it to find if it it's rapidly changing.