Обсуждение: heavy load-high cpu itilization

Поиск
Список
Период
Сортировка

heavy load-high cpu itilization

От
Filippos
Дата:
Dear all

first of all congratulations on your greak work here since from time to time
i 've found many answers to my problems. unfortunately for this specific
problem i didnt find much relevant information, so i would ask for your
guidance dealing with the following situation:

we have a dedicated server (8.4.4, redhat) with 24 cpus and 36 GB or RAM. i
would say that the traffic in the server is huge and the cpu utilization is
pretty high too (avg ~ 75% except during the nights when is it much lower).
i am trying to tune the server a little bit to handle this problem. the
incoming data in the database are about 30-40 GB /day.

at first the checkpoint_segments were set to 50, the checkpoint_timeout at
15 min and the checkpoint_completion_target was 0.5 sec.

i noticed that the utilization of the server was higher when it was close to
making a checkpoint and since the parameter of full_page_writes is ON , i
changed the parameters mentioned above to (i did that after reading a lot of
stuff online):
checkpoint_segments->250
checkpoint_timeout->40min
checkpoint_completion_target -> 0.8

but the cpu utilization is not significantly lower. another parameter i will
certainly change is the wal_buffers which is now set at 64KB and i plan to
make it 16MB. can this parameter cause a significant percentage of the
problem?

are there any suggestions what i can do to tune better the server? i can
provide any information you find relevant for the configuration of the
server, the OS, the storage etc

thank you in advance

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/heavy-load-high-cpu-itilization-tp4631760p4631760.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: heavy load-high cpu itilization

От
Scott Marlowe
Дата:
On Mon, Jul 25, 2011 at 12:00 PM, Filippos <filippos.kal@gmail.com> wrote:
> Dear all
>
> first of all congratulations on your greak work here since from time to time
> i 've found many answers to my problems. unfortunately for this specific
> problem i didnt find much relevant information, so i would ask for your
> guidance dealing with the following situation:
>
> we have a dedicated server (8.4.4, redhat) with 24 cpus and 36 GB or RAM. i

There are known data eating bugs in 8.4.4 you should upgrade to
8.4.latest as soon as possible.

> would say that the traffic in the server is huge and the cpu utilization is
> pretty high too (avg ~ 75% except during the nights when is it much lower).
> i am trying to tune the server a little bit to handle this problem. the
> incoming data in the database are about 30-40 GB /day.

So you're either CPU or IO bound.  We need to see which.

Look at these two pages:
http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
http://wiki.postgresql.org/wiki/SlowQueryQuestions

to get started.

> at first the checkpoint_segments were set to 50, the checkpoint_timeout at
> 15 min and the checkpoint_completion_target was 0.5 sec.

checkpoint_completion_target is not in seconds, it's a percentage to
have completely by the time the next checkpoint arrives.  a checkpoint
completion target of 1.0 means that the bg writer should write out
data fast enough to flush everything out of WAL to the disks right as
you reach checkpoint timeout.  the more aggressive this is the more of
the data will already be flushed to disk when the timeout occurs.
However, this comes at the expense of more IO overall as multiple
updates to the same block result in multiple writes instead of just
one.

> i noticed that the utilization of the server was higher when it was close to
> making a checkpoint and since the parameter of full_page_writes is ON , i
> changed the parameters mentioned above to (i did that after reading a lot of
> stuff online):
> checkpoint_segments->250
> checkpoint_timeout->40min
> checkpoint_completion_target -> 0.8
>
> but the cpu utilization is not significantly lower. another parameter i will
> certainly change is the wal_buffers which is now set at 64KB and i plan to
> make it 16MB. can this parameter cause a significant percentage of the
> problem?

Most of the work done by checkpointing / background writing is IO
intensive, not CPU intensive.

> are there any suggestions what i can do to tune better the server? i can
> provide any information you find relevant for the configuration of the
> server, the OS, the storage etc

First you need to more accurately identify the problem.  Tools like
iostat, vmstat, top, and so forth can help you figure out if the
problem is that you're IO bound or CPU bound.  It's also possible
you've got a thundering herd issue where there's too many processes
all trying to vie for the limited number of cores at the same time.
If you've got more than 30k to 50k context switches per second in
vmstat it's likely you're getting too many things trying to run at
once.

Re: heavy load-high cpu itilization

От
Filippos
Дата:
thx a lot for your answer.
i will provide some stats, so if you could help me figure out the source of
the problem that would be great

-top -c
Tasks: 1220 total,  49 running, 1171 sleeping,   0 stopped,   0 zombie
Cpu(s): 84.1%us,  2.8%sy,  0.0%ni, 12.3%id,  0.1%wa,  0.1%hi,  0.6%si,
0.0%st
Mem:  98846996k total, 98632044k used,   214952k free,   134320k buffers
Swap: 50331640k total,   116312k used, 50215328k free, 89445208k cached

-SELECT count(procpid) FROM pg_stat_activity -> 422
-SELECT count(procpid) FROM pg_stat_activity WHERE (NOW() - query_start) >
INTERVAL '1 MINUTES' AND current_query = '<IDLE>' -> 108
-SELECT count(procpid) FROM pg_stat_activity WHERE (NOW() - query_start) >
INTERVAL '5 MINUTES' AND current_query = '<IDLE>' -> 45

-vmstat -n 1 10
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa st
41  1 116300 347008 134176 89608912    0    0   143   210    0    0 11  1 88
0  0
20  0 116300 423556 134116 89581840    0    0  8336  3038 11118 21139 81  5
13  0  0
24  0 116300 412904 134108 89546840    0    0  8488  9025 10621 22921 81  4
15  0  0
23  0 116300 409388 134084 89513728    0    0  8320   548 11386 20226 82  4
14  0  0
34  0 116300 403688 134088 89509520    0    0  6336     0 9552 20994 83  3
14  0  0
22  1 116300 337972 134104 89518624    0    0  8792    28 8980 20455 83  4
13  0  0
37  0 116300 303956 134116 89528720    0    0  8440   536 9644 20492 84  3
13  0  0
17  1 116300 293212 134112 89532816    0    0  5864  8240 9527 19771 85  3
12  0  0
14  0 116300 282168 134116 89540720    0    0  7772   752 10141 21780 84  3
13  0  0
44  0 116300 278684 134100 89536080    0    0  7352   555 9856 21539 85  2
13  0  0

-vmstat -s
     98846992  total memory
     98685392  used memory
     40342200  active memory
     52644588  inactive memory
       161604  free memory
       129960  buffer memory
     89421936  swap cache
     50331640  total swap
       116300  used swap
     50215340  free swap
   2258553017 non-nice user cpu ticks
      1125281 nice user cpu ticks
    146638389 system cpu ticks
  17789847697 idle cpu ticks
     83090716 IO-wait cpu ticks
      5045742 IRQ cpu ticks
     38895985 softirq cpu ticks
            0 stolen cpu ticks
  29142450583 pages paged in
  42731005078 pages paged out
        39784 pages swapped in
      3395187 pages swapped out
   1338370564 interrupts
   1176640487 CPU context switches
   1305704895 boot time
     24471946 forks

(after 30 sec)
-vmstat -s
     98846992  total memory
     98367312  used memory
     39959952  active memory
     52957104  inactive memory
       479684  free memory
       129720  buffer memory
     89410640  swap cache
     50331640  total swap
       116296  used swap
     50215344  free swap
   2258645091 non-nice user cpu ticks
      1125282 nice user cpu ticks
    146640181 system cpu ticks
  17789863186 idle cpu ticks
     83090856 IO-wait cpu ticks
      5045855 IRQ cpu ticks
     38896749 softirq cpu ticks
            0 stolen cpu ticks
  29142861271 pages paged in
  42731249289 pages paged out
        39784 pages swapped in
      3395187 pages swapped out
   1338808821 interrupts
   1177463384 CPU context switches
   1305704895 boot time
     24472003 forks

from the above -> context switches /s = (1177463384 - 1176640487)/30 = 27429

thx in advance for any advice

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/heavy-load-high-cpu-itilization-tp4647751p4651856.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: heavy load-high cpu itilization

От
Filippos
Дата:
thx a lot for your answer.
i will provide some stats, so if you could help me figure out the source of
the problem that would be great

-*top -c*
Tasks: 1220 total,  49 running, 1171 sleeping,   0 stopped,   0 zombie
Cpu(s): *84.1%us*,  2.8%sy,  0.0%ni, 12.3%id,  0.1%wa,  0.1%hi,  0.6%si,
0.0%st
Mem:  98846996k total, 98632044k used,   214952k free,   134320k buffers
Swap: 50331640k total,   116312k used, 50215328k free, 89445208k cached

-SELECT count(procpid) FROM pg_stat_activity -> *422*
-SELECT count(procpid) FROM pg_stat_activity WHERE (NOW() - query_start) >
INTERVAL '1 MINUTES' AND current_query = '<IDLE>' -> *108*
-SELECT count(procpid) FROM pg_stat_activity WHERE (NOW() - query_start) >
INTERVAL '5 MINUTES' AND current_query = '<IDLE>' -> *45*

-*vmstat -n 1 10*
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa st
41  1 116300 347008 134176 89608912    0    0   143   210    0    0 11  1 88
0  0
20  0 116300 423556 134116 89581840    0    0  8336  3038 11118 21139 81  5
13  0  0
24  0 116300 412904 134108 89546840    0    0  8488  9025 10621 22921 81  4
15  0  0
23  0 116300 409388 134084 89513728    0    0  8320   548 11386 20226 82  4
14  0  0
34  0 116300 403688 134088 89509520    0    0  6336     0 9552 20994 83  3
14  0  0
22  1 116300 337972 134104 89518624    0    0  8792    28 8980 20455 83  4
13  0  0
37  0 116300 303956 134116 89528720    0    0  8440   536 9644 20492 84  3
13  0  0
17  1 116300 293212 134112 89532816    0    0  5864  8240 9527 19771 85  3
12  0  0
14  0 116300 282168 134116 89540720    0    0  7772   752 10141 21780 84  3
13  0  0
44  0 116300 278684 134100 89536080    0    0  7352   555 9856 21539 85  2
13  0  0

-*vmstat -s*
     98846992  total memory
     98685392  used memory
     40342200  active memory
     52644588  inactive memory
       161604  free memory
       129960  buffer memory
     89421936  swap cache
     50331640  total swap
       116300  used swap
     50215340  free swap
   2258553017 non-nice user cpu ticks
      1125281 nice user cpu ticks
    146638389 system cpu ticks
  17789847697 idle cpu ticks
     83090716 IO-wait cpu ticks
      5045742 IRQ cpu ticks
     38895985 softirq cpu ticks
            0 stolen cpu ticks
  29142450583 pages paged in
  42731005078 pages paged out
        39784 pages swapped in
      3395187 pages swapped out
   1338370564 interrupts
   1176640487 CPU context switches
   1305704895 boot time
     24471946 forks

(after 30 sec)
-*vmstat -s*
     98846992  total memory
     98367312  used memory
     39959952  active memory
     52957104  inactive memory
       479684  free memory
       129720  buffer memory
     89410640  swap cache
     50331640  total swap
       116296  used swap
     50215344  free swap
   2258645091 non-nice user cpu ticks
      1125282 nice user cpu ticks
    146640181 system cpu ticks
  17789863186 idle cpu ticks
     83090856 IO-wait cpu ticks
      5045855 IRQ cpu ticks
     38896749 softirq cpu ticks
            0 stolen cpu ticks
  29142861271 pages paged in
  42731249289 pages paged out
        39784 pages swapped in
      3395187 pages swapped out
   1338808821 interrupts
   1177463384 CPU context switches
   1305704895 boot time
     24472003 forks

from the above -> context switches /s = (1177463384 - 1176640487)/30 =
*27429*

thx in advance for any advice




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/heavy-load-high-cpu-itilization-tp4647751p4650542.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: heavy load-high cpu itilization

От
Jim Nasby
Дата:
On Jul 30, 2011, at 3:02 PM, Filippos wrote:
> thx a lot for your answer.
> i will provide some stats, so if you could help me figure out the source of
> the problem that would be great
>
> -*top -c*
> Tasks: 1220 total,  49 running, 1171 sleeping,   0 stopped,   0 zombie
> Cpu(s): *84.1%us*,  2.8%sy,  0.0%ni, 12.3%id,  0.1%wa,  0.1%hi,  0.6%si,
> 0.0%st
> Mem:  98846996k total, 98632044k used,   214952k free,   134320k buffers
> Swap: 50331640k total,   116312k used, 50215328k free, 89445208k cached

84% CPU isn't horrible, and you do have idle CPU time available. So you don't look to be too CPU-bound, although you
needto keep in mind that one process might be CPU intensive and taking a long time to run, thereby blocking other
processesthat depend on it's results. 

> -SELECT count(procpid) FROM pg_stat_activity -> *422*
> -SELECT count(procpid) FROM pg_stat_activity WHERE (NOW() - query_start) >
> INTERVAL '1 MINUTES' AND current_query = '<IDLE>' -> *108*
> -SELECT count(procpid) FROM pg_stat_activity WHERE (NOW() - query_start) >
> INTERVAL '5 MINUTES' AND current_query = '<IDLE>' -> *45*

It would be good to look at getting some connection pooling happening.

Your vmstat output shows you generally have CPU available. Can you provide some output from iostat -xk 2?

> -*vmstat -n 1 10*
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa st
> 41  1 116300 347008 134176 89608912    0    0   143   210    0    0 11  1 88
> 0  0
> 20  0 116300 423556 134116 89581840    0    0  8336  3038 11118 21139 81  5
> 13  0  0
> 24  0 116300 412904 134108 89546840    0    0  8488  9025 10621 22921 81  4
> 15  0  0
> 23  0 116300 409388 134084 89513728    0    0  8320   548 11386 20226 82  4
> 14  0  0
> 34  0 116300 403688 134088 89509520    0    0  6336     0 9552 20994 83  3
> 14  0  0
> 22  1 116300 337972 134104 89518624    0    0  8792    28 8980 20455 83  4
> 13  0  0
> 37  0 116300 303956 134116 89528720    0    0  8440   536 9644 20492 84  3
> 13  0  0
> 17  1 116300 293212 134112 89532816    0    0  5864  8240 9527 19771 85  3
> 12  0  0
> 14  0 116300 282168 134116 89540720    0    0  7772   752 10141 21780 84  3
> 13  0  0
> 44  0 116300 278684 134100 89536080    0    0  7352   555 9856 21539 85  2
> 13  0  0
>
> -*vmstat -s*
>    98846992  total memory
>    98685392  used memory
>    40342200  active memory
>    52644588  inactive memory
>      161604  free memory
>      129960  buffer memory
>    89421936  swap cache
>    50331640  total swap
>      116300  used swap
>    50215340  free swap
>  2258553017 non-nice user cpu ticks
>     1125281 nice user cpu ticks
>   146638389 system cpu ticks
> 17789847697 idle cpu ticks
>    83090716 IO-wait cpu ticks
>     5045742 IRQ cpu ticks
>    38895985 softirq cpu ticks
>           0 stolen cpu ticks
> 29142450583 pages paged in
> 42731005078 pages paged out
>       39784 pages swapped in
>     3395187 pages swapped out
>  1338370564 interrupts
>  1176640487 CPU context switches
>  1305704895 boot time
>    24471946 forks
>
> (after 30 sec)
> -*vmstat -s*
>    98846992  total memory
>    98367312  used memory
>    39959952  active memory
>    52957104  inactive memory
>      479684  free memory
>      129720  buffer memory
>    89410640  swap cache
>    50331640  total swap
>      116296  used swap
>    50215344  free swap
>  2258645091 non-nice user cpu ticks
>     1125282 nice user cpu ticks
>   146640181 system cpu ticks
> 17789863186 idle cpu ticks
>    83090856 IO-wait cpu ticks
>     5045855 IRQ cpu ticks
>    38896749 softirq cpu ticks
>           0 stolen cpu ticks
> 29142861271 pages paged in
> 42731249289 pages paged out
>       39784 pages swapped in
>     3395187 pages swapped out
>  1338808821 interrupts
>  1177463384 CPU context switches
>  1305704895 boot time
>    24472003 forks
>
> from the above -> context switches /s = (1177463384 - 1176640487)/30 =
> *27429*
>
> thx in advance for any advice
>
>
>
>
> --
> View this message in context:
http://postgresql.1045698.n5.nabble.com/heavy-load-high-cpu-itilization-tp4647751p4650542.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net