Обсуждение: High CPU load on Postgres Server during Peak times!!!!

Поиск

Список

Период

Сортировка

High CPU load on Postgres Server during Peak times!!!!

От

Shiva Raman

Дата:

22 сентября 2009 г., 13:55:00

Dear all

I am having a problem of high cpu loads in my postgres server during peak time. Following are the
details of my setup (details as per the postgres wiki) .

* PostgreSQL version
o Run "select pg_version();" in psql or PgAdmin III and provide the full, exact output.

clusternode2:~ # rpm -qa | grep postgres
postgresql-devel-8.1.9-1.2
postgresql-8.1.9-1.2
postgresql-docs-8.1.9-1.2
postgresql-server-8.1.9-1.2
postgresql-libs-64bit-8.1.9-1.2
postgresql-libs-8.1.9-1.2
postgresql-jdbc-8.1-12.2
postgresql-contrib-8.1.9-1.2

* A description of what you are trying to achieve and what results you expect.

To keep the CPU Load below 10 , Now during peak times the load is nearing to 40
At that time , it is not possible to access the data.

* The EXACT text of the query you ran, if any

* The EXACT output of that query if it's short enough to be reasonable to post
o If you think the output is wrong, what you think should've been produced instead

* The EXACT error message you get, if there is one

As of now , i am unable to locate the exact query, the load shoots up abnormally during
peak time is the main problem .

* What program you're using to connect to PostgreSQL

Jakarta Tomcat - Struts with JSP

* What version of the ODBC/JDBC driver you're using, if any

postgresql-jdbc-8.1-12.2

* What you were doing when the error happened / how to cause the error. Describe in as much detail as possible, step by step, including command lines, SQL output, etc.

When certain tables with more than 3 lakh items are concurrently accessed by more than 300
users, the CPU load shoots up .

* Is there anything remotely unusual in the PostgreSQL server logs?
o On Windows these are in your data directory. On a default PostgreSQL install that'll be in C:\Program Files\PostgreSQL\8.4\data\pg_log (assuming you're using 8.4)

The log file /var/log/postgresql has no data .

o On Linux this depends a bit on distro, but you'll usually find them in /var/log/postgresql/.
* Operating system and version
o Linux users:
+ Linux distro and version
+ Kernel details (run "uname -a" on the terminal)

SLES 10 SP3
clusternode2:~ # uname -a
Linux clusternode2 2.6.16.46-0.12-ppc64 #1 SMP Thu May 17 14:00:09 UTC 2007 ppc64 ppc64 ppc64 GNU/Linux

* What kind of hardware you have.
o CPU manufacturer and model, eg "AMD Athlon X2" or "Intel Core 2 Duo"
o Amount and size of RAM installed, eg "2GB RAM"

High Availability Cluster with two IBM P Series Server and one DS4700 Storage

IBM P series P52A with 2-core 2.1 Ghz POWER5+ Processor Card , 36 MB L3 Cache ,16 GB of RAM,
73.4 GB 10,000 RPM Ultra320 SCSI Drive for Operating System .

o Storage details (important for performance and corruption questions)
+ Do you use a RAID controller? If so, what type of controller? eg "3Ware Escalade 8500-8"
# Does it have a battery backed cache module?
# Is write-back caching enabled?
+ Do you use software RAID? If so, what software and what version? eg "Linux software RAID (md) 2.6.18-5-686 SMP mod_unload 686 REGPARM gcc-4.1".
# In the case of Linux software RAID you can get the details from the "modinfo md_mod" command
+ Is your PostgreSQL database on a SAN?
# Who made it, what kind, etc? Provide what details you can.
+ How many hard disks are connected to the system and what types are they? You need to say more than just "6 disks". At least give maker, rotational speed and interface type, eg "6 15,000rpm Seagate SAS disks".
+ How are your disks arranged for storage? Are you using RAID? If so, what RAID level(s)? What PostgreSQL data is on what disks / disk sets? What file system(s) are in use?
# eg: "Two disks in RAID 1, with all PostgreSQL data and programs stored on one ext3 file system."
# eg: "4 disks in RAID 5 holding the pg data directory on an ext3 file system. 2 disks in RAID 1 holding pg_clog, pg_xlog, the temporary tablespace, and the sort scratch space, also on ext3.".
# eg: "Default Windows install of PostgreSQL"
+ In case of corruption data reports:
# Have you had any unexpected power loss lately?
# Have you run a file system check? (chkdsk / fsck)
# Are there any error messages in the system logs? (unix/linux: "dmesg", "/var/log/syslog" ; Windows: Event Viewer in Control Panel -> Administrative Tools )

IBM SAN DS4700 Storage with Fibre Channel HDD (73.4 GB * 10)
Two Partitions - 73.4 GB * 3 RAID 5 - 134 GB storage partitions (One holding Jakarata tomcat
application server and other holding Postgresql Database) .
Four Hard disk RAID 5 with ext3 file systems hold the pgdata on SAN .
Hard disk rotational speed is 73 GB 15K IBM 2 GB Fibre channel

No power loss, filesystem check also fine, No errors on /var/log/syslog

Following is the output of TOP command during offpeak time.

top - 18:36:56 up 77 days, 20:33, 1 user, load average: 12.99, 9.22, 10.37
Tasks: 142 total, 12 running, 130 sleeping, 0 stopped, 0 zombie
Cpu(s): 46.1%us, 1.9%sy, 0.0%ni, 6.1%id, 3.0%wa, 0.0%hi, 0.1%si, 42.9%st
Mem: 16133676k total, 13657396k used, 2476280k free, 450908k buffers
Swap: 14466492k total, 124k used, 14466368k free, 11590056k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22458 postgres 19 0 2473m 477m 445m R 40 3.0 0:15.49 postmaster
22451 postgres 15 0 2442m 447m 437m S 33 2.8 0:30.44 postmaster
22464 postgres 17 0 2443m 397m 383m R 28 2.5 0:13.78 postmaster
22484 postgres 16 0 2448m 431m 412m S 20 2.7 0:02.73 postmaster
22465 postgres 17 0 2440m 461m 449m R 15 2.9 0:03.52 postmaster
22452 postgres 16 0 2450m 727m 706m R 13 4.6 0:23.46 postmaster
22476 postgres 16 0 2437m 413m 405m S 13 2.6 0:06.11 postmaster
22485 postgres 16 0 2439m 230m 222m R 7 1.5 0:05.72 postmaster
22481 postgres 15 0 2436m 175m 169m S 7 1.1 0:04.44 postmaster
22435 postgres 17 0 2438m 371m 361m R 6 2.4 1:17.92 postmaster
22440 postgres 17 0 2445m 497m 483m R 5 3.2 1:44.50 postmaster
22486 postgres 17 0 2432m 84m 81m R 4 0.5 0:00.76 postmaster
3 root 34 19 0 0 0 R 0 0.0 1:47.50 ksoftirqd/0
4726 root 15 0 29540 8776 3428 S 0 0.1 140:02.98 X
24950 root 15 0 0 0 0 S 0 0.0 0:30.96 pdflush
1 root 16 0 812 316 280 S 0 0.0 0:13.29 init
2 root RT 0 0 0 0 S 0 0.0 0:01.46 migration/0
4 root RT 0 0 0 0 S 0 0.0 0:00.78 migration/1
5 root 34 19 0 0 0 S 0 0.0 1:36.79 ksoftirqd/1
6 root RT 0 0 0 0 S 0 0.0 0:01.46 migration/2
7 root 34 19 0 0 0 R 0 0.0 1:49.83 ksoftirqd/2
8 root RT 0 0 0 0 S 0 0.0 0:00.79 migration/3
9 root 34 19 0 0 0 S 0 0.0 1:38.18 ksoftirqd/3
10 root 10 -5 0 0 0 S 0 0.0 1:02.11 events/0
11 root 10 -5 0 0 0 S 0 0.0 1:03.27 events/1
12 root 10 -5 0 0 0 S 0 0.0 1:01.76 events/2
13 root 10 -5 0 0 0 S 0 0.0 1:02.29 events/3
14 root 10 -5 0 0 0 S 0 0.0 0:00.01 khelper
1016 root 10 -5 0 0 0 S 0 0.0 0:00.00 kthread
1054 root 10 -5 0 0 0 S 0 0.0 0:03.08 kblockd/0
1055 root 10 -5 0 0 0 S 0 0.0 0:02.83 kblockd/1
1056 root 10 -5 0 0 0 S 0 0.0 0:03.19 kblockd/2

The CPU Load shoots upto 40 during peak time.

Following is my postgresql.conf (without comments)

hba_file = '/var/lib/pgsql/data/pg_hba.conf'
listen_addresses = '*'
port = 5432
max_connections = 1800
shared_buffers = 300000
max_fsm_relations = 1000
effective_cache_size = 200000
log_destination = 'stderr'
redirect_stderr = on
log_rotation_age = 0
log_rotation_size = 10240
silent_mode = onlog_line_prefix = '%t %d %u '
autovacuum = on
datestyle = 'iso, dmy'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'

User Access
Total Number of Users is 500
Maximum number of Concurrent users will be 500 during peak time
Off Peak time the maximum number of concurrent user will be around 150 to 200.

Please let me know your suggestions to improve the performance.

Regards

Shiva Raman

Re: High CPU load on Postgres Server during Peak times!!!!

От

Merlin Moncure

Дата:

22 сентября 2009 г., 14:18:26

On Tue, Sep 22, 2009 at 9:54 AM, Shiva Raman <raman.shivag@gmail.com> wrote:
> Dear all
>
>   I am having a problem of high cpu loads in my postgres server during peak
> time. Following are the
> details of my setup (details as per the postgres wiki) .
>
> * PostgreSQL version
>          o Run "select pg_version();" in psql or PgAdmin III and provide the
> full, exact output.
>
>
> clusternode2:~ # rpm -qa | grep postgres
> postgresql-devel-8.1.9-1.2
> postgresql-8.1.9-1.2
> postgresql-docs-8.1.9-1.2
> postgresql-server-8.1.9-1.2
> postgresql-libs-64bit-8.1.9-1.2
> postgresql-libs-8.1.9-1.2
> postgresql-jdbc-8.1-12.2
> postgresql-contrib-8.1.9-1.2
>
>
> * A description of what you are trying to achieve and what results you
> expect.
>
> To keep the CPU Load below 10 , Now during peak times the load is nearing to
> 40
> At that time , it is not possible to access the data.
>
>    * The EXACT text of the query you ran, if any
>
>
>    * The EXACT output of that query if it's short enough to be reasonable to
> post
>          o If you think the output is wrong, what you think should've been
> produced instead
>
>    * The EXACT error message you get, if there is one
>
> As of now , i am unable to locate the exact query, the load shoots up
> abnormally during
> peak time is the main problem .
>
>
>    * What program you're using to connect to PostgreSQL
>
>         Jakarta Tomcat - Struts with JSP
>
>
>    * What version of the ODBC/JDBC driver you're using, if any
>
> postgresql-jdbc-8.1-12.2
>
>    * What you were doing when the error happened / how to cause the error.
> Describe in as much detail as possible, step by step, including command
> lines, SQL output, etc.
>
> When certain tables with more than 3 lakh items are concurrently accessed by
> more than 300
> users, the CPU load shoots up .
>
>    * Is there anything remotely unusual in the PostgreSQL server logs?
>          o On Windows these are in your data directory. On a default
> PostgreSQL install that'll be in C:\Program Files\PostgreSQL\8.4\data\pg_log
> (assuming you're using 8.4)
>
> The log file /var/log/postgresql has no data .
>
>          o On Linux this depends a bit on distro, but you'll usually find
> them in /var/log/postgresql/.
>    * Operating system and version
>          o Linux users:
>                + Linux distro and version
>                + Kernel details (run "uname -a" on the terminal)
>
> SLES 10 SP3
> clusternode2:~ # uname -a
> Linux clusternode2 2.6.16.46-0.12-ppc64 #1 SMP Thu May 17 14:00:09 UTC 2007
> ppc64 ppc64 ppc64 GNU/Linux
>
>
>
>    * What kind of hardware you have.
>          o CPU manufacturer and model, eg "AMD Athlon X2" or "Intel Core 2
> Duo"
>          o Amount and size of RAM installed, eg "2GB RAM"
>
> High Availability Cluster with two IBM P Series Server and one DS4700
> Storage
>
> IBM P series P52A with 2-core 2.1 Ghz POWER5+ Processor Card , 36 MB L3
> Cache ,16 GB of RAM,
> 73.4 GB 10,000 RPM Ultra320 SCSI Drive for Operating System .
>
>
>
>          o Storage details (important for performance and corruption
> questions)
>                + Do you use a RAID controller? If so, what type of
> controller? eg "3Ware Escalade 8500-8"
>                      # Does it have a battery backed cache module?
>                      # Is write-back caching enabled?
>                + Do you use software RAID? If so, what software and what
> version? eg "Linux software RAID (md) 2.6.18-5-686 SMP mod_unload 686
> REGPARM gcc-4.1".
>                      # In the case of Linux software RAID you can get the
> details from the "modinfo md_mod" command
>                + Is your PostgreSQL database on a SAN?
>                      # Who made it, what kind, etc? Provide what details you
> can.
>                + How many hard disks are connected to the system and what
> types are they? You need to say more than just "6 disks". At least give
> maker, rotational speed and interface type, eg "6 15,000rpm Seagate SAS
> disks".
>                + How are your disks arranged for storage? Are you using
> RAID? If so, what RAID level(s)? What PostgreSQL data is on what disks /
> disk sets? What file system(s) are in use?
>                      # eg: "Two disks in RAID 1, with all PostgreSQL data
> and programs stored on one ext3 file system."
>                      # eg: "4 disks in RAID 5 holding the pg data directory
> on an ext3 file system. 2 disks in RAID 1 holding pg_clog, pg_xlog, the
> temporary tablespace, and the sort scratch space, also on ext3.".
>                      # eg: "Default Windows install of PostgreSQL"
>                + In case of corruption data reports:
>                      # Have you had any unexpected power loss lately?
>                      # Have you run a file system check? (chkdsk / fsck)
>                      # Are there any error messages in the system logs?
> (unix/linux: "dmesg", "/var/log/syslog" ; Windows: Event Viewer in Control
> Panel -> Administrative Tools )
>
>
> IBM SAN DS4700 Storage with Fibre Channel HDD (73.4 GB * 10)
> Two Partitions - 73.4 GB * 3 RAID 5 - 134 GB storage partitions (One holding
> Jakarata tomcat
> application server and other holding Postgresql Database) .
> Four Hard disk RAID 5 with ext3 file systems hold the pgdata on SAN .
> Hard disk rotational speed is 73 GB 15K IBM 2 GB Fibre channel
>
> No power loss, filesystem check also fine, No errors on /var/log/syslog
>
> Following is the output of TOP command during offpeak time.
>
>
> top - 18:36:56 up 77 days, 20:33,  1 user,  load average: 12.99, 9.22, 10.37
> Tasks: 142 total,  12 running, 130 sleeping,   0 stopped,   0 zombie
> Cpu(s): 46.1%us,  1.9%sy,  0.0%ni,  6.1%id,  3.0%wa,  0.0%hi,  0.1%si,
> 42.9%st
> Mem:  16133676k total, 13657396k used,  2476280k free,   450908k buffers
> Swap: 14466492k total,      124k used, 14466368k free, 11590056k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
> 22458 postgres  19   0 2473m 477m 445m R   40  3.0   0:15.49 postmaster
>
> 22451 postgres  15   0 2442m 447m 437m S   33  2.8   0:30.44 postmaster
>
> 22464 postgres  17   0 2443m 397m 383m R   28  2.5   0:13.78 postmaster
>
> 22484 postgres  16   0 2448m 431m 412m S   20  2.7   0:02.73 postmaster
>
> 22465 postgres  17   0 2440m 461m 449m R   15  2.9   0:03.52 postmaster
>
> 22452 postgres  16   0 2450m 727m 706m R   13  4.6   0:23.46 postmaster
>
> 22476 postgres  16   0 2437m 413m 405m S   13  2.6   0:06.11 postmaster
>
> 22485 postgres  16   0 2439m 230m 222m R    7  1.5   0:05.72 postmaster
>
> 22481 postgres  15   0 2436m 175m 169m S    7  1.1   0:04.44 postmaster
>
> 22435 postgres  17   0 2438m 371m 361m R    6  2.4   1:17.92 postmaster
>
> 22440 postgres  17   0 2445m 497m 483m R    5  3.2   1:44.50 postmaster
>
> 22486 postgres  17   0 2432m  84m  81m R    4  0.5   0:00.76 postmaster
>
>    3 root      34  19     0    0    0 R    0  0.0   1:47.50 ksoftirqd/0
>
> 4726 root      15   0 29540 8776 3428 S    0  0.1 140:02.98 X
>
> 24950 root      15   0     0    0    0 S    0  0.0   0:30.96 pdflush
>
>    1 root      16   0   812  316  280 S    0  0.0   0:13.29 init
>
>    2 root      RT   0     0    0    0 S    0  0.0   0:01.46 migration/0
>
>    4 root      RT   0     0    0    0 S    0  0.0   0:00.78 migration/1
>
>    5 root      34  19     0    0    0 S    0  0.0   1:36.79 ksoftirqd/1
>
>    6 root      RT   0     0    0    0 S    0  0.0   0:01.46 migration/2
>
>    7 root      34  19     0    0    0 R    0  0.0   1:49.83 ksoftirqd/2
>
>    8 root      RT   0     0    0    0 S    0  0.0   0:00.79 migration/3
>
>    9 root      34  19     0    0    0 S    0  0.0   1:38.18 ksoftirqd/3
>
>   10 root      10  -5     0    0    0 S    0  0.0   1:02.11 events/0
>
>   11 root      10  -5     0    0    0 S    0  0.0   1:03.27 events/1
>
>   12 root      10  -5     0    0    0 S    0  0.0   1:01.76 events/2
>
>   13 root      10  -5     0    0    0 S    0  0.0   1:02.29 events/3
>
>   14 root      10  -5     0    0    0 S    0  0.0   0:00.01 khelper
>
> 1016 root      10  -5     0    0    0 S    0  0.0   0:00.00 kthread
>
> 1054 root      10  -5     0    0    0 S    0  0.0   0:03.08 kblockd/0
>
> 1055 root      10  -5     0    0    0 S    0  0.0   0:02.83 kblockd/1
>
> 1056 root      10  -5     0    0    0 S    0  0.0   0:03.19 kblockd/2
>
>
>
>
> The CPU Load shoots upto 40 during peak time.
>
> Following is my postgresql.conf (without comments)
>
> hba_file = '/var/lib/pgsql/data/pg_hba.conf'
> listen_addresses = '*'
> port = 5432
> max_connections = 1800
> shared_buffers = 300000
> max_fsm_relations = 1000
> effective_cache_size = 200000
> log_destination = 'stderr'
> redirect_stderr = on
> log_rotation_age = 0
> log_rotation_size = 10240
> silent_mode = onlog_line_prefix = '%t %d %u '
> autovacuum = on
> datestyle = 'iso, dmy'
> lc_messages = 'en_US.UTF-8'
> lc_monetary = 'en_US.UTF-8'
> lc_numeric = 'en_US.UTF-8'
> lc_time = 'en_US.UTF-8'
>
> User Access
> Total Number of Users is 500
> Maximum number of Concurrent users will be 500 during peak time
> Off Peak time the maximum number of concurrent user will be around 150 to
> 200.
>
>
> Please let me know your suggestions to improve the performance.

The very first step is to determine if you are cpu bound or i/o bound.
 You need to monitor top or vmstat during high load period and report
the results here.  Is the DS4700 direct attached?  Sometimes using a
SAN can throw the iowait numbers off a bit.  I bet you are simply
underpowered in I/O department.

merlin

Re: High CPU load on Postgres Server during Peak times!!!!

От

Andy Colson

Дата:

22 сентября 2009 г., 14:19:57

Shiva Raman wrote:
> Dear all
>
>   I am having a problem of high cpu loads in my postgres server during
> peak time. Following are the
> details of my setup (details as per the postgres wiki) .
>
>
> *Following is the output of TOP command during offpeak time.*
>
>
> top - 18:36:56 up 77 days, 20:33,  1 user,  load average: 12.99, 9.22, 10.37
> Tasks: 142 total,  12 running, 130 sleeping,   0 stopped,   0 zombie
> Cpu(s): 46.1%us,  1.9%sy,  0.0%ni,  6.1%id,  3.0%wa,  0.0%hi,  0.1%si,
> 42.9%st
> Mem:  16133676k total, 13657396k used,  2476280k free,   450908k buffers
> Swap: 14466492k total,      124k used, 14466368k free, 11590056k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
> 22458 postgres  19   0 2473m 477m 445m R   40  3.0   0:15.49 postmaster
>
> 22451 postgres  15   0 2442m 447m 437m S   33  2.8   0:30.44 postmaster
>
> 22464 postgres  17   0 2443m 397m 383m R   28  2.5   0:13.78 postmaster
>
> 22484 postgres  16   0 2448m 431m 412m S   20  2.7   0:02.73 postmaster
>
> 22465 postgres  17   0 2440m 461m 449m R   15  2.9   0:03.52 postmaster
>
> 22452 postgres  16   0 2450m 727m 706m R   13  4.6   0:23.46 postmaster
>
> 22476 postgres  16   0 2437m 413m 405m S   13  2.6   0:06.11 postmaster
>
> 22485 postgres  16   0 2439m 230m 222m R    7  1.5   0:05.72 postmaster
>
> 22481 postgres  15   0 2436m 175m 169m S    7  1.1   0:04.44 postmaster
>
> 22435 postgres  17   0 2438m 371m 361m R    6  2.4   1:17.92 postmaster
>
> 22440 postgres  17   0 2445m 497m 483m R    5  3.2   1:44.50 postmaster
>
> 22486 postgres  17   0 2432m  84m  81m R    4  0.5   0:00.76 postmaster
>

First off, nice report.

I see you are on a pretty old version of pg.  Are you vacuuming regularly?

If you run a 'ps ax|grep post' do you see anything that says 'idle in
transaction'?  (I hope that old of version will show it.  my processes
show up as postgres not postmaster)

The top looks like you are cpu bound.  Have you tried enabling logging
slow queries? (again, I hope your version supports that)  It could be
you have a query or two that are not using indexes, and slowing
everything down.

Also on the top, it has this: 42.9%st.  Are you in a vm?  or running
vm's on the box?

Its weird, you have 6.1% idle and 3.0% waiting for disk and yet you have
a load of 13.  Load usually means somebody is waiting for something.
But you have a little cpu idle time... and you have very low disk
waits... you are using very little swap.  hum... odd...

-Andy

Re: High CPU load on Postgres Server during Peak times!!!!

От

Andy Colson

Дата:

22 сентября 2009 г., 14:35:37

Andy Colson wrote:
> Shiva Raman wrote:
>> Dear all
>>
>>   I am having a problem of high cpu loads in my postgres server during
>> peak time. Following are the
>> details of my setup (details as per the postgres wiki) .
>>
>>
>> *Following is the output of TOP command during offpeak time.*
>>
>>
>> top - 18:36:56 up 77 days, 20:33,  1 user,  load average: 12.99, 9.22,
>> 10.37
>> Tasks: 142 total,  12 running, 130 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 46.1%us,  1.9%sy,  0.0%ni,  6.1%id,  3.0%wa,  0.0%hi,  0.1%si,
>> 42.9%st
>> Mem:  16133676k total, 13657396k used,  2476280k free,   450908k buffers
>> Swap: 14466492k total,      124k used, 14466368k free, 11590056k cached
>>
>
>
> First off, nice report.
>
> I see you are on a pretty old version of pg.  Are you vacuuming regularly?
>
> If you run a 'ps ax|grep post' do you see anything that says 'idle in
> transaction'?  (I hope that old of version will show it.  my processes
> show up as postgres not postmaster)
>
> The top looks like you are cpu bound.  Have you tried enabling logging
> slow queries? (again, I hope your version supports that)  It could be
> you have a query or two that are not using indexes, and slowing
> everything down.
>
> Also on the top, it has this: 42.9%st.  Are you in a vm?  or running
> vm's on the box?
>
> Its weird, you have 6.1% idle and 3.0% waiting for disk and yet you have
> a load of 13.  Load usually means somebody is waiting for something. But
> you have a little cpu idle time... and you have very low disk waits...
> you are using very little swap.  hum... odd...
>
> -Andy
>

Looks like I missed an important point.  You said this was top during
off peak time.  So ignore my high load ramblings.

But... if this is off peak, and you only have 6% idle cpu... I'd say
your cpu bound.  (I'm still not sure what the 42.9%st is, so maybe I'm
off base with the 6% idle too)

-Andy

Re: High CPU load on Postgres Server during Peak times!!!!

От

"Fernando Hevia"

Дата:

22 сентября 2009 г., 15:28:53


> -----Mensaje original-----
> De: Shiva Raman
> Enviado el: Martes, 22 de Septiembre de 2009 10:55
> Para: pgsql-performance@postgresql.org
> Asunto: [PERFORM] High CPU load on Postgres Server during
> Peak times!!!!
>
> Dear all
>
>   I am having a problem of high cpu loads in my postgres
> server during peak time.


Some quick advice:

>
> clusternode2:~ # rpm -qa | grep postgres
> postgresql-devel-8.1.9-1.2
> postgresql-8.1.9-1.2
> postgresql-docs-8.1.9-1.2
> postgresql-server-8.1.9-1.2
> postgresql-libs-64bit-8.1.9-1.2
> postgresql-libs-8.1.9-1.2
> postgresql-jdbc-8.1-12.2
> postgresql-contrib-8.1.9-1.2
>
>

8.1 is quite old. Consider upgrading as newer versions are faster.
Current Postgres version is 8.4.

>
> High Availability Cluster with two IBM P Series Server and
> one DS4700 Storage
>
> IBM P series P52A with 2-core 2.1 Ghz POWER5+ Processor Card
> , 36 MB L3 Cache ,16 GB of RAM,
> 73.4 GB 10,000 RPM Ultra320 SCSI Drive for Operating System .
>

Sounds you are underpowered on cpu for 500 concurrent users.
Of course this really depends on what they are doing.

>
> IBM SAN DS4700 Storage with Fibre Channel HDD (73.4 GB * 10)
> Two Partitions - 73.4 GB * 3 RAID 5 - 134 GB storage
> partitions (One holding Jakarata tomcat
> application server and other holding Postgresql Database) .
> Four Hard disk RAID 5 with ext3 file systems hold the pgdata on SAN .
> Hard disk rotational speed is 73 GB 15K IBM 2 GB Fibre channel
>

A more suitable partitioning for an OLTP database would be:

2 x 73.4 GB RAID 1 for App Server + Postgresql and pg_xlog
8 x 73.4 GB RAID 10 for pgdata

RAID 5 is strongly discouraged.

>
> Following is the output of TOP command during offpeak time.
>
>
> top - 18:36:56 up 77 days, 20:33,  1 user,  load average:
> 12.99, 9.22, 10.37
> Tasks: 142 total,  12 running, 130 sleeping,   0 stopped,   0 zombie
> Cpu(s): 46.1%us,  1.9%sy,  0.0%ni,  6.1%id,  3.0%wa,  0.0%hi,
>  0.1%si, 42.9%st
> Mem:  16133676k total, 13657396k used,  2476280k free,
> 450908k buffers
> Swap: 14466492k total,      124k used, 14466368k free,
> 11590056k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
> 22458 postgres  19   0 2473m 477m 445m R   40  3.0   0:15.49
> postmaster
> 22451 postgres  15   0 2442m 447m 437m S   33  2.8   0:30.44
> postmaster
> 22464 postgres  17   0 2443m 397m 383m R   28  2.5   0:13.78
> postmaster
> 22484 postgres  16   0 2448m 431m 412m S   20  2.7   0:02.73
> postmaster
> 22465 postgres  17   0 2440m 461m 449m R   15  2.9   0:03.52
> postmaster
> 22452 postgres  16   0 2450m 727m 706m R   13  4.6   0:23.46
> postmaster
> 22476 postgres  16   0 2437m 413m 405m S   13  2.6   0:06.11
> postmaster
> 22485 postgres  16   0 2439m 230m 222m R    7  1.5   0:05.72
> postmaster
> 22481 postgres  15   0 2436m 175m 169m S    7  1.1   0:04.44
> postmaster
> 22435 postgres  17   0 2438m 371m 361m R    6  2.4   1:17.92
> postmaster
> 22440 postgres  17   0 2445m 497m 483m R    5  3.2   1:44.50
> postmaster
> 22486 postgres  17   0 2432m  84m  81m R    4  0.5   0:00.76
> postmaster
>

Are you running several Postgres clusters on this hardware?
Please post Top output showing cmd line arguments (press 'c')


>
> User Access
> Total Number of Users is 500
> Maximum number of Concurrent users will be 500 during peak time
> Off Peak time the maximum number of concurrent user will be
> around 150 to 200.
>

A connection pooler like pgpool or pgbouncer would considerably reduce the
burden on your system.


Regards,
Fernando.

Re: High CPU load on Postgres Server during Peak times!!!!

От

Shiva Raman

Дата:

23 сентября 2009 г., 18:25:20

Thanks a lot for the reply.

I see you are on a pretty old version of pg. Are you vacuuming regularly?

Yes, Vaccuuming is done every day morning at 06 am

It is running perfectly fine.

If you run a 'ps ax|grep post' do you see anything that says 'idle in transaction'? (I hope that old of version will show it. my processes show up as postgres not postmaster)

Lots of requests shows as 'idle in transaction'.

Currently i am restarting the database using a cron job every 30 minutes during offpeak time

and every 15 minutes during the peak time.

The top looks like you are cpu bound.

Have you tried enabling logging slow queries? (again, I hope your version supports that) It could be you have a query or two that are not using indexes, and slowing everything down.

Exactly right, thanks for the tip.

I indexed few tables frequently accessed which are not indexed. After indexing the load has come down to 50 % during Peak time its between 10 and 20 and during offpeak its between 4 and 8 .

The PowerPC cpu is having some virtual layer that is shown in the Steal value.

Its weird, you have 6.1% idle and 3.0% waiting for disk and yet you have a load of 13. Load usually means somebody is waiting for something. But you have a little cpu idle time... and you have very low disk waits... you are using very little swap. hum... odd...

As per the concurrency of 300 to 400 users, the following parameters are changed in

postgresql conf based on the calculation provided in the postgresql documentation.

Max connections = 1800 ( Too much open connections will result in unwanted memory wastage)

Shared Buffers = 375 000 ( 375000 * 8 * 1024 /100 = 3072 MB ) # proposed value is 1/4 the actual memory

Effective Cache Size = 266000 ( 266000 * 8 * 1024 /100 = 2179 MB ) # proposed value is 1/3 memory after OS Allocation

work_mem = 3000 ( 3000 * max connections * 1024 = 3000 * 1800 * 1024 = 5529 MB ( this is the working memory for postgres) )

max_fsm_pages = 20000 ( This has to be analyzed and can be increased to 40000, this can be done after one or two day observation)

Postgresql.conf

---------------

hba_file = '/var/lib/pgsql/data/pg_hba.conf'

listen_addresses = '*'

port = 5432

max_connections = 1800

shared_buffers = 300000

max_fsm_relations = 1000

effective_cache_size = 200000

log_destination = 'stderr'

redirect_stderr = on

log_rotation_age = 0

log_rotation_size = 10240

silent_mode = onlog_line_prefix = '%t %d %u '

autovacuum = on

datestyle = 'iso, dmy'

lc_messages = 'en_US.UTF-8'

lc_monetary = 'en_US.UTF-8'

lc_numeric = 'en_US.UTF-8'

lc_time = 'en_US.UTF-8'

Any modifications i have to do in this values ?

Regds

Shiva Raman .

Re: High CPU load on Postgres Server during Peak times!!!!

От

Shiva Raman

Дата:

23 сентября 2009 г., 18:28:50

Hi

Thanks for your mail.

Some quick advice:

> clusternode2:~ # rpm -qa | grep postgres

> postgresql-devel-8.1.9-1.2

> postgresql-8.1.9-1.2

> postgresql-docs-8.1.9-1.2

> postgresql-server-8.1.9-1.2

> postgresql-libs-64bit-8.1.9-1.2

> postgresql-libs-8.1.9-1.2

> postgresql-jdbc-8.1-12.2

> postgresql-contrib-8.1.9-1.2

8.1 is quite old. Consider upgrading as newer versions are faster.

Current Postgres version is 8.4.

> High Availability Cluster with two IBM P Series Server and

> one DS4700 Storage

> IBM P series P52A with 2-core 2.1 Ghz POWER5+ Processor Card

> , 36 MB L3 Cache ,16 GB of RAM,

> 73.4 GB 10,000 RPM Ultra320 SCSI Drive for Operating System .

Sounds you are underpowered on cpu for 500 concurrent users.

Of course this really depends on what they are doing.

> IBM SAN DS4700 Storage with Fibre Channel HDD (73.4 GB * 10)

> Two Partitions - 73.4 GB * 3 RAID 5 - 134 GB storage

> partitions (One holding Jakarata tomcat

> application server and other holding Postgresql Database) .

> Four Hard disk RAID 5 with ext3 file systems hold the pgdata on SAN .

> Hard disk rotational speed is 73 GB 15K IBM 2 GB Fibre channel

A more suitable partitioning for an OLTP database would be:

2 x 73.4 GB RAID 1 for App Server + Postgresql and pg_xlog

8 x 73.4 GB RAID 10 for pgdata

RAID 5 is strongly discouraged.

- Show quoted text -

> Following is the output of TOP command during offpeak time.

> top - 18:36:56 up 77 days, 20:33, 1 user, load average:

> 12.99, 9.22, 10.37

> Tasks: 142 total, 12 running, 130 sleeping, 0 stopped, 0 zombie

> Cpu(s): 46.1%us, 1.9%sy, 0.0%ni, 6.1%id, 3.0%wa, 0.0%hi,

> 0.1%si, 42.9%st

> Mem: 16133676k total, 13657396k used, 2476280k free,

> 450908k buffers

> Swap: 14466492k total, 124k used, 14466368k free,

> 11590056k cached

> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+

> COMMAND

> 22458 postgres 19 0 2473m 477m 445m R 40 3.0 0:15.49

> postmaster

> 22451 postgres 15 0 2442m 447m 437m S 33 2.8 0:30.44

> postmaster

> 22464 postgres 17 0 2443m 397m 383m R 28 2.5 0:13.78

> postmaster

> 22484 postgres 16 0 2448m 431m 412m S 20 2.7 0:02.73

> postmaster

> 22465 postgres 17 0 2440m 461m 449m R 15 2.9 0:03.52

> postmaster

> 22452 postgres 16 0 2450m 727m 706m R 13 4.6 0:23.46

> postmaster

> 22476 postgres 16 0 2437m 413m 405m S 13 2.6 0:06.11

> postmaster

> 22485 postgres 16 0 2439m 230m 222m R 7 1.5 0:05.72

> postmaster

> 22481 postgres 15 0 2436m 175m 169m S 7 1.1 0:04.44

> postmaster

> 22435 postgres 17 0 2438m 371m 361m R 6 2.4 1:17.92

> postmaster

> 22440 postgres 17 0 2445m 497m 483m R 5 3.2 1:44.50

> postmaster

> 22486 postgres 17 0 2432m 84m 81m R 4 0.5 0:00.76

> postmaster

Are you running several Postgres clusters on this hardware?

Please post Top output showing cmd line arguments (press 'c')

NO Only single Postgres instance

> User Access

> Total Number of Users is 500

> Maximum number of Concurrent users will be 500 during peak time

> Off Peak time the maximum number of concurrent user will be

> around 150 to 200.

A connection pooler like pgpool or pgbouncer would considerably reduce the

burden on your system.

I am already using connection pooling in tomcat web server, so installing pgpool

will help enhancing the performance ?Any changes i have to do in my application to

include pgpool?

Regds

Shiva raman

Re: High CPU load on Postgres Server during Peak times!!!!

От

Andy Colson

Дата:

23 сентября 2009 г., 18:53:24

Shiva Raman wrote:
> /If you run a 'ps ax|grep post' do you see anything that says 'idle in
> transaction'? (I hope that old of version will show it. my processes
> show up as postgres not postmaster)/
>
>
> Lots of requests shows as 'idle in transaction'.
>

Eww.  I think that's bad.  A connection that has a transaction open will
cause lots of row versions, which use up ram, and make it slower to step
through the table (even with an index).  You really need to fix up your
code and make sure you commit transactions.  (any statement (select,
insert, update) will start a new transaction that you need to explicitly
commit).

>
> Currently i am restarting the database using a cron job every 30 minutes
> during offpeak time
>
> and every 15 minutes during the peak time.

do you get lots of update/deletes?  Or are there mostly selects?  If its
mostly update/delete then the 'idle in transactions' is killing you.  If
you have mostly selects then its probably something else.

> work_mem = 3000 ( 3000 * max connections * 1024 = 3000 * 1800 * 1024 =
> 5529 MB ( this is the working memory for postgres) )

work_mem is per connection.  If you changed this to get a better query
plan then ok, but dont change it just for the sake of changing it.
Ick... I just went back and checked, you have 16G of ram... this
probably isn't a problem.  Nevermind.

-Andy

Re: High CPU load on Postgres Server during Peak times!!!!

От

"Fernando Hevia"

Дата:

23 сентября 2009 г., 20:49:05

>>>
>>> User Access
>>> Total Number of Users is 500
>>> Maximum number of Concurrent users will be 500 during peak time
>>> Off Peak time the maximum number of concurrent user will be
>>> around 150 to 200.
>>>
>>
>>A connection pooler like pgpool or pgbouncer would considerably reduce the
>>burden on your system.
>>
>
>I am already using connection pooling in tomcat web server, so installing
pgpool
>will help enhancing the performance ?Any changes i have to do in my
application to
>include pgpool?
>

There shouldn't be need for another pooling solution.
Anyway, you probably dont want 1800 concurrent connections on your database
server, nor even get near that number.

Check the number of actual connections with:
  select count(*) from pg_stat_activity;

A vmstat run during high loads could provide a hindsight to if the number of
connections is straining your server.

If the number of connections is high (say over 200-300), try reducing the
pool size in Tomcat and see what happens.
You possibly could do fine with something between 50 and 100 connections.


Regards,
Fernando.

Re: High CPU load on Postgres Server during Peak times!!!!

От

Karl Denninger

Дата:

23 сентября 2009 г., 20:53:04

Fernando Hevia wrote:

User Access
Total Number of Users is 500
Maximum number of Concurrent users will be 500 during peak time
Off Peak time the maximum number of concurrent user will be
around 150 to 200.

A connection pooler like pgpool or pgbouncer would considerably reduce the
burden on your system.

I am already using connection pooling in tomcat web server, so installing

pgpool

will help enhancing the performance ?Any changes i have to do in my

application to

include pgpool?

There shouldn't be need for another pooling solution.
Anyway, you probably dont want 1800 concurrent connections on your database
server, nor even get near that number.

Check the number of actual connections with:  select count(*) from pg_stat_activity;

A vmstat run during high loads could provide a hindsight to if the number of
connections is straining your server.

If the number of connections is high (say over 200-300), try reducing the
pool size in Tomcat and see what happens.
You possibly could do fine with something between 50 and 100 connections.

I can second this - I have an EXTREMELY busy forum system using pgpool and during peak hours it runs very well within around 100 connections in use.

-- Karl

Вложения

karl.vcf

Re: High CPU load on Postgres Server during Peak times!!!!

От

Scott Marlowe

Дата:

23 сентября 2009 г., 20:55:37

On Wed, Sep 23, 2009 at 12:25 PM, Shiva Raman <raman.shivag@gmail.com> wrote:

First let me say that upgrading to a later version is likely going to
help as much as anything else you're likely to pick up from this
discussion.  Not that this discussion isn't worthwhile, it is.

> If you run a 'ps ax|grep post' do you see anything that says 'idle in
> transaction'? (I hope that old of version will show it. my processes show up
> as postgres not postmaster)
>
> Lots of requests shows as 'idle in transaction'.
>
> Currently i am restarting the database using a cron job every 30 minutes
> during offpeak time
>
> and every 15 minutes during the peak time.

Wow.  It'd be way better if you could fix your application /
connection layer to not do that.

> As per the concurrency of 300 to 400 users, the following parameters are
> changed in
>
> postgresql conf based on the calculation provided in the postgresql
> documentation.
>
> Max connections = 1800 ( Too much open connections will result in unwanted
> memory wastage)

This is very high.  If you only need 400 users, you might want to
consider setting this to 500 or so.

> Shared Buffers = 375 000 ( 375000 * 8 * 1024 /100 = 3072 MB ) # proposed
> value is 1/4 the actual memory

Reasonable, but don't just blindly use 1/4 memory.  For transactional
loads smaller is often better.  For reporting dbs, larger is often
better.  Test it to see what happens with your load and varying
amounts of shared_buffers

> Effective Cache Size = 266000 ( 266000 * 8 * 1024 /100 = 2179 MB ) #
> proposed value is 1/3 memory after OS Allocation

Better to add the cache / buffer amount of OS and shared_buffers to
get it.  Which would be much higher.   Generally it's in the 3/4 of
memory on most machines.

> work_mem = 3000 ( 3000 * max connections * 1024 = 3000 * 1800 * 1024 = 5529
> MB ( this is the working memory for postgres) )

This is the max work_mem per sort or hash aggregate.  Note that if all
of your maximum backends connected and each did 2 sorts and one hash
aggregate at once, you could use max_connections * 3 * work_mem memory
at once.  Machine swaps til it dies.

Assuming this is 3000 8k blocks that 24Meg which is high but not unreasonable.

> max_fsm_pages = 20000 ( This has to be analyzed and can be increased to
> 40000, this can be done after one or two day observation)

To see what you need here, log into the postgres database as a
superuser and issue the command:

vacuum verbose;

and see what the last 5 or so lines have to say.  They'll look like this:

INFO:  free space map contains 339187 pages in 18145 relations
DETAIL:  A total of 623920 page slots are in use (including overhead).
623920 page slots are required to track all free space.
Current limits are:  10000000 page slots, 500000 relations, using 109582 kB.

Re: High CPU load on Postgres Server during Peak times!!!!

От

Shiva Raman

Дата:

24 сентября 2009 г., 12:50:42

Hi

Today the load observed very high load . I am pasting the top.

TOP
top - 12:45:23 up 79 days, 14:42, 1 user, load average: 45.84, 33.13, 25.84
Tasks: 394 total, 48 running, 346 sleeping,   0 stopped,   0 zombie
Cpu(s): 49.2%us, 0.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 50.0%st
Mem: 16133676k total, 14870736k used, 1262940k free,   475484k buffers
Swap: 14466492k total,      124k used, 14466368k free, 11423616k cached

PID USER      PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
4152 postgres 17   0 2436m 176m 171m R   16 1.1   0:03.09 postgres: postgres dbEnterpriser_09_10 192.168.10.
4122 postgres 17   0 2431m 20m 17m R   12 0.1   0:06.38 postgres: postgres dbEnterpriser_09_10 192.168.10.
4007 postgres 16   0 2434m 80m 75m R   11 0.5   0:26.46 postgres: postgres dbEnterpriser_09_10 192.168.10.
3994 postgres 16   0 2432m 134m 132m R   10 0.9   0:43.40 postgres: postgres dbEnterpriser_09_10 192.168.10.
4166 postgres 16   0 2433m 12m 8896 R    9 0.1   0:02.71 postgres: postgres dbEnterpriser_09_10 192.168.10.
4110 postgres 15   0 2436m 224m 217m S    8 1.4   0:06.83 postgres: postgres dbEnterpriser_09_10 192.168.10.
4061 postgres 16   0 2446m 491m 473m R    8 3.1   0:17.32 postgres: postgres dbEnterpriser_09_10 192.168.10.
4113 postgres 16   0 2432m 68m 65m R    8 0.4   0:11.03 postgres: postgres dbEnterpriser_09_10 192.168.10.
4071 postgres 16   0 2435m 200m 194m R    7 1.3   0:13.69 postgres: postgres dbEnterpriser_09_10 192.168.10.
4169 postgres 15   0 2436m 122m 117m R    7 0.8   0:00.93 postgres: postgres dbEnterpriser_09_10 192.168.10.
4178 postgres 16   0 2432m 77m 75m R    7 0.5   0:00.56 postgres: postgres dbEnterpriser_09_10 192.168.10.
4108 postgres 16   0 2437m 301m 293m R    6 1.9   0:11.94 postgres: postgres dbEnterpriser_09_10 192.168.10.
4155 postgres 16   0 2438m 252m 244m S    5 1.6   0:02.80 postgres: postgres dbEnterpriser_09_10 192.168.10.
4190 postgres 15   0 2432m 10m 8432 R    5 0.1   0:00.71 postgres: postgres dbEnterpriser_09_10 192.168.10.
3906 postgres 16   0 2433m 124m 119m R    5 0.8   0:57.28 postgres: postgres dbEnterpriser_09_10 192.168.10.
3970 postgres 16   0 2442m 314m 304m R    5 2.0   0:16.43 postgres: postgres dbEnterpriser_09_10 192.168.10.
4130 postgres 17   0 2433m 76m 72m R    5 0.5   0:03.76 postgres: postgres dbEnterpriser_09_10 192.168.10.
4179 postgres 16   0 2432m 105m 102m R    5 0.7   0:01.11 postgres: postgres dbEnterpriser_09_10 192.168.10.
4125 postgres 17   0 2436m 398m 391m R    4 2.5   0:05.62 postgres: postgres dbEnterpriser_09_10 192.168.10.
4162 postgres 16   0 2432m 125m 122m R    4 0.8   0:01.01 postgres: postgres dbEnterpriser_09_10 192.168.10.
217m S    8 1.4   0:06.83 postgres: postgres dbEnterpriser_09_10 192.168.10. dbEnterpriser_09_10 192.168.10.
4061 postgres 16   0 2446m 491m 473m R    8 3.1   0:17.32 postgres: postgres dbEnterpriser_09_10 192.168.10.
4113 postgres 16   0 2432m 68m 65m R    8 0.4   0:11.03 postgres: postgres dbEnterpriser_09_10 192.168.10.
4071 postgres 16   0 2435m 200m 194m R    7 1.3   0:13.69 postgres: postgres dbEnterpriser_09_10 192.168.10.
4169 postgres 15   0 2436m 122m 117m R    7 0.8   0:00.93 postgres: postgres dbEnterpriser_09_10 192.168.10.
4178 postgres 16   0 2432m 77m 75m R    7 0.5   0:00.56 postgres: postgres dbEnterpriser_09_10 192.168.10.
4108 postgres 16   0 2437m 301m 293m R    6 1.9   0:11.94 postgres: postgres dbEnterpriser_09_10 192.168.10.
4155 postgres 16   0 2438m 252m 244m S    5 1.6   0:02.80 postgres: postgres dbEnterpriser_09_10 192.168.10.
4190 postgres 15   0 2432m 10m 8432 R    5 0.1   0:00.71 postgres: postgres dbEnterpriser_09_10 192.168.10.
3906 postgres 16   0 2433m 124m 119m R    5 0.8   0:57.28 postgres: postgres dbEnterpriser_09_10 192.168.10.
3970 postgres 16   0 2442m 314m 304m R    5 2.0   0:16.43 postgres: postgres dbEnterpriser_09_10 192.168.10.
4130 postgres 17   0 2433m 76m 72m R    5 0.5   0:03.76 postgres: postgres dbEnterpriser_09_10 192.168.10.
4179 postgres 16   0 2432m 105m 102m R    5 0.7   0:01.11 postgres: postgres dbEnterpriser_09_10 192.168.10.
4125 postgres 17   0 2436m 398m 391m R    4 2.5   0:05.62 postgres: postgres dbEnterpriser_09_10 192.168.10.
4162 postgres 16   0 2432m 125m 122m R    4 0.8   0:01.01 postgres: postgres dbEnterpriser_09_10 192.168.10.
4185 postgres 1

OUTPUT OF IOSTAT 1 5 (is SAN becoming a bottleneck,shows 50% CPU usage?)

clusternode2:~ # iostat 1 5
Linux 2.6.16.46-0.12-ppc64 (clusternode2)       09/24/2009      _ppc64_ (4 CPU)

avg-cpu: %user   %nice %system %iowait %steal   %idle
          16.00    0.00    0.68    0.61   10.72   71.99

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.08         1.96        22.54   13505448 155494808
sdb               0.00         0.20         0.45    1410179    3099920
sdc               0.00         0.05         0.01     357404      78840
scd0              0.00         0.00         0.00        136          0
sdd              12.20        77.69       343.49 535925176 2369551848
sde               0.00         0.00         0.00       1120          0

avg-cpu: %user   %nice %system %iowait %steal   %idle
          29.46    0.00    0.25    0.00    7.43   62.87

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
scd0              0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0
sde               0.00         0.00         0.00          0          0

avg-cpu: %user   %nice %system %iowait %steal   %idle
          46.17    0.00    0.99    0.00   38.52   14.32

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
scd0              0.00         0.00         0.00          0          0
sdd               3.96         0.00       118.81          0        120
sde               0.00         0.00         0.00          0          0

avg-cpu: %user   %nice %system %iowait %steal   %idle
          48.88    0.00    0.99    0.00   49.88    0.25

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
scd0              0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0
sde               0.00         0.00         0.00          0          0

avg-cpu: %user   %nice %system %iowait %steal   %idle
          47.86    0.00    2.14    0.00   50.00    0.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
scd0              0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0
sde               0.00         0.00         0.00          0          0

Andy Colson Wrote : ,
Eww. I think that's bad. A connection that has a transaction open will cause lots of row versions, which use up ram, and make it slower to step through the table (even with an index). You really need to fix up your code and make sure you commit transactions. (any statement (select, insert, update) will start a new transaction that you need to explicitly commit).

With reference to this suggestion by Andy Colson, we checked the application code and found that onlyINSERT, UPDATE has COMMIT and SELECT has no commit, We are using a lot of "Ajax Suggest" in the all the forms accessed for fetching the data using SELECT statements which are not explicitly commited. We have started updating the code on this.

Thanks for this suggestion.

Again thanks to suggestion of Scott Marlowe in reducing the number of connections. This was now reducted to 500 .

As i mentioned in the mail, i am restarting the database every 30 minutes. I found a shell script in the wiki which could the idle in transaction pids. This is the code. The code will kill all old pids in the server.

This is the script

/usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' | \


  /usr/bin/wc -l ` -gt 20 && /usr/bin/pkill -o -f 'idle in transaction'

and this is the link where the script was provided.

http://wiki.dspace.org/index.php/Idle_In_Transaction_Problem

I tried it run it as test in the server, but the script is not executing. Even i see many of the "Idle in transaction " PIDs are showing R (RUnning status) , but most of them are showing S(Sleep ) status. Please suggest anyway i can resolve this idle transaction issue.

Regards

Shiva Raman

Re: High CPU load on Postgres Server during Peak times!!!!

От

Praveen DS

Дата:

24 сентября 2009 г., 13:36:16

For 'idle in transaction' issues, you have to fix your code. I faced this issue couple of months back. How good is your exception handling? Are you rollingback/comitting your transactions while exceptions are thrown, during the course of db operations?

Honestly I wouldn't go for these scripts which kill processes.

On Thu, Sep 24, 2009 at 6:20 PM, Shiva Raman <raman.shivag@gmail.com> wrote:

Hi

Today the load observed very high load . I am pasting the top.

TOP
top - 12:45:23 up 79 days, 14:42, 1 user, load average: 45.84, 33.13, 25.84
Tasks: 394 total, 48 running, 346 sleeping,   0 stopped,   0 zombie
Cpu(s): 49.2%us, 0.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 50.0%st
Mem: 16133676k total, 14870736k used, 1262940k free,   475484k buffers
Swap: 14466492k total,      124k used, 14466368k free, 11423616k cached

PID USER      PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
4152 postgres 17   0 2436m 176m 171m R   16 1.1   0:03.09 postgres: postgres dbEnterpriser_09_10 192.168.10.
4122 postgres 17   0 2431m 20m 17m R   12 0.1   0:06.38 postgres: postgres dbEnterpriser_09_10 192.168.10.
4007 postgres 16   0 2434m 80m 75m R   11 0.5   0:26.46 postgres: postgres dbEnterpriser_09_10 192.168.10.
3994 postgres 16   0 2432m 134m 132m R   10 0.9   0:43.40 postgres: postgres dbEnterpriser_09_10 192.168.10.
4166 postgres 16   0 2433m 12m 8896 R    9 0.1   0:02.71 postgres: postgres dbEnterpriser_09_10 192.168.10.
4110 postgres 15   0 2436m 224m 217m S    8 1.4   0:06.83 postgres: postgres dbEnterpriser_09_10 192.168.10.
4061 postgres 16   0 2446m 491m 473m R    8 3.1   0:17.32 postgres: postgres dbEnterpriser_09_10 192.168.10.
4113 postgres 16   0 2432m 68m 65m R    8 0.4   0:11.03 postgres: postgres dbEnterpriser_09_10 192.168.10.
4071 postgres 16   0 2435m 200m 194m R    7 1.3   0:13.69 postgres: postgres dbEnterpriser_09_10 192.168.10.
4169 postgres 15   0 2436m 122m 117m R    7 0.8   0:00.93 postgres: postgres dbEnterpriser_09_10 192.168.10.
4178 postgres 16   0 2432m 77m 75m R    7 0.5   0:00.56 postgres: postgres dbEnterpriser_09_10 192.168.10.
4108 postgres 16   0 2437m 301m 293m R    6 1.9   0:11.94 postgres: postgres dbEnterpriser_09_10 192.168.10.
4155 postgres 16   0 2438m 252m 244m S    5 1.6   0:02.80 postgres: postgres dbEnterpriser_09_10 192.168.10.
4190 postgres 15   0 2432m 10m 8432 R    5 0.1   0:00.71 postgres: postgres dbEnterpriser_09_10 192.168.10.
3906 postgres 16   0 2433m 124m 119m R    5 0.8   0:57.28 postgres: postgres dbEnterpriser_09_10 192.168.10.
3970 postgres 16   0 2442m 314m 304m R    5 2.0   0:16.43 postgres: postgres dbEnterpriser_09_10 192.168.10.
4130 postgres 17   0 2433m 76m 72m R    5 0.5   0:03.76 postgres: postgres dbEnterpriser_09_10 192.168.10.
4179 postgres 16   0 2432m 105m 102m R    5 0.7   0:01.11 postgres: postgres dbEnterpriser_09_10 192.168.10.
4125 postgres 17   0 2436m 398m 391m R    4 2.5   0:05.62 postgres: postgres dbEnterpriser_09_10 192.168.10.
4162 postgres 16   0 2432m 125m 122m R    4 0.8   0:01.01 postgres: postgres dbEnterpriser_09_10 192.168.10.
217m S    8 1.4   0:06.83 postgres: postgres dbEnterpriser_09_10 192.168.10. dbEnterpriser_09_10 192.168.10.
4061 postgres 16   0 2446m 491m 473m R    8 3.1   0:17.32 postgres: postgres dbEnterpriser_09_10 192.168.10.
4113 postgres 16   0 2432m 68m 65m R    8 0.4   0:11.03 postgres: postgres dbEnterpriser_09_10 192.168.10.
4071 postgres 16   0 2435m 200m 194m R    7 1.3   0:13.69 postgres: postgres dbEnterpriser_09_10 192.168.10.
4169 postgres 15   0 2436m 122m 117m R    7 0.8   0:00.93 postgres: postgres dbEnterpriser_09_10 192.168.10.
4178 postgres 16   0 2432m 77m 75m R    7 0.5   0:00.56 postgres: postgres dbEnterpriser_09_10 192.168.10.
4108 postgres 16   0 2437m 301m 293m R    6 1.9   0:11.94 postgres: postgres dbEnterpriser_09_10 192.168.10.
4155 postgres 16   0 2438m 252m 244m S    5 1.6   0:02.80 postgres: postgres dbEnterpriser_09_10 192.168.10.
4190 postgres 15   0 2432m 10m 8432 R    5 0.1   0:00.71 postgres: postgres dbEnterpriser_09_10 192.168.10.
3906 postgres 16   0 2433m 124m 119m R    5 0.8   0:57.28 postgres: postgres dbEnterpriser_09_10 192.168.10.
3970 postgres 16   0 2442m 314m 304m R    5 2.0   0:16.43 postgres: postgres dbEnterpriser_09_10 192.168.10.
4130 postgres 17   0 2433m 76m 72m R    5 0.5   0:03.76 postgres: postgres dbEnterpriser_09_10 192.168.10.
4179 postgres 16   0 2432m 105m 102m R    5 0.7   0:01.11 postgres: postgres dbEnterpriser_09_10 192.168.10.
4125 postgres 17   0 2436m 398m 391m R    4 2.5   0:05.62 postgres: postgres dbEnterpriser_09_10 192.168.10.
4162 postgres 16   0 2432m 125m 122m R    4 0.8   0:01.01 postgres: postgres dbEnterpriser_09_10 192.168.10.
4185 postgres 1

OUTPUT OF IOSTAT 1 5 (is SAN becoming a bottleneck,shows 50% CPU usage?)

clusternode2:~ # iostat 1 5
Linux 2.6.16.46-0.12-ppc64 (clusternode2)       09/24/2009      _ppc64_ (4 CPU)

avg-cpu: %user   %nice %system %iowait %steal   %idle
          16.00    0.00    0.68    0.61   10.72   71.99

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.08         1.96        22.54   13505448 155494808
sdb               0.00         0.20         0.45    1410179    3099920
sdc               0.00         0.05         0.01     357404      78840
scd0              0.00         0.00         0.00        136          0
sdd              12.20        77.69       343.49 535925176 2369551848
sde               0.00         0.00         0.00       1120          0

avg-cpu: %user   %nice %system %iowait %steal   %idle
          29.46    0.00    0.25    0.00    7.43   62.87

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
scd0              0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0
sde               0.00         0.00         0.00          0          0

avg-cpu: %user   %nice %system %iowait %steal   %idle
          46.17    0.00    0.99    0.00   38.52   14.32

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
scd0              0.00         0.00         0.00          0          0
sdd               3.96         0.00       118.81          0        120
sde               0.00         0.00         0.00          0          0

avg-cpu: %user   %nice %system %iowait %steal   %idle
          48.88    0.00    0.99    0.00   49.88    0.25

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
scd0              0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0
sde               0.00         0.00         0.00          0          0

avg-cpu: %user   %nice %system %iowait %steal   %idle
          47.86    0.00    2.14    0.00   50.00    0.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
scd0              0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0
sde               0.00         0.00         0.00          0          0

Andy Colson Wrote : ,
Eww. I think that's bad. A connection that has a transaction open will cause lots of row versions, which use up ram, and make it slower to step through the table (even with an index). You really need to fix up your code and make sure you commit transactions. (any statement (select, insert, update) will start a new transaction that you need to explicitly commit).

With reference to this suggestion by Andy Colson, we checked the application code and found that onlyINSERT, UPDATE has COMMIT and SELECT has no commit, We are using a lot of "Ajax Suggest" in the all the forms accessed for fetching the data using SELECT statements which are not explicitly commited. We have started updating the code on this.

Thanks for this suggestion.

Again thanks to suggestion of Scott Marlowe in reducing the number of connections. This was now reducted to 500 .

As i mentioned in the mail, i am restarting the database every 30 minutes. I found a shell script in the wiki which could the idle in transaction pids. This is the code. The code will kill all old pids in the server.

This is the script
/usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' | \


  /usr/bin/wc -l ` -gt 20 && /usr/bin/pkill -o -f 'idle in transaction'
and this is the link where the script was provided.

http://wiki.dspace.org/index.php/Idle_In_Transaction_Problem

I tried it run it as test in the server, but the script is not executing. Even i see many of the "Idle in transaction " PIDs are showing R (RUnning status) , but most of them are showing S(Sleep ) status. Please suggest anyway i can resolve this idle transaction issue.

Regards

Shiva Raman

Re: High CPU load on Postgres Server during Peak times!!!!

От

Andy Colson

Дата:

24 сентября 2009 г., 15:32:32

Shiva Raman wrote:
> Hi
>
> Today the load observed very high load . I am pasting the top.
>
> _*TOP *_
> top - 12:45:23 up 79 days, 14:42,  1 user,  load average: 45.84, 33.13,
> 25.84
> Tasks: 394 total,  48 running, 346 sleeping,   0 stopped,   0 zombie
> Cpu(s): 49.2%us,  0.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.1%si,
> 50.0%st
> Mem:  16133676k total, 14870736k used,  1262940k free,   475484k buffers
> Swap: 14466492k total,      124k used, 14466368k free, 11423616k cached
>
>
> _*OUTPUT OF IOSTAT 1 5 (is SAN becoming a bottleneck,shows 50% CPU
> usage?) *_
>
> clusternode2:~ # iostat 1 5
> Linux 2.6.16.46-0.12-ppc64 (clusternode2)       09/24/2009      _ppc64_
> (4 CPU)
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           16.00    0.00    0.68    0.61   10.72   71.99
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           29.46    0.00    0.25    0.00    7.43   62.87
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           46.17    0.00    0.99    0.00   38.52   14.32
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           48.88    0.00    0.99    0.00   49.88    0.25
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           47.86    0.00    2.14    0.00   50.00    0.00
>

Both top and iostat show no wait time for io.  However, they both show
wait time on the vm.  You have 50% user and 50% steal, and zero% io.

you said: "SAN becoming a bottleneck,shows 50% CPU usage?"

I'm not sure what you are looking at.  SAN is like HD right?  I assume
waiting on the SAN would show up as %iowait... yes?


>
> Andy Colson Wrote :  ,
> /Eww.  I think that's bad.  A connection that has a transaction open
> will cause lots of row versions, which use up ram, and make it slower to
> step through the table (even with an index).  You really need to fix up
> your code and make sure you commit transactions.  (any statement
> (select, insert, update) will start a new transaction that you need to
> explicitly commit).
>
> /With reference to this suggestion by Andy Colson, we checked the
> application code and found that onlyINSERT, UPDATE  has COMMIT  and
> SELECT has no commit, We are using a lot of  "Ajax Suggest" in the all
> the forms accessed for fetching the data using SELECT statements which
> are not explicitly commited. We have started updating the code on this.
>
> Thanks for this  suggestion.
>
>
> Again thanks to suggestion of Scott Marlowe in reducing the number of
> connections. This was now reducted to 500 .
>
>
> As i mentioned in the mail, i am restarting the database every 30
> minutes. I found a shell script in the wiki which could the idle in
> transaction pids. This is the code. The code will kill all old pids in
> the server.
>
> This is the script
>
> /usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' | \
>
>
>   /usr/bin/wc -l ` -gt 20 && /usr/bin/pkill -o -f 'idle in transaction'
>
> and this is the link where the script was provided.
>
> http://wiki.dspace.org/index.php/Idle_In_Transaction_Problem
>
> I tried it run it as test in the server, but the script is not
> executing. Even i see many of the "Idle in transaction " PIDs are
> showing R (RUnning status) , but most of them are showing S(Sleep )
> status. Please suggest anyway i can resolve this idle transaction issue.

fixing up the code to commit selects will make the "idle in trans." go
away.  I'm with Praveen, fix the code, avoid the scripts.

Is there anything else running on this box?  You said previously "The
PowerPC cpu is having some virtual layer that is shown in the Steal
value.".  I'm not sure what that means.  Are you in a virtual machine?
Or running other vm's?  Based on the top you posted (this one and the
very first one) you are loosing half your cpu to the vm.  (unless I'm
totally reading this wrong... I don't have experience with vm's so
please someone jump in here and correct me if I'm wrong)


-Andy

Re: High CPU load on Postgres Server during Peak times!!!!

От

Karl Denninger

Дата:

24 сентября 2009 г., 15:49:37

Andy Colson wrote:
> Shiva Raman wrote:
>> Hi
>>
>> Today the load observed very high load . I am pasting the top.
>>
>> _*TOP *_
>> top - 12:45:23 up 79 days, 14:42,  1 user,  load average: 45.84,
>> 33.13, 25.84
>> Tasks: 394 total,  48 running, 346 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 49.2%us,  0.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,
>> 0.1%si, 50.0%st
>> Mem:  16133676k total, 14870736k used,  1262940k free,   475484k buffers
>> Swap: 14466492k total,      124k used, 14466368k free, 11423616k cached
>>
>>   /usr/bin/wc -l ` -gt 20 && /usr/bin/pkill -o -f 'idle in transaction'
>>
>> and this is the link where the script was provided.
>>
>> http://wiki.dspace.org/index.php/Idle_In_Transaction_Problem
>>
>> I tried it run it as test in the server, but the script is not
>> executing. Even i see many of the "Idle in transaction " PIDs are
>> showing R (RUnning status) , but most of them are showing S(Sleep )
>> status. Please suggest anyway i can resolve this idle transaction issue.
>
> fixing up the code to commit selects will make the "idle in trans." go
> away.  I'm with Praveen, fix the code, avoid the scripts.
>
> Is there anything else running on this box?  You said previously "The
> PowerPC cpu is having some virtual layer that is shown in the Steal
> value.".  I'm not sure what that means.  Are you in a virtual machine?
> Or running other vm's?  Based on the top you posted (this one and the
> very first one) you are loosing half your cpu to the vm.  (unless I'm
> totally reading this wrong... I don't have experience with vm's so
> please someone jump in here and correct me if I'm wrong)
>
"idle in transaction" processes will DESTROY throughput over time.

Don't kill them - find out how they're happening.  They should NOT happen.

If you take an exception in an application it is essential that the
application NOT leave pending transactions open.  If your middleware
between application and Postgres doesn't take care of this cleanup on
exit on its own (or if it would if you left through an "approved" path
but you're doing something like SEGVing out of a compiled app or calling
exit() without closing open connections, etc) you need to figure out
where you're getting these exceptions from and fix them.

Hacks like killing "idle in transaction" processes will eventually bite
you by killing a process that is TEMPORARILY idle while waiting for some
resource but the check "catches it" at exactly the wrong time, whacking
a perfectly good change.  At best this returns an error to the user; at
worst, especially in a web-based application, it can result in a
silently-lost transaction.

-- Karl

Вложения

karl.vcf

Re: High CPU load on Postgres Server during Peak times!!!!

От

"Dave Dutcher"

Дата:

24 сентября 2009 г., 17:08:28

>From: Shiva Raman
>Subject: Re: [PERFORM] High CPU load on Postgres Server during Peak
times!!!!
>
>Andy Colson Wrote :  ,
>>Eww.  I think that's bad.  A connection that has a transaction open will
cause lots of row versions,
>>which use up ram, and make it slower to step through the table (even with
an index).  You really need
>>to fix up your code and make sure you commit transactions.  (any statement
(select, insert, update) will
>>start a new transaction that you need to explicitly commit).
>
>With reference to this suggestion by Andy Colson, we checked the
application code and found that only
>INSERT, UPDATE  has COMMIT  and SELECT has no commit, We are using a lot of
"Ajax Suggest" in the all
>the forms accessed for fetching the data using SELECT statements which are
not explicitly committed.
>We have started updating the code on this.

You need a COMMIT for every BEGIN.  If you just run a SELECT statement
without first beginning a transaction, then you should not end up with a
connection that is Idle in Transaction.  If you are beginning a transaction,
doing a select, and then not committing, then yes that is a bug.

Dave

Re: High CPU load on Postgres Server during Peak times!!!!

От

Karl Denninger

Дата:

24 сентября 2009 г., 17:56:55

Dave Dutcher wrote:

From: Shiva Raman
Subject: Re: [PERFORM] High CPU load on Postgres Server during Peak

not explicitly committed.

We have started updating the code on this.

You need a COMMIT for every BEGIN.  If you just run a SELECT statement
without first beginning a transaction, then you should not end up with a
connection that is Idle in Transaction.  If you are beginning a transaction,
doing a select, and then not committing, then yes that is a bug.

Dave

Dave is correct. A SELECT without a BEGIN in front of it will not begin a transaction. Atomic SELECTs (that is, those not intended to return rows that will then be updated or deleted, etc.) does not need and should NOT have a BEGIN in front of it.

Any block of statements that must act in an atomic fashion must have a BEGIN/COMMIT or BEGIN/ROLLBACK block around them to guarantee atomic results across statements; any time you issue a BEGIN you MUST issue either a ROLLBACK or COMMIT. Exiting SOUNDS safe (and if the connection is truly dropped it is as that will implicitly roll back any uncommitted transaction) BUT in a pooled connection environment it leads to exactly what you're seeing here.

It is a serious mistake to leave open transactions active in a session as that leaves multiple copies of rows and the support data necessary to handle them either in memory, on disk or both. When the working set of all postgresql instances reaches the physical memory limit and the system starts to page performance will go straight in the toilet.

-- Karl

Вложения

karl.vcf

Re: High CPU load on Postgres Server during Peak times!!!!

От

Craig James

Дата:

24 сентября 2009 г., 18:45:46

Dave Dutcher wrote:
> You need a COMMIT for every BEGIN.  If you just run a SELECT statement
> without first beginning a transaction, then you should not end up with a
> connection that is Idle in Transaction.  If you are beginning a transaction,
> doing a select, and then not committing, then yes that is a bug.

The BEGIN can be hidden, though.  For example, if the application is written in Perl,

  $dbh = DBI->connect($dsn, $user, $pass, {AutoCommit => 0});

will automatically start a transaction the first time you do anything.  Under the covers, the Perl DBI issues the BEGIN
foryou, and you have to do an explicit 

  $dbh->commit();

to commit it.

Craig

Re: High CPU load on Postgres Server during Peak times!!!!

От

Shiva Raman

Дата:

25 сентября 2009 г., 07:17:57

Hi Gerhard

Thanks for the mail

On Thu, Sep 24, 2009 at 7:19 PM, Gerhard Wiesinger <gerhard@wiesinger.com> wrote:

Hello Shiva,

What I see from top (0.0%wa) you don't have any I/O problem but a major CPU problem. But this is contrast to iostat where up to 50% of iowait is there (sometimes).

I think you have 2 problems:
1.) Client applications which don't close the connection. If the applications wants persistent connections (for performance reasons), then idle postgresql processes are ok. A better approach would be some kind of connection pool. What programming language do you use on the web tier?

I am using connection pooling on Tomcat Web Server . Total of 500 connections are configured to be handled in the connection pool.

2.) Find out queries which produce the high CPU load. (e.g. pg_top). I guess there are some very suboptimal queries there. (I guess some indexes are missing).

You could e.g. set
log_min_duration_statement = 50 # 50ms, all slower queries are logged

I enabled the min duration statement and i found that allmost ninety percent of queries are logged which has duration more thatn 50. Most of the queries ranges between 50 and 500.
Certain Select queuries duration are between 1000 and 2500. And for report queries with more than 3 lakh and 1 lakh rows , the queries takes more than 6000 ms.

And: Idle connection don't take any I/O and CPU, just memory resources (and very small network resources).

And IHMO killing database processes isn't a solution to your problem. Database server should nearly never be restarted.

Ciao,
Gerhard

Regards

Shiva Raman

Re: High CPU load on Postgres Server during Peak times!!!!

От

Shiva Raman

Дата:

25 сентября 2009 г., 08:07:07

Hi Gerhard
I also found the pg_log has 73 G of data .

clusternode2:/var/lib/pgsql/data # du -sh pg_log/
73G pg_log/

Is it necessary to keep this Log files? Can i backup the logs and delete it from the original directory ? Is this logs files necessary in case any data recovery to be done ?
I am database dumps every day .
pg_xlog and pg_clog has nearly less than 25 Mb of data only.

Regds

Shiva Raman

Re: High CPU load on Postgres Server during Peak times!!!!

От

Grzegorz Jaśkiewicz

Дата:

25 сентября 2009 г., 08:16:39

On Fri, Sep 25, 2009 at 9:06 AM, Shiva Raman <raman.shivag@gmail.com> wrote:

Hi Gerhard
I also found the pg_log has 73 G of data .

clusternode2:/var/lib/pgsql/data # du -sh pg_log/
73G pg_log/

Is it necessary to keep this Log files? Can i backup the logs and delete it from the original directory ? Is this logs files necessary in case any data recovery to be done ?
I am database dumps every day .

you're probably logging too much. Change level of logging (log_statement to ddl for instance), and do 'pg_ctl reload'

--
GJ

Re: High CPU load on Postgres Server during Peak times!!!!

От

Shiva Raman

Дата:

25 сентября 2009 г., 08:55:37

As suggested, i had changed the log_statement='ddl' and now it is logging only
the ddl statements . thanks for the tip.
Can i delete the old log files in pg_log after backing up as zip archive ? is it neccesary to keep those log files ?

Regards

Shiva Raman

2009/9/25 Grzegorz Jaśkiewicz <gryzman@gmail.com>

On Fri, Sep 25, 2009 at 9:06 AM, Shiva Raman <raman.shivag@gmail.com> wrote:
Hi Gerhard
I also found the pg_log has 73 G of data .

clusternode2:/var/lib/pgsql/data # du -sh pg_log/
73G pg_log/

Is it necessary to keep this Log files? Can i backup the logs and delete it from the original directory ? Is this logs files necessary in case any data recovery to be done ?
I am database dumps every day .
you're probably logging too much. Change level of logging (log_statement to ddl for instance), and do 'pg_ctl reload'

--
GJ

Re: High CPU load on Postgres Server during Peak times!!!!

От

Grzegorz Jaśkiewicz

Дата:

25 сентября 2009 г., 09:03:16

2009/9/25 Shiva Raman <raman.shivag@gmail.com>

As suggested, i had changed the log_statement='ddl' and now it is logging only
the ddl statements . thanks for the tip.
Can i delete the old log files in pg_log after backing up as zip archive ? is it neccesary to keep those log files ?

they're yours, you can do whatever you wish with em.
pg_logs are just textual log files.

pg_xlogs on the other hand, you should never touch (unless using logs storage/shipment for backups/replication).

--
GJ

Re: High CPU load on Postgres Server during Peak times!!!!

От

Gerhard Wiesinger

Дата:

25 сентября 2009 г., 13:20:02

Hello Craig,

Are you sure this is correct?

The test program (see below) with autocommit=0 counts up when an insert is
done in
another session and there is no commit done.

I think with each new select a new implicit transaction is done when no
explicit "BEGIN" has been established.

Can one confirm this behavior?

Thnx.

Ciao,
Gerhard

# Disable autocommit!
my $dbh = DBI->connect($con, $dbuser, $dbpass, {RaiseError => 1,
AutoCommit=>0}) || die "Unable to access Database '$dbname' on host
'$dbhost' as user '$dbuser'. Error returned was: ". $DBI::errstr ."";

my $sth = $dbh->prepare('SELECT COUNT(*) FROM employee;');

for (;;)
{
   $sth->execute();
   my ($count) = $sth->fetchrow();
   print "count=$count\n";
   $sth->finish();
#  $dbh->commit;
   sleep(3);
}

$dbh->disconnect;

--
http://www.wiesinger.com/

On Thu, 24 Sep 2009, Craig James wrote:

> Dave Dutcher wrote:
>> You need a COMMIT for every BEGIN.  If you just run a SELECT statement
>> without first beginning a transaction, then you should not end up with a
>> connection that is Idle in Transaction.  If you are beginning a
>> transaction,
>> doing a select, and then not committing, then yes that is a bug.
>
> The BEGIN can be hidden, though.  For example, if the application is written
> in Perl,
>
> $dbh = DBI->connect($dsn, $user, $pass, {AutoCommit => 0});
>
> will automatically start a transaction the first time you do anything.  Under
> the covers, the Perl DBI issues the BEGIN for you, and you have to do an
> explicit
>
> $dbh->commit();
>
> to commit it.
>
> Craig
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: High CPU load on Postgres Server during Peak times!!!!

От

Craig James

Дата:

25 сентября 2009 г., 15:29:59

Gerhard Wiesinger wrote:
> Hello Craig,
>
> Are you sure this is correct?
>
> The test program (see below) with autocommit=0 counts up when an insert
> is done in another session and there is no commit done.
>
> I think with each new select a new implicit transaction is done when no
> explicit "BEGIN" has been established.

Sorry, I should have been more specific.  A transaction starts when you do something that will alter data in the
database,such as insert, update, alter table, create sequence, and so forth.  The Perl DBI won't start a transaction
fora select. 

But my basic point is still valid: Some languages like Perl can implicitely start a transaction, so if programmers
aren'tfamiliar with this behavior, they can accidentally create long-running transactions. 

Craig


> Can one confirm this behavior?
>
> Thnx.
>
> Ciao,
> Gerhard
>
> # Disable autocommit!
> my $dbh = DBI->connect($con, $dbuser, $dbpass, {RaiseError => 1,
> AutoCommit=>0}) || die "Unable to access Database '$dbname' on host
> '$dbhost' as user '$dbuser'. Error returned was: ". $DBI::errstr ."";
>
> my $sth = $dbh->prepare('SELECT COUNT(*) FROM employee;');
>
> for (;;)
> {
>   $sth->execute();
>   my ($count) = $sth->fetchrow();
>   print "count=$count\n";
>   $sth->finish();
> #  $dbh->commit;
>   sleep(3);
> }
>
> $dbh->disconnect;
>
> --
> http://www.wiesinger.com/
>
>
> On Thu, 24 Sep 2009, Craig James wrote:
>
>> Dave Dutcher wrote:
>>> You need a COMMIT for every BEGIN.  If you just run a SELECT statement
>>> without first beginning a transaction, then you should not end up with a
>>> connection that is Idle in Transaction.  If you are beginning a
>>> transaction,
>>> doing a select, and then not committing, then yes that is a bug.
>>
>> The BEGIN can be hidden, though.  For example, if the application is
>> written in Perl,
>>
>> $dbh = DBI->connect($dsn, $user, $pass, {AutoCommit => 0});
>>
>> will automatically start a transaction the first time you do
>> anything.  Under the covers, the Perl DBI issues the BEGIN for you,
>> and you have to do an explicit
>>
>> $dbh->commit();
>>
>> to commit it.
>>
>> Craig
>>
>>
>>
>> --
>> Sent via pgsql-performance mailing list
>> (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>

Re: High CPU load on Postgres Server during Peak times!!!!

От

Shiva Raman

Дата:

09 октября 2009 г., 07:11:39

Dear all
with reference to the discussions and valuable suggestions i got from the list, the code has been reviewed and updated with explicit commit . There is a good improvement in performance .I am also planning to upgrade the database from 8.1 to 8.3 /8.4 .
My current OS is SLES 10 SP3 default ships with postgresql 8.1 .
The latest version of SLES 11 ships with postgresql 8.3 version.
I will be upgrading the Postgersql on my SLES 10 SP3 for PPC only.
I have not seen any prebuilt RPMS of Postgresql 8.3 or 8.4 version for SLES 10 PPC architecture .
When I tried to build the PPC RPM from Source in the PowerPC, it shows lot of dependancies.
I have decided to install from source - Postgres 8.3 / Postgresql-8.4. tar.gz .

Is there any major changes/updates in my 8.1 database i have to take care while upgrading to postgresql 8.3/ 8.4 ? Is 8.3 or 8.4 the right version to upgrade from 8.1 ?

Please let me know your suggestions.

Regards

Shiva Raman .

On Fri, Sep 25, 2009 at 8:52 PM, Craig James <craig_james@emolecules.com> wrote:

Gerhard Wiesinger wrote:
Hello Craig,

Are you sure this is correct?

The test program (see below) with autocommit=0 counts up when an insert is done in another session and there is no commit done.

I think with each new select a new implicit transaction is done when no explicit "BEGIN" has been established.

Sorry, I should have been more specific. A transaction starts when you do something that will alter data in the database, such as insert, update, alter table, create sequence, and so forth. The Perl DBI won't start a transaction for a select.

But my basic point is still valid: Some languages like Perl can implicitely start a transaction, so if programmers aren't familiar with this behavior, they can accidentally create long-running transactions.

Craig

Can one confirm this behavior?

Thnx.

Ciao,
Gerhard

# Disable autocommit!
my $dbh = DBI->connect($con, $dbuser, $dbpass, {RaiseError => 1, AutoCommit=>0}) || die "Unable to access Database '$dbname' on host '$dbhost' as user '$dbuser'. Error returned was: ". $DBI::errstr ."";

my $sth = $dbh->prepare('SELECT COUNT(*) FROM employee;');

for (;;)
{
$sth->execute();
my ($count) = $sth->fetchrow();
print "count=$count\n";
$sth->finish();
# $dbh->commit;
sleep(3);
}

$dbh->disconnect;

--
http://www.wiesinger.com/

On Thu, 24 Sep 2009, Craig James wrote:

Dave Dutcher wrote:
You need a COMMIT for every BEGIN. If you just run a SELECT statement
without first beginning a transaction, then you should not end up with a
connection that is Idle in Transaction. If you are beginning a transaction,
doing a select, and then not committing, then yes that is a bug.

The BEGIN can be hidden, though. For example, if the application is written in Perl,

$dbh = DBI->connect($dsn, $user, $pass, {AutoCommit => 0});

will automatically start a transaction the first time you do anything. Under the covers, the Perl DBI issues the BEGIN for you, and you have to do an explicit

$dbh->commit();

to commit it.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: High CPU load on Postgres Server during Peak times!!!!

От

Merlin Moncure

Дата:

09 октября 2009 г., 13:27:37

On Fri, Oct 9, 2009 at 3:11 AM, Shiva Raman <raman.shivag@gmail.com> wrote:
> Dear all
>   with reference to the discussions and valuable suggestions i got from the
> list, the code has been reviewed and updated with explicit commit . There is
> a good improvement in  performance .I am also planning to upgrade the
> database from 8.1 to 8.3 /8.4 .
>    My current OS is SLES 10 SP3 default ships with postgresql 8.1 .
>   The latest version of SLES 11 ships with postgresql 8.3 version.
> I will be upgrading the Postgersql on my SLES 10 SP3 for PPC only.
> I have not seen any prebuilt RPMS of Postgresql 8.3 or 8.4 version for SLES
> 10  PPC architecture .
> When I tried to build the PPC RPM from Source in the PowerPC, it shows lot
> of dependancies.
>    I have decided to install from source - Postgres 8.3 / Postgresql-8.4.
> tar.gz .
>
> Is there any major changes/updates in my 8.1 database  i have to take care
> while  upgrading to postgresql 8.3/ 8.4 ?  Is 8.3 or 8.4 the right version
> to upgrade from 8.1 ?
>
> Please let me know your suggestions.


The 'big picture' issues:
*) Test your postgresql.conf first. Some settings have changed or have
been removed (like fsm).
*) Many implicit casts to text were removed. Essentially the server is
less tolerant of sql that many would consider buggy
*) autovacuum is now on by default

and, most importantly:
*) sit back and enjoy the speed :-)

regarding 8.3/8.4, it's a tough call.  8.4 has a better chance of
being supported by in place upgrade in the future, so i'd start there.

merlin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: High CPU load on Postgres Server during Peak times!!!!

Вложения

Вложения

Вложения