Обсуждение: Sudden drop in DBb performance
Dear list, we are encountering serious performance problems with our database. Queries which took around 100ms or less last week now take several seconds. The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on hardware as follows: 8-core Intel Xeon CPU with 2.83GHz 48 GB RAM RAID 5 with 8 SAS disks PostgreSQL 8.4.8 (installed from the Ubuntu repository). Additionally to the DB the machine also hosts a few virtual machines. In the past everything worked very well and the described problem occurs just out of the blue. We don't know of any postgresql config changes or anything else which might explain the performance reduction. We have a number of DBs running in the cluster, and the problem seems to affect all of them. We checked the performance of the RAID .. which is reasonable for eg. "hdparm -tT". Memory is well used, but not swapping. vmstat shows, that the machine isn't using the swap and the load shouldn't be also to high: root@host:~# vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 308024 884812 40512932 0 0 464 168 353 92 4 2 84 9 Bonnie++ results given below, I am no expert at interpreting those :-) Activating log_min_duration shows for instance this query --- there are now constantly queries which take absurdely long. 2011-09-02 22:38:18 CEST LOG: Dauer: 25520.374 ms Anweisung: SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten' db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_table_x_keyword on table_x (cost=0.00..8.29 rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1) Index Cond: ((keyword)::text = 'diplomaten'::text) Total runtime: 0.087 ms (3 Zeilen) db=# \d keywords.table_x Tabelle »keywords.table_x« Spalte | Typ | Attribute ------------+-------------------+------------------------------------------------------------------------------------------------------ keyword_id | integer | not null Vorgabewert nextval('keywords.table_x_keyword_id_seq'::regclass) keyword | character varying | so | double precision | Indexe: "table_x_pkey" PRIMARY KEY, btree (keyword_id) CLUSTER "idx_table_x_keyword" btree (keyword) Fremdschlüsselverweise von: TABLE "keywords.table_x_has" CONSTRAINT "table_x_has_keyword_id_fkey" FOREIGN KEY (keyword_id) REFERENCES keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE Could you be so kind and give us any advice how to track down the problem or comment on possible reasons??? Thank you very much in advance!!! Regards, heinz + gerhard name | current_setting ----------------------------+------------------------------------------------------------------------------------------------------------- version | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit archive_command | /usr/local/sbin/weblyzard-wal-archiver.sh %p %f archive_mode | on checkpoint_segments | 192 effective_cache_size | 25000MB external_pid_file | /var/run/postgresql/8.4-main.pid full_page_writes | on geqo | on lc_collate | de_AT.UTF-8 lc_ctype | de_AT.UTF-8 listen_addresses | * log_line_prefix | %t log_min_duration_statement | 3s maintenance_work_mem | 500MB max_connections | 250 max_stack_depth | 2MB port | 5432 server_encoding | UTF8 shared_buffers | 7000MB ssl | on TimeZone | localtime unix_socket_directory | /var/run/postgresql work_mem | 256MB Results of Bonnie++ Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP voyager 95G 1400 93 27804 3 16324 2 2925 96 41636 3 374.9 4 Latency 7576us 233s 164s 15647us 13120ms 3302ms Version 1.96 ------Sequential Create------ --------Random Create-------- voyager -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 141 0 +++++ +++ 146 0 157 0 +++++ +++ 172 0 Latency 1020ms 128us 9148ms 598ms 37us 485ms 1.96,1.96,voyager,1,1314988752,95G,,1400,93,27804,3,16324,2,2925,96,41636,3,374.9,4,16,,,,,141,0,+++++,+++,146,0,157,0,+++++,+++,172,0,7576us,233s,164s,15647us,13120ms,3302ms,1020ms,128us,9148ms,598ms,37us,485ms
Hi. Autoexplain module allow to log plans and statistics of live queries. Try it. 2011/9/3, Gerhard Wohlgenannt <wohlg@ai.wu.ac.at>: > Dear list, > > we are encountering serious performance problems with our database. > Queries which took around 100ms or less last week now take several seconds. > > The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on > hardware as follows: > 8-core Intel Xeon CPU with 2.83GHz > 48 GB RAM > RAID 5 with 8 SAS disks > PostgreSQL 8.4.8 (installed from the Ubuntu repository). > > Additionally to the DB the machine also hosts a few virtual machines. In > the past everything worked very well and the described problem occurs > just out of the blue. We don't know of any postgresql config changes or > anything else which might explain the performance reduction. > We have a number of DBs running in the cluster, and the problem seems to > affect all of them. > > We checked the performance of the RAID .. which is reasonable for eg. > "hdparm -tT". Memory is well used, but not swapping. > vmstat shows, that the machine isn't using the swap and the load > shouldn't be also to high: > root@host:~# vmstat > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us > sy id wa > 0 0 0 308024 884812 40512932 0 0 464 168 353 92 > 4 2 84 9 > > Bonnie++ results given below, I am no expert at interpreting those :-) > > > Activating log_min_duration shows for instance this query --- there are > now constantly queries which take absurdely long. > > 2011-09-02 22:38:18 CEST LOG: Dauer: 25520.374 ms Anweisung: SELECT > keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten' > > db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE > keyword=E'diplomaten'; > > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using idx_table_x_keyword on table_x (cost=0.00..8.29 > rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1) > Index Cond: ((keyword)::text = 'diplomaten'::text) > Total runtime: 0.087 ms > (3 Zeilen) > > db=# \d keywords.table_x > Tabelle »keywords.table_x« > Spalte | Typ > | Attribute > ------------+-------------------+------------------------------------------------------------------------------------------------------ > keyword_id | integer | not null Vorgabewert > nextval('keywords.table_x_keyword_id_seq'::regclass) > keyword | character varying | > so | double precision | > Indexe: > "table_x_pkey" PRIMARY KEY, btree (keyword_id) CLUSTER > "idx_table_x_keyword" btree (keyword) > Fremdschlüsselverweise von: > TABLE "keywords.table_x_has" CONSTRAINT > "table_x_has_keyword_id_fkey" FOREIGN KEY (keyword_id) REFERENCES > keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE > > > > > Could you be so kind and give us any advice how to track down the > problem or comment on possible reasons??? > > Thank you very much in advance!!! > > Regards, > heinz + gerhard > > > > > > name > | current_setting > ----------------------------+------------------------------------------------------------------------------------------------------------- > version | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, > compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit > archive_command | /usr/local/sbin/weblyzard-wal-archiver.sh > %p %f > archive_mode | on > checkpoint_segments | 192 > effective_cache_size | 25000MB > external_pid_file | /var/run/postgresql/8.4-main.pid > full_page_writes | on > geqo | on > lc_collate | de_AT.UTF-8 > lc_ctype | de_AT.UTF-8 > listen_addresses | * > log_line_prefix | %t > log_min_duration_statement | 3s > maintenance_work_mem | 500MB > max_connections | 250 > max_stack_depth | 2MB > port | 5432 > server_encoding | UTF8 > shared_buffers | 7000MB > ssl | on > TimeZone | localtime > unix_socket_directory | /var/run/postgresql > work_mem | 256MB > > > Results of Bonnie++ > > Version 1.96 ------Sequential Output------ --Sequential Input- > --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > /sec %CP > voyager 95G 1400 93 27804 3 16324 2 2925 96 41636 3 > 374.9 4 > Latency 7576us 233s 164s 15647us 13120ms > 3302ms > Version 1.96 ------Sequential Create------ --------Random > Create-------- > voyager -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > /sec %CP > 16 141 0 +++++ +++ 146 0 157 0 +++++ +++ > 172 0 > Latency 1020ms 128us 9148ms 598ms 37us > 485ms > 1.96,1.96,voyager,1,1314988752,95G,,1400,93,27804,3,16324,2,2925,96,41636,3,374.9,4,16,,,,,141,0,+++++,+++,146,0,157,0,+++++,+++,172,0,7576us,233s,164s,15647us,13120ms,3302ms,1020ms,128us,9148ms,598ms,37us,485ms > > > > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- ------------ pasman
On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote: > Dear list, > > we are encountering serious performance problems with our database. > Queries which took around 100ms or less last week now take several > seconds. > > The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on > hardware as follows: > 8-core Intel Xeon CPU with 2.83GHz > 48 GB RAM > RAID 5 with 8 SAS disks > PostgreSQL 8.4.8 (installed from the Ubuntu repository). > > Additionally to the DB the machine also hosts a few virtual machines. In > the past everything worked very well and the described problem occurs > just out of the blue. We don't know of any postgresql config changes or > anything else which might explain the performance reduction. > We have a number of DBs running in the cluster, and the problem seems to > affect all of them. What are the virtual machines doing? Are you sure they are not doing a lot of IO? > > We checked the performance of the RAID .. which is reasonable for eg. > "hdparm -tT". Memory is well used, but not swapping. > vmstat shows, that the machine isn't using the swap and the load > shouldn't be also to high: > root@host:~# vmstat > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us > sy id wa > 0 0 0 308024 884812 40512932 0 0 464 168 353 92 > 4 2 84 9 > > Bonnie++ results given below, I am no expert at interpreting those :-) > > > Activating log_min_duration shows for instance this query --- there are > now constantly queries which take absurdely long. > > 2011-09-02 22:38:18 CEST LOG: Dauer: 25520.374 ms Anweisung: SELECT > keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten' > > db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE > keyword=E'diplomaten'; > QUERY > PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using idx_table_x_keyword on table_x (cost=0.00..8.29 > rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1) > Index Cond: ((keyword)::text = 'diplomaten'::text) > Total runtime: 0.087 ms > (3 Zeilen) > > db=# \d keywords.table_x > Tabelle »keywords.table_x« > Spalte | Typ > | Attribute > ------------+-------------------+------------------------------------------------------------------------------------------------------ > keyword_id | integer | not null Vorgabewert > nextval('keywords.table_x_keyword_id_seq'::regclass) > keyword | character varying | > so | double precision | > Indexe: > "table_x_pkey" PRIMARY KEY, btree (keyword_id) CLUSTER > "idx_table_x_keyword" btree (keyword) > Fremdschlüsselverweise von: > TABLE "keywords.table_x_has" CONSTRAINT > "table_x_has_keyword_id_fkey" FOREIGN KEY (keyword_id) REFERENCES > keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE But in this explain analyze, the query finished in 41 ms. Use auto-explain contrib module to see the explain plan of the slow execution. > Could you be so kind and give us any advice how to track down the > problem or comment on possible reasons??? One of the things > > Thank you very much in advance!!! > > Regards, > heinz + gerhard > > > > > > name > | current_setting > ----------------------------+------------------------------------------------------------------------------------------------------------- > version | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, > compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit > archive_command | /usr/local/sbin/weblyzard-wal-archiver.sh > %p %f > archive_mode | on > checkpoint_segments | 192 > effective_cache_size | 25000MB > external_pid_file | /var/run/postgresql/8.4-main.pid > full_page_writes | on > geqo | on > lc_collate | de_AT.UTF-8 > lc_ctype | de_AT.UTF-8 > listen_addresses | * > log_line_prefix | %t > log_min_duration_statement | 3s > maintenance_work_mem | 500MB > max_connections | 250 > max_stack_depth | 2MB > port | 5432 > server_encoding | UTF8 > shared_buffers | 7000MB > ssl | on > TimeZone | localtime > unix_socket_directory | /var/run/postgresql > work_mem | 256MB > > > Results of Bonnie++ > > Version 1.96 ------Sequential Output------ --Sequential Input- > --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > /sec %CP > voyager 95G 1400 93 27804 3 16324 2 2925 96 41636 3 > 374.9 4 > Latency 7576us 233s 164s 15647us 13120ms > 3302ms > Version 1.96 ------Sequential Create------ --------Random > Create-------- > voyager -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > /sec %CP > 16 141 0 +++++ +++ 146 0 157 0 +++++ +++ > 172 0 > Latency 1020ms 128us 9148ms 598ms 37us > 485ms > That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait slow with 8 drives. Tomas
On 09/05/2011 02:48 AM, Tomas Vondra wrote: > On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote: >> Dear list, >> >> we are encountering serious performance problems with our database. >> Queries which took around 100ms or less last week now take several >> seconds. >> Results of Bonnie++ >> >> Version 1.96 ------Sequential Output------ --Sequential Input- >> --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >> --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >> /sec %CP >> voyager 95G 1400 93 27804 3 16324 2 2925 96 41636 3 >> 374.9 4 >> Latency 7576us 233s 164s 15647us 13120ms >> 3302ms >> Version 1.96 ------Sequential Create------ --------Random >> Create-------- >> voyager -Create-- --Read--- -Delete-- -Create-- --Read--- >> -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> /sec %CP >> 16 141 0 +++++ +++ 146 0 157 0 +++++ +++ >> 172 0 >> Latency 1020ms 128us 9148ms 598ms 37us >> 485ms >> > > That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait > slow with 8 drives. > > Tomas > > Agreed, that's really slow. A single SATA drive will get 60 MB/s. Did you run Bonnie while the VM's were up and running? root@host:~# vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 308024 884812 40512932 0 0 464 168 353 92 4 2 84 9 Only one line? That does not help much. Can you run it as 'vmstat 2' and let it run while a few slow queries are performed? Then paste all the lines? -Andy
On 5 Září 2011, 15:51, Andy Colson wrote: > On 09/05/2011 02:48 AM, Tomas Vondra wrote: >> That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait >> slow with 8 drives. >> >> Tomas >> >> > > Agreed, that's really slow. A single SATA drive will get 60 MB/s. Did > you run Bonnie while the VM's were up and running? > > root@host:~# vmstat > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy > id wa > 0 0 0 308024 884812 40512932 0 0 464 168 353 92 4 > 2 84 9 > > > Only one line? That does not help much. Can you run it as 'vmstat 2' and > let it run while a few slow queries are performed? Then paste all the > lines? And maybe a few lines from "iostat -x 2" too. BTW what kind of raid is it? Is it hw or sw based? Have you checked health of the drives? Are you sure there's nothing else using the drives (e.g. one of the VMs, rebuild of the array or something like that)? Tomas
On 5 Září 2011, 16:08, Gerhard Wohlgenannt wrote: > Below please find the results of vmstat 2 over some periode of time .. > with normal database / system load. What does a "normal load" mean? Does that mean a time when the queries are slow? Are you sure the machine really has 48GB of RAM? Because from the vmstat output it seems like there's just 32GB. procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 1342168 336936 107636 31353512 0 0 14 183 1911 3426 2 1 93 4 1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB BTW there's 1.3GB of swap, although it's not used heavily (according to the vmstat output). Otherwise I don't see anything wrong in the output. What is the size of the database (use pg_database_size to get it)? Did it grow significantly recently? Tomas
On 09/05/2011 09:39 AM, Tomas Vondra wrote: > On 5 Září 2011, 16:08, Gerhard Wohlgenannt wrote: >> Below please find the results of vmstat 2 over some periode of time .. >> with normal database / system load. > > What does a "normal load" mean? Does that mean a time when the queries are > slow? > > Are you sure the machine really has 48GB of RAM? Because from the vmstat > output it seems like there's just 32GB. > > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy > id wa > 0 0 1342168 336936 107636 31353512 0 0 14 183 1911 3426 > 2 1 93 4 > > > 1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB > > BTW there's 1.3GB of swap, although it's not used heavily (according to > the vmstat output). > > Otherwise I don't see anything wrong in the output. What is the size of > the database (use pg_database_size to get it)? Did it grow significantly > recently? > > Tomas > Yeah, its interesting that it swapped in memory, but never out. Looking at this vmstat, it does not look like a hardwareproblem.(Assuming "normal load" means slow queries) > Did it grow significantly recently? That's a good thought, maybe the stats are old and you have bad plans? It could also be major updates to the data too (asopposed to growth). Gerhard, have you done an 'explain analyze' on any of your slow queries? Have you done an analyze lately? -Andy
On 09/05/2011 03:51 PM, Andy Colson wrote: > On 09/05/2011 02:48 AM, Tomas Vondra wrote: >> On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote: >>> Dear list, >>> >>> we are encountering serious performance problems with our database. >>> Queries which took around 100ms or less last week now take several >>> seconds. > >>> Results of Bonnie++ >>> >>> Version 1.96 ------Sequential Output------ --Sequential Input- >>> --Random- >>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >>> --Seeks-- >>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >>> /sec %CP >>> voyager 95G 1400 93 27804 3 16324 2 2925 96 41636 3 >>> 374.9 4 >>> Latency 7576us 233s 164s 15647us 13120ms >>> 3302ms >>> Version 1.96 ------Sequential Create------ --------Random >>> Create-------- >>> voyager -Create-- --Read--- -Delete-- -Create-- --Read--- >>> -Delete-- >>> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >>> /sec %CP >>> 16 141 0 +++++ +++ 146 0 157 0 +++++ +++ >>> 172 0 >>> Latency 1020ms 128us 9148ms 598ms 37us >>> 485ms >>> >> >> That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait >> slow with 8 drives. >> >> Tomas >> >> > > Agreed, that's really slow. A single SATA drive will get 60 MB/s. > Did you run Bonnie while the VM's were up and running? > > root@host:~# vmstat > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us > sy id wa > 0 0 0 308024 884812 40512932 0 0 464 168 353 92 > 4 2 84 9 > > > Only one line? That does not help much. Can you run it as 'vmstat 2' > and let it run while a few slow queries are performed? Then paste all > the lines? > > > -Andy Hi Andy, thanks a lot for your help. Below please find the results of vmstat 2 over some periode of time .. with normal database / system load. 0 0 1344332 237196 104140 31468412 0 0 330 102 4322 7130 4 2 90 4 1 1 1344332 236708 104144 31469000 0 0 322 105 2096 3723 1 2 92 5 2 1 1344204 240924 104156 31462484 350 0 1906 234 3687 4512 12 3 77 9 0 0 1344200 238372 104168 31462452 0 0 8 109 4050 8376 8 3 86 3 0 0 1344200 232668 104168 31462468 0 0 12 158 2036 3633 2 2 92 3 0 3 1344196 282784 104180 31413384 4 0 1768 343 2490 4391 1 2 84 13 1 1 1344196 278188 104192 31416080 0 0 1392 341 2215 3850 1 2 82 15 0 0 1344120 276964 104608 31416904 90 0 634 304 2390 3949 4 2 86 8 1 1 1344120 277096 104628 31417752 0 0 492 378 2394 3866 2 1 87 10 0 1 1344120 274476 104628 31418620 0 0 260 233 1997 3255 2 1 91 6 1 1 1344120 276584 104628 31418808 0 0 128 208 2015 3266 2 1 91 6 0 0 1343672 272352 106288 31418788 694 0 1346 344 2170 3660 3 1 89 6 0 1 1343632 270220 107648 31419152 48 0 468 490 2356 3622 4 2 88 5 0 0 1343624 270708 107660 31419344 20 0 228 138 2086 3518 2 3 91 4 0 1 1343612 268732 107660 31419584 12 0 168 112 2100 3585 3 2 91 3 0 0 1343544 266616 107660 31420112 14 0 154 73 2059 3719 3 2 93 3 0 1 1343540 267368 107684 31420168 0 0 78 260 2256 3970 3 2 90 6 0 1 1343540 268352 107692 31420356 0 0 94 284 2239 4086 2 2 89 6 0 0 1343540 274064 107692 31423584 0 0 1622 301 2322 4258 2 3 83 13 0 2 1343440 273064 107704 31423696 96 0 106 180 2158 3795 3 2 90 5 0 0 1342184 262888 107708 31426040 840 0 2014 146 2309 3713 5 3 83 9 0 0 1342184 261904 107732 31426128 0 0 60 158 1893 3510 1 3 91 5 2 0 1342184 258680 107732 31427436 0 0 794 114 2160 3647 2 3 90 5 0 2 1342176 258184 107744 31428308 24 0 310 116 1943 3335 2 2 91 4 1 0 1342172 259068 107756 31428700 2 0 138 143 1976 3468 1 1 93 5 0 0 1342172 258084 107756 31429948 0 0 620 88 2117 3565 3 1 90 6 0 0 1342172 258456 107952 31430028 0 0 62 305 2174 3827 1 2 91 6 1 0 1342172 257480 107952 31430636 0 0 300 256 2316 3959 3 2 86 8 0 0 1342172 257720 107952 31430772 0 0 46 133 2411 4047 3 2 91 3 1 2 1342172 257844 107976 31430776 0 0 136 184 2111 3841 1 1 92 6 1 2 1342172 338376 107576 31349412 0 0 462 8615 2655 5508 5 3 79 13 1 2 1342172 340772 107580 31351080 0 0 682 377 2503 4022 2 1 87 10 1 2 1342172 335688 107596 31351992 0 0 548 306 2480 3867 4 1 86 9 0 2 1342168 337432 107608 31352704 0 0 224 188 1919 3158 1 1 93 6 0 0 1342168 337804 107608 31353020 0 0 154 249 1933 3175 1 1 92 6 0 1 1342168 335944 107636 31353464 0 0 212 173 1912 3280 4 2 89 5 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 1342168 336936 107636 31353512 0 0 14 183 1911 3426 2 1 93 4 0 1 1342168 334440 107656 31353736 0 0 264 372 2119 3400 6 2 84 8 0 0 1342164 334084 107680 31354468 0 0 302 413 2361 3613 2 1 87 10 2 0 1342160 342764 107680 31354916 8 0 184 332 2142 3117 1 1 90 7 0 1 1342160 343788 107680 31355808 0 0 360 211 2247 3249 1 2 91 5 2 1 1342156 340804 107704 31355904 0 0 88 280 2287 3448 2 2 90 6 0 1 1342156 344276 107704 31356464 0 0 316 276 2050 3298 1 2 90 7 0 0 1342156 344160 107712 31356576 0 0 4 225 1884 3194 1 3 90 6 0 0 1342152 342548 107724 31356688 0 0 52 231 1963 3232 1 3 89 6 2 1 1342152 343664 107724 31356764 0 0 104 348 2643 3614 3 2 88 8 1 1 1342144 341060 107760 31357080 16 0 120 307 2511 3474 4 3 87 7 1 0 1342140 342332 107780 31357500 8 0 206 193 2243 3448 4 2 89 5 1 0 1342136 339472 107780 31357508 0 0 32 142 4290 3799 6 3 87 4 0 0 1342136 341160 107780 31357992 0 0 216 171 2613 3995 4 2 88 5 0 0 1342136 342168 107820 31357988 0 0 26 140 2347 3753 3 4 89 4 0 0 1342136 342532 107820 31358128 0 0 36 155 2119 3653 2 1 91 5 2 0 1342136 341564 107828 31358144 0 0 0 151 1973 3486 4 2 90 4 1 1 1342136 342076 107852 31358416 0 0 148 284 2251 3857 6 2 84 8 0 1 1342136 339944 107852 31359284 0 0 482 478 2902 5210 4 2 84 10 0 1 1342136 342184 107852 31359836 0 0 238 372 2292 4063 2 1 88 9 cheers gerhard
hi, thanks a lot for your help! >> Dear list, >> >> we are encountering serious performance problems with our database. >> Queries which took around 100ms or less last week now take several >> seconds. >> >> The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on >> hardware as follows: >> 8-core Intel Xeon CPU with 2.83GHz >> 48 GB RAM >> RAID 5 with 8 SAS disks >> PostgreSQL 8.4.8 (installed from the Ubuntu repository). >> >> Additionally to the DB the machine also hosts a few virtual machines. In >> the past everything worked very well and the described problem occurs >> just out of the blue. We don't know of any postgresql config changes or >> anything else which might explain the performance reduction. >> We have a number of DBs running in the cluster, and the problem seems to >> affect all of them. > What are the virtual machines doing? Are you sure they are not doing a lot > of IO? we also have a ssd-disk in the machine, and the virtual machines do most of their IO on that. But there sure also is some amount of IO onto the systems raid array. maybe we should consider having a dedicated database server. >> We checked the performance of the RAID .. which is reasonable for eg. >> "hdparm -tT". Memory is well used, but not swapping. >> vmstat shows, that the machine isn't using the swap and the load >> shouldn't be also to high: >> root@host:~# vmstat >> procs -----------memory---------- ---swap-- -----io---- -system-- >> ----cpu---- >> r b swpd free buff cache si so bi bo in cs us >> sy id wa >> 0 0 0 308024 884812 40512932 0 0 464 168 353 92 >> 4 2 84 9 >> >> Bonnie++ results given below, I am no expert at interpreting those :-) >> >> >> Activating log_min_duration shows for instance this query --- there are >> now constantly queries which take absurdely long. >> >> 2011-09-02 22:38:18 CEST LOG: Dauer: 25520.374 ms Anweisung: SELECT >> keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten' >> >> db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE >> keyword=E'diplomaten'; >> QUERY >> PLAN >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Index Scan using idx_table_x_keyword on table_x (cost=0.00..8.29 >> rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1) >> Index Cond: ((keyword)::text = 'diplomaten'::text) >> Total runtime: 0.087 ms >> (3 Zeilen) >> >> db=# \d keywords.table_x >> Tabelle »keywords.table_x« >> Spalte | Typ >> | Attribute >> ------------+-------------------+------------------------------------------------------------------------------------------------------ >> keyword_id | integer | not null Vorgabewert >> nextval('keywords.table_x_keyword_id_seq'::regclass) >> keyword | character varying | >> so | double precision | >> Indexe: >> "table_x_pkey" PRIMARY KEY, btree (keyword_id) CLUSTER >> "idx_table_x_keyword" btree (keyword) >> Fremdschlüsselverweise von: >> TABLE "keywords.table_x_has" CONSTRAINT >> "table_x_has_keyword_id_fkey" FOREIGN KEY (keyword_id) REFERENCES >> keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE > But in this explain analyze, the query finished in 41 ms. Use auto-explain > contrib module to see the explain plan of the slow execution. thanks. we will use auto_explain as soon as some long running updates are finished (don't won't to kill them) cheers gerhard
hi, thanks a lot for your help! >> Dear list, >> >> we are encountering serious performance problems with our database. >> Queries which took around 100ms or less last week now take several >> seconds. >> >> The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on >> hardware as follows: >> 8-core Intel Xeon CPU with 2.83GHz >> 48 GB RAM >> RAID 5 with 8 SAS disks >> PostgreSQL 8.4.8 (installed from the Ubuntu repository). >> >> Additionally to the DB the machine also hosts a few virtual machines. In >> the past everything worked very well and the described problem occurs >> just out of the blue. We don't know of any postgresql config changes or >> anything else which might explain the performance reduction. >> We have a number of DBs running in the cluster, and the problem seems to >> affect all of them. > What are the virtual machines doing? Are you sure they are not doing a lot > of IO? we also have a ssd-disk in the machine, and the virtual machines do most of their IO on that. But there sure also is some amount of I/O onto the systems raid array coming from the virtual machines. maybe we should consider having a dedicated database server. >> We checked the performance of the RAID .. which is reasonable for eg. >> "hdparm -tT". Memory is well used, but not swapping. >> vmstat shows, that the machine isn't using the swap and the load >> shouldn't be also to high: >> root@host:~# vmstat >> procs -----------memory---------- ---swap-- -----io---- -system-- >> ----cpu---- >> r b swpd free buff cache si so bi bo in cs us >> sy id wa >> 0 0 0 308024 884812 40512932 0 0 464 168 353 92 >> 4 2 84 9 >> >> Bonnie++ results given below, I am no expert at interpreting those :-) >> >> >> Activating log_min_duration shows for instance this query --- there are >> now constantly queries which take absurdely long. >> >> 2011-09-02 22:38:18 CEST LOG: Dauer: 25520.374 ms Anweisung: SELECT >> keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten' >> >> db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE >> keyword=E'diplomaten'; >> QUERY >> PLAN >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Index Scan using idx_table_x_keyword on table_x (cost=0.00..8.29 >> rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1) >> Index Cond: ((keyword)::text = 'diplomaten'::text) >> Total runtime: 0.087 ms >> (3 Zeilen) >> >> db=# \d keywords.table_x >> Tabelle »keywords.table_x« >> Spalte | Typ >> | Attribute >> ------------+-------------------+------------------------------------------------------------------------------------------------------ >> keyword_id | integer | not null Vorgabewert >> nextval('keywords.table_x_keyword_id_seq'::regclass) >> keyword | character varying | >> so | double precision | >> Indexe: >> "table_x_pkey" PRIMARY KEY, btree (keyword_id) CLUSTER >> "idx_table_x_keyword" btree (keyword) >> Fremdschlüsselverweise von: >> TABLE "keywords.table_x_has" CONSTRAINT >> "table_x_has_keyword_id_fkey" FOREIGN KEY (keyword_id) REFERENCES >> keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE > But in this explain analyze, the query finished in 41 ms. Use auto-explain > contrib module to see the explain plan of the slow execution. thanks. we will use auto_explain as soon as some long running updates are finished (don't want to kill them) cheers gerhard
On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenannt <wohlg@ai.wu.ac.at> wrote: > Below please find the results of vmstat 2 over some periode of time .. with > normal database / system load. > > 0 0 1344332 237196 104140 31468412 0 0 330 102 4322 7130 4 2 > 90 4 > 1 1 1344332 236708 104144 31469000 0 0 322 105 2096 3723 1 2 > 92 5 > 2 1 1344204 240924 104156 31462484 350 0 1906 234 3687 4512 12 3 > 77 9 > 0 0 1344200 238372 104168 31462452 0 0 8 109 4050 8376 8 3 > 86 3 > 0 0 1344200 232668 104168 31462468 0 0 12 158 2036 3633 2 2 > 92 3 > 0 3 1344196 282784 104180 31413384 4 0 1768 343 2490 4391 1 2 > 84 13 > 1 1 1344196 278188 104192 31416080 0 0 1392 341 2215 3850 1 2 > 82 15 > 0 0 1344120 276964 104608 31416904 90 0 634 304 2390 3949 4 2 > 86 8 > 1 1 1344120 277096 104628 31417752 0 0 492 378 2394 3866 2 1 > 87 10 > 0 1 1344120 274476 104628 31418620 0 0 260 233 1997 3255 2 1 > 91 6 > 1 1 1344120 276584 104628 31418808 0 0 128 208 2015 3266 2 1 > 91 6 > 0 0 1343672 272352 106288 31418788 694 0 1346 344 2170 3660 3 1 > 89 6 > 0 1 1343632 270220 107648 31419152 48 0 468 490 2356 3622 4 2 > 88 5 > 0 0 1343624 270708 107660 31419344 20 0 228 138 2086 3518 2 3 > 91 4 > 0 1 1343612 268732 107660 31419584 12 0 168 112 2100 3585 3 2 > 91 3 > 0 0 1343544 266616 107660 31420112 14 0 154 73 2059 3719 3 2 > 93 3 > 0 1 1343540 267368 107684 31420168 0 0 78 260 2256 3970 3 2 > 90 6 > 0 1 1343540 268352 107692 31420356 0 0 94 284 2239 4086 2 2 > 89 6 > 0 0 1343540 274064 107692 31423584 0 0 1622 301 2322 4258 2 3 > 83 13 > 0 2 1343440 273064 107704 31423696 96 0 106 180 2158 3795 3 2 > 90 5 > 0 0 1342184 262888 107708 31426040 840 0 2014 146 2309 3713 5 3 > 83 9 > 0 0 1342184 261904 107732 31426128 0 0 60 158 1893 3510 1 3 > 91 5 > 2 0 1342184 258680 107732 31427436 0 0 794 114 2160 3647 2 3 > 90 5 > 0 2 1342176 258184 107744 31428308 24 0 310 116 1943 3335 2 2 > 91 4 > 1 0 1342172 259068 107756 31428700 2 0 138 143 1976 3468 1 1 > 93 5 > 0 0 1342172 258084 107756 31429948 0 0 620 88 2117 3565 3 1 > 90 6 > 0 0 1342172 258456 107952 31430028 0 0 62 305 2174 3827 1 2 > 91 6 > 1 0 1342172 257480 107952 31430636 0 0 300 256 2316 3959 3 2 > 86 8 > 0 0 1342172 257720 107952 31430772 0 0 46 133 2411 4047 3 2 > 91 3 > 1 2 1342172 257844 107976 31430776 0 0 136 184 2111 3841 1 1 > 92 6 > 1 2 1342172 338376 107576 31349412 0 0 462 8615 2655 5508 5 3 > 79 13 > 1 2 1342172 340772 107580 31351080 0 0 682 377 2503 4022 2 1 > 87 10 > 1 2 1342172 335688 107596 31351992 0 0 548 306 2480 3867 4 1 > 86 9 > 0 2 1342168 337432 107608 31352704 0 0 224 188 1919 3158 1 1 > 93 6 > 0 0 1342168 337804 107608 31353020 0 0 154 249 1933 3175 1 1 > 92 6 > 0 1 1342168 335944 107636 31353464 0 0 212 173 1912 3280 4 2 > 89 5 > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id > wa > 0 0 1342168 336936 107636 31353512 0 0 14 183 1911 3426 2 1 > 93 4 > 0 1 1342168 334440 107656 31353736 0 0 264 372 2119 3400 6 2 > 84 8 > 0 0 1342164 334084 107680 31354468 0 0 302 413 2361 3613 2 1 > 87 10 > 2 0 1342160 342764 107680 31354916 8 0 184 332 2142 3117 1 1 > 90 7 > 0 1 1342160 343788 107680 31355808 0 0 360 211 2247 3249 1 2 > 91 5 > 2 1 1342156 340804 107704 31355904 0 0 88 280 2287 3448 2 2 > 90 6 > 0 1 1342156 344276 107704 31356464 0 0 316 276 2050 3298 1 2 > 90 7 > 0 0 1342156 344160 107712 31356576 0 0 4 225 1884 3194 1 3 > 90 6 > 0 0 1342152 342548 107724 31356688 0 0 52 231 1963 3232 1 3 > 89 6 > 2 1 1342152 343664 107724 31356764 0 0 104 348 2643 3614 3 2 > 88 8 > 1 1 1342144 341060 107760 31357080 16 0 120 307 2511 3474 4 3 > 87 7 > 1 0 1342140 342332 107780 31357500 8 0 206 193 2243 3448 4 2 > 89 5 > 1 0 1342136 339472 107780 31357508 0 0 32 142 4290 3799 6 3 > 87 4 > 0 0 1342136 341160 107780 31357992 0 0 216 171 2613 3995 4 2 > 88 5 > 0 0 1342136 342168 107820 31357988 0 0 26 140 2347 3753 3 4 > 89 4 > 0 0 1342136 342532 107820 31358128 0 0 36 155 2119 3653 2 1 > 91 5 > 2 0 1342136 341564 107828 31358144 0 0 0 151 1973 3486 4 2 > 90 4 > 1 1 1342136 342076 107852 31358416 0 0 148 284 2251 3857 6 2 > 84 8 > 0 1 1342136 339944 107852 31359284 0 0 482 478 2902 5210 4 2 > 84 10 > 0 1 1342136 342184 107852 31359836 0 0 238 372 2292 4063 2 1 > 88 9 Your IO Wait is actually pretty high. On an 8 core machine, 12.5% means one core is doing nothing but waiting for IO.
On 09/05/2011 01:45 PM, Scott Marlowe wrote: > On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenannt<wohlg@ai.wu.ac.at> wrote: >> Below please find the results of vmstat 2 over some periode of time .. with >> normal database / system load. >> 2 1 1344204 240924 104156 31462484 350 0 1906 234 3687 4512 12 3 77 9 > > Your IO Wait is actually pretty high. On an 8 core machine, 12.5% > means one core is doing nothing but waiting for IO. > My server is 2-core, so these numbers looked fine by me. I need to remember core count when I look at these. So the line above, for 2 core's would not worry me a bit, but on 8 cores, it pretty much means one core was pegged (with9% wait? Or is it one core was pegged, and another was 72% io wait?) I have always loved the vmstat output, but its starting to get confusing when you have to take core's into account. (Andmy math was never strong in the first place :-) ) Good catch, thanks Scott. -Andy
On 5 Září 2011, 21:07, Andy Colson wrote: > On 09/05/2011 01:45 PM, Scott Marlowe wrote: >> On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenannt<wohlg@ai.wu.ac.at> >> wrote: >>> Below please find the results of vmstat 2 over some periode of time .. >>> with >>> normal database / system load. >>> > 2 1 1344204 240924 104156 31462484 350 0 1906 234 3687 4512 12 3 > 77 9 >> >> Your IO Wait is actually pretty high. On an 8 core machine, 12.5% >> means one core is doing nothing but waiting for IO. >> > > My server is 2-core, so these numbers looked fine by me. I need to > remember core count when I look at these. > > So the line above, for 2 core's would not worry me a bit, but on 8 cores, > it pretty much means one core was pegged (with 9% wait? Or is it one core > was pegged, and another was 72% io wait?) AFAIK it's as if one core was 72% io wait. Anyway that's exactly why I was asking for "iostat -x" because the util% gives a better idea of what's going on. > I have always loved the vmstat output, but its starting to get confusing > when you have to take core's into account. (And my math was never strong > in the first place :-) ) That's why I love dstat, just do this $ dstat -C 0,1,2,3,4,5,6,7 and you know all you need. > Good catch, thanks Scott. Yes, good catch. Still, this does not explain why the queries were running fast before, and why the RAID array is so sluggish. Not to mention that we don't know what were the conditions when collecting those numbers (were the VMs off or running?). Tomas
On 6 Září 2011, 10:26, Gerhard Wohlgenannt wrote: > Thanks a lot to everybody for their helpful hints!!! > > I am running all these benchmarks while the VMs are up .. with the > system under something like "typical" loads .. > > The RAID is hardware based. On of my colleagues will check if there is > any hardware problem on the RAID (the disks) today, but nothing no > errors have been reported. > > please find below the results of > iostat -x 2 > vmstat 2 > > hmm, looks like we definitely do have a problem with I/O load?! > btw: dm-19 is the logical volume where the /var (postgresql) is on .. Well, it definitely looks like that. Something is doing a lot of writes on that drive - the drive is 100% utilized, i.e. it's a bottleneck. You need to find out what is writing the data - try iotop or something like that. And it's probably the reason why the bonnie results were so poor. Tomas
Thanks a lot to everybody for their helpful hints!!! I am running all these benchmarks while the VMs are up .. with the system under something like "typical" loads .. The RAID is hardware based. On of my colleagues will check if there is any hardware problem on the RAID (the disks) today, but nothing no errors have been reported. please find below the results of iostat -x 2 vmstat 2 hmm, looks like we definitely do have a problem with I/O load?! btw: dm-19 is the logical volume where the /var (postgresql) is on .. cheers gerhard procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 16 1370892 434996 33840 28938348 1 1 615 312 9 4 5 2 81 12 0 15 1370892 440832 33840 28938380 0 0 4 136 2086 3899 0 4 12 84 1 16 1370892 447008 33864 28938380 0 0 0 27 2442 4252 1 5 10 83 1 11 1370892 452272 33864 28938380 0 0 12 5 2106 3886 0 4 12 83 2 4 1370892 315880 33888 28941396 0 0 1522 3084 2213 4120 4 3 57 37 0 10 1370892 240900 33628 28934060 0 0 1060 17275 3396 4793 3 3 55 40 1 5 1370892 238172 33044 28905652 0 0 148 267 3943 5284 2 3 26 69 2 2 1370916 232932 31960 28694024 0 12 1170 5625 3037 6336 6 7 61 26 1 2 1370912 232788 27588 28697216 10 0 1016 3848 2780 5669 8 5 56 31 1 4 1370908 2392224 27608 28144712 0 0 936 8811 2514 5244 8 6 61 25 0 1 1370908 2265428 27612 28153188 0 0 4360 1598 2822 4784 13 3 69 15 1 2 1370908 2041260 27612 28176788 0 0 11842 474 3679 4255 12 4 78 6 0 3 1370908 2199880 27624 28272112 0 0 47638 569 7798 5495 11 4 70 14 0 3 1370908 2000752 27624 28318692 0 0 23492 275 5084 5161 10 3 71 17 1 0 1370908 1691000 27624 28365060 0 0 22920 117 4961 5426 12 5 69 15 1 0 1370908 2123512 27624 28367576 0 0 1244 145 2053 3728 12 3 83 2 2 0 1370908 1740724 27636 28403748 0 0 18272 190 2920 4188 12 4 76 8 2 0 1370908 1305856 27636 28460172 0 0 28174 493 3744 4750 11 6 68 15 1 2 1370908 973412 27644 28529640 0 0 34614 305 3419 4522 12 5 69 13 2 2 1370904 1790820 27656 28659080 2 0 64376 389 5527 5374 12 7 69 12 1 2 1370904 1384100 27656 28750336 0 0 45740 351 4898 5381 13 6 68 13 1 0 1370904 954200 27656 28864252 0 0 56544 413 4596 5470 13 7 66 14 1 0 1370904 1597264 27656 28865756 0 0 926 391 2009 3502 11 4 81 4 3 2 1370904 1219180 27668 28868244 0 0 1160 500 2180 3772 11 5 80 4 2 7 1370900 809128 27680 28869020 0 0 298 21875 2417 3936 11 5 49 35 0 9 1370900 1693360 27680 28869032 0 0 8 0 2756 4174 8 5 28 59 1 2 1370900 1531100 27688 28871104 0 0 1034 7849 2646 4571 10 3 72 15 iostat -x 2: Linux 2.6.32-33-server (voyager) 06.09.2011 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5,02 0,00 2,41 11,60 0,00 80,97 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 3,05 5,22 1,05 0,67 117,54 45,72 95,37 0,01 3,94 0,75 0,13 sdb 10,02 148,15 157,91 93,49 10019,50 5098,93 60,14 4,53 18,04 2,30 57,75 dm-0 0,00 0,00 3,03 4,87 24,21 38,96 8,00 0,45 56,83 0,06 0,05 dm-1 0,00 0,00 1,07 0,87 93,32 6,77 51,59 0,01 2,71 0,42 0,08 dm-2 0,00 0,00 0,00 0,00 0,00 0,00 8,00 0,00 6,30 6,30 0,00 dm-3 0,00 0,00 0,19 0,32 1,54 2,55 8,00 0,03 63,61 2,72 0,14 dm-4 0,00 0,00 0,19 0,88 1,54 7,05 8,00 0,04 33,91 12,84 1,38 dm-5 0,00 0,00 0,10 0,04 0,83 0,33 8,00 0,00 16,22 2,63 0,04 dm-6 0,00 0,00 0,00 0,00 0,00 0,00 8,00 0,00 4,88 4,88 0,00 dm-7 0,00 0,00 0,00 0,00 0,00 0,00 8,00 0,00 4,37 4,37 0,00 dm-8 0,00 0,00 0,00 0,00 0,00 0,00 8,00 0,00 4,69 4,69 0,00 dm-9 0,00 0,00 0,00 0,00 0,00 0,00 8,00 0,00 5,71 5,71 0,00 dm-10 0,00 0,00 0,00 0,00 0,00 0,00 8,00 0,00 4,65 4,65 0,00 dm-11 0,00 0,00 0,00 0,00 0,00 0,00 8,00 0,00 4,17 4,17 0,00 dm-12 0,00 0,00 0,11 1,34 0,90 10,73 8,00 0,12 76,31 12,61 1,83 dm-13 0,00 0,00 0,01 0,00 0,09 0,01 8,00 0,00 18,70 1,26 0,00 dm-14 0,00 0,00 1,83 1,39 14,66 11,10 8,00 0,18 55,46 2,77 0,89 dm-15 0,00 0,00 0,00 0,00 0,00 0,00 8,00 0,00 5,35 5,31 0,00 dm-16 0,00 0,00 0,18 0,02 4,00 0,38 21,08 0,00 21,20 5,95 0,12 dm-17 0,00 0,00 0,00 0,00 0,01 0,01 18,76 0,00 30,79 26,47 0,00 dm-18 0,00 0,00 1,19 0,02 11,05 0,19 9,24 0,00 3,57 1,20 0,15 dm-19 0,00 0,00 159,62 202,37 9949,08 5022,90 41,36 0,60 29,19 1,55 56,27 dm-20 0,00 0,00 2,39 2,31 19,13 18,48 8,00 0,18 39,23 1,29 0,61 dm-21 0,00 0,00 0,62 2,44 5,00 19,53 8,00 0,11 34,84 5,41 1,66 dm-22 0,00 0,00 0,01 0,03 0,09 0,24 8,00 0,00 21,67 0,53 0,00 dm-23 0,00 0,00 0,75 0,66 6,02 5,32 8,00 0,04 26,32 4,89 0,69 dm-24 0,00 0,00 0,00 0,00 0,00 0,00 8,00 0,00 5,67 5,67 0,00 avg-cpu: %user %nice %system %iowait %steal %idle 0,88 0,00 5,27 81,72 0,00 12,13 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,50 0,00 3,00 0,00 12,00 4,00 0,00 0,00 0,00 0,00 sdb 0,00 559,00 0,00 523,50 0,00 19148,00 36,58 143,87 278,68 1,91 100,00 dm-0 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-1 0,00 0,00 0,00 1,50 0,00 12,00 8,00 0,00 0,00 0,00 0,00 dm-2 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-3 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-4 0,00 0,00 0,00 4,00 0,00 32,00 8,00 0,53 132,50 48,75 19,50 dm-5 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-6 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-7 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-8 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-9 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-10 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-11 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-12 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-13 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-14 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-15 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-16 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-17 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-18 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-19 0,00 0,00 0,00 0,00 0,00 0,00 0,00 316,54 0,00 0,00 100,00 dm-20 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-21 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-22 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-23 0,00 0,00 0,00 1,50 0,00 12,00 8,00 0,08 53,33 36,67 5,50 dm-24 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 avg-cpu: %user %nice %system %iowait %steal %idle 1,40 0,00 5,36 53,87 0,00 39,37 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 sdb 0,00 118,50 9,00 627,00 196,00 15220,50 24,24 137,49 209,69 1,57 100,00 dm-0 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-1 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-2 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-3 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-4 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-5 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-6 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-7 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-8 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-9 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-10 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-11 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-12 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-13 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-14 0,00 0,00 0,00 3,00 0,00 24,00 8,00 0,36 58,33 31,67 9,50 dm-15 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-16 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-17 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-18 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-19 0,00 0,00 4,00 300,50 68,00 9562,00 31,63 226,15 730,23 3,28 100,00 dm-20 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-21 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-22 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-23 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-24 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 >> Agreed, that's really slow. A single SATA drive will get 60 MB/s. Did >> you run Bonnie while the VM's were up and running? >> >> root@host:~# vmstat >> procs -----------memory---------- ---swap-- -----io---- -system-- >> ----cpu---- >> r b swpd free buff cache si so bi bo in cs us sy >> id wa >> 0 0 0 308024 884812 40512932 0 0 464 168 353 92 4 >> 2 84 9 >> >> >> Only one line? That does not help much. Can you run it as 'vmstat 2' and >> let it run while a few slow queries are performed? Then paste all the >> lines? > And maybe a few lines from "iostat -x 2" too. > > BTW what kind of raid is it? Is it hw or sw based? Have you checked health > of the drives? > > Are you sure there's nothing else using the drives (e.g. one of the VMs, > rebuild of the array or something like that)? > > Tomas >
hi, > What does a "normal load" mean? Does that mean a time when the queries are > slow? yes, we are have slow queries (according to postgresql.log) with such load > Are you sure the machine really has 48GB of RAM? Because from the vmstat > output it seems like there's just 32GB. > > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy > id wa > 0 0 1342168 336936 107636 31353512 0 0 14 183 1911 3426 > 2 1 93 4 > > > 1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB strange. we paid for 48G :-) and top and free show 48G /root# free total used free shared buffers cached Mem: 49564860 49310444 254416 0 30908 30329576 -/+ buffers/cache: 18949960 30614900 Swap: 20971512 1370960 19600552 > Otherwise I don't see anything wrong in the output. What is the size of > the database (use pg_database_size to get it)? Did it grow significantly > recently? > there are a number of databases in the cluster on that machine, in the filesystem it adds up to 271G cheers gerhard
> That's why I love dstat, just do this > > $ dstat -C 0,1,2,3,4,5,6,7 > > and you know all you need. dstat looks like a very nice tool, results below .. (now the system load seems a bit lower then before when generating results for vmstat and iostat) >> Good catch, thanks Scott. > Yes, good catch. > > Still, this does not explain why the queries were running fast before, and > why the RAID array is so sluggish. Not to mention that we don't know what > were the conditions when collecting those numbers (were the VMs off or > running?). > the VMs were running. they are in something like production use, so i shouldn't just turn them off .. :-) and the processes in the VMs cause a big portion of the DB load, so turning them off would distort the results ... and thanks again for all the replies!!! :-) ~# dstat -C 0,1,2,3,4,5,6,7 -------cpu0-usage--------------cpu1-usage--------------cpu2-usage--------------cpu3-usage--------------cpu4-usage--------------cpu5-usage--------------cpu6-usage--------------cpu7-usage------ -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| read writ| recv send| in out | int csw 7 1 75 17 0 0: 4 5 84 7 0 0: 5 3 80 12 0 0: 4 3 85 9 0 0: 7 2 75 16 0 0: 4 2 87 8 0 0: 7 2 75 16 0 0: 4 1 87 8 0 0|5071k 2578k| 0 0 |9760B 9431B|2468 4126 0 0 98 2 0 0: 0 0 98 2 0 0: 6 2 22 71 0 0: 5 0 76 19 0 0: 0 12 82 6 0 0: 3 7 88 2 0 0: 3 1 84 12 0 0: 2 0 94 4 0 0|5160k 1376k| 60k 225k| 0 0 |2101 3879 11 1 84 4 0 0: 2 0 93 6 0 0: 3 4 72 22 0 0: 2 2 92 3 0 1: 10 13 22 54 0 1: 6 7 75 12 0 0: 3 0 87 10 0 0: 12 0 81 7 0 0|6640k 1683k| 140k 240k| 0 0 |2860 4617 1 1 29 68 0 1: 12 0 80 8 0 0: 6 0 78 16 0 1: 3 1 80 16 0 0: 14 14 57 16 0 0: 0 11 78 12 0 0: 9 1 83 7 0 0: 0 0 96 4 0 0|4448k 1266k| 102k 336k| 0 0 |2790 4645 0 0 89 11 0 0: 1 0 98 1 0 0: 14 0 57 29 0 0: 1 1 89 9 0 0: 1 15 41 43 0 0: 3 15 75 7 0 0: 3 2 60 35 0 0: 0 0 95 5 0 0| 18M 1622k| 97k 285k| 0 0 |3303 4764 0 0 96 4 0 0: 0 0 99 0 0 1: 1 2 14 83 0 0: 1 25 17 57 0 0: 1 0 87 12 0 0: 1 0 19 80 0 0: 3 3 0 94 0 0: 0 0 48 52 0 0|1320k 19M| 40k 113k| 0 0 |2909 4709 1 0 63 36 0 0: 5 2 88 5 0 0: 34 2 0 63 1 0: 8 8 72 12 0 0: 0 9 85 6 0 0: 1 2 84 13 0 0: 2 1 60 37 0 0: 1 1 62 36 0 0|9160k 5597k| 52k 143k| 32k 0 |2659 4650 4 0 43 53 0 0: 2 0 93 5 0 0: 9 0 63 28 0 0: 3 1 89 7 0 0: 2 9 72 16 0 1: 0 13 81 6 0 0: 9 1 52 38 0 0: 3 0 84 13 0 0|4980k 1358k| 106k 239k| 0 0 |2993 5158 2 1 90 7 0 0: 2 0 95 3 0 0: 2 3 82 13 0 0: 0 0 87 13 0 0: 6 10 32 52 0 0: 2 10 82 6 0 0: 5 0 86 9 0 0: 10 5 81 4 0 0|4376k 2949k| 119k 295k| 0 0 |2729 4630 1 0 93 6 0 0: 2 0 91 6 1 0: 15 4 71 11 0 0: 7 2 90 1 0 0: 13 10 12 65 0 0: 2 13 41 45 0 0: 1 0 97 2 0 0: 1 0 94 5 0 0|3896k 15M| 87k 242k| 0 0 |2809 5514 2 0 98 0 0 0: 0 0 73 27 0 0: 0 0 100 0 0 0: 2 1 29 68 0 0: 4 5 0 92 0 0: 2 5 92 2 0 0: 0 0 100 0 0 0: 1 0 77 22 0 0| 172k 19M| 40k 127k| 0 0 |2221 4069 0 0 48 52 0 0: 0 0 97 3 0 0: 0 0 92 8 0 0: 3 0 91 6 0 0: 2 10 10 78 0 0: 4 10 81 6 0 0: 2 0 29 69 0 0: 1 0 26 73 0 0| 652k 6931k| 66k 233k| 0 0 |2416 4389 6 2 72 21 0 0: 3 1 86 10 0 0: 7 0 60 34 0 0: 2 2 91 6 0 0: 1 13 78 9 0 0: 2 8 84 6 0 0: 2 0 79 19 0 0: 0 2 87 11 0 0|2784k 1456k| 96k 206k| 0 0 |2854 5226 9 4 50 37 0 0: 3 3 84 10 0 0: 4 0 84 12 0 0: 2 3 86 9 0 0: 10 2 73 15 0 0: 3 5 84 8 0 0: 8 4 81 6 0 0: 1 2 84 13 0 0|2952k 1374k| 133k 305k| 0 0 |3249 5076 9 1 78 13 0 0: 4 4 83 9 0 0: 3 1 68 28 0 0: 3 3 82 12 0 0: 9 0 64 26 0 1: 2 1 83 13 0 1: 9 3 63 24 0 1: 3 1 91 5 0 0|3648k 1420k| 188k 444k| 0 0 |3560 5981 3 1 63 33 0 0: 0 1 86 13 0 0: 1 0 67 32 0 0: 1 2 84 12 0 1: 4 2 49 45 0 0: 6 3 82 9 0 0: 1 1 93 5 0 0: 1 1 91 7 0 0|2980k 1385k| 181k 457k| 0 0 |3023 5230 3 5 90 2 0 0: 3 9 84 4 0 0: 2 2 55 41 0 0: 21 3 69 7 0 0: 1 3 76 20 0 0: 2 1 93 4 0 0: 1 3 67 29 0 0: 0 1 93 6 0 0|2796k 1359k| 104k 237k| 0 0 |2339 4598 6 5 74 15 0 0: 0 5 87 8 0 0: 4 0 91 5 0 0: 0 0 98 2 0 0: 11 0 74 16 0 0: 3 1 87 9 0 0: 26 4 53 17 0 0: 0 1 92 7 0 0|1920k 1401k| 107k 278k| 0 0 |2352 4480 2 0 91 7 0 0: 1 10 84 5 0 0: 4 0 93 3 0 0: 3 2 93 2 0 0: 6 2 84 8 0 0: 1 0 97 2 0 0: 2 0 85 13 0 0: 3 2 89 6 0 0|1508k 1397k| 134k 313k| 0 0 |2374 4547 2 0 74 24 0 0: 0 0 97 3 0 0: 1 0 97 2 0 0: 3 2 95 0 0 0: 1 0 96 3 0 0: 4 2 91 4 0 0: 0 5 89 6 0 0: 3 1 91 6 0 0|1464k 1464k| 68k 143k| 0 0 |1839 3950 4 0 91 5 0 0: 5 4 89 3 0 0: 1 0 98 1 0 0: 0 1 97 2 0 0: 2 2 68 28 0 0: 1 0 91 8 0 0: 2 8 70 20 0 0: 1 5 91 2 0 1|1060k 1903k| 133k 142k| 0 0 |2358 4720 3 0 91 6 0 0: 4 3 88 5 0 0: 1 0 95 4 0 0: 1 2 94 3 0 0: 3 1 53 43 0 0: 3 2 93 2 0 0: 0 2 86 13 0 0: 0 4 89 7 0 0|1580k 1332k| 91k 273k| 0 0 |2249 4564 4 0 79 16 0 1: 3 6 84 8 0 0: 1 0 92 7 0 0: 2 3 88 7 0 0: 5 2 86 7 0 0: 2 4 90 3 0 1: 4 1 81 14 0 0: 7 3 81 10 0 0|1128k 1343k| 122k 263k| 0 0 |3262 5503 0 2 60 38 0 0: 3 5 90 3 0 0: 0 11 78 12 0 0: 1 5 91 3 0 0: 2 4 90 4 0 0: 4 2 89 5 0 0: 3 0 94 3 0 0: 4 2 87 8 0 0|1252k 1255k| 329k 494k| 0 0 |3223 5661 2 0 84 13 0 1: 3 6 84 7 0 0: 3 7 83 8 0 0: 5 15 72 8 0 0: 2 1 96 1 0 0: 1 2 93 4 0 0: 4 1 65 30 0 0: 1 0 89 10 0 0|1660k 2513k| 148k 460k| 0 0 |2597 5129 1 2 92 5 0 0: 0 1 41 58 0 0: 10 5 83 2 0 0: 3 18 78 2 0 0: 3 0 93 4 0 0: 3 0 97 0 0 0: 4 0 80 17 0 0: 2 0 91 7 0 0| 652k 15M| 72k 198k| 0 0 |2340 6382 0 1 88 11 0 0: 3 3 88 7 0 0: 2 1 80 17 0 0: 0 13 75 13 0 0: 1 1 73 25 0 0: 1 1 93 5 0 0: 0 1 94 5 0 0: 1 0 86 13 0 0|1568k 1155k| 68k 166k| 0 0 |2348 4738 3 0 54 43 0 0: 4 9 82 6 0 0: 2 0 88 11 0 0: 2 5 88 6 0 0: 1 0 88 11 0 0: 2 0 93 5 0 0: 3 1 80 17 0 0: 0 0 88 12 0 0|1200k 1319k| 113k 233k| 0 0 |2660 5168 0 1 91 8 0 0: 4 6 87 4 0 0: 1 0 72 27 0 0: 2 2 70 26 0 0: 2 0 85 13 0 0: 1 0 94 5 0 0: 2 0 95 3 0 0: 1 1 89 9 0 0|1032k 1346k| 119k 234k| 0 0 |2398 4808 0 1 73 26 0 0: 3 5 83 10 0 0: 1 2 96 1 0 0: 4 0 91 5 0 0: 2 2 90 6 0 0: 2 0 94 4 0 0: 3 4 70 23 0 0: 3 6 85 6 0 0|1272k 1522k| 117k 279k| 0 0 |2739 5237 1 0 56 43 0 0: 0 2 86 12 0 0: 2 1 89 8 0 0: 2 2 92 4 0 0: 7 1 82 10 0 0: 4 1 92 3 0 0: 5 2 80 13 0 0: 4 4 87 6 0 0|1080k 1637k| 141k 345k| 0 0 |2631 5142 0 0 65 35 0 0: 0 1 89 10 0 0: 2 0 84 14 0 0: 1 3 92 4 0 0: 0 0 89 11 0 0: 2 0 91 7 0 0: 2 5 87 6 0 0: 2 7 87 5 0 0|1012k 1412k| 95k 215k| 0 0 |2304 4762 1 0 93 6 0 0: 3 1 95 1 0 0: 2 1 97 0 0 0: 3 3 91 3 0 0: 2 0 72 26 0 0: 1 0 93 6 0 0: 0 7 70 23 0 0: 2 8 82 8 0 0|1152k 1548k| 101k 225k| 0 0 |2334 4706 1 0 93 6 0 0: 2 3 93 2 0 0: 5 2 91 3 0 0: 6 3 91 1 0 0: 5 2 85 8 0 0: 2 1 91 6 0 0: 1 5 60 34 0 0: 2 6 86 6 0 0|1096k 1553k| 126k 289k| 0 0 |2471 4766 0 0 63 37 0 0: 0 0 90 10 0 0: 1 0 99 0 0 0: 0 0 93 7 0 0: 1 0 86 13 0 0: 2 2 87 9 0 0: 0 7 70 24 0 0: 2 3 93 3 0 0|1256k 1262k| 110k 277k| 0 0 |2281 4534 0 0 24 76 0 0: 2 1 81 16 0 0: 1 3 44 52 0 0: 2 2 65 31 0 0: 1 2 75 22 0 0: 1 0 94 5 0 0: 0 4 17 79 0 0: 5 3 93 0 0 0| 240k 19M| 83k 200k| 0 0 |2388 4131 1 0 76 23 0 0: 1 1 94 4 0 0: 5 3 81 11 0 0: 2 4 80 14 0 0: 2 1 56 41 0 0: 0 0 77 23 0 0: 3 7 74 17 0 0: 1 5 90 5 0 0|1108k 3127k| 113k 262k| 0 0 |2584 4930 2 1 90 8 0 0: 2 0 93 5 0 0: 2 1 75 22 0 0: 2 1 83 15 0 0: 0 1 79 20 0 0: 0 0 92 7 0 1: 2 6 77 14 0 1: 2 8 87 4 0 0|1220k 1358k| 152k 318k| 0 12k|2590 5104 3 0 81 16 0 0: 3 1 88 9 0 0: 3 1 93 3 0 0: 2 3 90 5 0 0: 1 0 93 6 0 0: 4 1 92 4 0 0: 5 13 37 46 0 0: 2 7 81 10 0 0|1256k 1437k| 167k 336k| 0 0 |2734 5356 0 0 93 7 0 0: 1 2 88 9 0 0: 2 0 92 5 0 1: 1 2 94 3 0 0: 0 0 50 50 0 0: 0 0 99 1 0 0: 3 11 81 5 0 0: 3 7 86 4 0 0|1108k 1448k| 109k 225k| 0 0 |2081 4316 3 0 42 55 0 0: 0 3 92 5 0 0: 0 0 100 0 0 0: 1 0 92 7 0 0: 3 1 69 27 0 0: 1 0 66 33 0 0: 4 7 56 33 0 0: 3 3 61 33 0 0| 720k 17M| 104k 290k| 0 0 |2300 4168 0 2 59 39 0 0: 3 7 75 15 0 0: 0 6 89 5 0 0: 12 9 75 4 0 0: 1 0 81 18 0 0: 2 1 94 3 0 0: 1 8 56 35 0 0: 3 5 57 35 0 1| 428k 8241k| 96k 259k| 0 124k|2082 5154 1 1 90 8 0 0: 2 2 46 49 0 0: 1 0 93 6 0 0: 9 4 83 4 0 0: 0 0 95 5 0 0: 1 0 98 1 0 0: 6 7 76 11 0 0: 1 4 85 10 0 0| 592k 14M| 82k 194k| 0 0 |2195 5839 4 1 82 13 0 0: 3 0 87 10 0 0: 4 3 81 12 0 0: 0 0 82 18 0 0: 1 1 88 11 0 0: 1 0 90 9 0 0: 5 7 64 24 0 0: 5 3 86 6 0 0|1876k 1273k| 145k 323k| 0 0 |2544 4949 0 1 66 33 0 0: 5 1 89 6 0 0: 0 1 88 12 0 0: 1 3 82 14 0 0: 1 0 86 12 0 1: 1 1 88 10 0 0: 8 3 82 8 0 0: 6 7 84 3 0 0|2544k 1259k| 125k 280k| 0 0 |2487 4776 0 1 85 14 0 0: 0 0 100 0 0 0: 3 0 95 2 0 0: 5 3 91 2 0 0: 0 0 94 6 0 0: 6 1 84 10 0 0: 1 10 50 39 0 0: 4 5 82 9 0 0| 896k 1378k| 92k 181k| 0 0 |2452 4823 0 0 94 6 0 0: 2 1 97 0 0 0: 3 0 94 3 0 0: 0 0 89 11 0 0: 2 3 83 12 0 0: 2 2 83 13 0 0: 2 9 50 40 0 0: 5 10 66 19 0 0| 848k 1237k| 98k 204k| 0 0 |2309 4733 3 0 93 3 0 1: 2 9 83 6 0 0: 1 0 98 1 0 0: 1 3 96 0 0 0: 3 2 84 12 0 0: 1 0 88 11 0 0: 1 0 39 60 0 0: 0 4 88 8 0 0| 992k 1113k| 99k 212k| 0 0 |2318 4696 10 0 87 3 0 0: 1 8 90 2 0 0: 2 0 91 7 0 0: 4 2 90 4 0 0: 4 2 78 16 0 0: 3 0 89 9 0 0: 4 0 90 7 0 0: 0 0 99 1 0 0| 964k 1165k| 119k 291k| 0 0 |2505 4842 8 3 83 6 0 0: 7 0 80 13 0 0: 2 6 67 25 0 0: 4 6 80 10 0 0: 2 1 77 20 0 0: 2 1 87 8 0 2: 2 1 82 16 0 0: 1 0 95 4 0 0| 888k 1154k| 158k 331k| 0 0 |2759 5495 6 0 91 3 0 0: 4 1 93 2 0 0: 0 10 85 5 0 0: 1 10 85 5 0 0: 2 1 37 60 0 0: 0 0 88 12 0 0: 5 0 83 12 0 0: 2 1 88 9 0 0| 992k 1143k| 149k 300k| 0 0 |2606 5054 5 0 75 19 0 1: 7 1 81 11 0 0: 4 11 73 12 0 0: 6 16 62 16 0 0: 5 1 88 6 0 0: 6 1 84 8 0 0: 1 4 71 24 0 1: 1 2 91 6 0 0|1372k 1109k| 193k 434k| 0 0 |3216 5887 3 1 90 6 0 0: 10 3 80 7 0 0: 0 2 92 6 0 0: 2 0 94 4 0 0: 0 2 65 33 0 0: 2 0 95 3 0 0: 0 7 76 18 0 0: 5 6 79 10 0 0|1044k 1077k| 191k 409k| 0 0 |2815 5311 1 0 90 9 0 0: 6 4 89 2 0 0: 3 0 92 5 0 0: 6 1 93 0 0 0: 3 3 45 50 0 0: 1 1 88 10 0 0: 3 7 88 3 0 0: 2 7 85 7 0 0| 800k 1214k| 135k 282k| 0 0 |2570 5140 7 1 89 3 0 0: 0 3 91 6 0 0: 1 2 93 4 0 0: 3 4 90 3 0 0: 3 1 39 57 0 0: 1 0 88 11 0 0: 1 4 79 17 0 0: 3 2 85 9 0 1|1060k 1076k| 108k 225k| 0 0 |2681 5304 1 1 85 13 0 0: 4 5 90 2 0 0: 2 5 91 2 0 0: 4 8 83 5 0 0: 2 0 41 58 0 0: 2 2 92 4 0 0: 0 2 62 36 0 0: 0 1 90 9 0 0|1956k 948k| 153k 360k| 0 0 |2802 5189 2 0 92 6 0 0: 4 2 85 9 0 0: 0 2 95 3 0 0: 1 2 93 4 0 0: 2 3 34 61 0 0: 1 3 85 11 0 0: 3 0 55 42 0 0: 1 0 91 8 0 0|1768k 921k| 170k 367k| 0 0 |2511 4701
> That's a good thought, maybe the stats are old and you have bad > plans? It could also be major updates to the data too (as opposed to > growth). we have made checks for number of dead tuples etc recently, but looks ok. and as "everything" in the database seems to be very slow atm, I guess the problem is not caused by bad plans for specific tables/queries. > > Gerhard, have you done an 'explain analyze' on any of your slow > queries? Have you done an analyze lately? > yes we added the 'auto_explain' module to log/analyze queries >= 5000ms. a sample result from the logs (there is lots of stuff in the logs, I selected this query because it is very simple): 2011-09-06 04:00:35 CEST ANWEISUNG: INSERT into keywords.table_x_site_impact (content_id, site_impact_id, site_impact) VALUES (199083087, 1, 1.000000) 2011-09-06 04:00:35 CEST LOG: Dauer: 15159.723 ms Anweisung: INSERT into keywords.table_x_site_impact (content_id, site_impact_id, site_impact) VALUES (199083087, 1 , 1.000000) 2011-09-06 04:00:35 CEST LOG: duration: 15159.161 ms plan: Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.017..0.019 rows=1 loops=1) Output: nextval('keywords.table_x_site_impact_internal_id_seq'::regclass), 199083087::bigint, 1::smallint, 1::double precision
On 6 Září 2011, 10:55, Gerhard Wohlgenannt wrote: > >> That's why I love dstat, just do this >> >> $ dstat -C 0,1,2,3,4,5,6,7 >> >> and you know all you need. > > dstat looks like a very nice tool, results below .. > (now the system load seems a bit lower then before when generating > results for vmstat and iostat) >>> Good catch, thanks Scott. >> Yes, good catch. >> >> Still, this does not explain why the queries were running fast before, >> and >> why the RAID array is so sluggish. Not to mention that we don't know >> what >> were the conditions when collecting those numbers (were the VMs off or >> running?). >> > the VMs were running. they are in something like production use, so i > shouldn't just turn them off .. :-) > and the processes in the VMs cause a big portion of the DB load, so > turning them off would distort the results ... Distort the results? If you want to measure the RAID performance, you have to do that when there are no other processes using it. > and thanks again for all the replies!!! :-) Please, use something like pastebin.com to post there results. It was bearable for the vmstat output but this is alamost unreadable due to the wrapping. > ~# dstat -C 0,1,2,3,4,5,6,7 > -------cpu0-usage--------------cpu1-usage--------------cpu2-usage--------------cpu3-usage--------------cpu4-usage--------------cpu5-usage--------------cpu6-usage--------------cpu7-usage------ > -dsk/total- -net/total- ---paging-- ---system-- > usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq > siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq > siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| read writ| recv > send| in out | int csw > 7 1 75 17 0 0: 4 5 84 7 0 0: 5 3 80 12 0 > 0: 4 3 85 9 0 0: 7 2 75 16 0 0: 4 2 87 8 0 > 0: 7 2 75 16 0 0: 4 1 87 8 0 0|5071k 2578k| 0 0 > |9760B 9431B|2468 4126 ... But if I read that correctly, the wait for the cores is 17%, 7%, 12%, 9%, 16%, 8%, 16% and 8%, and the cores are mostly idle (idl is about 85%). So it seems there's a low number of processes, switched between the cpus and most of the time they're waiting for the I/O. Given the low values for disk I/O and the iostat output we've seen before, it's obvious there's a lot of random I/O (mostly writes). Let's speculate for a while what could cause this (in arbitrary order): 1) Checkpoints. Something is doing a lot of writes, and with DB that often means a checkpoint is in progress. I'm not sure about your checkpoint_timeout, but you do have 192 segments and about 7GB of shared buffers. That means there may be a lot of dirty buffers (even 100% of the buffers). You're using RAID5 and that really is not a write-friendly RAID version. We don't know actual performance as the bonnie was run with VMs accessing the volume, but RAID10 usually performs much better. Enable log_checkpoints in the config and see what's going on. You can also use iotop to see what processes are doing the writes (it might be a background writer, ...). 2) The RAID is broken and can't handle the load it handled fine before. This is not very likely, as you've mentioned that there were no warnings etc. 3) There are some new jobs that do a lot of I/O. Is there anything new that wasn't running before? I guess you'd mention that. 4) The database significantly grew in a short period of time, and the active part now does not fit into the RAM (page cache), so the data has to be retrieved from the disk. And it's not just about the database size, it's about the active part of the database - if you're suddenly accessing more data, the cache may not be large enough. This is usually a gradual process (cache hit ratio slowly decreases as the database grows), but if the database grew rapidly ... This could be a caused by MVCC, i.e. there may be a lot of dead tuples - have you done a big UPDATE / DELETE or something like that recently? regards Tomas
Hi, On 03.09.2011 09:26, Gerhard Wohlgenannt wrote: > Activating log_min_duration shows for instance this query --- there are > now constantly queries which take absurdely long. 2 things you should check: - if your /var/lib/postgresql is on an ext3 fs, I've seen such things before due to the changes around Linux 2.6.2X that caused long stalls during fsync() (see https://lwn.net/Articles/328363/ etc.). But if all your queries take longer than they should and not only a few, that probably isn't the reason. - regarding your hardware RAID-5, I've seen some controllers switch the write cache off quietly for various reasons (battery too old, issues with disks). Check if it is still enabled, without write caching enabled, most RAID-controllers will just suck. Regards, Marinos