Обсуждение: Sudden drop in DBb performance

Поиск
Список
Период
Сортировка

Sudden drop in DBb performance

От
Gerhard Wohlgenannt
Дата:
Dear list,

we are encountering serious performance problems with our database.
Queries which took around 100ms or less last week now take several seconds.

The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on
hardware as follows:
8-core Intel Xeon CPU with 2.83GHz
48 GB RAM
RAID 5 with 8 SAS disks
PostgreSQL 8.4.8 (installed from the Ubuntu repository).

Additionally to the DB the machine also hosts a few virtual machines. In
the past everything worked very well and the described problem occurs
just out of the blue. We don't know of any postgresql config changes or
anything else which might explain the performance reduction.
We have a number of DBs running in the cluster, and the problem seems to
affect all of them.

We checked the performance of the RAID .. which is reasonable for eg.
"hdparm -tT". Memory is well used, but not swapping.
vmstat shows, that the machine isn't using the swap and the load
shouldn't be also to high:
   root@host:~# vmstat
   procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
    r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
sy id wa
    0  0      0 308024 884812 40512932    0    0   464   168  353   92
4  2 84  9

Bonnie++ results given below, I am no expert at interpreting those :-)


Activating log_min_duration shows for instance this query --- there are
now constantly queries which take absurdely long.

2011-09-02 22:38:18 CEST LOG:  Dauer: 25520.374 ms  Anweisung: SELECT
keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'

db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE
keyword=E'diplomaten';
                                                                                     QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using idx_table_x_keyword on table_x  (cost=0.00..8.29
rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1)
    Index Cond: ((keyword)::text = 'diplomaten'::text)
  Total runtime: 0.087 ms
(3 Zeilen)

db=# \d keywords.table_x
                                          Tabelle »keywords.table_x«
    Spalte   |        Typ
|                                              Attribute

------------+-------------------+------------------------------------------------------------------------------------------------------
  keyword_id | integer           | not null Vorgabewert
nextval('keywords.table_x_keyword_id_seq'::regclass)
  keyword    | character varying |
  so         | double precision  |
Indexe:
     "table_x_pkey" PRIMARY KEY, btree (keyword_id) CLUSTER
     "idx_table_x_keyword" btree (keyword)
Fremdschlüsselverweise von:
     TABLE "keywords.table_x_has" CONSTRAINT
"table_x_has_keyword_id_fkey" FOREIGN KEY (keyword_id) REFERENCES
keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE




Could you be so kind and give us any advice how to track down the
problem or comment on possible reasons???

Thank you very much in advance!!!

Regards,
  heinz + gerhard





             name
|                                               current_setting

----------------------------+-------------------------------------------------------------------------------------------------------------
  version                    | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu,
compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
  archive_command            | /usr/local/sbin/weblyzard-wal-archiver.sh
%p %f
  archive_mode               | on
  checkpoint_segments        | 192
  effective_cache_size       | 25000MB
  external_pid_file          | /var/run/postgresql/8.4-main.pid
  full_page_writes           | on
  geqo                       | on
  lc_collate                 | de_AT.UTF-8
  lc_ctype                   | de_AT.UTF-8
  listen_addresses           | *
  log_line_prefix            | %t
  log_min_duration_statement | 3s
  maintenance_work_mem       | 500MB
  max_connections            | 250
  max_stack_depth            | 2MB
  port                       | 5432
  server_encoding            | UTF8
  shared_buffers             | 7000MB
  ssl                        | on
  TimeZone                   | localtime
  unix_socket_directory      | /var/run/postgresql
  work_mem                   | 256MB


Results of Bonnie++

Version  1.96       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
voyager         95G  1400  93 27804   3 16324   2  2925  96 41636   3
374.9   4
Latency              7576us      233s      164s   15647us   13120ms
3302ms
Version  1.96       ------Sequential Create------ --------Random
Create--------
voyager             -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
                  16   141   0 +++++ +++   146   0   157   0 +++++ +++
172   0
Latency              1020ms     128us    9148ms     598ms      37us
485ms

1.96,1.96,voyager,1,1314988752,95G,,1400,93,27804,3,16324,2,2925,96,41636,3,374.9,4,16,,,,,141,0,+++++,+++,146,0,157,0,+++++,+++,172,0,7576us,233s,164s,15647us,13120ms,3302ms,1020ms,128us,9148ms,598ms,37us,485ms






Re: Sudden drop in DBb performance

От
pasman pasmański
Дата:
Hi.
Autoexplain module allow to log plans and statistics of live queries. Try it.

2011/9/3, Gerhard Wohlgenannt <wohlg@ai.wu.ac.at>:
> Dear list,
>
> we are encountering serious performance problems with our database.
> Queries which took around 100ms or less last week now take several seconds.
>
> The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on
> hardware as follows:
> 8-core Intel Xeon CPU with 2.83GHz
> 48 GB RAM
> RAID 5 with 8 SAS disks
> PostgreSQL 8.4.8 (installed from the Ubuntu repository).
>
> Additionally to the DB the machine also hosts a few virtual machines. In
> the past everything worked very well and the described problem occurs
> just out of the blue. We don't know of any postgresql config changes or
> anything else which might explain the performance reduction.
> We have a number of DBs running in the cluster, and the problem seems to
> affect all of them.
>
> We checked the performance of the RAID .. which is reasonable for eg.
> "hdparm -tT". Memory is well used, but not swapping.
> vmstat shows, that the machine isn't using the swap and the load
> shouldn't be also to high:
>    root@host:~# vmstat
>    procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
> sy id wa
>     0  0      0 308024 884812 40512932    0    0   464   168  353   92
> 4  2 84  9
>
> Bonnie++ results given below, I am no expert at interpreting those :-)
>
>
> Activating log_min_duration shows for instance this query --- there are
> now constantly queries which take absurdely long.
>
> 2011-09-02 22:38:18 CEST LOG:  Dauer: 25520.374 ms  Anweisung: SELECT
> keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'
>
> db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE
> keyword=E'diplomaten';
>
>         QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Index Scan using idx_table_x_keyword on table_x  (cost=0.00..8.29
> rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1)
>     Index Cond: ((keyword)::text = 'diplomaten'::text)
>   Total runtime: 0.087 ms
> (3 Zeilen)
>
> db=# \d keywords.table_x
>                                           Tabelle »keywords.table_x«
>     Spalte   |        Typ
> |                                              Attribute
>
------------+-------------------+------------------------------------------------------------------------------------------------------
>   keyword_id | integer           | not null Vorgabewert
> nextval('keywords.table_x_keyword_id_seq'::regclass)
>   keyword    | character varying |
>   so         | double precision  |
> Indexe:
>      "table_x_pkey" PRIMARY KEY, btree (keyword_id) CLUSTER
>      "idx_table_x_keyword" btree (keyword)
> Fremdschlüsselverweise von:
>      TABLE "keywords.table_x_has" CONSTRAINT
> "table_x_has_keyword_id_fkey" FOREIGN KEY (keyword_id) REFERENCES
> keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE
>
>
>
>
> Could you be so kind and give us any advice how to track down the
> problem or comment on possible reasons???
>
> Thank you very much in advance!!!
>
> Regards,
>   heinz + gerhard
>
>
>
>
>
>              name
> |                                               current_setting
>
----------------------------+-------------------------------------------------------------------------------------------------------------
>   version                    | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu,
> compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
>   archive_command            | /usr/local/sbin/weblyzard-wal-archiver.sh
> %p %f
>   archive_mode               | on
>   checkpoint_segments        | 192
>   effective_cache_size       | 25000MB
>   external_pid_file          | /var/run/postgresql/8.4-main.pid
>   full_page_writes           | on
>   geqo                       | on
>   lc_collate                 | de_AT.UTF-8
>   lc_ctype                   | de_AT.UTF-8
>   listen_addresses           | *
>   log_line_prefix            | %t
>   log_min_duration_statement | 3s
>   maintenance_work_mem       | 500MB
>   max_connections            | 250
>   max_stack_depth            | 2MB
>   port                       | 5432
>   server_encoding            | UTF8
>   shared_buffers             | 7000MB
>   ssl                        | on
>   TimeZone                   | localtime
>   unix_socket_directory      | /var/run/postgresql
>   work_mem                   | 256MB
>
>
> Results of Bonnie++
>
> Version  1.96       ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> voyager         95G  1400  93 27804   3 16324   2  2925  96 41636   3
> 374.9   4
> Latency              7576us      233s      164s   15647us   13120ms
> 3302ms
> Version  1.96       ------Sequential Create------ --------Random
> Create--------
> voyager             -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> /sec %CP
>                   16   141   0 +++++ +++   146   0   157   0 +++++ +++
> 172   0
> Latency              1020ms     128us    9148ms     598ms      37us
> 485ms
>
1.96,1.96,voyager,1,1314988752,95G,,1400,93,27804,3,16324,2,2925,96,41636,3,374.9,4,16,,,,,141,0,+++++,+++,146,0,157,0,+++++,+++,172,0,7576us,233s,164s,15647us,13120ms,3302ms,1020ms,128us,9148ms,598ms,37us,485ms
>
>
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


--
------------
pasman

Re: Sudden drop in DBb performance

От
"Tomas Vondra"
Дата:
On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote:
> Dear list,
>
> we are encountering serious performance problems with our database.
> Queries which took around 100ms or less last week now take several
> seconds.
>
> The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on
> hardware as follows:
> 8-core Intel Xeon CPU with 2.83GHz
> 48 GB RAM
> RAID 5 with 8 SAS disks
> PostgreSQL 8.4.8 (installed from the Ubuntu repository).
>
> Additionally to the DB the machine also hosts a few virtual machines. In
> the past everything worked very well and the described problem occurs
> just out of the blue. We don't know of any postgresql config changes or
> anything else which might explain the performance reduction.
> We have a number of DBs running in the cluster, and the problem seems to
> affect all of them.

What are the virtual machines doing? Are you sure they are not doing a lot
of IO?

>
> We checked the performance of the RAID .. which is reasonable for eg.
> "hdparm -tT". Memory is well used, but not swapping.
> vmstat shows, that the machine isn't using the swap and the load
> shouldn't be also to high:
>    root@host:~# vmstat
>    procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
> sy id wa
>     0  0      0 308024 884812 40512932    0    0   464   168  353   92
> 4  2 84  9
>
> Bonnie++ results given below, I am no expert at interpreting those :-)
>
>
> Activating log_min_duration shows for instance this query --- there are
> now constantly queries which take absurdely long.
>
> 2011-09-02 22:38:18 CEST LOG:  Dauer: 25520.374 ms  Anweisung: SELECT
> keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'
>
> db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE
> keyword=E'diplomaten';
>                                                                                      QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Index Scan using idx_table_x_keyword on table_x  (cost=0.00..8.29
> rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1)
>     Index Cond: ((keyword)::text = 'diplomaten'::text)
>   Total runtime: 0.087 ms
> (3 Zeilen)
>
> db=# \d keywords.table_x
>                                           Tabelle »keywords.table_x«
>     Spalte   |        Typ
> |                                              Attribute
>
------------+-------------------+------------------------------------------------------------------------------------------------------
>   keyword_id | integer           | not null Vorgabewert
> nextval('keywords.table_x_keyword_id_seq'::regclass)
>   keyword    | character varying |
>   so         | double precision  |
> Indexe:
>      "table_x_pkey" PRIMARY KEY, btree (keyword_id) CLUSTER
>      "idx_table_x_keyword" btree (keyword)
> Fremdschlüsselverweise von:
>      TABLE "keywords.table_x_has" CONSTRAINT
> "table_x_has_keyword_id_fkey" FOREIGN KEY (keyword_id) REFERENCES
> keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE

But in this explain analyze, the query finished in 41 ms. Use auto-explain
contrib module to see the explain plan of the slow execution.

> Could you be so kind and give us any advice how to track down the
> problem or comment on possible reasons???

One of the things

>
> Thank you very much in advance!!!
>
> Regards,
>   heinz + gerhard
>
>
>
>
>
>              name
> |                                               current_setting
>
----------------------------+-------------------------------------------------------------------------------------------------------------
>   version                    | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu,
> compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
>   archive_command            | /usr/local/sbin/weblyzard-wal-archiver.sh
> %p %f
>   archive_mode               | on
>   checkpoint_segments        | 192
>   effective_cache_size       | 25000MB
>   external_pid_file          | /var/run/postgresql/8.4-main.pid
>   full_page_writes           | on
>   geqo                       | on
>   lc_collate                 | de_AT.UTF-8
>   lc_ctype                   | de_AT.UTF-8
>   listen_addresses           | *
>   log_line_prefix            | %t
>   log_min_duration_statement | 3s
>   maintenance_work_mem       | 500MB
>   max_connections            | 250
>   max_stack_depth            | 2MB
>   port                       | 5432
>   server_encoding            | UTF8
>   shared_buffers             | 7000MB
>   ssl                        | on
>   TimeZone                   | localtime
>   unix_socket_directory      | /var/run/postgresql
>   work_mem                   | 256MB
>
>
> Results of Bonnie++
>
> Version  1.96       ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> voyager         95G  1400  93 27804   3 16324   2  2925  96 41636   3
> 374.9   4
> Latency              7576us      233s      164s   15647us   13120ms
> 3302ms
> Version  1.96       ------Sequential Create------ --------Random
> Create--------
> voyager             -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> /sec %CP
>                   16   141   0 +++++ +++   146   0   157   0 +++++ +++
> 172   0
> Latency              1020ms     128us    9148ms     598ms      37us
> 485ms
>

That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait
slow with 8 drives.

Tomas


Re: Sudden drop in DBb performance

От
Andy Colson
Дата:
On 09/05/2011 02:48 AM, Tomas Vondra wrote:
> On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote:
>> Dear list,
>>
>> we are encountering serious performance problems with our database.
>> Queries which took around 100ms or less last week now take several
>> seconds.

>> Results of Bonnie++
>>
>> Version  1.96       ------Sequential Output------ --Sequential Input-
>> --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>> --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>> /sec %CP
>> voyager         95G  1400  93 27804   3 16324   2  2925  96 41636   3
>> 374.9   4
>> Latency              7576us      233s      164s   15647us   13120ms
>> 3302ms
>> Version  1.96       ------Sequential Create------ --------Random
>> Create--------
>> voyager             -Create-- --Read--- -Delete-- -Create-- --Read---
>> -Delete--
>>                 files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>> /sec %CP
>>                    16   141   0 +++++ +++   146   0   157   0 +++++ +++
>> 172   0
>> Latency              1020ms     128us    9148ms     598ms      37us
>> 485ms
>>
>
> That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait
> slow with 8 drives.
>
> Tomas
>
>

Agreed, that's really slow.  A single SATA drive will get 60 MB/s.  Did you run Bonnie while the VM's were up and
running?

   root@host:~# vmstat
   procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
    0  0      0 308024 884812 40512932    0    0   464   168  353   92  4  2 84  9


Only one line?  That does not help much.  Can you run it as 'vmstat 2' and let it run while a few slow queries are
performed? Then paste all the lines? 


-Andy

Re: Sudden drop in DBb performance

От
"Tomas Vondra"
Дата:
On 5 Září 2011, 15:51, Andy Colson wrote:
> On 09/05/2011 02:48 AM, Tomas Vondra wrote:
>> That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait
>> slow with 8 drives.
>>
>> Tomas
>>
>>
>
> Agreed, that's really slow.  A single SATA drive will get 60 MB/s.  Did
> you run Bonnie while the VM's were up and running?
>
>    root@host:~# vmstat
>    procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id wa
>     0  0      0 308024 884812 40512932    0    0   464   168  353   92  4
> 2 84  9
>
>
> Only one line?  That does not help much.  Can you run it as 'vmstat 2' and
> let it run while a few slow queries are performed?  Then paste all the
> lines?

And maybe a few lines from "iostat -x 2" too.

BTW what kind of raid is it? Is it hw or sw based? Have you checked health
of the drives?

Are you sure there's nothing else using the drives (e.g. one of the VMs,
rebuild of the array or something like that)?

Tomas


Re: Sudden drop in DBb performance

От
"Tomas Vondra"
Дата:
On 5 Září 2011, 16:08, Gerhard Wohlgenannt wrote:
> Below please find the results of vmstat 2 over some periode of time ..
> with normal database / system load.

What does a "normal load" mean? Does that mean a time when the queries are
slow?

Are you sure the machine really has 48GB of RAM? Because from the vmstat
output it seems like there's just 32GB.

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
  0  0 1342168 336936 107636 31353512    0    0    14   183 1911 3426
2  1 93  4


1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB

BTW there's 1.3GB of swap, although it's not used heavily (according to
the vmstat output).

Otherwise I don't see anything wrong in the output. What is the size of
the database (use pg_database_size to get it)? Did it grow significantly
recently?

Tomas


Re: Sudden drop in DBb performance

От
Andy Colson
Дата:
On 09/05/2011 09:39 AM, Tomas Vondra wrote:
> On 5 Září 2011, 16:08, Gerhard Wohlgenannt wrote:
>> Below please find the results of vmstat 2 over some periode of time ..
>> with normal database / system load.
>
> What does a "normal load" mean? Does that mean a time when the queries are
> slow?
>
> Are you sure the machine really has 48GB of RAM? Because from the vmstat
> output it seems like there's just 32GB.
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>    r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id wa
>    0  0 1342168 336936 107636 31353512    0    0    14   183 1911 3426
> 2  1 93  4
>
>
> 1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB
>
> BTW there's 1.3GB of swap, although it's not used heavily (according to
> the vmstat output).
>
> Otherwise I don't see anything wrong in the output. What is the size of
> the database (use pg_database_size to get it)? Did it grow significantly
> recently?
>
> Tomas
>

Yeah, its interesting that it swapped in memory, but never out.  Looking at this vmstat, it does not look like a
hardwareproblem.(Assuming "normal load" means slow queries) 

> Did it grow significantly recently?

That's a good thought,  maybe the stats are old and you have bad plans?  It could also be major updates to the data too
(asopposed to growth). 

Gerhard, have you done an 'explain analyze' on any of your slow queries?   Have you done an analyze lately?

-Andy

Re: Sudden drop in DBb performance

От
Gerhard Wohlgenannt
Дата:
On 09/05/2011 03:51 PM, Andy Colson wrote:
> On 09/05/2011 02:48 AM, Tomas Vondra wrote:
>> On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote:
>>> Dear list,
>>>
>>> we are encountering serious performance problems with our database.
>>> Queries which took around 100ms or less last week now take several
>>> seconds.
>
>>> Results of Bonnie++
>>>
>>> Version  1.96       ------Sequential Output------ --Sequential Input-
>>> --Random-
>>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>>> --Seeks--
>>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>>> /sec %CP
>>> voyager         95G  1400  93 27804   3 16324   2  2925  96 41636   3
>>> 374.9   4
>>> Latency              7576us      233s      164s   15647us   13120ms
>>> 3302ms
>>> Version  1.96       ------Sequential Create------ --------Random
>>> Create--------
>>> voyager             -Create-- --Read--- -Delete-- -Create-- --Read---
>>> -Delete--
>>>                 files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>>> /sec %CP
>>>                    16   141   0 +++++ +++   146   0   157   0 +++++ +++
>>> 172   0
>>> Latency              1020ms     128us    9148ms     598ms      37us
>>> 485ms
>>>
>>
>> That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait
>> slow with 8 drives.
>>
>> Tomas
>>
>>
>
> Agreed, that's really slow.  A single SATA drive will get 60 MB/s.
> Did you run Bonnie while the VM's were up and running?
>
>   root@host:~# vmstat
>   procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>    r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
> sy id wa
>    0  0      0 308024 884812 40512932    0    0   464   168  353   92
> 4  2 84  9
>
>
> Only one line?  That does not help much.  Can you run it as 'vmstat 2'
> and let it run while a few slow queries are performed?  Then paste all
> the lines?
>
>
> -Andy

Hi Andy,

thanks a lot for your help.

Below please find the results of vmstat 2 over some periode of time ..
with normal database / system load.

  0  0 1344332 237196 104140 31468412    0    0   330   102 4322 7130
4  2 90  4
  1  1 1344332 236708 104144 31469000    0    0   322   105 2096 3723
1  2 92  5
  2  1 1344204 240924 104156 31462484  350    0  1906   234 3687 4512
12  3 77  9
  0  0 1344200 238372 104168 31462452    0    0     8   109 4050 8376
8  3 86  3
  0  0 1344200 232668 104168 31462468    0    0    12   158 2036 3633
2  2 92  3
  0  3 1344196 282784 104180 31413384    4    0  1768   343 2490 4391
1  2 84 13
  1  1 1344196 278188 104192 31416080    0    0  1392   341 2215 3850
1  2 82 15
  0  0 1344120 276964 104608 31416904   90    0   634   304 2390 3949
4  2 86  8
  1  1 1344120 277096 104628 31417752    0    0   492   378 2394 3866
2  1 87 10
  0  1 1344120 274476 104628 31418620    0    0   260   233 1997 3255
2  1 91  6
  1  1 1344120 276584 104628 31418808    0    0   128   208 2015 3266
2  1 91  6
  0  0 1343672 272352 106288 31418788  694    0  1346   344 2170 3660
3  1 89  6
  0  1 1343632 270220 107648 31419152   48    0   468   490 2356 3622
4  2 88  5
  0  0 1343624 270708 107660 31419344   20    0   228   138 2086 3518
2  3 91  4
  0  1 1343612 268732 107660 31419584   12    0   168   112 2100 3585
3  2 91  3
  0  0 1343544 266616 107660 31420112   14    0   154    73 2059 3719
3  2 93  3
  0  1 1343540 267368 107684 31420168    0    0    78   260 2256 3970
3  2 90  6
  0  1 1343540 268352 107692 31420356    0    0    94   284 2239 4086
2  2 89  6
  0  0 1343540 274064 107692 31423584    0    0  1622   301 2322 4258
2  3 83 13
  0  2 1343440 273064 107704 31423696   96    0   106   180 2158 3795
3  2 90  5
  0  0 1342184 262888 107708 31426040  840    0  2014   146 2309 3713
5  3 83  9
  0  0 1342184 261904 107732 31426128    0    0    60   158 1893 3510
1  3 91  5
  2  0 1342184 258680 107732 31427436    0    0   794   114 2160 3647
2  3 90  5
  0  2 1342176 258184 107744 31428308   24    0   310   116 1943 3335
2  2 91  4
  1  0 1342172 259068 107756 31428700    2    0   138   143 1976 3468
1  1 93  5
  0  0 1342172 258084 107756 31429948    0    0   620    88 2117 3565
3  1 90  6
  0  0 1342172 258456 107952 31430028    0    0    62   305 2174 3827
1  2 91  6
  1  0 1342172 257480 107952 31430636    0    0   300   256 2316 3959
3  2 86  8
  0  0 1342172 257720 107952 31430772    0    0    46   133 2411 4047
3  2 91  3
  1  2 1342172 257844 107976 31430776    0    0   136   184 2111 3841
1  1 92  6
  1  2 1342172 338376 107576 31349412    0    0   462  8615 2655 5508
5  3 79 13
  1  2 1342172 340772 107580 31351080    0    0   682   377 2503 4022
2  1 87 10
  1  2 1342172 335688 107596 31351992    0    0   548   306 2480 3867
4  1 86  9
  0  2 1342168 337432 107608 31352704    0    0   224   188 1919 3158
1  1 93  6
  0  0 1342168 337804 107608 31353020    0    0   154   249 1933 3175
1  1 92  6
  0  1 1342168 335944 107636 31353464    0    0   212   173 1912 3280
4  2 89  5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
  0  0 1342168 336936 107636 31353512    0    0    14   183 1911 3426
2  1 93  4
  0  1 1342168 334440 107656 31353736    0    0   264   372 2119 3400
6  2 84  8
  0  0 1342164 334084 107680 31354468    0    0   302   413 2361 3613
2  1 87 10
  2  0 1342160 342764 107680 31354916    8    0   184   332 2142 3117
1  1 90  7
  0  1 1342160 343788 107680 31355808    0    0   360   211 2247 3249
1  2 91  5
  2  1 1342156 340804 107704 31355904    0    0    88   280 2287 3448
2  2 90  6
  0  1 1342156 344276 107704 31356464    0    0   316   276 2050 3298
1  2 90  7
  0  0 1342156 344160 107712 31356576    0    0     4   225 1884 3194
1  3 90  6
  0  0 1342152 342548 107724 31356688    0    0    52   231 1963 3232
1  3 89  6
  2  1 1342152 343664 107724 31356764    0    0   104   348 2643 3614
3  2 88  8
  1  1 1342144 341060 107760 31357080   16    0   120   307 2511 3474
4  3 87  7
  1  0 1342140 342332 107780 31357500    8    0   206   193 2243 3448
4  2 89  5
  1  0 1342136 339472 107780 31357508    0    0    32   142 4290 3799
6  3 87  4
  0  0 1342136 341160 107780 31357992    0    0   216   171 2613 3995
4  2 88  5
  0  0 1342136 342168 107820 31357988    0    0    26   140 2347 3753
3  4 89  4
  0  0 1342136 342532 107820 31358128    0    0    36   155 2119 3653
2  1 91  5
  2  0 1342136 341564 107828 31358144    0    0     0   151 1973 3486
4  2 90  4
  1  1 1342136 342076 107852 31358416    0    0   148   284 2251 3857
6  2 84  8
  0  1 1342136 339944 107852 31359284    0    0   482   478 2902 5210
4  2 84 10
  0  1 1342136 342184 107852 31359836    0    0   238   372 2292 4063
2  1 88  9

cheers gerhard


Re: Sudden drop in DBb performance

От
Gerhard Wohlgenannt
Дата:
hi,

thanks a lot for your help!

>> Dear list,
>>
>> we are encountering serious performance problems with our database.
>> Queries which took around 100ms or less last week now take several
>> seconds.
>>
>> The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on
>> hardware as follows:
>> 8-core Intel Xeon CPU with 2.83GHz
>> 48 GB RAM
>> RAID 5 with 8 SAS disks
>> PostgreSQL 8.4.8 (installed from the Ubuntu repository).
>>
>> Additionally to the DB the machine also hosts a few virtual machines. In
>> the past everything worked very well and the described problem occurs
>> just out of the blue. We don't know of any postgresql config changes or
>> anything else which might explain the performance reduction.
>> We have a number of DBs running in the cluster, and the problem seems to
>> affect all of them.
> What are the virtual machines doing? Are you sure they are not doing a lot
> of IO?

we also have a ssd-disk in the machine, and the virtual machines do most
of their IO on that. But there sure also is some amount of IO onto the
systems raid array. maybe we should consider having a dedicated database
server.

>> We checked the performance of the RAID .. which is reasonable for eg.
>> "hdparm -tT". Memory is well used, but not swapping.
>> vmstat shows, that the machine isn't using the swap and the load
>> shouldn't be also to high:
>>     root@host:~# vmstat
>>     procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>>      r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
>> sy id wa
>>      0  0      0 308024 884812 40512932    0    0   464   168  353   92
>> 4  2 84  9
>>
>> Bonnie++ results given below, I am no expert at interpreting those :-)
>>
>>
>> Activating log_min_duration shows for instance this query --- there are
>> now constantly queries which take absurdely long.
>>
>> 2011-09-02 22:38:18 CEST LOG:  Dauer: 25520.374 ms  Anweisung: SELECT
>> keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'
>>
>> db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE
>> keyword=E'diplomaten';
>>                                                                                       QUERY
>> PLAN
>>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>    Index Scan using idx_table_x_keyword on table_x  (cost=0.00..8.29
>> rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1)
>>      Index Cond: ((keyword)::text = 'diplomaten'::text)
>>    Total runtime: 0.087 ms
>> (3 Zeilen)
>>
>> db=# \d keywords.table_x
>>                                            Tabelle »keywords.table_x«
>>      Spalte   |        Typ
>> |                                              Attribute
>>
------------+-------------------+------------------------------------------------------------------------------------------------------
>>    keyword_id | integer           | not null Vorgabewert
>> nextval('keywords.table_x_keyword_id_seq'::regclass)
>>    keyword    | character varying |
>>    so         | double precision  |
>> Indexe:
>>       "table_x_pkey" PRIMARY KEY, btree (keyword_id) CLUSTER
>>       "idx_table_x_keyword" btree (keyword)
>> Fremdschlüsselverweise von:
>>       TABLE "keywords.table_x_has" CONSTRAINT
>> "table_x_has_keyword_id_fkey" FOREIGN KEY (keyword_id) REFERENCES
>> keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE
> But in this explain analyze, the query finished in 41 ms. Use auto-explain
> contrib module to see the explain plan of the slow execution.

thanks. we will use auto_explain as soon as some long running updates
are finished (don't won't to kill them)

cheers gerhard

Re: Sudden drop in DBb performance

От
Gerhard Wohlgenannt
Дата:
hi,

thanks a lot for your help!

>> Dear list,
>>
>> we are encountering serious performance problems with our database.
>> Queries which took around 100ms or less last week now take several
>> seconds.
>>
>> The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on
>> hardware as follows:
>> 8-core Intel Xeon CPU with 2.83GHz
>> 48 GB RAM
>> RAID 5 with 8 SAS disks
>> PostgreSQL 8.4.8 (installed from the Ubuntu repository).
>>
>> Additionally to the DB the machine also hosts a few virtual machines. In
>> the past everything worked very well and the described problem occurs
>> just out of the blue. We don't know of any postgresql config changes or
>> anything else which might explain the performance reduction.
>> We have a number of DBs running in the cluster, and the problem seems to
>> affect all of them.
> What are the virtual machines doing? Are you sure they are not doing a lot
> of IO?

we also have a ssd-disk in the machine, and the virtual machines do most
of their IO on that. But there sure also is some amount of I/O onto the
systems raid array coming from the virtual machines. maybe we should
consider having a dedicated database server.

>> We checked the performance of the RAID .. which is reasonable for eg.
>> "hdparm -tT". Memory is well used, but not swapping.
>> vmstat shows, that the machine isn't using the swap and the load
>> shouldn't be also to high:
>>     root@host:~# vmstat
>>     procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>>      r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
>> sy id wa
>>      0  0      0 308024 884812 40512932    0    0   464   168  353   92
>> 4  2 84  9
>>
>> Bonnie++ results given below, I am no expert at interpreting those :-)
>>
>>
>> Activating log_min_duration shows for instance this query --- there are
>> now constantly queries which take absurdely long.
>>
>> 2011-09-02 22:38:18 CEST LOG:  Dauer: 25520.374 ms  Anweisung: SELECT
>> keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'
>>
>> db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE
>> keyword=E'diplomaten';
>>                                                                                       QUERY
>> PLAN
>>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>    Index Scan using idx_table_x_keyword on table_x  (cost=0.00..8.29
>> rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1)
>>      Index Cond: ((keyword)::text = 'diplomaten'::text)
>>    Total runtime: 0.087 ms
>> (3 Zeilen)
>>
>> db=# \d keywords.table_x
>>                                            Tabelle »keywords.table_x«
>>      Spalte   |        Typ
>> |                                              Attribute
>>
------------+-------------------+------------------------------------------------------------------------------------------------------
>>    keyword_id | integer           | not null Vorgabewert
>> nextval('keywords.table_x_keyword_id_seq'::regclass)
>>    keyword    | character varying |
>>    so         | double precision  |
>> Indexe:
>>       "table_x_pkey" PRIMARY KEY, btree (keyword_id) CLUSTER
>>       "idx_table_x_keyword" btree (keyword)
>> Fremdschlüsselverweise von:
>>       TABLE "keywords.table_x_has" CONSTRAINT
>> "table_x_has_keyword_id_fkey" FOREIGN KEY (keyword_id) REFERENCES
>> keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE
> But in this explain analyze, the query finished in 41 ms. Use auto-explain
> contrib module to see the explain plan of the slow execution.

thanks. we will use auto_explain as soon as some long running updates
are finished (don't want to kill them)

cheers gerhard

Re: Sudden drop in DBb performance

От
Scott Marlowe
Дата:
On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenannt <wohlg@ai.wu.ac.at> wrote:
> Below please find the results of vmstat 2 over some periode of time .. with
> normal database / system load.
>
>  0  0 1344332 237196 104140 31468412    0    0   330   102 4322 7130  4  2
> 90  4
>  1  1 1344332 236708 104144 31469000    0    0   322   105 2096 3723  1  2
> 92  5
>  2  1 1344204 240924 104156 31462484  350    0  1906   234 3687 4512 12  3
> 77  9
>  0  0 1344200 238372 104168 31462452    0    0     8   109 4050 8376  8  3
> 86  3
>  0  0 1344200 232668 104168 31462468    0    0    12   158 2036 3633  2  2
> 92  3
>  0  3 1344196 282784 104180 31413384    4    0  1768   343 2490 4391  1  2
> 84 13
>  1  1 1344196 278188 104192 31416080    0    0  1392   341 2215 3850  1  2
> 82 15
>  0  0 1344120 276964 104608 31416904   90    0   634   304 2390 3949  4  2
> 86  8
>  1  1 1344120 277096 104628 31417752    0    0   492   378 2394 3866  2  1
> 87 10
>  0  1 1344120 274476 104628 31418620    0    0   260   233 1997 3255  2  1
> 91  6
>  1  1 1344120 276584 104628 31418808    0    0   128   208 2015 3266  2  1
> 91  6
>  0  0 1343672 272352 106288 31418788  694    0  1346   344 2170 3660  3  1
> 89  6
>  0  1 1343632 270220 107648 31419152   48    0   468   490 2356 3622  4  2
> 88  5
>  0  0 1343624 270708 107660 31419344   20    0   228   138 2086 3518  2  3
> 91  4
>  0  1 1343612 268732 107660 31419584   12    0   168   112 2100 3585  3  2
> 91  3
>  0  0 1343544 266616 107660 31420112   14    0   154    73 2059 3719  3  2
> 93  3
>  0  1 1343540 267368 107684 31420168    0    0    78   260 2256 3970  3  2
> 90  6
>  0  1 1343540 268352 107692 31420356    0    0    94   284 2239 4086  2  2
> 89  6
>  0  0 1343540 274064 107692 31423584    0    0  1622   301 2322 4258  2  3
> 83 13
>  0  2 1343440 273064 107704 31423696   96    0   106   180 2158 3795  3  2
> 90  5
>  0  0 1342184 262888 107708 31426040  840    0  2014   146 2309 3713  5  3
> 83  9
>  0  0 1342184 261904 107732 31426128    0    0    60   158 1893 3510  1  3
> 91  5
>  2  0 1342184 258680 107732 31427436    0    0   794   114 2160 3647  2  3
> 90  5
>  0  2 1342176 258184 107744 31428308   24    0   310   116 1943 3335  2  2
> 91  4
>  1  0 1342172 259068 107756 31428700    2    0   138   143 1976 3468  1  1
> 93  5
>  0  0 1342172 258084 107756 31429948    0    0   620    88 2117 3565  3  1
> 90  6
>  0  0 1342172 258456 107952 31430028    0    0    62   305 2174 3827  1  2
> 91  6
>  1  0 1342172 257480 107952 31430636    0    0   300   256 2316 3959  3  2
> 86  8
>  0  0 1342172 257720 107952 31430772    0    0    46   133 2411 4047  3  2
> 91  3
>  1  2 1342172 257844 107976 31430776    0    0   136   184 2111 3841  1  1
> 92  6
>  1  2 1342172 338376 107576 31349412    0    0   462  8615 2655 5508  5  3
> 79 13
>  1  2 1342172 340772 107580 31351080    0    0   682   377 2503 4022  2  1
> 87 10
>  1  2 1342172 335688 107596 31351992    0    0   548   306 2480 3867  4  1
> 86  9
>  0  2 1342168 337432 107608 31352704    0    0   224   188 1919 3158  1  1
> 93  6
>  0  0 1342168 337804 107608 31353020    0    0   154   249 1933 3175  1  1
> 92  6
>  0  1 1342168 335944 107636 31353464    0    0   212   173 1912 3280  4  2
> 89  5
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  0 1342168 336936 107636 31353512    0    0    14   183 1911 3426  2  1
> 93  4
>  0  1 1342168 334440 107656 31353736    0    0   264   372 2119 3400  6  2
> 84  8
>  0  0 1342164 334084 107680 31354468    0    0   302   413 2361 3613  2  1
> 87 10
>  2  0 1342160 342764 107680 31354916    8    0   184   332 2142 3117  1  1
> 90  7
>  0  1 1342160 343788 107680 31355808    0    0   360   211 2247 3249  1  2
> 91  5
>  2  1 1342156 340804 107704 31355904    0    0    88   280 2287 3448  2  2
> 90  6
>  0  1 1342156 344276 107704 31356464    0    0   316   276 2050 3298  1  2
> 90  7
>  0  0 1342156 344160 107712 31356576    0    0     4   225 1884 3194  1  3
> 90  6
>  0  0 1342152 342548 107724 31356688    0    0    52   231 1963 3232  1  3
> 89  6
>  2  1 1342152 343664 107724 31356764    0    0   104   348 2643 3614  3  2
> 88  8
>  1  1 1342144 341060 107760 31357080   16    0   120   307 2511 3474  4  3
> 87  7
>  1  0 1342140 342332 107780 31357500    8    0   206   193 2243 3448  4  2
> 89  5
>  1  0 1342136 339472 107780 31357508    0    0    32   142 4290 3799  6  3
> 87  4
>  0  0 1342136 341160 107780 31357992    0    0   216   171 2613 3995  4  2
> 88  5
>  0  0 1342136 342168 107820 31357988    0    0    26   140 2347 3753  3  4
> 89  4
>  0  0 1342136 342532 107820 31358128    0    0    36   155 2119 3653  2  1
> 91  5
>  2  0 1342136 341564 107828 31358144    0    0     0   151 1973 3486  4  2
> 90  4
>  1  1 1342136 342076 107852 31358416    0    0   148   284 2251 3857  6  2
> 84  8
>  0  1 1342136 339944 107852 31359284    0    0   482   478 2902 5210  4  2
> 84 10
>  0  1 1342136 342184 107852 31359836    0    0   238   372 2292 4063  2  1
> 88  9

Your IO Wait is actually pretty high.  On an 8 core machine, 12.5%
means one core is doing nothing but waiting for IO.

Re: Sudden drop in DBb performance

От
Andy Colson
Дата:
On 09/05/2011 01:45 PM, Scott Marlowe wrote:
> On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenannt<wohlg@ai.wu.ac.at>  wrote:
>> Below please find the results of vmstat 2 over some periode of time .. with
>> normal database / system load.
>>
2  1 1344204 240924 104156 31462484  350    0  1906   234 3687 4512 12  3 77  9
>
> Your IO Wait is actually pretty high.  On an 8 core machine, 12.5%
> means one core is doing nothing but waiting for IO.
>

My server is 2-core, so these numbers looked fine by me.  I need to remember core count when I look at these.

So the line above, for 2 core's would not worry me a bit, but on 8 cores, it pretty much means one core was pegged
(with9% wait?  Or is it one core was pegged, and another was 72% io wait?) 

I have always loved the vmstat output, but its starting to get confusing when you have to take core's into account.
(Andmy math was never strong in the first place :-) ) 

Good catch, thanks Scott.

-Andy

Re: Sudden drop in DBb performance

От
"Tomas Vondra"
Дата:
On 5 Září 2011, 21:07, Andy Colson wrote:
> On 09/05/2011 01:45 PM, Scott Marlowe wrote:
>> On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenannt<wohlg@ai.wu.ac.at>
>> wrote:
>>> Below please find the results of vmstat 2 over some periode of time ..
>>> with
>>> normal database / system load.
>>>
> 2  1 1344204 240924 104156 31462484  350    0  1906   234 3687 4512 12  3
> 77  9
>>
>> Your IO Wait is actually pretty high.  On an 8 core machine, 12.5%
>> means one core is doing nothing but waiting for IO.
>>
>
> My server is 2-core, so these numbers looked fine by me.  I need to
> remember core count when I look at these.
>
> So the line above, for 2 core's would not worry me a bit, but on 8 cores,
> it pretty much means one core was pegged (with 9% wait?  Or is it one core
> was pegged, and another was 72% io wait?)

AFAIK it's as if one core was 72% io wait. Anyway that's exactly why I was
asking for "iostat -x" because the util% gives a better idea of what's
going on.

> I have always loved the vmstat output, but its starting to get confusing
> when you have to take core's into account.  (And my math was never strong
> in the first place :-) )

That's why I love dstat, just do this

$ dstat -C 0,1,2,3,4,5,6,7

and you know all you need.

> Good catch, thanks Scott.

Yes, good catch.

Still, this does not explain why the queries were running fast before, and
why the RAID array is so sluggish. Not to mention that we don't know what
were the conditions when collecting those numbers (were the VMs off or
running?).

Tomas


Re: Sudden drop in DBb performance

От
"Tomas Vondra"
Дата:
On 6 Září 2011, 10:26, Gerhard Wohlgenannt wrote:
> Thanks a lot to everybody for their helpful hints!!!
>
> I am running all these benchmarks while the VMs are up .. with the
> system under something like "typical" loads ..
>
> The RAID is hardware based. On of my colleagues will check if there is
> any hardware problem on the RAID (the disks) today, but nothing no
> errors have been reported.
>
> please find below the results of
> iostat -x 2
> vmstat 2
>
> hmm, looks like we definitely do have a problem with I/O load?!
> btw: dm-19 is the logical volume where the /var (postgresql) is on ..

Well, it definitely looks like that. Something is doing a lot of writes on
that drive - the drive is 100% utilized, i.e. it's a bottleneck. You need
to find out what is writing the data - try iotop or something like that.

And it's probably the reason why the bonnie results were so poor.

Tomas


Re: Sudden drop in DBb performance

От
Gerhard Wohlgenannt
Дата:
Thanks a lot to everybody for their helpful hints!!!

I am running all these benchmarks while the VMs are up .. with the
system under something like "typical" loads ..

The RAID is hardware based. On of my colleagues will check if there is
any hardware problem on the RAID (the disks) today, but nothing no
errors have been reported.

please find below the results of
iostat -x 2
vmstat 2

hmm, looks like we definitely do have a problem with I/O load?!
btw: dm-19 is the logical volume where the /var (postgresql) is on ..

cheers gerhard

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
  1 16 1370892 434996  33840 28938348    1    1   615   312    9    4
5  2 81 12
  0 15 1370892 440832  33840 28938380    0    0     4   136 2086 3899
0  4 12 84
  1 16 1370892 447008  33864 28938380    0    0     0    27 2442 4252
1  5 10 83
  1 11 1370892 452272  33864 28938380    0    0    12     5 2106 3886
0  4 12 83
  2  4 1370892 315880  33888 28941396    0    0  1522  3084 2213 4120
4  3 57 37
  0 10 1370892 240900  33628 28934060    0    0  1060 17275 3396 4793
3  3 55 40
  1  5 1370892 238172  33044 28905652    0    0   148   267 3943 5284
2  3 26 69
  2  2 1370916 232932  31960 28694024    0   12  1170  5625 3037 6336
6  7 61 26
  1  2 1370912 232788  27588 28697216   10    0  1016  3848 2780 5669
8  5 56 31
  1  4 1370908 2392224  27608 28144712    0    0   936  8811 2514 5244
8  6 61 25
  0  1 1370908 2265428  27612 28153188    0    0  4360  1598 2822 4784
13  3 69 15
  1  2 1370908 2041260  27612 28176788    0    0 11842   474 3679 4255
12  4 78  6
  0  3 1370908 2199880  27624 28272112    0    0 47638   569 7798 5495
11  4 70 14
  0  3 1370908 2000752  27624 28318692    0    0 23492   275 5084 5161
10  3 71 17
  1  0 1370908 1691000  27624 28365060    0    0 22920   117 4961 5426
12  5 69 15
  1  0 1370908 2123512  27624 28367576    0    0  1244   145 2053 3728
12  3 83  2
  2  0 1370908 1740724  27636 28403748    0    0 18272   190 2920 4188
12  4 76  8
  2  0 1370908 1305856  27636 28460172    0    0 28174   493 3744 4750
11  6 68 15
  1  2 1370908 973412  27644 28529640    0    0 34614   305 3419 4522
12  5 69 13
  2  2 1370904 1790820  27656 28659080    2    0 64376   389 5527 5374
12  7 69 12
  1  2 1370904 1384100  27656 28750336    0    0 45740   351 4898 5381
13  6 68 13
  1  0 1370904 954200  27656 28864252    0    0 56544   413 4596 5470
13  7 66 14
  1  0 1370904 1597264  27656 28865756    0    0   926   391 2009 3502
11  4 81  4
  3  2 1370904 1219180  27668 28868244    0    0  1160   500 2180 3772
11  5 80  4
  2  7 1370900 809128  27680 28869020    0    0   298 21875 2417 3936
11  5 49 35
  0  9 1370900 1693360  27680 28869032    0    0     8     0 2756 4174
8  5 28 59
  1  2 1370900 1531100  27688 28871104    0    0  1034  7849 2646 4571
10  3 72 15



iostat -x 2:

Linux 2.6.32-33-server (voyager)        06.09.2011      _x86_64_
(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            5,02    0,00    2,41   11,60    0,00   80,97

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               3,05     5,22    1,05    0,67   117,54    45,72
95,37     0,01    3,94   0,75   0,13
sdb              10,02   148,15  157,91   93,49 10019,50  5098,93
60,14     4,53   18,04   2,30  57,75
dm-0              0,00     0,00    3,03    4,87    24,21    38,96
8,00     0,45   56,83   0,06   0,05
dm-1              0,00     0,00    1,07    0,87    93,32     6,77
51,59     0,01    2,71   0,42   0,08
dm-2              0,00     0,00    0,00    0,00     0,00     0,00
8,00     0,00    6,30   6,30   0,00
dm-3              0,00     0,00    0,19    0,32     1,54     2,55
8,00     0,03   63,61   2,72   0,14
dm-4              0,00     0,00    0,19    0,88     1,54     7,05
8,00     0,04   33,91  12,84   1,38
dm-5              0,00     0,00    0,10    0,04     0,83     0,33
8,00     0,00   16,22   2,63   0,04
dm-6              0,00     0,00    0,00    0,00     0,00     0,00
8,00     0,00    4,88   4,88   0,00
dm-7              0,00     0,00    0,00    0,00     0,00     0,00
8,00     0,00    4,37   4,37   0,00
dm-8              0,00     0,00    0,00    0,00     0,00     0,00
8,00     0,00    4,69   4,69   0,00
dm-9              0,00     0,00    0,00    0,00     0,00     0,00
8,00     0,00    5,71   5,71   0,00
dm-10             0,00     0,00    0,00    0,00     0,00     0,00
8,00     0,00    4,65   4,65   0,00
dm-11             0,00     0,00    0,00    0,00     0,00     0,00
8,00     0,00    4,17   4,17   0,00
dm-12             0,00     0,00    0,11    1,34     0,90    10,73
8,00     0,12   76,31  12,61   1,83
dm-13             0,00     0,00    0,01    0,00     0,09     0,01
8,00     0,00   18,70   1,26   0,00
dm-14             0,00     0,00    1,83    1,39    14,66    11,10
8,00     0,18   55,46   2,77   0,89
dm-15             0,00     0,00    0,00    0,00     0,00     0,00
8,00     0,00    5,35   5,31   0,00
dm-16             0,00     0,00    0,18    0,02     4,00     0,38
21,08     0,00   21,20   5,95   0,12
dm-17             0,00     0,00    0,00    0,00     0,01     0,01
18,76     0,00   30,79  26,47   0,00
dm-18             0,00     0,00    1,19    0,02    11,05     0,19
9,24     0,00    3,57   1,20   0,15
dm-19             0,00     0,00  159,62  202,37  9949,08  5022,90
41,36     0,60   29,19   1,55  56,27
dm-20             0,00     0,00    2,39    2,31    19,13    18,48
8,00     0,18   39,23   1,29   0,61
dm-21             0,00     0,00    0,62    2,44     5,00    19,53
8,00     0,11   34,84   5,41   1,66
dm-22             0,00     0,00    0,01    0,03     0,09     0,24
8,00     0,00   21,67   0,53   0,00
dm-23             0,00     0,00    0,75    0,66     6,02     5,32
8,00     0,04   26,32   4,89   0,69
dm-24             0,00     0,00    0,00    0,00     0,00     0,00
8,00     0,00    5,67   5,67   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,88    0,00    5,27   81,72    0,00   12,13

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,50    0,00    3,00     0,00    12,00
4,00     0,00    0,00   0,00   0,00
sdb               0,00   559,00    0,00  523,50     0,00 19148,00
36,58   143,87  278,68   1,91 100,00
dm-0              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-1              0,00     0,00    0,00    1,50     0,00    12,00
8,00     0,00    0,00   0,00   0,00
dm-2              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-3              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-4              0,00     0,00    0,00    4,00     0,00    32,00
8,00     0,53  132,50  48,75  19,50
dm-5              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-6              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-7              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-8              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-9              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-10             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-11             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-12             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-13             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-14             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-15             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-16             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-17             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-18             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-19             0,00     0,00    0,00    0,00     0,00     0,00
0,00   316,54    0,00   0,00 100,00
dm-20             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-21             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-22             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-23             0,00     0,00    0,00    1,50     0,00    12,00
8,00     0,08   53,33  36,67   5,50
dm-24             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            1,40    0,00    5,36   53,87    0,00   39,37

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
sdb               0,00   118,50    9,00  627,00   196,00 15220,50
24,24   137,49  209,69   1,57 100,00
dm-0              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-1              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-2              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-3              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-4              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-5              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-6              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-7              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-8              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-9              0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-10             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-11             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-12             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-13             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-14             0,00     0,00    0,00    3,00     0,00    24,00
8,00     0,36   58,33  31,67   9,50
dm-15             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-16             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-17             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-18             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-19             0,00     0,00    4,00  300,50    68,00  9562,00
31,63   226,15  730,23   3,28 100,00
dm-20             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-21             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-22             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-23             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00
dm-24             0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00   0,00






>> Agreed, that's really slow.  A single SATA drive will get 60 MB/s.  Did
>> you run Bonnie while the VM's were up and running?
>>
>>     root@host:~# vmstat
>>     procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>>      r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> id wa
>>      0  0      0 308024 884812 40512932    0    0   464   168  353   92  4
>> 2 84  9
>>
>>
>> Only one line?  That does not help much.  Can you run it as 'vmstat 2' and
>> let it run while a few slow queries are performed?  Then paste all the
>> lines?
> And maybe a few lines from "iostat -x 2" too.
>
> BTW what kind of raid is it? Is it hw or sw based? Have you checked health
> of the drives?
>
> Are you sure there's nothing else using the drives (e.g. one of the VMs,
> rebuild of the array or something like that)?
>
> Tomas
>


Re: Sudden drop in DBb performance

От
Gerhard Wohlgenannt
Дата:
hi,

> What does a "normal load" mean? Does that mean a time when the queries are
> slow?

yes, we are have slow queries (according to postgresql.log) with such load
> Are you sure the machine really has 48GB of RAM? Because from the vmstat
> output it seems like there's just 32GB.
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>    r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id wa
>    0  0 1342168 336936 107636 31353512    0    0    14   183 1911 3426
> 2  1 93  4
>
>
> 1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB

strange.
we paid for 48G :-) and top and free show 48G
/root# free
              total       used       free     shared    buffers     cached
Mem:      49564860   49310444     254416          0      30908   30329576
-/+ buffers/cache:   18949960   30614900
Swap:     20971512    1370960   19600552


> Otherwise I don't see anything wrong in the output. What is the size of
> the database (use pg_database_size to get it)? Did it grow significantly
> recently?
>

there are a number of databases in the cluster on that machine,
in the filesystem it adds up to 271G

cheers gerhard

Re: Sudden drop in DBb performance

От
Gerhard Wohlgenannt
Дата:
> That's why I love dstat, just do this
>
> $ dstat -C 0,1,2,3,4,5,6,7
>
> and you know all you need.

dstat looks like a very nice tool, results below ..
(now the system load seems a bit lower then before when generating
results for vmstat and iostat)
>> Good catch, thanks Scott.
> Yes, good catch.
>
> Still, this does not explain why the queries were running fast before, and
> why the RAID array is so sluggish. Not to mention that we don't know what
> were the conditions when collecting those numbers (were the VMs off or
> running?).
>
the VMs were running. they are in something like production use, so i
shouldn't just turn them off ..  :-)
and the processes in the VMs cause a big portion of the DB load, so
turning them off would distort the results ...

and thanks again for all the replies!!! :-)


~# dstat -C 0,1,2,3,4,5,6,7

-------cpu0-usage--------------cpu1-usage--------------cpu2-usage--------------cpu3-usage--------------cpu4-usage--------------cpu5-usage--------------cpu6-usage--------------cpu7-usage------

-dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq
siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq
siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| read  writ| recv
send|  in   out | int   csw
   7   1  75  17   0   0:  4   5  84   7   0   0:  5   3  80  12   0
0:  4   3  85   9   0   0:  7   2  75  16   0   0:  4   2  87   8   0
0:  7   2  75  16   0   0:  4   1  87   8   0   0|5071k 2578k|   0     0
|9760B 9431B|2468  4126
   0   0  98   2   0   0:  0   0  98   2   0   0:  6   2  22  71   0
0:  5   0  76  19   0   0:  0  12  82   6   0   0:  3   7  88   2   0
0:  3   1  84  12   0   0:  2   0  94   4   0   0|5160k 1376k|  60k
225k|   0     0 |2101  3879
  11   1  84   4   0   0:  2   0  93   6   0   0:  3   4  72  22   0
0:  2   2  92   3   0   1: 10  13  22  54   0   1:  6   7  75  12   0
0:  3   0  87  10   0   0: 12   0  81   7   0   0|6640k 1683k| 140k
240k|   0     0 |2860  4617
   1   1  29  68   0   1: 12   0  80   8   0   0:  6   0  78  16   0
1:  3   1  80  16   0   0: 14  14  57  16   0   0:  0  11  78  12   0
0:  9   1  83   7   0   0:  0   0  96   4   0   0|4448k 1266k| 102k
336k|   0     0 |2790  4645
   0   0  89  11   0   0:  1   0  98   1   0   0: 14   0  57  29   0
0:  1   1  89   9   0   0:  1  15  41  43   0   0:  3  15  75   7   0
0:  3   2  60  35   0   0:  0   0  95   5   0   0|  18M 1622k|  97k
285k|   0     0 |3303  4764
   0   0  96   4   0   0:  0   0  99   0   0   1:  1   2  14  83   0
0:  1  25  17  57   0   0:  1   0  87  12   0   0:  1   0  19  80   0
0:  3   3   0  94   0   0:  0   0  48  52   0   0|1320k   19M|  40k
113k|   0     0 |2909  4709
   1   0  63  36   0   0:  5   2  88   5   0   0: 34   2   0  63   1
0:  8   8  72  12   0   0:  0   9  85   6   0   0:  1   2  84  13   0
0:  2   1  60  37   0   0:  1   1  62  36   0   0|9160k 5597k|  52k
143k|  32k    0 |2659  4650
   4   0  43  53   0   0:  2   0  93   5   0   0:  9   0  63  28   0
0:  3   1  89   7   0   0:  2   9  72  16   0   1:  0  13  81   6   0
0:  9   1  52  38   0   0:  3   0  84  13   0   0|4980k 1358k| 106k
239k|   0     0 |2993  5158
   2   1  90   7   0   0:  2   0  95   3   0   0:  2   3  82  13   0
0:  0   0  87  13   0   0:  6  10  32  52   0   0:  2  10  82   6   0
0:  5   0  86   9   0   0: 10   5  81   4   0   0|4376k 2949k| 119k
295k|   0     0 |2729  4630
   1   0  93   6   0   0:  2   0  91   6   1   0: 15   4  71  11   0
0:  7   2  90   1   0   0: 13  10  12  65   0   0:  2  13  41  45   0
0:  1   0  97   2   0   0:  1   0  94   5   0   0|3896k   15M|  87k
242k|   0     0 |2809  5514
   2   0  98   0   0   0:  0   0  73  27   0   0:  0   0 100   0   0
0:  2   1  29  68   0   0:  4   5   0  92   0   0:  2   5  92   2   0
0:  0   0 100   0   0   0:  1   0  77  22   0   0| 172k   19M|  40k
127k|   0     0 |2221  4069
   0   0  48  52   0   0:  0   0  97   3   0   0:  0   0  92   8   0
0:  3   0  91   6   0   0:  2  10  10  78   0   0:  4  10  81   6   0
0:  2   0  29  69   0   0:  1   0  26  73   0   0| 652k 6931k|  66k
233k|   0     0 |2416  4389
   6   2  72  21   0   0:  3   1  86  10   0   0:  7   0  60  34   0
0:  2   2  91   6   0   0:  1  13  78   9   0   0:  2   8  84   6   0
0:  2   0  79  19   0   0:  0   2  87  11   0   0|2784k 1456k|  96k
206k|   0     0 |2854  5226
   9   4  50  37   0   0:  3   3  84  10   0   0:  4   0  84  12   0
0:  2   3  86   9   0   0: 10   2  73  15   0   0:  3   5  84   8   0
0:  8   4  81   6   0   0:  1   2  84  13   0   0|2952k 1374k| 133k
305k|   0     0 |3249  5076
   9   1  78  13   0   0:  4   4  83   9   0   0:  3   1  68  28   0
0:  3   3  82  12   0   0:  9   0  64  26   0   1:  2   1  83  13   0
1:  9   3  63  24   0   1:  3   1  91   5   0   0|3648k 1420k| 188k
444k|   0     0 |3560  5981
   3   1  63  33   0   0:  0   1  86  13   0   0:  1   0  67  32   0
0:  1   2  84  12   0   1:  4   2  49  45   0   0:  6   3  82   9   0
0:  1   1  93   5   0   0:  1   1  91   7   0   0|2980k 1385k| 181k
457k|   0     0 |3023  5230
   3   5  90   2   0   0:  3   9  84   4   0   0:  2   2  55  41   0
0: 21   3  69   7   0   0:  1   3  76  20   0   0:  2   1  93   4   0
0:  1   3  67  29   0   0:  0   1  93   6   0   0|2796k 1359k| 104k
237k|   0     0 |2339  4598
   6   5  74  15   0   0:  0   5  87   8   0   0:  4   0  91   5   0
0:  0   0  98   2   0   0: 11   0  74  16   0   0:  3   1  87   9   0
0: 26   4  53  17   0   0:  0   1  92   7   0   0|1920k 1401k| 107k
278k|   0     0 |2352  4480
   2   0  91   7   0   0:  1  10  84   5   0   0:  4   0  93   3   0
0:  3   2  93   2   0   0:  6   2  84   8   0   0:  1   0  97   2   0
0:  2   0  85  13   0   0:  3   2  89   6   0   0|1508k 1397k| 134k
313k|   0     0 |2374  4547
   2   0  74  24   0   0:  0   0  97   3   0   0:  1   0  97   2   0
0:  3   2  95   0   0   0:  1   0  96   3   0   0:  4   2  91   4   0
0:  0   5  89   6   0   0:  3   1  91   6   0   0|1464k 1464k|  68k
143k|   0     0 |1839  3950
   4   0  91   5   0   0:  5   4  89   3   0   0:  1   0  98   1   0
0:  0   1  97   2   0   0:  2   2  68  28   0   0:  1   0  91   8   0
0:  2   8  70  20   0   0:  1   5  91   2   0   1|1060k 1903k| 133k
142k|   0     0 |2358  4720
   3   0  91   6   0   0:  4   3  88   5   0   0:  1   0  95   4   0
0:  1   2  94   3   0   0:  3   1  53  43   0   0:  3   2  93   2   0
0:  0   2  86  13   0   0:  0   4  89   7   0   0|1580k 1332k|  91k
273k|   0     0 |2249  4564
   4   0  79  16   0   1:  3   6  84   8   0   0:  1   0  92   7   0
0:  2   3  88   7   0   0:  5   2  86   7   0   0:  2   4  90   3   0
1:  4   1  81  14   0   0:  7   3  81  10   0   0|1128k 1343k| 122k
263k|   0     0 |3262  5503
   0   2  60  38   0   0:  3   5  90   3   0   0:  0  11  78  12   0
0:  1   5  91   3   0   0:  2   4  90   4   0   0:  4   2  89   5   0
0:  3   0  94   3   0   0:  4   2  87   8   0   0|1252k 1255k| 329k
494k|   0     0 |3223  5661
   2   0  84  13   0   1:  3   6  84   7   0   0:  3   7  83   8   0
0:  5  15  72   8   0   0:  2   1  96   1   0   0:  1   2  93   4   0
0:  4   1  65  30   0   0:  1   0  89  10   0   0|1660k 2513k| 148k
460k|   0     0 |2597  5129
   1   2  92   5   0   0:  0   1  41  58   0   0: 10   5  83   2   0
0:  3  18  78   2   0   0:  3   0  93   4   0   0:  3   0  97   0   0
0:  4   0  80  17   0   0:  2   0  91   7   0   0| 652k   15M|  72k
198k|   0     0 |2340  6382
   0   1  88  11   0   0:  3   3  88   7   0   0:  2   1  80  17   0
0:  0  13  75  13   0   0:  1   1  73  25   0   0:  1   1  93   5   0
0:  0   1  94   5   0   0:  1   0  86  13   0   0|1568k 1155k|  68k
166k|   0     0 |2348  4738
   3   0  54  43   0   0:  4   9  82   6   0   0:  2   0  88  11   0
0:  2   5  88   6   0   0:  1   0  88  11   0   0:  2   0  93   5   0
0:  3   1  80  17   0   0:  0   0  88  12   0   0|1200k 1319k| 113k
233k|   0     0 |2660  5168
   0   1  91   8   0   0:  4   6  87   4   0   0:  1   0  72  27   0
0:  2   2  70  26   0   0:  2   0  85  13   0   0:  1   0  94   5   0
0:  2   0  95   3   0   0:  1   1  89   9   0   0|1032k 1346k| 119k
234k|   0     0 |2398  4808
   0   1  73  26   0   0:  3   5  83  10   0   0:  1   2  96   1   0
0:  4   0  91   5   0   0:  2   2  90   6   0   0:  2   0  94   4   0
0:  3   4  70  23   0   0:  3   6  85   6   0   0|1272k 1522k| 117k
279k|   0     0 |2739  5237
   1   0  56  43   0   0:  0   2  86  12   0   0:  2   1  89   8   0
0:  2   2  92   4   0   0:  7   1  82  10   0   0:  4   1  92   3   0
0:  5   2  80  13   0   0:  4   4  87   6   0   0|1080k 1637k| 141k
345k|   0     0 |2631  5142
   0   0  65  35   0   0:  0   1  89  10   0   0:  2   0  84  14   0
0:  1   3  92   4   0   0:  0   0  89  11   0   0:  2   0  91   7   0
0:  2   5  87   6   0   0:  2   7  87   5   0   0|1012k 1412k|  95k
215k|   0     0 |2304  4762
   1   0  93   6   0   0:  3   1  95   1   0   0:  2   1  97   0   0
0:  3   3  91   3   0   0:  2   0  72  26   0   0:  1   0  93   6   0
0:  0   7  70  23   0   0:  2   8  82   8   0   0|1152k 1548k| 101k
225k|   0     0 |2334  4706
   1   0  93   6   0   0:  2   3  93   2   0   0:  5   2  91   3   0
0:  6   3  91   1   0   0:  5   2  85   8   0   0:  2   1  91   6   0
0:  1   5  60  34   0   0:  2   6  86   6   0   0|1096k 1553k| 126k
289k|   0     0 |2471  4766
   0   0  63  37   0   0:  0   0  90  10   0   0:  1   0  99   0   0
0:  0   0  93   7   0   0:  1   0  86  13   0   0:  2   2  87   9   0
0:  0   7  70  24   0   0:  2   3  93   3   0   0|1256k 1262k| 110k
277k|   0     0 |2281  4534
   0   0  24  76   0   0:  2   1  81  16   0   0:  1   3  44  52   0
0:  2   2  65  31   0   0:  1   2  75  22   0   0:  1   0  94   5   0
0:  0   4  17  79   0   0:  5   3  93   0   0   0| 240k   19M|  83k
200k|   0     0 |2388  4131
   1   0  76  23   0   0:  1   1  94   4   0   0:  5   3  81  11   0
0:  2   4  80  14   0   0:  2   1  56  41   0   0:  0   0  77  23   0
0:  3   7  74  17   0   0:  1   5  90   5   0   0|1108k 3127k| 113k
262k|   0     0 |2584  4930
   2   1  90   8   0   0:  2   0  93   5   0   0:  2   1  75  22   0
0:  2   1  83  15   0   0:  0   1  79  20   0   0:  0   0  92   7   0
1:  2   6  77  14   0   1:  2   8  87   4   0   0|1220k 1358k| 152k
318k|   0    12k|2590  5104
   3   0  81  16   0   0:  3   1  88   9   0   0:  3   1  93   3   0
0:  2   3  90   5   0   0:  1   0  93   6   0   0:  4   1  92   4   0
0:  5  13  37  46   0   0:  2   7  81  10   0   0|1256k 1437k| 167k
336k|   0     0 |2734  5356
   0   0  93   7   0   0:  1   2  88   9   0   0:  2   0  92   5   0
1:  1   2  94   3   0   0:  0   0  50  50   0   0:  0   0  99   1   0
0:  3  11  81   5   0   0:  3   7  86   4   0   0|1108k 1448k| 109k
225k|   0     0 |2081  4316
   3   0  42  55   0   0:  0   3  92   5   0   0:  0   0 100   0   0
0:  1   0  92   7   0   0:  3   1  69  27   0   0:  1   0  66  33   0
0:  4   7  56  33   0   0:  3   3  61  33   0   0| 720k   17M| 104k
290k|   0     0 |2300  4168
   0   2  59  39   0   0:  3   7  75  15   0   0:  0   6  89   5   0
0: 12   9  75   4   0   0:  1   0  81  18   0   0:  2   1  94   3   0
0:  1   8  56  35   0   0:  3   5  57  35   0   1| 428k 8241k|  96k
259k|   0   124k|2082  5154
   1   1  90   8   0   0:  2   2  46  49   0   0:  1   0  93   6   0
0:  9   4  83   4   0   0:  0   0  95   5   0   0:  1   0  98   1   0
0:  6   7  76  11   0   0:  1   4  85  10   0   0| 592k   14M|  82k
194k|   0     0 |2195  5839
   4   1  82  13   0   0:  3   0  87  10   0   0:  4   3  81  12   0
0:  0   0  82  18   0   0:  1   1  88  11   0   0:  1   0  90   9   0
0:  5   7  64  24   0   0:  5   3  86   6   0   0|1876k 1273k| 145k
323k|   0     0 |2544  4949
   0   1  66  33   0   0:  5   1  89   6   0   0:  0   1  88  12   0
0:  1   3  82  14   0   0:  1   0  86  12   0   1:  1   1  88  10   0
0:  8   3  82   8   0   0:  6   7  84   3   0   0|2544k 1259k| 125k
280k|   0     0 |2487  4776
   0   1  85  14   0   0:  0   0 100   0   0   0:  3   0  95   2   0
0:  5   3  91   2   0   0:  0   0  94   6   0   0:  6   1  84  10   0
0:  1  10  50  39   0   0:  4   5  82   9   0   0| 896k 1378k|  92k
181k|   0     0 |2452  4823
   0   0  94   6   0   0:  2   1  97   0   0   0:  3   0  94   3   0
0:  0   0  89  11   0   0:  2   3  83  12   0   0:  2   2  83  13   0
0:  2   9  50  40   0   0:  5  10  66  19   0   0| 848k 1237k|  98k
204k|   0     0 |2309  4733
   3   0  93   3   0   1:  2   9  83   6   0   0:  1   0  98   1   0
0:  1   3  96   0   0   0:  3   2  84  12   0   0:  1   0  88  11   0
0:  1   0  39  60   0   0:  0   4  88   8   0   0| 992k 1113k|  99k
212k|   0     0 |2318  4696
  10   0  87   3   0   0:  1   8  90   2   0   0:  2   0  91   7   0
0:  4   2  90   4   0   0:  4   2  78  16   0   0:  3   0  89   9   0
0:  4   0  90   7   0   0:  0   0  99   1   0   0| 964k 1165k| 119k
291k|   0     0 |2505  4842
   8   3  83   6   0   0:  7   0  80  13   0   0:  2   6  67  25   0
0:  4   6  80  10   0   0:  2   1  77  20   0   0:  2   1  87   8   0
2:  2   1  82  16   0   0:  1   0  95   4   0   0| 888k 1154k| 158k
331k|   0     0 |2759  5495
   6   0  91   3   0   0:  4   1  93   2   0   0:  0  10  85   5   0
0:  1  10  85   5   0   0:  2   1  37  60   0   0:  0   0  88  12   0
0:  5   0  83  12   0   0:  2   1  88   9   0   0| 992k 1143k| 149k
300k|   0     0 |2606  5054
   5   0  75  19   0   1:  7   1  81  11   0   0:  4  11  73  12   0
0:  6  16  62  16   0   0:  5   1  88   6   0   0:  6   1  84   8   0
0:  1   4  71  24   0   1:  1   2  91   6   0   0|1372k 1109k| 193k
434k|   0     0 |3216  5887
   3   1  90   6   0   0: 10   3  80   7   0   0:  0   2  92   6   0
0:  2   0  94   4   0   0:  0   2  65  33   0   0:  2   0  95   3   0
0:  0   7  76  18   0   0:  5   6  79  10   0   0|1044k 1077k| 191k
409k|   0     0 |2815  5311
   1   0  90   9   0   0:  6   4  89   2   0   0:  3   0  92   5   0
0:  6   1  93   0   0   0:  3   3  45  50   0   0:  1   1  88  10   0
0:  3   7  88   3   0   0:  2   7  85   7   0   0| 800k 1214k| 135k
282k|   0     0 |2570  5140
   7   1  89   3   0   0:  0   3  91   6   0   0:  1   2  93   4   0
0:  3   4  90   3   0   0:  3   1  39  57   0   0:  1   0  88  11   0
0:  1   4  79  17   0   0:  3   2  85   9   0   1|1060k 1076k| 108k
225k|   0     0 |2681  5304
   1   1  85  13   0   0:  4   5  90   2   0   0:  2   5  91   2   0
0:  4   8  83   5   0   0:  2   0  41  58   0   0:  2   2  92   4   0
0:  0   2  62  36   0   0:  0   1  90   9   0   0|1956k  948k| 153k
360k|   0     0 |2802  5189
   2   0  92   6   0   0:  4   2  85   9   0   0:  0   2  95   3   0
0:  1   2  93   4   0   0:  2   3  34  61   0   0:  1   3  85  11   0
0:  3   0  55  42   0   0:  1   0  91   8   0   0|1768k  921k| 170k
367k|   0     0 |2511  4701



Re: Sudden drop in DBb performance

От
Gerhard Wohlgenannt
Дата:
> That's a good thought,  maybe the stats are old and you have bad
> plans?  It could also be major updates to the data too (as opposed to
> growth).

we have made checks for number of dead tuples etc recently, but looks
ok. and as "everything" in the database seems to be very slow atm, I
guess the problem is not caused by bad plans for specific tables/queries.
>
> Gerhard, have you done an 'explain analyze' on any of your slow
> queries?   Have you done an analyze lately?
>

yes we added the 'auto_explain' module to log/analyze queries >= 5000ms.
a sample result from the logs (there is lots of stuff in the logs, I
selected this query because it is very simple):

2011-09-06 04:00:35 CEST ANWEISUNG:  INSERT into
keywords.table_x_site_impact (content_id, site_impact_id, site_impact)
VALUES (199083087, 1, 1.000000)
2011-09-06 04:00:35 CEST LOG:  Dauer: 15159.723 ms  Anweisung: INSERT
into keywords.table_x_site_impact (content_id, site_impact_id,
site_impact) VALUES (199083087, 1 , 1.000000)
2011-09-06 04:00:35 CEST LOG:  duration: 15159.161 ms  plan:
         Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.017..0.019 rows=1 loops=1)
           Output:
nextval('keywords.table_x_site_impact_internal_id_seq'::regclass),
199083087::bigint, 1::smallint, 1::double precision




Re: Sudden drop in DBb performance

От
"Tomas Vondra"
Дата:
On 6 Září 2011, 10:55, Gerhard Wohlgenannt wrote:
>
>> That's why I love dstat, just do this
>>
>> $ dstat -C 0,1,2,3,4,5,6,7
>>
>> and you know all you need.
>
> dstat looks like a very nice tool, results below ..
> (now the system load seems a bit lower then before when generating
> results for vmstat and iostat)
>>> Good catch, thanks Scott.
>> Yes, good catch.
>>
>> Still, this does not explain why the queries were running fast before,
>> and
>> why the RAID array is so sluggish. Not to mention that we don't know
>> what
>> were the conditions when collecting those numbers (were the VMs off or
>> running?).
>>
> the VMs were running. they are in something like production use, so i
> shouldn't just turn them off ..  :-)
> and the processes in the VMs cause a big portion of the DB load, so
> turning them off would distort the results ...

Distort the results? If you want to measure the RAID performance, you have
to do that when there are no other processes using it.

> and thanks again for all the replies!!! :-)

Please, use something like pastebin.com to post there results. It was
bearable for the vmstat output but this is alamost unreadable due to the
wrapping.

> ~# dstat -C 0,1,2,3,4,5,6,7
>
-------cpu0-usage--------------cpu1-usage--------------cpu2-usage--------------cpu3-usage--------------cpu4-usage--------------cpu5-usage--------------cpu6-usage--------------cpu7-usage------
> -dsk/total- -net/total- ---paging-- ---system--
> usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq
> siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq
> siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| read  writ| recv
> send|  in   out | int   csw
>    7   1  75  17   0   0:  4   5  84   7   0   0:  5   3  80  12   0
> 0:  4   3  85   9   0   0:  7   2  75  16   0   0:  4   2  87   8   0
> 0:  7   2  75  16   0   0:  4   1  87   8   0   0|5071k 2578k|   0     0
> |9760B 9431B|2468  4126
...

But if I read that correctly, the wait for the cores is 17%, 7%, 12%, 9%,
16%, 8%, 16% and 8%, and the cores are mostly idle (idl is about 85%). So
it seems there's a low number of processes, switched between the cpus and
most of the time they're waiting for the I/O.

Given the low values for disk I/O and the iostat output we've seen before,
it's obvious there's a lot of random I/O (mostly writes).

Let's speculate for a while what could cause this (in arbitrary order):

1) Checkpoints. Something is doing a lot of writes, and with DB that often
means a checkpoint is in progress. I'm not sure about your
checkpoint_timeout, but you do have 192 segments and about 7GB of shared
buffers. That means there may be a lot of dirty buffers (even 100% of the
buffers).

You're using RAID5 and that really is not a write-friendly RAID version.
We don't know actual performance as the bonnie was run with VMs accessing
the volume, but RAID10 usually performs much better.

Enable log_checkpoints in the config and see what's going on. You can also
use iotop to see what processes are doing the writes (it might be a
background writer, ...).

2) The RAID is broken and can't handle the load it handled fine before.
This is not very likely, as you've mentioned that there were no warnings
etc.

3) There are some new jobs that do a lot of I/O. Is there anything new
that wasn't running before? I guess you'd mention that.

4) The database significantly grew in a short period of time, and the
active part now does not fit into the RAM (page cache), so the data has to
be retrieved from the disk. And it's not just about the database size,
it's about the active part of the database - if you're suddenly accessing
more data, the cache may not be large enough.

This is usually a gradual process (cache hit ratio slowly decreases as the
database grows), but if the database grew rapidly ... This could be a
caused by MVCC, i.e. there may be a lot of dead tuples - have you done a
big UPDATE / DELETE or something like that recently?

regards
Tomas


Re: Sudden drop in DBb performance

От
Marinos Yannikos
Дата:
Hi,

On 03.09.2011 09:26, Gerhard Wohlgenannt wrote:
> Activating log_min_duration shows for instance this query --- there are
> now constantly queries which take absurdely long.

2 things you should check:

- if your /var/lib/postgresql is on an ext3 fs, I've seen such things
before due to the changes around Linux 2.6.2X that caused long stalls
during fsync() (see https://lwn.net/Articles/328363/ etc.). But if all
your queries take longer than they should and not only a few, that
probably isn't the reason.

- regarding your hardware RAID-5, I've seen some controllers switch the
write cache off quietly for various reasons (battery too old, issues
with disks). Check if it is still enabled, without write caching
enabled, most RAID-controllers will just suck.

Regards,
  Marinos