Обсуждение: H800 + md1200 Performance problem

Поиск
Список
Период
Сортировка

H800 + md1200 Performance problem

От
Cesar Martin
Дата:
Hello there,

I am having performance problem with new DELL server. Actually I have this two servers

Server A (old - production)
-----------------
2xCPU Six-Core AMD Opteron 2439 SE
64GB RAM
Raid controller Perc6 512MB cache NV
  - 2 HD 146GB SAS 15Krpm RAID1 (SO Centos 5.4 y pg_xlog) (XFS no barriers) 
  - 6 HD 300GB SAS 15Krpm RAID10 (DB Postgres 8.3.9) (XFS no barriers)

Server B (new)
------------------
2xCPU 16 Core AMD Opteron 6282 SE
64GB RAM
Raid controller H700 1GB cache NV
  - 2HD 74GB SAS 15Krpm RAID1 stripe 16k (SO Centos 6.2)
  - 4HD 146GB SAS 15Krpm RAID10 stripe 16k XFS (pg_xlog) (ext4 bs 4096, no barriers)
Raid controller H800 1GB cache nv
  - MD1200 12HD 300GB SAS 15Krpm RAID10 stripe 256k (DB Postgres 8.3.18) (ext4 bs 4096, stride 64, stripe-width 384, no barriers)

Postgres DB is the same in both servers. This DB has 170GB size with some tables partitioned by date with a trigger. In both shared_buffers, checkpoint_segments... settings are similar because RAM is similar.

I supposed that, new server had to be faster than old, because have more disk in RAID10 and two RAID controllers with more cache memory, but really I'm not obtaining the expected results

For example this query:

EXPLAIN ANALYZE SELECT c.id AS c__id, c.fk_news_id AS c__fk_news_id, c.fk_news_group_id AS c__fk_news_group_id, c.fk_company_id AS c__fk_company_id, c.import_date AS c__import_date, c.highlight AS c__highlight, c.status AS c__status, c.ord AS c__ord, c.news_date AS c__news_date, c.fk_media_id AS c__fk_media_id, c.title AS c__title, c.search_title_idx AS c__search_title_idx, c.stored AS c__stored, c.tono AS c__tono, c.media_type AS c__media_type, c.fk_editions_news_id AS c__fk_editions_news_id, c.dossier_selected AS c__dossier_selected, c.update_stats AS c__update_stats, c.url_news AS c__url_news, c.url_image AS c__url_image, m.id AS m__id, m.name AS m__name, m.media_type AS m__media_type, m.media_code AS m__media_code, m.fk_data_source_id AS m__fk_data_source_id, m.language_iso AS m__language_iso, m.country_iso AS m__country_iso, m.region_iso AS m__region_iso, m.subregion_iso AS m__subregion_iso, m.media_code_temp AS m__media_code_temp, m.url AS m__url, m.current_rank AS m__current_rank, m.typologyid AS m__typologyid, m.fk_platform_id AS m__fk_platform_id, m.page_views_per_day AS m__page_views_per_day, m.audience AS m__audience, m.last_stats_update AS m__last_stats_update, n.id AS n__id, n.fk_media_id AS n__fk_media_id, n.fk_news_media_id AS n__fk_news_media_id, n.fk_data_source_id AS n__fk_data_source_id, n.news_code AS n__news_code, n.title AS n__title, n.searchfull_idx AS n__searchfull_idx, n.news_date AS n__news_date, n.economical_value AS n__economical_value, n.audience AS n__audience, n.media_type AS n__media_type, n.url_news AS n__url_news, n.url_news_old AS n__url_news_old, n.url_image AS n__url_image, n.typologyid AS n__typologyid, n.author AS n__author, n.fk_platform_id AS n__fk_platform_id, n2.id AS n2__id, n2.name AS n2__name, n3.id AS n3__id, n3.name AS n3__name, f.id AS f__id, f.name AS f__name, n4.id AS n4__id, n4.opentext AS n4__opentext, i.id AS i__id, i.name AS i__name, i.ord AS i__ord, i2.id AS i2__id, i2.name AS i2__name FROM company_news_internet c LEFT JOIN media_internet m ON c.fk_media_id = m.id AND m.media_type = 4 LEFT JOIN news_internet n ON c.fk_news_id = n.id AND n.media_type = 4 LEFT JOIN news_media_internet n2 ON n.fk_news_media_id = n2.id AND n2.media_type = 4 LEFT JOIN news_group_internet n3 ON c.fk_news_group_id = n3.id AND n3.media_type = 4 LEFT JOIN feed_internet f ON n3.fk_feed_id = f.id LEFT JOIN news_text_internet n4 ON c.fk_news_id = n4.fk_news_id AND n4.media_type = 4 LEFT JOIN internet_typology i ON n.typologyid = i.id LEFT JOIN internet_media_platform i2 ON n.fk_platform_id = i2.id WHERE (c.fk_company_id = '16073' AND c.status <> '-3' AND n3.fk_feed_id = '30693' AND n3.status = '1' AND f.fk_company_id = '16073') AND n.typologyid IN ('6', '7', '1', '2', '3', '5', '4') AND c.id > '49764393' AND c.news_date >= '2012-04-02'::timestamp - INTERVAL '4 months' AND n.news_date >= '2012-04-02'::timestamp - INTERVAL '4 months' AND c.fk_news_group_id IN ('43475') AND (c.media_type = 4) ORDER BY c.news_date DESC, c.id DESC LIMIT 200

Takes about 20 second in server A but in new server B takes 150 seconds... In EXPLAIN I have noticed that sequential scan on table news_internet_201112 takes 2s:
      ->  Seq Scan on news_internet_201112 n  (cost=0.00..119749.12 rows=1406528 width=535) (actual time=0.046..2186.379 rows=1844831 loops=1)
          Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without time zone) AND (media_type = 4) AND (typologyid = ANY ('{6,7,1,2,3,5,4}'::integer[])))

While in Server B, takes 11s:
      ->  Seq Scan on news_internet_201112 n  (cost=0.00..119520.12 rows=1405093 width=482) (actual time=0.177..11783.621 rows=1844831 loops=1)
          Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without time zone) AND (media_type = 4) AND (typologyid = ANY ('{6,7,1,2,3,5,4}'::integer[])))

Is notorious that, while in server A, execution time vary only few second when I execute the same query repeated times, in server B, execution time fluctuates between 30 and 150 second despite the server dont have any client.

In other example, when I query entire table, running twice the same query:
Server 1
------------
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..457010.37 rows=6731337 width=318) (actual time=0.042..19665.155 rows=6731337 loops=1)
 Total runtime: 20391.555 ms
-
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..457010.37 rows=6731337 width=318) (actual time=0.012..2171.181 rows=6731337 loops=1)
 Total runtime: 2831.028 ms

Server 2
------------
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.110..10010.443 rows=6765779 loops=1)
 Total runtime: 11552.818 ms
-
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.023..8173.801 rows=6765779 loops=1)
 Total runtime: 12939.717 ms

It seems that Server B don cache the table¿?¿?

I'm lost, I had tested different file systems, like XFS, stripe sizes... but I not have had results 

Any ideas that could be happen?

Thanks a lot!!

--
César Martín Pérez
cmartinp@gmail.com


Re: H800 + md1200 Performance problem

От
Mike DelNegro
Дата:
Did you check your read ahead settings (getra)?

Mike DelNegro

Sent from my iPhone

On Apr 3, 2012, at 8:20 AM, Cesar Martin <cmartinp@gmail.com> wrote:

Hello there,

I am having performance problem with new DELL server. Actually I have this two servers

Server A (old - production)
-----------------
2xCPU Six-Core AMD Opteron 2439 SE
64GB RAM
Raid controller Perc6 512MB cache NV
  - 2 HD 146GB SAS 15Krpm RAID1 (SO Centos 5.4 y pg_xlog) (XFS no barriers) 
  - 6 HD 300GB SAS 15Krpm RAID10 (DB Postgres 8.3.9) (XFS no barriers)

Server B (new)
------------------
2xCPU 16 Core AMD Opteron 6282 SE
64GB RAM
Raid controller H700 1GB cache NV
  - 2HD 74GB SAS 15Krpm RAID1 stripe 16k (SO Centos 6.2)
  - 4HD 146GB SAS 15Krpm RAID10 stripe 16k XFS (pg_xlog) (ext4 bs 4096, no barriers)
Raid controller H800 1GB cache nv
  - MD1200 12HD 300GB SAS 15Krpm RAID10 stripe 256k (DB Postgres 8.3.18) (ext4 bs 4096, stride 64, stripe-width 384, no barriers)

Postgres DB is the same in both servers. This DB has 170GB size with some tables partitioned by date with a trigger. In both shared_buffers, checkpoint_segments... settings are similar because RAM is similar.

I supposed that, new server had to be faster than old, because have more disk in RAID10 and two RAID controllers with more cache memory, but really I'm not obtaining the expected results

For example this query:

EXPLAIN ANALYZE SELECT c.id AS c__id, c.fk_news_id AS c__fk_news_id, c.fk_news_group_id AS c__fk_news_group_id, c.fk_company_id AS c__fk_company_id, c.import_date AS c__import_date, c.highlight AS c__highlight, c.status AS c__status, c.ord AS c__ord, c.news_date AS c__news_date, c.fk_media_id AS c__fk_media_id, c.title AS c__title, c.search_title_idx AS c__search_title_idx, c.stored AS c__stored, c.tono AS c__tono, c.media_type AS c__media_type, c.fk_editions_news_id AS c__fk_editions_news_id, c.dossier_selected AS c__dossier_selected, c.update_stats AS c__update_stats, c.url_news AS c__url_news, c.url_image AS c__url_image, m.id AS m__id, m.name AS m__name, m.media_type AS m__media_type, m.media_code AS m__media_code, m.fk_data_source_id AS m__fk_data_source_id, m.language_iso AS m__language_iso, m.country_iso AS m__country_iso, m.region_iso AS m__region_iso, m.subregion_iso AS m__subregion_iso, m.media_code_temp AS m__media_code_temp, m.url AS m__url, m.current_rank AS m__current_rank, m.typologyid AS m__typologyid, m.fk_platform_id AS m__fk_platform_id, m.page_views_per_day AS m__page_views_per_day, m.audience AS m__audience, m.last_stats_update AS m__last_stats_update, n.id AS n__id, n.fk_media_id AS n__fk_media_id, n.fk_news_media_id AS n__fk_news_media_id, n.fk_data_source_id AS n__fk_data_source_id, n.news_code AS n__news_code, n.title AS n__title, n.searchfull_idx AS n__searchfull_idx, n.news_date AS n__news_date, n.economical_value AS n__economical_value, n.audience AS n__audience, n.media_type AS n__media_type, n.url_news AS n__url_news, n.url_news_old AS n__url_news_old, n.url_image AS n__url_image, n.typologyid AS n__typologyid, n.author AS n__author, n.fk_platform_id AS n__fk_platform_id, n2.id AS n2__id, n2.name AS n2__name, n3.id AS n3__id, n3.name AS n3__name, f.id AS f__id, f.name AS f__name, n4.id AS n4__id, n4.opentext AS n4__opentext, i.id AS i__id, i.name AS i__name, i.ord AS i__ord, i2.id AS i2__id, i2.name AS i2__name FROM company_news_internet c LEFT JOIN media_internet m ON c.fk_media_id = m.id AND m.media_type = 4 LEFT JOIN news_internet n ON c.fk_news_id = n.id AND n.media_type = 4 LEFT JOIN news_media_internet n2 ON n.fk_news_media_id = n2.id AND n2.media_type = 4 LEFT JOIN news_group_internet n3 ON c.fk_news_group_id = n3.id AND n3.media_type = 4 LEFT JOIN feed_internet f ON n3.fk_feed_id = f.id LEFT JOIN news_text_internet n4 ON c.fk_news_id = n4.fk_news_id AND n4.media_type = 4 LEFT JOIN internet_typology i ON n.typologyid = i.id LEFT JOIN internet_media_platform i2 ON n.fk_platform_id = i2.id WHERE (c.fk_company_id = '16073' AND c.status <> '-3' AND n3.fk_feed_id = '30693' AND n3.status = '1' AND f.fk_company_id = '16073') AND n.typologyid IN ('6', '7', '1', '2', '3', '5', '4') AND c.id > '49764393' AND c.news_date >= '2012-04-02'::timestamp - INTERVAL '4 months' AND n.news_date >= '2012-04-02'::timestamp - INTERVAL '4 months' AND c.fk_news_group_id IN ('43475') AND (c.media_type = 4) ORDER BY c.news_date DESC, c.id DESC LIMIT 200

Takes about 20 second in server A but in new server B takes 150 seconds... In EXPLAIN I have noticed that sequential scan on table news_internet_201112 takes 2s:
      ->  Seq Scan on news_internet_201112 n  (cost=0.00..119749.12 rows=1406528 width=535) (actual time=0.046..2186.379 rows=1844831 loops=1)
          Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without time zone) AND (media_type = 4) AND (typologyid = ANY ('{6,7,1,2,3,5,4}'::integer[])))

While in Server B, takes 11s:
      ->  Seq Scan on news_internet_201112 n  (cost=0.00..119520.12 rows=1405093 width=482) (actual time=0.177..11783.621 rows=1844831 loops=1)
          Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without time zone) AND (media_type = 4) AND (typologyid = ANY ('{6,7,1,2,3,5,4}'::integer[])))

Is notorious that, while in server A, execution time vary only few second when I execute the same query repeated times, in server B, execution time fluctuates between 30 and 150 second despite the server dont have any client.

In other example, when I query entire table, running twice the same query:
Server 1
------------
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..457010.37 rows=6731337 width=318) (actual time=0.042..19665.155 rows=6731337 loops=1)
 Total runtime: 20391.555 ms
-
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..457010.37 rows=6731337 width=318) (actual time=0.012..2171.181 rows=6731337 loops=1)
 Total runtime: 2831.028 ms

Server 2
------------
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.110..10010.443 rows=6765779 loops=1)
 Total runtime: 11552.818 ms
-
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.023..8173.801 rows=6765779 loops=1)
 Total runtime: 12939.717 ms

It seems that Server B don cache the table¿?¿?

I'm lost, I had tested different file systems, like XFS, stripe sizes... but I not have had results 

Any ideas that could be happen?

Thanks a lot!!

--
César Martín Pérez
cmartinp@gmail.com


=

Re: H800 + md1200 Performance problem

От
Cesar Martin
Дата:
Hi Mike,
Thank you for your fast response.

blockdev --getra /dev/sdc
256

What value do you recommend for this setting?

Thanks!

El 3 de abril de 2012 14:37, Mike DelNegro <mdelnegro@yahoo.com> escribió:
Did you check your read ahead settings (getra)?

Mike DelNegro

Sent from my iPhone

On Apr 3, 2012, at 8:20 AM, Cesar Martin <cmartinp@gmail.com> wrote:

Hello there,

I am having performance problem with new DELL server. Actually I have this two servers

Server A (old - production)
-----------------
2xCPU Six-Core AMD Opteron 2439 SE
64GB RAM
Raid controller Perc6 512MB cache NV
  - 2 HD 146GB SAS 15Krpm RAID1 (SO Centos 5.4 y pg_xlog) (XFS no barriers) 
  - 6 HD 300GB SAS 15Krpm RAID10 (DB Postgres 8.3.9) (XFS no barriers)

Server B (new)
------------------
2xCPU 16 Core AMD Opteron 6282 SE
64GB RAM
Raid controller H700 1GB cache NV
  - 2HD 74GB SAS 15Krpm RAID1 stripe 16k (SO Centos 6.2)
  - 4HD 146GB SAS 15Krpm RAID10 stripe 16k XFS (pg_xlog) (ext4 bs 4096, no barriers)
Raid controller H800 1GB cache nv
  - MD1200 12HD 300GB SAS 15Krpm RAID10 stripe 256k (DB Postgres 8.3.18) (ext4 bs 4096, stride 64, stripe-width 384, no barriers)

Postgres DB is the same in both servers. This DB has 170GB size with some tables partitioned by date with a trigger. In both shared_buffers, checkpoint_segments... settings are similar because RAM is similar.

I supposed that, new server had to be faster than old, because have more disk in RAID10 and two RAID controllers with more cache memory, but really I'm not obtaining the expected results

For example this query:

EXPLAIN ANALYZE SELECT c.id AS c__id, c.fk_news_id AS c__fk_news_id, c.fk_news_group_id AS c__fk_news_group_id, c.fk_company_id AS c__fk_company_id, c.import_date AS c__import_date, c.highlight AS c__highlight, c.status AS c__status, c.ord AS c__ord, c.news_date AS c__news_date, c.fk_media_id AS c__fk_media_id, c.title AS c__title, c.search_title_idx AS c__search_title_idx, c.stored AS c__stored, c.tono AS c__tono, c.media_type AS c__media_type, c.fk_editions_news_id AS c__fk_editions_news_id, c.dossier_selected AS c__dossier_selected, c.update_stats AS c__update_stats, c.url_news AS c__url_news, c.url_image AS c__url_image, m.id AS m__id, m.name AS m__name, m.media_type AS m__media_type, m.media_code AS m__media_code, m.fk_data_source_id AS m__fk_data_source_id, m.language_iso AS m__language_iso, m.country_iso AS m__country_iso, m.region_iso AS m__region_iso, m.subregion_iso AS m__subregion_iso, m.media_code_temp AS m__media_code_temp, m.url AS m__url, m.current_rank AS m__current_rank, m.typologyid AS m__typologyid, m.fk_platform_id AS m__fk_platform_id, m.page_views_per_day AS m__page_views_per_day, m.audience AS m__audience, m.last_stats_update AS m__last_stats_update, n.id AS n__id, n.fk_media_id AS n__fk_media_id, n.fk_news_media_id AS n__fk_news_media_id, n.fk_data_source_id AS n__fk_data_source_id, n.news_code AS n__news_code, n.title AS n__title, n.searchfull_idx AS n__searchfull_idx, n.news_date AS n__news_date, n.economical_value AS n__economical_value, n.audience AS n__audience, n.media_type AS n__media_type, n.url_news AS n__url_news, n.url_news_old AS n__url_news_old, n.url_image AS n__url_image, n.typologyid AS n__typologyid, n.author AS n__author, n.fk_platform_id AS n__fk_platform_id, n2.id AS n2__id, n2.name AS n2__name, n3.id AS n3__id, n3.name AS n3__name, f.id AS f__id, f.name AS f__name, n4.id AS n4__id, n4.opentext AS n4__opentext, i.id AS i__id, i.name AS i__name, i.ord AS i__ord, i2.id AS i2__id, i2.name AS i2__name FROM company_news_internet c LEFT JOIN media_internet m ON c.fk_media_id = m.id AND m.media_type = 4 LEFT JOIN news_internet n ON c.fk_news_id = n.id AND n.media_type = 4 LEFT JOIN news_media_internet n2 ON n.fk_news_media_id = n2.id AND n2.media_type = 4 LEFT JOIN news_group_internet n3 ON c.fk_news_group_id = n3.id AND n3.media_type = 4 LEFT JOIN feed_internet f ON n3.fk_feed_id = f.id LEFT JOIN news_text_internet n4 ON c.fk_news_id = n4.fk_news_id AND n4.media_type = 4 LEFT JOIN internet_typology i ON n.typologyid = i.id LEFT JOIN internet_media_platform i2 ON n.fk_platform_id = i2.id WHERE (c.fk_company_id = '16073' AND c.status <> '-3' AND n3.fk_feed_id = '30693' AND n3.status = '1' AND f.fk_company_id = '16073') AND n.typologyid IN ('6', '7', '1', '2', '3', '5', '4') AND c.id > '49764393' AND c.news_date >= '2012-04-02'::timestamp - INTERVAL '4 months' AND n.news_date >= '2012-04-02'::timestamp - INTERVAL '4 months' AND c.fk_news_group_id IN ('43475') AND (c.media_type = 4) ORDER BY c.news_date DESC, c.id DESC LIMIT 200

Takes about 20 second in server A but in new server B takes 150 seconds... In EXPLAIN I have noticed that sequential scan on table news_internet_201112 takes 2s:
      ->  Seq Scan on news_internet_201112 n  (cost=0.00..119749.12 rows=1406528 width=535) (actual time=0.046..2186.379 rows=1844831 loops=1)
          Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without time zone) AND (media_type = 4) AND (typologyid = ANY ('{6,7,1,2,3,5,4}'::integer[])))

While in Server B, takes 11s:
      ->  Seq Scan on news_internet_201112 n  (cost=0.00..119520.12 rows=1405093 width=482) (actual time=0.177..11783.621 rows=1844831 loops=1)
          Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without time zone) AND (media_type = 4) AND (typologyid = ANY ('{6,7,1,2,3,5,4}'::integer[])))

Is notorious that, while in server A, execution time vary only few second when I execute the same query repeated times, in server B, execution time fluctuates between 30 and 150 second despite the server dont have any client.

In other example, when I query entire table, running twice the same query:
Server 1
------------
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..457010.37 rows=6731337 width=318) (actual time=0.042..19665.155 rows=6731337 loops=1)
 Total runtime: 20391.555 ms
-
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..457010.37 rows=6731337 width=318) (actual time=0.012..2171.181 rows=6731337 loops=1)
 Total runtime: 2831.028 ms

Server 2
------------
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.110..10010.443 rows=6765779 loops=1)
 Total runtime: 11552.818 ms
-
EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.023..8173.801 rows=6765779 loops=1)
 Total runtime: 12939.717 ms

It seems that Server B don cache the table¿?¿?

I'm lost, I had tested different file systems, like XFS, stripe sizes... but I not have had results 

Any ideas that could be happen?

Thanks a lot!!

--
César Martín Pérez
cmartinp@gmail.com





--
César Martín Pérez
cmartinp@gmail.com

Re: H800 + md1200 Performance problem

От
Merlin Moncure
Дата:
On Tue, Apr 3, 2012 at 7:20 AM, Cesar Martin <cmartinp@gmail.com> wrote:
> Hello there,
>
> I am having performance problem with new DELL server. Actually I have this
> two servers
>
> Server A (old - production)
> -----------------
> 2xCPU Six-Core AMD Opteron 2439 SE
> 64GB RAM
> Raid controller Perc6 512MB cache NV
>   - 2 HD 146GB SAS 15Krpm RAID1 (SO Centos 5.4 y pg_xlog) (XFS no barriers)
>   - 6 HD 300GB SAS 15Krpm RAID10 (DB Postgres 8.3.9) (XFS no barriers)
>
> Server B (new)
> ------------------
> 2xCPU 16 Core AMD Opteron 6282 SE
> 64GB RAM
> Raid controller H700 1GB cache NV
>   - 2HD 74GB SAS 15Krpm RAID1 stripe 16k (SO Centos 6.2)
>   - 4HD 146GB SAS 15Krpm RAID10 stripe 16k XFS (pg_xlog) (ext4 bs 4096, no
> barriers)
> Raid controller H800 1GB cache nv
>   - MD1200 12HD 300GB SAS 15Krpm RAID10 stripe 256k (DB Postgres 8.3.18)
> (ext4 bs 4096, stride 64, stripe-width 384, no barriers)
>
> Postgres DB is the same in both servers. This DB has 170GB size with some
> tables partitioned by date with a trigger. In both shared_buffers,
> checkpoint_segments... settings are similar because RAM is similar.
>
> I supposed that, new server had to be faster than old, because have more
> disk in RAID10 and two RAID controllers with more cache memory, but really
> I'm not obtaining the expected results
>
> For example this query:
>
> EXPLAIN ANALYZE SELECT c.id AS c__id, c.fk_news_id AS c__fk_news_id,
> c.fk_news_group_id AS c__fk_news_group_id, c.fk_company_id AS
> c__fk_company_id, c.import_date AS c__import_date, c.highlight AS
> c__highlight, c.status AS c__status, c.ord AS c__ord, c.news_date AS
> c__news_date, c.fk_media_id AS c__fk_media_id, c.title AS c__title,
> c.search_title_idx AS c__search_title_idx, c.stored AS c__stored, c.tono AS
> c__tono, c.media_type AS c__media_type, c.fk_editions_news_id AS
> c__fk_editions_news_id, c.dossier_selected AS c__dossier_selected,
> c.update_stats AS c__update_stats, c.url_news AS c__url_news, c.url_image AS
> c__url_image, m.id AS m__id, m.name AS m__name, m.media_type AS
> m__media_type, m.media_code AS m__media_code, m.fk_data_source_id AS
> m__fk_data_source_id, m.language_iso AS m__language_iso, m.country_iso AS
> m__country_iso, m.region_iso AS m__region_iso, m.subregion_iso AS
> m__subregion_iso, m.media_code_temp AS m__media_code_temp, m.url AS m__url,
> m.current_rank AS m__current_rank, m.typologyid AS m__typologyid,
> m.fk_platform_id AS m__fk_platform_id, m.page_views_per_day AS
> m__page_views_per_day, m.audience AS m__audience, m.last_stats_update AS
> m__last_stats_update, n.id AS n__id, n.fk_media_id AS n__fk_media_id,
> n.fk_news_media_id AS n__fk_news_media_id, n.fk_data_source_id AS
> n__fk_data_source_id, n.news_code AS n__news_code, n.title AS n__title,
> n.searchfull_idx AS n__searchfull_idx, n.news_date AS n__news_date,
> n.economical_value AS n__economical_value, n.audience AS n__audience,
> n.media_type AS n__media_type, n.url_news AS n__url_news, n.url_news_old AS
> n__url_news_old, n.url_image AS n__url_image, n.typologyid AS n__typologyid,
> n.author AS n__author, n.fk_platform_id AS n__fk_platform_id, n2.id AS
> n2__id, n2.name AS n2__name, n3.id AS n3__id, n3.name AS n3__name, f.id AS
> f__id, f.name AS f__name, n4.id AS n4__id, n4.opentext AS n4__opentext, i.id
> AS i__id, i.name AS i__name, i.ord AS i__ord, i2.id AS i2__id, i2.name AS
> i2__name FROM company_news_internet c LEFT JOIN media_internet m ON
> c.fk_media_id = m.id AND m.media_type = 4 LEFT JOIN news_internet n ON
> c.fk_news_id = n.id AND n.media_type = 4 LEFT JOIN news_media_internet n2 ON
> n.fk_news_media_id = n2.id AND n2.media_type = 4 LEFT JOIN
> news_group_internet n3 ON c.fk_news_group_id = n3.id AND n3.media_type = 4
> LEFT JOIN feed_internet f ON n3.fk_feed_id = f.id LEFT JOIN
> news_text_internet n4 ON c.fk_news_id = n4.fk_news_id AND n4.media_type = 4
> LEFT JOIN internet_typology i ON n.typologyid = i.id LEFT JOIN
> internet_media_platform i2 ON n.fk_platform_id = i2.id WHERE
> (c.fk_company_id = '16073' AND c.status <> '-3' AND n3.fk_feed_id = '30693'
> AND n3.status = '1' AND f.fk_company_id = '16073') AND n.typologyid IN ('6',
> '7', '1', '2', '3', '5', '4') AND c.id > '49764393' AND c.news_date >=
> '2012-04-02'::timestamp - INTERVAL '4 months' AND n.news_date >=
> '2012-04-02'::timestamp - INTERVAL '4 months' AND c.fk_news_group_id IN
> ('43475') AND (c.media_type = 4) ORDER BY c.news_date DESC, c.id DESC LIMIT
> 200
>
> Takes about 20 second in server A but in new server B takes 150 seconds...
> In EXPLAIN I have noticed that sequential scan on table news_internet_201112
> takes 2s:
>       ->  Seq Scan on news_internet_201112 n  (cost=0.00..119749.12
> rows=1406528 width=535) (actual time=0.046..2186.379 rows=1844831 loops=1)
>           Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without
> time zone) AND (media_type = 4) AND (typologyid = ANY
> ('{6,7,1,2,3,5,4}'::integer[])))
>
> While in Server B, takes 11s:
>       ->  Seq Scan on news_internet_201112 n  (cost=0.00..119520.12
> rows=1405093 width=482) (actual time=0.177..11783.621 rows=1844831 loops=1)
>           Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without
> time zone) AND (media_type = 4) AND (typologyid = ANY
> ('{6,7,1,2,3,5,4}'::integer[])))
>
> Is notorious that, while in server A, execution time vary only few second
> when I execute the same query repeated times, in server B, execution time
> fluctuates between 30 and 150 second despite the server dont have any
> client.
>
> In other example, when I query entire table, running twice the same query:
> Server 1
> ------------
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
>                                                                  QUERY PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..457010.37
> rows=6731337 width=318) (actual time=0.042..19665.155 rows=6731337 loops=1)
>  Total runtime: 20391.555 ms
> -
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
>                                                                  QUERY PLAN
>
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..457010.37
> rows=6731337 width=318) (actual time=0.012..2171.181 rows=6731337 loops=1)
>  Total runtime: 2831.028 ms
>
> Server 2
> ------------
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
>                                                                  QUERY PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..369577.79
> rows=6765779 width=323) (actual time=0.110..10010.443 rows=6765779 loops=1)
>  Total runtime: 11552.818 ms
> -
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
>                                                                  QUERY PLAN
>
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..369577.79
> rows=6765779 width=323) (actual time=0.023..8173.801 rows=6765779 loops=1)
>  Total runtime: 12939.717 ms
>
> It seems that Server B don cache the table¿?¿?
>
> I'm lost, I had tested different file systems, like XFS, stripe sizes... but
> I not have had results
>
> Any ideas that could be happen?
>
> Thanks a lot!!

That's a significant regression.  Probable hardware issue -- have you
run performance tests on it such as bonnie++?  dd?  What's iowait
during the scan?

merlin

Re: H800 + md1200 Performance problem

От
Tomas Vondra
Дата:
On 3.4.2012 14:59, Cesar Martin wrote:
> Hi Mike,
> Thank you for your fast response.
>
> blockdev --getra /dev/sdc
> 256

That's way too low. Is this setting the same on both machines?

Anyway, set it to 4096, 8192 or even 16384 and check the difference.

BTW explain analyze is nice, but it's only half the info, especially
when the issue is outside PostgreSQL (hw, OS, ...). Please, provide
samples from iostat / vmstat or tools like that.

Tomas

Re: H800 + md1200 Performance problem

От
Scott Marlowe
Дата:
On Tue, Apr 3, 2012 at 6:20 AM, Cesar Martin <cmartinp@gmail.com> wrote:
> Hello there,
>
> I am having performance problem with new DELL server. Actually I have this
> two servers
>
> Server A (old - production)
> -----------------
> 2xCPU Six-Core AMD Opteron 2439 SE
> 64GB RAM
> Raid controller Perc6 512MB cache NV
>   - 2 HD 146GB SAS 15Krpm RAID1 (SO Centos 5.4 y pg_xlog) (XFS no barriers)
>   - 6 HD 300GB SAS 15Krpm RAID10 (DB Postgres 8.3.9) (XFS no barriers)
>
> Server B (new)
> ------------------
> 2xCPU 16 Core AMD Opteron 6282 SE
> 64GB RAM
> Raid controller H700 1GB cache NV
>   - 2HD 74GB SAS 15Krpm RAID1 stripe 16k (SO Centos 6.2)
>   - 4HD 146GB SAS 15Krpm RAID10 stripe 16k XFS (pg_xlog) (ext4 bs 4096, no
> barriers)
> Raid controller H800 1GB cache nv
>   - MD1200 12HD 300GB SAS 15Krpm RAID10 stripe 256k (DB Postgres 8.3.18)
> (ext4 bs 4096, stride 64, stripe-width 384, no barriers)
>
> Postgres DB is the same in both servers. This DB has 170GB size with some
> tables partitioned by date with a trigger. In both shared_buffers,
> checkpoint_segments... settings are similar because RAM is similar.
>
> I supposed that, new server had to be faster than old, because have more
> disk in RAID10 and two RAID controllers with more cache memory, but really
> I'm not obtaining the expected results

What does

sysctl -n vm.zone_reclaim_mode

say?  If it says 1, change it to 0:

sysctl -w zone_reclaim_mode=0

It's an automatic setting designed to make large virtual hosting
servers etc run faster but totally screws with pg and file servers
with big numbers of cores and large memory spaces.

Re: H800 + md1200 Performance problem

От
Scott Marlowe
Дата:
On Tue, Apr 3, 2012 at 9:32 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Tue, Apr 3, 2012 at 6:20 AM, Cesar Martin <cmartinp@gmail.com> wrote:
>> Hello there,
>>
>> I am having performance problem with new DELL server. Actually I have this
>> two servers
>>
>> Server A (old - production)
>> -----------------
>> 2xCPU Six-Core AMD Opteron 2439 SE
>> 64GB RAM
>> Raid controller Perc6 512MB cache NV
>>   - 2 HD 146GB SAS 15Krpm RAID1 (SO Centos 5.4 y pg_xlog) (XFS no barriers)
>>   - 6 HD 300GB SAS 15Krpm RAID10 (DB Postgres 8.3.9) (XFS no barriers)
>>
>> Server B (new)
>> ------------------
>> 2xCPU 16 Core AMD Opteron 6282 SE
>> 64GB RAM
>> Raid controller H700 1GB cache NV
>>   - 2HD 74GB SAS 15Krpm RAID1 stripe 16k (SO Centos 6.2)
>>   - 4HD 146GB SAS 15Krpm RAID10 stripe 16k XFS (pg_xlog) (ext4 bs 4096, no
>> barriers)
>> Raid controller H800 1GB cache nv
>>   - MD1200 12HD 300GB SAS 15Krpm RAID10 stripe 256k (DB Postgres 8.3.18)
>> (ext4 bs 4096, stride 64, stripe-width 384, no barriers)
>>
>> Postgres DB is the same in both servers. This DB has 170GB size with some
>> tables partitioned by date with a trigger. In both shared_buffers,
>> checkpoint_segments... settings are similar because RAM is similar.
>>
>> I supposed that, new server had to be faster than old, because have more
>> disk in RAID10 and two RAID controllers with more cache memory, but really
>> I'm not obtaining the expected results
>
> What does
>
> sysctl -n vm.zone_reclaim_mode
>
> say?  If it says 1, change it to 0:
>
> sysctl -w zone_reclaim_mode=0

That should be:

sysctl -w vm.zone_reclaim_mode=0

Re: H800 + md1200 Performance problem

От
Cesar Martin
Дата:
Yes, setting is the same in both machines. 

The results of bonnie++ running without arguments are:

Version      1.96   ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
cltbbdd01      126G    94  99 202873  99 208327  95  1639  91 819392  88  2131 139
Latency             88144us     228ms     338ms     171ms     147ms   20325us
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
cltbbdd01        16  8063  26 +++++ +++ 27361  96 31437  96 +++++ +++ +++++ +++
Latency              7850us    2290us    2310us     530us      11us     522us

With DD, one core of CPU put at 100% and results are  about 100-170 MBps, that I thing is bad result for this HW:

dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=100
100+0 records in
100+0 records out
838860800 bytes (839 MB) copied, 8,1822 s, 103 MB/s

dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=1000 conv=fdatasync
1000+0 records in
1000+0 records out
8388608000 bytes (8,4 GB) copied, 50,8388 s, 165 MB/s

dd if=/dev/zero of=/vol02/bonnie/DD bs=1M count=1024 conv=fdatasync
1024+0 records in
1024+0 records out
1073741824 bytes (1,1 GB) copied, 7,39628 s, 145 MB/s

When monitor I/O activity with iostat, during dd, I have noticed that, if the test takes 10 second, the disk have activity only during last 3 or 4 seconds and iostat report about 250-350MBps. Is it normal?

I set read ahead to different values, but the results don't differ substantially...

Thanks!

El 3 de abril de 2012 15:21, Tomas Vondra <tv@fuzzy.cz> escribió:
On 3.4.2012 14:59, Cesar Martin wrote:
> Hi Mike,
> Thank you for your fast response.
>
> blockdev --getra /dev/sdc
> 256

That's way too low. Is this setting the same on both machines?

Anyway, set it to 4096, 8192 or even 16384 and check the difference.

BTW explain analyze is nice, but it's only half the info, especially
when the issue is outside PostgreSQL (hw, OS, ...). Please, provide
samples from iostat / vmstat or tools like that.

Tomas

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
César Martín Pérez
cmartinp@gmail.com

Re: H800 + md1200 Performance problem

От
Cesar Martin
Дата:
OK Scott. I go to change this kernel parameter and will repeat the tests.
Tanks!

El 3 de abril de 2012 17:34, Scott Marlowe <scott.marlowe@gmail.com> escribió:
On Tue, Apr 3, 2012 at 9:32 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Tue, Apr 3, 2012 at 6:20 AM, Cesar Martin <cmartinp@gmail.com> wrote:
>> Hello there,
>>
>> I am having performance problem with new DELL server. Actually I have this
>> two servers
>>
>> Server A (old - production)
>> -----------------
>> 2xCPU Six-Core AMD Opteron 2439 SE
>> 64GB RAM
>> Raid controller Perc6 512MB cache NV
>>   - 2 HD 146GB SAS 15Krpm RAID1 (SO Centos 5.4 y pg_xlog) (XFS no barriers)
>>   - 6 HD 300GB SAS 15Krpm RAID10 (DB Postgres 8.3.9) (XFS no barriers)
>>
>> Server B (new)
>> ------------------
>> 2xCPU 16 Core AMD Opteron 6282 SE
>> 64GB RAM
>> Raid controller H700 1GB cache NV
>>   - 2HD 74GB SAS 15Krpm RAID1 stripe 16k (SO Centos 6.2)
>>   - 4HD 146GB SAS 15Krpm RAID10 stripe 16k XFS (pg_xlog) (ext4 bs 4096, no
>> barriers)
>> Raid controller H800 1GB cache nv
>>   - MD1200 12HD 300GB SAS 15Krpm RAID10 stripe 256k (DB Postgres 8.3.18)
>> (ext4 bs 4096, stride 64, stripe-width 384, no barriers)
>>
>> Postgres DB is the same in both servers. This DB has 170GB size with some
>> tables partitioned by date with a trigger. In both shared_buffers,
>> checkpoint_segments... settings are similar because RAM is similar.
>>
>> I supposed that, new server had to be faster than old, because have more
>> disk in RAID10 and two RAID controllers with more cache memory, but really
>> I'm not obtaining the expected results
>
> What does
>
> sysctl -n vm.zone_reclaim_mode
>
> say?  If it says 1, change it to 0:
>
> sysctl -w zone_reclaim_mode=0

That should be:

sysctl -w vm.zone_reclaim_mode=0



--
César Martín Pérez
cmartinp@gmail.com

Re: H800 + md1200 Performance problem

От
Cesar Martin
Дата:
OK Scott. I go to change this kernel parameter and will repeat the tests.
Tanks!

El 3 de abril de 2012 17:34, Scott Marlowe <scott.marlowe@gmail.com> escribió:
On Tue, Apr 3, 2012 at 9:32 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Tue, Apr 3, 2012 at 6:20 AM, Cesar Martin <cmartinp@gmail.com> wrote:
>> Hello there,
>>
>> I am having performance problem with new DELL server. Actually I have this
>> two servers
>>
>> Server A (old - production)
>> -----------------
>> 2xCPU Six-Core AMD Opteron 2439 SE
>> 64GB RAM
>> Raid controller Perc6 512MB cache NV
>>   - 2 HD 146GB SAS 15Krpm RAID1 (SO Centos 5.4 y pg_xlog) (XFS no barriers)
>>   - 6 HD 300GB SAS 15Krpm RAID10 (DB Postgres 8.3.9) (XFS no barriers)
>>
>> Server B (new)
>> ------------------
>> 2xCPU 16 Core AMD Opteron 6282 SE
>> 64GB RAM
>> Raid controller H700 1GB cache NV
>>   - 2HD 74GB SAS 15Krpm RAID1 stripe 16k (SO Centos 6.2)
>>   - 4HD 146GB SAS 15Krpm RAID10 stripe 16k XFS (pg_xlog) (ext4 bs 4096, no
>> barriers)
>> Raid controller H800 1GB cache nv
>>   - MD1200 12HD 300GB SAS 15Krpm RAID10 stripe 256k (DB Postgres 8.3.18)
>> (ext4 bs 4096, stride 64, stripe-width 384, no barriers)
>>
>> Postgres DB is the same in both servers. This DB has 170GB size with some
>> tables partitioned by date with a trigger. In both shared_buffers,
>> checkpoint_segments... settings are similar because RAM is similar.
>>
>> I supposed that, new server had to be faster than old, because have more
>> disk in RAID10 and two RAID controllers with more cache memory, but really
>> I'm not obtaining the expected results
>
> What does
>
> sysctl -n vm.zone_reclaim_mode
>
> say?  If it says 1, change it to 0:
>
> sysctl -w zone_reclaim_mode=0

That should be:

sysctl -w vm.zone_reclaim_mode=0



--
César Martín Pérez
cmartinp@gmail.com

Re: H800 + md1200 Performance problem

От
Tomas Vondra
Дата:
On 3.4.2012 17:42, Cesar Martin wrote:
> Yes, setting is the same in both machines.
>
> The results of bonnie++ running without arguments are:
>
> Version      1.96   ------Sequential Output------ --Sequential Input-
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>  /sec %CP
> cltbbdd01      126G    94  99 202873  99 208327  95  1639  91 819392  88
>  2131 139
> Latency             88144us     228ms     338ms     171ms     147ms
> 20325us
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>  /sec %CP
> cltbbdd01        16  8063  26 +++++ +++ 27361  96 31437  96 +++++ +++
> +++++ +++
> Latency              7850us    2290us    2310us     530us      11us
> 522us
>
> With DD, one core of CPU put at 100% and results are  about 100-170
> MBps, that I thing is bad result for this HW:
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=100
> 100+0 records in
> 100+0 records out
> 838860800 bytes (839 MB) copied, 8,1822 s, 103 MB/s
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=1000 conv=fdatasync
> 1000+0 records in
> 1000+0 records out
> 8388608000 bytes (8,4 GB) copied, 50,8388 s, 165 MB/s
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=1M count=1024 conv=fdatasync
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1,1 GB) copied, 7,39628 s, 145 MB/s
>
> When monitor I/O activity with iostat, during dd, I have noticed that,
> if the test takes 10 second, the disk have activity only during last 3
> or 4 seconds and iostat report about 250-350MBps. Is it normal?

Well, you're testing writing, and the default behavior is to write the
data into page cache. And you do have 64GB of RAM so the write cache may
take large portion of the RAM - even gigabytes. To really test the I/O
you need to (a) write about 2x the amount of RAM or (b) tune the
dirty_ratio/dirty_background_ratio accordingly.

BTW what are you trying to achieve with "conv=fdatasync" at the end. My
dd man page does not mention 'fdatasync' and IMHO it's a mistake on your
side. If you want to sync the data at the end, then you need to do
something like

   time sh -c "dd ... && sync"

> I set read ahead to different values, but the results don't differ
> substantially...

Because read-ahead is for reading (which is what a SELECT does most of
the time), but the dests above are writing to the device. And writing is
not influenced by read-ahead.

To test reading, do this:

   dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=1024

Tomas

Re: H800 + md1200 Performance problem

От
Merlin Moncure
Дата:
On Tue, Apr 3, 2012 at 1:01 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> On 3.4.2012 17:42, Cesar Martin wrote:
>> Yes, setting is the same in both machines.
>>
>> The results of bonnie++ running without arguments are:
>>
>> Version      1.96   ------Sequential Output------ --Sequential Input-
>> --Random-
>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>> --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>>  /sec %CP
>> cltbbdd01      126G    94  99 202873  99 208327  95  1639  91 819392  88
>>  2131 139
>> Latency             88144us     228ms     338ms     171ms     147ms
>> 20325us
>>                     ------Sequential Create------ --------Random
>> Create--------
>>                     -Create-- --Read--- -Delete-- -Create-- --Read---
>> -Delete--
>> files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>>  /sec %CP
>> cltbbdd01        16  8063  26 +++++ +++ 27361  96 31437  96 +++++ +++
>> +++++ +++
>> Latency              7850us    2290us    2310us     530us      11us
>> 522us
>>
>> With DD, one core of CPU put at 100% and results are  about 100-170
>> MBps, that I thing is bad result for this HW:
>>
>> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=100
>> 100+0 records in
>> 100+0 records out
>> 838860800 bytes (839 MB) copied, 8,1822 s, 103 MB/s
>>
>> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=1000 conv=fdatasync
>> 1000+0 records in
>> 1000+0 records out
>> 8388608000 bytes (8,4 GB) copied, 50,8388 s, 165 MB/s
>>
>> dd if=/dev/zero of=/vol02/bonnie/DD bs=1M count=1024 conv=fdatasync
>> 1024+0 records in
>> 1024+0 records out
>> 1073741824 bytes (1,1 GB) copied, 7,39628 s, 145 MB/s
>>
>> When monitor I/O activity with iostat, during dd, I have noticed that,
>> if the test takes 10 second, the disk have activity only during last 3
>> or 4 seconds and iostat report about 250-350MBps. Is it normal?
>
> Well, you're testing writing, and the default behavior is to write the
> data into page cache. And you do have 64GB of RAM so the write cache may
> take large portion of the RAM - even gigabytes. To really test the I/O
> you need to (a) write about 2x the amount of RAM or (b) tune the
> dirty_ratio/dirty_background_ratio accordingly.
>
> BTW what are you trying to achieve with "conv=fdatasync" at the end. My
> dd man page does not mention 'fdatasync' and IMHO it's a mistake on your
> side. If you want to sync the data at the end, then you need to do
> something like
>
>   time sh -c "dd ... && sync"
>
>> I set read ahead to different values, but the results don't differ
>> substantially...
>
> Because read-ahead is for reading (which is what a SELECT does most of
> the time), but the dests above are writing to the device. And writing is
> not influenced by read-ahead.

Yeah, but I have to agree with Cesar -- that's pretty unspectacular
results for 12 drive sas array to say the least (unless the way dd was
being run was throwing it off somehow).  Something is definitely not
right here.  Maybe we can see similar tests run on the production
server as a point of comparison?

merlin

Re: H800 + md1200 Performance problem

От
Cesar Martin
Дата:
Hello,

Yesterday I changed the kernel setting, that said Scott, vm.zone_reclaim_mode = 0. I have done new benchmarks and I have noticed changes at least in Postgres:

First exec:
EXPLAIN ANALYZE SELECT * from company_news_internet_201111;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.020..7984.707 rows=6765779 loops=1)
 Total runtime: 12699.008 ms
(2 filas)

Second:
EXPLAIN ANALYZE SELECT * from company_news_internet_201111;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.023..1767.440 rows=6765779 loops=1)
 Total runtime: 2696.901 ms

It seems that now data is being cached right...

The large query in first exec takes 80 seconds and in second exec takes around 23 seconds. This is not spectacular but is better than yesterday.

Furthermore the results of dd are strange:

dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 803,738 s, 171 MB/s

171 MB/s I think is bad value for 12 SAS RAID10... And when I execute iostat during the dd execution i obtain results like:
sdc            1514,62         0,01       108,58         11     117765
sdc            3705,50         0,01       316,62          0        633
sdc               2,00         0,00         0,05          0          0
sdc             920,00         0,00        63,49          0        126
sdc            8322,50         0,03       712,00          0       1424
sdc            6662,50         0,02       568,53          0       1137
sdc               0,00         0,00         0,00          0          0
sdc               1,50         0,00         0,04          0          0
sdc            6413,00         0,01       412,28          0        824
sdc           13107,50         0,03       867,94          0       1735
sdc               0,00         0,00         0,00          0          0
sdc               1,50         0,00         0,03          0          0
sdc            9719,00         0,03       815,49          0       1630
sdc            2817,50         0,01       272,51          0        545
sdc               1,50         0,00         0,05          0          0
sdc            1181,00         0,00        71,49          0        142
sdc            7225,00         0,01       362,56          0        725
sdc            2973,50         0,01       269,97          0        539

I don't understand why MB_wrtn/s go from 0 to near 800MB/s constantly during execution.

Read results:

dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 257,626 s, 533 MB/s

sdc            3157,00       392,69         0,00        785          0
sdc            3481,00       432,75         0,00        865          0
sdc            2669,50       331,50         0,00        663          0
sdc            3725,50       463,75         0,00        927          0
sdc            2998,50       372,38         0,00        744          0
sdc            3600,50       448,00         0,00        896          0
sdc            3588,00       446,50         0,00        893          0
sdc            3494,00       434,50         0,00        869          0
sdc            3141,50       390,62         0,00        781          0
sdc            3667,50       456,62         0,00        913          0
sdc            3429,35       426,18         0,00        856          0
sdc            3043,50       378,06         0,00        756          0
sdc            3366,00       417,94         0,00        835          0
sdc            3480,50       432,62         0,00        865          0
sdc            3523,50       438,06         0,00        876          0
sdc            3554,50       441,88         0,00        883          0
sdc            3635,00       452,19         0,00        904          0
sdc            3107,00       386,20         0,00        772          0
sdc            3695,00       460,00         0,00        920          0
sdc            3475,50       432,11         0,00        864          0
sdc            3487,50       433,50         0,00        867          0
sdc            3232,50       402,39         0,00        804          0
sdc            3698,00       460,67         0,00        921          0
sdc            5059,50       632,00         0,00       1264          0
sdc            3934,00       489,56         0,00        979          0
sdc            4536,50       566,75         0,00       1133          0
sdc            5298,00       662,12         0,00       1324          0

Here results I think that are more logical. Read speed is maintained along all the test...

About the parameter "conv=fdatasync" that mention Tomas, I saw it at http://romanrm.ru/en/dd-benchmark and I started to use but is possible wrong. Before I used time sh -c "dd if=/dev/zero of=ddfile bs=X count=Y && sync".

What is your opinion about the results??

I have noticed that since I changed the setting  vm.zone_reclaim_mode = 0, swap is totally full. Do you recommend me disable swap?

Thanks!!

El 3 de abril de 2012 20:01, Tomas Vondra <tv@fuzzy.cz> escribió:
On 3.4.2012 17:42, Cesar Martin wrote:
> Yes, setting is the same in both machines.
>
> The results of bonnie++ running without arguments are:
>
> Version      1.96   ------Sequential Output------ --Sequential Input-
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>  /sec %CP
> cltbbdd01      126G    94  99 202873  99 208327  95  1639  91 819392  88
>  2131 139
> Latency             88144us     228ms     338ms     171ms     147ms
> 20325us
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>  /sec %CP
> cltbbdd01        16  8063  26 +++++ +++ 27361  96 31437  96 +++++ +++
> +++++ +++
> Latency              7850us    2290us    2310us     530us      11us
> 522us
>
> With DD, one core of CPU put at 100% and results are  about 100-170
> MBps, that I thing is bad result for this HW:
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=100
> 100+0 records in
> 100+0 records out
> 838860800 bytes (839 MB) copied, 8,1822 s, 103 MB/s
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=1000 conv=fdatasync
> 1000+0 records in
> 1000+0 records out
> 8388608000 bytes (8,4 GB) copied, 50,8388 s, 165 MB/s
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=1M count=1024 conv=fdatasync
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1,1 GB) copied, 7,39628 s, 145 MB/s
>
> When monitor I/O activity with iostat, during dd, I have noticed that,
> if the test takes 10 second, the disk have activity only during last 3
> or 4 seconds and iostat report about 250-350MBps. Is it normal?

Well, you're testing writing, and the default behavior is to write the
data into page cache. And you do have 64GB of RAM so the write cache may
take large portion of the RAM - even gigabytes. To really test the I/O
you need to (a) write about 2x the amount of RAM or (b) tune the
dirty_ratio/dirty_background_ratio accordingly.

BTW what are you trying to achieve with "conv=fdatasync" at the end. My
dd man page does not mention 'fdatasync' and IMHO it's a mistake on your
side. If you want to sync the data at the end, then you need to do
something like

  time sh -c "dd ... && sync"

> I set read ahead to different values, but the results don't differ
> substantially...

Because read-ahead is for reading (which is what a SELECT does most of
the time), but the dests above are writing to the device. And writing is
not influenced by read-ahead.

To test reading, do this:

  dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=1024

Tomas

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
César Martín Pérez
cmartinp@gmail.com

Re: H800 + md1200 Performance problem

От
Scott Marlowe
Дата:
On Wed, Apr 4, 2012 at 3:42 AM, Cesar Martin <cmartinp@gmail.com> wrote:
>
> I have noticed that since I changed the setting  vm.zone_reclaim_mode = 0,
> swap is totally full. Do you recommend me disable swap?

Yes

Re: H800 + md1200 Performance problem

От
Tomas Vondra
Дата:
On 4.4.2012 15:15, Scott Marlowe wrote:
> On Wed, Apr 4, 2012 at 3:42 AM, Cesar Martin <cmartinp@gmail.com> wrote:
>>
>> I have noticed that since I changed the setting  vm.zone_reclaim_mode = 0,
>> swap is totally full. Do you recommend me disable swap?
>
> Yes

Careful about that - it depends on how you disable it.

Setting 'vm.swappiness = 0' is a good idea, don't remove the swap (I've
been bitten by the vm.overcommit=2 without a swap repeatedly).

T.

Re: H800 + md1200 Performance problem

От
Scott Marlowe
Дата:
On Wed, Apr 4, 2012 at 7:20 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
> On 4.4.2012 15:15, Scott Marlowe wrote:
>> On Wed, Apr 4, 2012 at 3:42 AM, Cesar Martin <cmartinp@gmail.com> wrote:
>>>
>>> I have noticed that since I changed the setting  vm.zone_reclaim_mode = 0,
>>> swap is totally full. Do you recommend me disable swap?
>>
>> Yes
>
> Careful about that - it depends on how you disable it.
>
> Setting 'vm.swappiness = 0' is a good idea, don't remove the swap (I've
> been bitten by the vm.overcommit=2 without a swap repeatedly).

I've had far more problems with swap on and swappiness set to 0 than
with swap off.  But this has always been on large memory machines with
64 to 256G memory.  Even with fairly late model linux kernels (i.e.
10.04 LTS through 11.04) I've watched the kswapd start up swapping
hard on a machine with zero memory pressure and no need for swap.
Took about 2 weeks of hard running before kswapd decided to act
pathological.

Seen it with swap on, with swappiness to 0, and overcommit to either 0
or 2 on big machines.  Once we just took the swap partitions away it
the machines ran fine.

Re: H800 + md1200 Performance problem

От
Claudio Freire
Дата:
On Wed, Apr 4, 2012 at 1:22 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> Even with fairly late model linux kernels (i.e.
> 10.04 LTS through 11.04) I've watched the kswapd start up swapping
> hard on a machine with zero memory pressure and no need for swap.
> Took about 2 weeks of hard running before kswapd decided to act
> pathological.

Perhaps you had some overfull partitions in tmpfs?

Re: H800 + md1200 Performance problem

От
Scott Marlowe
Дата:
On Wed, Apr 4, 2012 at 10:28 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Wed, Apr 4, 2012 at 1:22 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> Even with fairly late model linux kernels (i.e.
>> 10.04 LTS through 11.04) I've watched the kswapd start up swapping
>> hard on a machine with zero memory pressure and no need for swap.
>> Took about 2 weeks of hard running before kswapd decided to act
>> pathological.
>
> Perhaps you had some overfull partitions in tmpfs?

Nope.  Didn't use tmpfs for anything on that machine.  Stock Ubuntu
10.04 with Postgres just doing simple but high traffic postgres stuff.

Re: H800 + md1200 Performance problem

От
Scott Marlowe
Дата:
On Wed, Apr 4, 2012 at 10:31 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Apr 4, 2012 at 10:28 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Wed, Apr 4, 2012 at 1:22 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>>> Even with fairly late model linux kernels (i.e.
>>> 10.04 LTS through 11.04) I've watched the kswapd start up swapping
>>> hard on a machine with zero memory pressure and no need for swap.
>>> Took about 2 weeks of hard running before kswapd decided to act
>>> pathological.
>>
>> Perhaps you had some overfull partitions in tmpfs?
>
> Nope.  Didn't use tmpfs for anything on that machine.  Stock Ubuntu
> 10.04 with Postgres just doing simple but high traffic postgres stuff.

Just to clarify, the machine had 128G RAM and about 95G of it was
kernel cache, the rest used by shared memory (set to 4G) and
postgresql.

Re: H800 + md1200 Performance problem

От
Tomas Vondra
Дата:
On 4.4.2012 18:22, Scott Marlowe wrote:
> On Wed, Apr 4, 2012 at 7:20 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> On 4.4.2012 15:15, Scott Marlowe wrote:
>>> On Wed, Apr 4, 2012 at 3:42 AM, Cesar Martin <cmartinp@gmail.com> wrote:
>>>>
>>>> I have noticed that since I changed the setting  vm.zone_reclaim_mode = 0,
>>>> swap is totally full. Do you recommend me disable swap?
>>>
>>> Yes
>>
>> Careful about that - it depends on how you disable it.
>>
>> Setting 'vm.swappiness = 0' is a good idea, don't remove the swap (I've
>> been bitten by the vm.overcommit=2 without a swap repeatedly).
>
> I've had far more problems with swap on and swappiness set to 0 than
> with swap off.  But this has always been on large memory machines with
> 64 to 256G memory.  Even with fairly late model linux kernels (i.e.
> 10.04 LTS through 11.04) I've watched the kswapd start up swapping
> hard on a machine with zero memory pressure and no need for swap.
> Took about 2 weeks of hard running before kswapd decided to act
> pathological.
>
> Seen it with swap on, with swappiness to 0, and overcommit to either 0
> or 2 on big machines.  Once we just took the swap partitions away it
> the machines ran fine.

I've experienced the issues in exactly the opposite case - machines with
very little memory (like a VPS with 512MB of RAM). I did want to operate
that machine without a swap yet it kept failing because of OOM errors or
panicking (depending on the overcommit ratio value).

Turns out it's quite difficult (~ almost impossible) tune the VM for a
swap-less case. In the end I've just added a 256MB of swap and
everything started to work fine - funny thing is the swap is not used at
all (according to sar).

T.

Re: H800 + md1200 Performance problem

От
Merlin Moncure
Дата:
On Wed, Apr 4, 2012 at 4:42 AM, Cesar Martin <cmartinp@gmail.com> wrote:
> Hello,
>
> Yesterday I changed the kernel setting, that said
> Scott, vm.zone_reclaim_mode = 0. I have done new benchmarks and I have
> noticed changes at least in Postgres:
>
> First exec:
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111;
>                                                                  QUERY PLAN
>
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..369577.79
> rows=6765779 width=323) (actual time=0.020..7984.707 rows=6765779 loops=1)
>  Total runtime: 12699.008 ms
> (2 filas)
>
> Second:
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111;
>                                                                  QUERY PLAN
>
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..369577.79
> rows=6765779 width=323) (actual time=0.023..1767.440 rows=6765779 loops=1)
>  Total runtime: 2696.901 ms
>
> It seems that now data is being cached right...
>
> The large query in first exec takes 80 seconds and in second exec takes
> around 23 seconds. This is not spectacular but is better than yesterday.
>
> Furthermore the results of dd are strange:
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
> 16384+0 records in
> 16384+0 records out
> 137438953472 bytes (137 GB) copied, 803,738 s, 171 MB/s
>
> 171 MB/s I think is bad value for 12 SAS RAID10... And when I execute iostat
> during the dd execution i obtain results like:
> sdc            1514,62         0,01       108,58         11     117765
> sdc            3705,50         0,01       316,62          0        633
> sdc               2,00         0,00         0,05          0          0
> sdc             920,00         0,00        63,49          0        126
> sdc            8322,50         0,03       712,00          0       1424
> sdc            6662,50         0,02       568,53          0       1137
> sdc               0,00         0,00         0,00          0          0
> sdc               1,50         0,00         0,04          0          0
> sdc            6413,00         0,01       412,28          0        824
> sdc           13107,50         0,03       867,94          0       1735
> sdc               0,00         0,00         0,00          0          0
> sdc               1,50         0,00         0,03          0          0
> sdc            9719,00         0,03       815,49          0       1630
> sdc            2817,50         0,01       272,51          0        545
> sdc               1,50         0,00         0,05          0          0
> sdc            1181,00         0,00        71,49          0        142
> sdc            7225,00         0,01       362,56          0        725
> sdc            2973,50         0,01       269,97          0        539
>
> I don't understand why MB_wrtn/s go from 0 to near 800MB/s constantly during
> execution.

This is looking more and more like a a raid controller issue. ISTM
it's bucking the cache, filling it up and flushing it synchronously.
your read results are ok but not what they should be IMO.  Maybe it's
an environmental issue or the card is just a straight up lemon (no
surprise in the dell line).  Are you using standard drivers, and have
you checked for updates?  Have you considered contacting dell support?

merlin

Re: H800 + md1200 Performance problem

От
Cesar Martin
Дата:
Raid controller issue or driver problem was the first problem that I studied.
I installed Centos 5.4 al the beginning, but I had performance problems, and I contacted Dell support... but Centos is not support by Dell... Then I installed Redhat 6 and we contact Dell with same problem.
Dell say that all is right and that this is a software problem.
I have installed Centos 5.4, 6.2 and Redhat 6 with similar result, I think that not is driver problem (megasas-raid kernel module).
I will check kernel updates...
Thanks!

PS. lately I'm pretty disappointed with the quality of the DELL components, is not the first problem we have with hardware in new machines.

El 4 de abril de 2012 19:16, Merlin Moncure <mmoncure@gmail.com> escribió:
On Wed, Apr 4, 2012 at 4:42 AM, Cesar Martin <cmartinp@gmail.com> wrote:
> Hello,
>
> Yesterday I changed the kernel setting, that said
> Scott, vm.zone_reclaim_mode = 0. I have done new benchmarks and I have
> noticed changes at least in Postgres:
>
> First exec:
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111;
>                                                                  QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..369577.79
> rows=6765779 width=323) (actual time=0.020..7984.707 rows=6765779 loops=1)
>  Total runtime: 12699.008 ms
> (2 filas)
>
> Second:
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111;
>                                                                  QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..369577.79
> rows=6765779 width=323) (actual time=0.023..1767.440 rows=6765779 loops=1)
>  Total runtime: 2696.901 ms
>
> It seems that now data is being cached right...
>
> The large query in first exec takes 80 seconds and in second exec takes
> around 23 seconds. This is not spectacular but is better than yesterday.
>
> Furthermore the results of dd are strange:
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
> 16384+0 records in
> 16384+0 records out
> 137438953472 bytes (137 GB) copied, 803,738 s, 171 MB/s
>
> 171 MB/s I think is bad value for 12 SAS RAID10... And when I execute iostat
> during the dd execution i obtain results like:
> sdc            1514,62         0,01       108,58         11     117765
> sdc            3705,50         0,01       316,62          0        633
> sdc               2,00         0,00         0,05          0          0
> sdc             920,00         0,00        63,49          0        126
> sdc            8322,50         0,03       712,00          0       1424
> sdc            6662,50         0,02       568,53          0       1137
> sdc               0,00         0,00         0,00          0          0
> sdc               1,50         0,00         0,04          0          0
> sdc            6413,00         0,01       412,28          0        824
> sdc           13107,50         0,03       867,94          0       1735
> sdc               0,00         0,00         0,00          0          0
> sdc               1,50         0,00         0,03          0          0
> sdc            9719,00         0,03       815,49          0       1630
> sdc            2817,50         0,01       272,51          0        545
> sdc               1,50         0,00         0,05          0          0
> sdc            1181,00         0,00        71,49          0        142
> sdc            7225,00         0,01       362,56          0        725
> sdc            2973,50         0,01       269,97          0        539
>
> I don't understand why MB_wrtn/s go from 0 to near 800MB/s constantly during
> execution.

This is looking more and more like a a raid controller issue. ISTM
it's bucking the cache, filling it up and flushing it synchronously.
your read results are ok but not what they should be IMO.  Maybe it's
an environmental issue or the card is just a straight up lemon (no
surprise in the dell line).  Are you using standard drivers, and have
you checked for updates?  Have you considered contacting dell support?

merlin



--
César Martín Pérez
cmartinp@gmail.com

Re: H800 + md1200 Performance problem

От
Scott Marlowe
Дата:
On Wed, Apr 4, 2012 at 12:46 PM, Cesar Martin <cmartinp@gmail.com> wrote:
> Raid controller issue or driver problem was the first problem that I
> studied.
> I installed Centos 5.4 al the beginning, but I had performance problems, and
> I contacted Dell support... but Centos is not support by Dell... Then I
> installed Redhat 6 and we contact Dell with same problem.
> Dell say that all is right and that this is a software problem.
> I have installed Centos 5.4, 6.2 and Redhat 6 with similar result, I think
> that not is driver problem (megasas-raid kernel module).
> I will check kernel updates...
> Thanks!

Look for firmware updates to your RAID card.

Re: H800 + md1200 Performance problem

От
Merlin Moncure
Дата:
On Wed, Apr 4, 2012 at 1:55 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Apr 4, 2012 at 12:46 PM, Cesar Martin <cmartinp@gmail.com> wrote:
>> Raid controller issue or driver problem was the first problem that I
>> studied.
>> I installed Centos 5.4 al the beginning, but I had performance problems, and
>> I contacted Dell support... but Centos is not support by Dell... Then I
>> installed Redhat 6 and we contact Dell with same problem.
>> Dell say that all is right and that this is a software problem.
>> I have installed Centos 5.4, 6.2 and Redhat 6 with similar result, I think
>> that not is driver problem (megasas-raid kernel module).
>> I will check kernel updates...
>> Thanks!
>
> Look for firmware updates to your RAID card.

allready checked that: look here:

http://www.dell.com/support/drivers/us/en/04/DriverDetails?DriverId=R269683&FileId=2731095787&DriverName=Dell%20PERC%20H800%20Adapter%2C%20v.12.3.0-0032%2C%20A02&urlProductCode=False

latest update is july 2010.  i've been down this road with dell many
times and I would advise RMAing the whole server -- that will at least
get their attention. dell performance/software support is worthless
and it's a crying shame blowing 10 grand on a server only to have it
underperform your 3 year old workhorse.

merlin

Re: H800 + md1200 Performance problem

От
Tomas Vondra
Дата:
On 4.4.2012 20:46, Cesar Martin wrote:
> Raid controller issue or driver problem was the first problem that I
> studied.
> I installed Centos 5.4 al the beginning, but I had performance problems,
> and I contacted Dell support... but Centos is not support by Dell...
> Then I installed Redhat 6 and we contact Dell with same problem.
> Dell say that all is right and that this is a software problem.
> I have installed Centos 5.4, 6.2 and Redhat 6 with similar result, I
> think that not is driver problem (megasas-raid kernel module).
> I will check kernel updates...
> Thanks!

Well, there are different meanings of 'working'. Obviously you mean
'gives reasonable performance' while Dell understands 'is not on fire'.

IIRC H800 is just a 926x controller from LSI, so it's probably based on
LSI 2108. Can you post basic info about the setting, i.e.

  MegaCli -AdpAllInfo -aALL

or something like that? I'm especially interested in the access/cache
policies, cache drop interval .etc, i.e.

  MegaCli -LDGetProp (-Cache | -Access | -Name | -DskCache)

What I'd do next is testing a much smaller array (even a single drive)
to see if the issue exists. If it works, try to add another drive etc.
It's much easier to show them something's wrong. The simpler the test
case, the better.

I've found this (it's about a 2108-based controller from LSI):

http://www.xbitlabs.com/articles/storage/display/lsi-megaraid-sas9260-8i_3.html#sect0

The paragraphs below the diagram are interesting. Not sure if they
describe the same issue you have, but maybe it's related.

Anyway, it's quite usual that a RAID controller has about 50% write
performance compared to read performance, usually due to on-board CPU
bottleneck. You do have ~ 530 MB/s and 170 MB/s, so it's not exactly 50%
but it's not very far.

But the fluctuation, that surely is strange. What are the page cache
dirty limits, i.e.

cat /proc/sys/vm/dirty_background_ratio
cat /proc/sys/vm/dirty_ratio

That's probably #1 source I've seen responsible for such issues (on
machines with a lot of RAM).

Tomas


Re: H800 + md1200 Performance problem

От
Glyn Astill
Дата:
> From: Tomas Vondra <tv@fuzzy.cz>
> But the fluctuation, that surely is strange. What are the page cache
> dirty limits, i.e.
>
> cat /proc/sys/vm/dirty_background_ratio
> cat /proc/sys/vm/dirty_ratio
>
> That's probably #1 source I've seen responsible for such issues (on
> machines with a lot of RAM).
>

+1 on that.

We're running similar 32 core dell servers with H700s and 128Gb RAM.

With those at the defaults (I don't recall if it's 5 and 10 respectively) you're looking at 3.2Gb of dirty pages before pdflush flushes them and 6.4Gb before the process is forced to flush its self.

Re: H800 + md1200 Performance problem

От
Tomas Vondra
Дата:
On 5.4.2012 17:17, Cesar Martin wrote:
> Well, I have installed megacli on server and attach the results in file
> megacli.txt. Also we have "Dell Open Manage" install in server, that can
> generate a log of H800. I attach to mail with name lsi_0403.
>
> About dirty limits, I have default values:
> vm.dirty_background_ratio = 10
> vm.dirty_ratio = 20
>
> I have compared with other server and values are the same, except in
> centos 5.4 database production server that have vm.dirty_ratio = 40

Do the other machines have the same amount of RAM? The point is that the
values that work with less memory don't work that well with large
amounts of memory (and the amount of RAM did grow a lot recently).

For example a few years ago the average amount of RAM was ~8GB. In that
case the

  vm.dirty_background_ratio = 10  => 800MB
  vm.dirty_ratio = 20 => 1600MB

which is all peachy if you have a decent controller with a write cache.
But turn that to 64GB and suddenly

  vm.dirty_background_ratio = 10  => 6.4GB
  vm.dirty_ratio = 20 => 12.8GB

The problem is that there'll be a lot of data waiting (for 30 seconds by
default), and then suddenly it starts writing all of them to the
controller. Such systems behave just as your system - short strokes of
writes interleaved with 'no activity'.

Greg Smith wrote a nice howto about this - it's from 2007 but all the
recommendations are still valid:

  http://www.westnet.com/~gsmith/content/linux-pdflush.htm

TL;DR:

  - decrease the dirty_background_ratio/dirty_ratio (or use *_bytes)

  - consider decreasing the dirty_expire_centiseconds


T.

Re: H800 + md1200 Performance problem

От
Merlin Moncure
Дата:
On Thu, Apr 5, 2012 at 10:49 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
> On 5.4.2012 17:17, Cesar Martin wrote:
>> Well, I have installed megacli on server and attach the results in file
>> megacli.txt. Also we have "Dell Open Manage" install in server, that can
>> generate a log of H800. I attach to mail with name lsi_0403.
>>
>> About dirty limits, I have default values:
>> vm.dirty_background_ratio = 10
>> vm.dirty_ratio = 20
>>
>> I have compared with other server and values are the same, except in
>> centos 5.4 database production server that have vm.dirty_ratio = 40
>
> Do the other machines have the same amount of RAM? The point is that the
> values that work with less memory don't work that well with large
> amounts of memory (and the amount of RAM did grow a lot recently).
>
> For example a few years ago the average amount of RAM was ~8GB. In that
> case the
>
>  vm.dirty_background_ratio = 10  => 800MB
>  vm.dirty_ratio = 20 => 1600MB
>
> which is all peachy if you have a decent controller with a write cache.
> But turn that to 64GB and suddenly
>
>  vm.dirty_background_ratio = 10  => 6.4GB
>  vm.dirty_ratio = 20 => 12.8GB
>
> The problem is that there'll be a lot of data waiting (for 30 seconds by
> default), and then suddenly it starts writing all of them to the
> controller. Such systems behave just as your system - short strokes of
> writes interleaved with 'no activity'.
>
> Greg Smith wrote a nice howto about this - it's from 2007 but all the
> recommendations are still valid:
>
>  http://www.westnet.com/~gsmith/content/linux-pdflush.htm
>
> TL;DR:
>
>  - decrease the dirty_background_ratio/dirty_ratio (or use *_bytes)
>
>  - consider decreasing the dirty_expire_centiseconds

The original problem is read based performance issue though and this
will not have any affect on that whatsoever (although it's still
excellent advice).  Also dd should bypass the o/s buffer cache.  I
still pretty much convinced that there is a fundamental performance
issue with the raid card dell needs to explain.

merlin

Re: H800 + md1200 Performance problem

От
Tomas Vondra
Дата:
On 5.4.2012 20:43, Merlin Moncure wrote:
> The original problem is read based performance issue though and this
> will not have any affect on that whatsoever (although it's still
> excellent advice).  Also dd should bypass the o/s buffer cache.  I
> still pretty much convinced that there is a fundamental performance
> issue with the raid card dell needs to explain.

Well, there are two issues IMHO.

1) Read performance that's not exactly as good as one'd expect from a
   12 x 15k SAS RAID10 array. Given that the 15k Cheetah drives usually
   give like 170 MB/s for sequential reads/writes. I'd definitely
   expect more than 533 MB/s when reading the data. At least something
   near 1GB/s (equal to 6 drives).

   Hmm, the dd read performance seems to grow over time - I wonder if
   this is the issue with adaptive read policy, as mentioned in the
   xbitlabs report.

   Cesar, can you set the read policy to a 'read ahead'

     megacli -LDSetProp RA -LALL -aALL

   or maybe 'no read-ahead'

     megacli -LDSetProp NORA -LALL -aALL

   It's worth a try, maybe it somehow conflicts with the way kernel
   handles read-ahead or something. I find these adaptive heuristics
   a bit unpredictable ...

   Another thing - I see the patrol reads are enabled. Can you disable
   that and try how that affects the performance?

2) Write performance behaviour, that's much more suspicious ...

   Not sure if it's related to the read performance issues.

Tomas

Re: H800 + md1200 Performance problem

От
Cesar Martin
Дата:
Hi,

Today I'm doing new benchmarks with RA, NORA, WB and WT in the controller:

With NORA
-----------------
dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 318,306 s, 432 MB/s

With RA
------------
dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 179,712 s, 765 MB/s
dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 202,948 s, 677 MB/s
dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 213,157 s, 645 MB/s

With Adaptative RA
-----------------
[root@cltbbdd01 ~]# dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 169,533 s, 811 MB/s
[root@cltbbdd01 ~]# dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 207,223 s, 663 MB/s

It's very strange the differences between the same test under same conditions... It seems thah adaptative read ahead is the best solution.

For write test, I apply tuned-adm throughput-performance, that change IO elevator to deadline and grow up vm.dirty_ratio to 40.... ?¿?¿?

With WB
-------------
dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 539,041 s, 255 MB/s
dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 505,695 s, 272 MB/s

Enforce WB
-----------------
dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 662,538 s, 207 MB/s

With WT
--------------
dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 750,615 s, 183 MB/s

I think that this results are more logical... WT results in bad performance and differences, inside the same test, are minimum.

Later I have put pair of dd at same time: 

dd if=/dev/zero of=/vol02/bonnie/DD2 bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 633,613 s, 217 MB/s
dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 732,759 s, 188 MB/s

Is very strange, that with parallel DD I take 400MBps. It's like if Centos have limit in IO throughput of a process...


El 5 de abril de 2012 22:06, Tomas Vondra <tv@fuzzy.cz> escribió:
On 5.4.2012 20:43, Merlin Moncure wrote:
> The original problem is read based performance issue though and this
> will not have any affect on that whatsoever (although it's still
> excellent advice).  Also dd should bypass the o/s buffer cache.  I
> still pretty much convinced that there is a fundamental performance
> issue with the raid card dell needs to explain.

Well, there are two issues IMHO.

1) Read performance that's not exactly as good as one'd expect from a
  12 x 15k SAS RAID10 array. Given that the 15k Cheetah drives usually
  give like 170 MB/s for sequential reads/writes. I'd definitely
  expect more than 533 MB/s when reading the data. At least something
  near 1GB/s (equal to 6 drives).

  Hmm, the dd read performance seems to grow over time - I wonder if
  this is the issue with adaptive read policy, as mentioned in the
  xbitlabs report.

  Cesar, can you set the read policy to a 'read ahead'

    megacli -LDSetProp RA -LALL -aALL

  or maybe 'no read-ahead'

    megacli -LDSetProp NORA -LALL -aALL

  It's worth a try, maybe it somehow conflicts with the way kernel
  handles read-ahead or something. I find these adaptive heuristics
  a bit unpredictable ...

  Another thing - I see the patrol reads are enabled. Can you disable
  that and try how that affects the performance?

2) Write performance behaviour, that's much more suspicious ...

  Not sure if it's related to the read performance issues.

Tomas

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
César Martín Pérez
cmartinp@gmail.com

Re: H800 + md1200 Performance problem

От
Cesar Martin
Дата:
Hi,

Finally the problem was BIOS configuration. DBPM had was set to "Active Power Controller" I changed this to "Max Performance". http://en.community.dell.com/techcenter/power-cooling/w/wiki/best-practices-in-power-management.aspx
Now wirite speed are 550MB/s and read 1,1GB/s.

Thank you all for your advice.

El 9 de abril de 2012 18:24, Cesar Martin <cmartinp@gmail.com> escribió:
Hi,

Today I'm doing new benchmarks with RA, NORA, WB and WT in the controller:

With NORA
-----------------
dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 318,306 s, 432 MB/s

With RA
------------
dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 179,712 s, 765 MB/s
dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 202,948 s, 677 MB/s
dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 213,157 s, 645 MB/s

With Adaptative RA
-----------------
[root@cltbbdd01 ~]# dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 169,533 s, 811 MB/s
[root@cltbbdd01 ~]# dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 207,223 s, 663 MB/s

It's very strange the differences between the same test under same conditions... It seems thah adaptative read ahead is the best solution.

For write test, I apply tuned-adm throughput-performance, that change IO elevator to deadline and grow up vm.dirty_ratio to 40.... ?¿?¿?

With WB
-------------
dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 539,041 s, 255 MB/s
dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 505,695 s, 272 MB/s

Enforce WB
-----------------
dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 662,538 s, 207 MB/s

With WT
--------------
dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 750,615 s, 183 MB/s

I think that this results are more logical... WT results in bad performance and differences, inside the same test, are minimum.

Later I have put pair of dd at same time: 

dd if=/dev/zero of=/vol02/bonnie/DD2 bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 633,613 s, 217 MB/s
dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 732,759 s, 188 MB/s

Is very strange, that with parallel DD I take 400MBps. It's like if Centos have limit in IO throughput of a process...


El 5 de abril de 2012 22:06, Tomas Vondra <tv@fuzzy.cz> escribió:

On 5.4.2012 20:43, Merlin Moncure wrote:
> The original problem is read based performance issue though and this
> will not have any affect on that whatsoever (although it's still
> excellent advice).  Also dd should bypass the o/s buffer cache.  I
> still pretty much convinced that there is a fundamental performance
> issue with the raid card dell needs to explain.

Well, there are two issues IMHO.

1) Read performance that's not exactly as good as one'd expect from a
  12 x 15k SAS RAID10 array. Given that the 15k Cheetah drives usually
  give like 170 MB/s for sequential reads/writes. I'd definitely
  expect more than 533 MB/s when reading the data. At least something
  near 1GB/s (equal to 6 drives).

  Hmm, the dd read performance seems to grow over time - I wonder if
  this is the issue with adaptive read policy, as mentioned in the
  xbitlabs report.

  Cesar, can you set the read policy to a 'read ahead'

    megacli -LDSetProp RA -LALL -aALL

  or maybe 'no read-ahead'

    megacli -LDSetProp NORA -LALL -aALL

  It's worth a try, maybe it somehow conflicts with the way kernel
  handles read-ahead or something. I find these adaptive heuristics
  a bit unpredictable ...

  Another thing - I see the patrol reads are enabled. Can you disable
  that and try how that affects the performance?

2) Write performance behaviour, that's much more suspicious ...

  Not sure if it's related to the read performance issues.

Tomas

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
César Martín Pérez
cmartinp@gmail.com




--
César Martín Pérez
cmartinp@gmail.com

Re: H800 + md1200 Performance problem

От
Scott Marlowe
Дата:
On Mon, Apr 16, 2012 at 8:13 AM, Cesar Martin <cmartinp@gmail.com> wrote:
> Hi,
>
> Finally the problem was BIOS configuration. DBPM had was set to "Active
> Power Controller" I changed this to "Max
> Performance". http://en.community.dell.com/techcenter/power-cooling/w/wiki/best-practices-in-power-management.aspx
> Now wirite speed are 550MB/s and read 1,1GB/s.

Why in the world would a server be delivered to a customer with such a
setting turned on?  ugh.

Re: H800 + md1200 Performance problem

От
Merlin Moncure
Дата:
On Mon, Apr 16, 2012 at 10:45 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Mon, Apr 16, 2012 at 8:13 AM, Cesar Martin <cmartinp@gmail.com> wrote:
>> Hi,
>>
>> Finally the problem was BIOS configuration. DBPM had was set to "Active
>> Power Controller" I changed this to "Max
>> Performance". http://en.community.dell.com/techcenter/power-cooling/w/wiki/best-practices-in-power-management.aspx
>> Now wirite speed are 550MB/s and read 1,1GB/s.
>
> Why in the world would a server be delivered to a customer with such a
> setting turned on?  ugh.

likely informal pressure to reduce power consumption.  anyways, this
verifies my suspicion that it was a dell problem. in my dealings with
them, you truly have to threaten to send the server back then the
solution magically appears.  don't spend time and money playing their
'qualified environment' game -- it never works...just tell them to
shove it.

there are a number of second tier vendors that give good value and
allow you to to things like install your own disk drives without
getting your support terminated.  of course, you lose the 'enterprise
support', to which I give a value of approximately zero.

merlin

Re: H800 + md1200 Performance problem

От
Scott Marlowe
Дата:
On Mon, Apr 16, 2012 at 10:08 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Mon, Apr 16, 2012 at 10:45 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Mon, Apr 16, 2012 at 8:13 AM, Cesar Martin <cmartinp@gmail.com> wrote:
>>> Hi,
>>>
>>> Finally the problem was BIOS configuration. DBPM had was set to "Active
>>> Power Controller" I changed this to "Max
>>> Performance". http://en.community.dell.com/techcenter/power-cooling/w/wiki/best-practices-in-power-management.aspx
>>> Now wirite speed are 550MB/s and read 1,1GB/s.
>>
>> Why in the world would a server be delivered to a customer with such a
>> setting turned on?  ugh.
>
> likely informal pressure to reduce power consumption.  anyways, this
> verifies my suspicion that it was a dell problem. in my dealings with
> them, you truly have to threaten to send the server back then the
> solution magically appears.  don't spend time and money playing their
> 'qualified environment' game -- it never works...just tell them to
> shove it.
>
> there are a number of second tier vendors that give good value and
> allow you to to things like install your own disk drives without
> getting your support terminated.  of course, you lose the 'enterprise
> support', to which I give a value of approximately zero.

Dell's support never even came close to what I used to get from Aberdeen.

Re: H800 + md1200 Performance problem

От
Scott Marlowe
Дата:
On Mon, Apr 16, 2012 at 10:31 AM, Glyn Astill <glynastill@yahoo.co.uk> wrote:
>> From: Scott Marlowe <scott.marlowe@gmail.com>
>>On Mon, Apr 16, 2012 at 8:13 AM, Cesar Martin <cmartinp@gmail.com> wrote:
>>> Hi,
>>>
>>> Finally the problem was BIOS configuration. DBPM had was set to "Active
>>> Power Controller" I changed this to "Max
>>> Performance". http://en.community.dell.com/techcenter/power-cooling/w/wiki/best-practices-in-power-management.aspx
>>> Now wirite speed are 550MB/s and read 1,1GB/s.
>>
>>Why in the world would a server be delivered to a customer with such a
>>setting turned on?  ugh.
>
>
> Because it's Dell and that's what they do.
>
>
> When our R910s arrived, despite them knowing what we were using them for, they'd installed the memory to use only one
channelper cpu. Burried deep in their manual I discovered that they called this "power optimised" mode and I had to buy
awhole extra bunch of risers to be able to use all of the channels properly. 
>
> If it wasn't for proper load testing, and Greg Smiths stream scaling tests I don't think I'd even have spotted it.

See and that's where a small technically knowledgeable supplier is so
great.  "No you don't want 8 8G dimms, you want 16 4G dimms." etc.

Re: H800 + md1200 Performance problem

От
Glyn Astill
Дата:
> From: Scott Marlowe <scott.marlowe@gmail.com>
>On Mon, Apr 16, 2012 at 8:13 AM, Cesar Martin <cmartinp@gmail.com> wrote:
>> Hi,
>>
>> Finally the problem was BIOS configuration. DBPM had was set to "Active
>> Power Controller" I changed this to "Max
>> Performance". http://en.community.dell.com/techcenter/power-cooling/w/wiki/best-practices-in-power-management.aspx
>> Now wirite speed are 550MB/s and read 1,1GB/s.
>
>Why in the world would a server be delivered to a customer with such a
>setting turned on?  ugh.


Because it's Dell and that's what they do. 


When our R910s arrived, despite them knowing what we were using them for, they'd installed the memory to use only one
channelper cpu. Burried deep in their manual I discovered that they called this "power optimised" mode and I had to buy
awhole extra bunch of risers to be able to use all of the channels properly. 

If it wasn't for proper load testing, and Greg Smiths stream scaling tests I don't think I'd even have spotted it.