Re: Two identical systems, radically different performance

Поиск
Список
Период
Сортировка
От Evgeny Shishkin
Тема Re: Two identical systems, radically different performance
Дата
Msg-id 4CB0E3B6-CF1E-4B54-9900-E8356744F0AA@gmail.com
обсуждение исходный текст
Ответ на Two identical systems, radically different performance  (Craig James <cjames@emolecules.com>)
Ответы Re: Two identical systems, radically different performance  (Craig James <cjames@emolecules.com>)
Список pgsql-performance

On Oct 9, 2012, at 1:45 AM, Craig James <cjames@emolecules.com> wrote:

This is driving me crazy.  A new server, virtually identical to an old one, has 50% of the performance with pgbench.  I've checked everything I can think of.

The setups (call the servers "old" and "new"):

old: 2 x 4-core Intel Xeon E5620
new: 4 x 4-core Intel Xeon E5606

both:

  memory: 12 GB DDR EC
  Disks: 12x500GB disks (Western Digital 7200RPM SATA)
    2 disks, RAID1: OS (ext4) and postgres xlog (ext2)
    8 disks, RAID10: $PGDATA

  3WARE 9650SE-12ML with battery-backed cache.  The admin tool (tw_cli)
  indicates that the battery is charged and the cache is working on both units.

  Linux: 2.6.32-41-server #94-Ubuntu SMP (new server's disk was
  actually cloned from old server).


  Postgres: 8.4.4 (yes, I should update.  But both are identical.)

The postgres.conf files are identical; diffs from the original are:

    max_connections = 500
    shared_buffers = 1000MB
    work_mem = 128MB
    synchronous_commit = off
    full_page_writes = off
    wal_buffers = 256kB

wal buffers seems very small. Simon suggests to set them at least to 16MB.
    checkpoint_segments = 30
    effective_cache_size = 4GB

You have 12Gb RAM.
    track_activities = on
    track_counts = on
    track_functions = none
    autovacuum = on
    autovacuum_naptime = 5min
    escape_string_warning = off

Note that the old server is in production and was serving a light load while this test was running, so in theory it should be slower, not faster, than the new server.

pgbench: Old server

    pgbench -i -s 100 -U test
    pgbench -U test -c ... -t ...

    -c  -t      TPS
     5  20000  3777
    10  10000  2622
    20  5000   3759
    30  3333   5712
    40  2500   5953
    50  2000   6141

New server
    -c  -t      TPS
    5   20000  2733
    10  10000  2783
    20  5000   3241
    30  3333   2987
    40  2500   2739
    50  2000   2119

As you can see, the new server is dramatically slower than the old one.

I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++.  The xlog disks were almost identical in performance.  The RAID10 pg-data disks looked like this:

Old server:
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
xenon        24064M   687  99 203098  26 81904  16  3889  96 403747  31 737.6  31
Latency             20512us     469ms     394ms   21402us     396ms     112ms
Version  1.96       ------Sequential Create------ --------Random Create--------
xenon               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 15953  27 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency             43291us     857us     519us    1588us      37us     178us
1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\
+,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us


New server:
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zinc         24064M   862  99 212143  54 96008  14  4921  99 279239  17 752.0  23
Latency             15613us     598ms     597ms    2764us     398ms     215ms
Version  1.96       ------Sequential Create------ --------Random Create--------
zinc                -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 20380  26 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               487us     627us     407us     972us      29us     262us
1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\
,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us

Sequential Input on the new one is 279MB/s, on the old 400MB/s. 

I don't know enough about bonnie++ to know if these differences are interesting.

One dramatic difference I noted via vmstat.  On the old server, the I/O load during the bonnie++ run was steady, like this:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  2  71800 2117612  17940 9375660    0    0 82948 81944 1992 1341  1  3 86 10
 0  2  71800 2113328  17948 9383896    0    0 76288 75806 1751 1167  0  2 86 11
 0  1  71800 2111004  17948 9386540   92    0 93324 94232 2230 1510  0  4 86 10
 0  1  71800 2106796  17948 9387436  114    0 67698 67588 1572 1088  0  2 87 11
 0  1  71800 2106724  17956 9387968   50    0 81970 85710 1918 1287  0  3 86 10
 1  1  71800 2103304  17956 9390700    0    0 92096 92160 1970 1194  0  4 86 10
 0  2  71800 2103196  17976 9389204    0    0 70722 69680 1655 1116  1  3 86 10
 1  1  71800 2099064  17980 9390824    0    0 57346 57348 1357  949  0  2 87 11
 0  1  71800 2095596  17980 9392720    0    0 57344 57348 1379  987  0  2 86 12

But the new server varied wildly during bonnie++:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  1      0 4518352  12004 7167000    0    0 118894 120838 2613 1539  0  2 93  5
 0  1      0 4517252  12004 7167824    0    0  52116  53248 1179  793  0  1 94  5
 0  1      0 4515864  12004 7169088    0    0  46764  49152 1104  733  0  1 91  7
 0  1      0 4515180  12012 7169764    0    0  32924  30724  750  542  0  1 93  6
 0  1      0 4514328  12016 7170780    0    0  42188  45056 1019  664  0  1 90  9
 0  1      0 4513072  12016 7171856    0    0  67528  65540 1487  993  0  1 96  4
 0  1      0 4510852  12016 7173160    0    0  56876  57344 1358  942  0  1 94  5
 0  1      0 4500280  12044 7179924    0    0  91564  94220 2505 2504  1  2 91  6
 0  1      0 4495564  12052 7183492    0    0 102660 104452 2289 1473  0  2 92  6
 0  1      0 4492092  12052 7187720    0    0  98498  96274 2140 1385  0  2 93  5
 0  1      0 4488608  12060 7190772    0    0  97628 100358 2176 1398  0  1 94  4
 1  0      0 4485880  12052 7192600    0    0 112406 114686 2461 1509  0  3 90  7
 1  0      0 4483424  12052 7195612    0    0  64678  65536 1449  948  0  1 91  8
 0  1      0 4480252  12052 7199404    0    0  99608 100356 2217 1452  0  1 96  3

Any ideas where to look next would be greatly appreciated.

Craig


В списке pgsql-performance по дате отправления:

Предыдущее
От: Craig James
Дата:
Сообщение: Two identical systems, radically different performance
Следующее
От: Craig James
Дата:
Сообщение: Re: Two identical systems, radically different performance