Обсуждение: Performance Problem - pgsql on MD3000 DAS

Поиск
Список
Период
Сортировка

Performance Problem - pgsql on MD3000 DAS

От
Marc
Дата:
Hi All,

We're migrating to new hardware and to pgsql 8.3.1 from pgsql 8.2.4.  We were scheduled to go live yesterday morning but elected not to late Friday after observing this issue:

Our new hardware includes an external Dell MD3000 RAID array of 15 15k SAS disks.  We have a 2 disk RAID1 array for txnlog and a 12 disk RAID10 array for pgsql data.  
Our host is a Dell PowerEdge 2950 with 2 Quad-Core Xeon 2.5GHz CPUs and 16GB of RAM running 64-bit CentOS on an internal RAID1 using standard 7200 RPM SATA drives and Dell's PERC 6i controller.  The host connects to the MD3000 via 2 SAS HBA cards and Dell's multipath RDAC driver.

I ran pgbench against two database instances - one using the disks from the MD3000 and the other using local internal SATA storage.  The results I got showed roughly twice the throughput on local storage vs the external direct-attached-storage array.  My procedure to run the benchmark was:
- Create new DB on MD3000 mount using initdb.  Edit postgresql.conf and change only the port # to 5462
- Create new DB on local mount using initdb.  Edit postgresql.conf and change only the port # to 5452

Here are the results from pgbench:

[postgres@dbnya1 ~]$ pgbench -p 5462  -c 20 -t 100 pgbench-md3000
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 833.874657 (including connections establishing)
tps = 846.126412 (excluding connections establishing)

[postgres@dbnya1 ~]$ pgbench -p 5452  -c 20 -t 100 pgbenchloc
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 2047.808129 (including connections establishing)
tps = 2125.310428 (excluding connections establishing)

I subsequently ran bonnie++, an I/O benchmark, on the local storage and the MD3000 and found that the raw I/O throughput I'm getting on the MD3000 is roughly twice as much as on local storage.  Here's the results from bonnie++:
Version 1.03c       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
dbnya1.local 32088M 48475  67 50154  12 24146   3 55312  70 54822   4 197.9   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

Version 1.03c       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
dbnya1.md3000 32088M 70672  97 137568  35 71718  17 74395  95 241676  21  1049   2
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

So, for some strange reason, pgsql is struggling with performance when being run off this external disk.  Has anyone seen behavior similar to this before?  Any suggestions on how to proceed?  It seems to me that this isn't a hardware issue based on the bonnie++ benchmark as we're getting more raw throughput on all of the tests.  

---Marc

Re: Performance Problem - pgsql on MD3000 DAS

От
Tomasz Ostrowski
Дата:
On 2008-05-25 15:19, Marc wrote:

> I ran pgbench against two database instances - one using the disks from
> the MD3000 and the other using local internal SATA storage.  The results
> I got showed roughly twice the throughput on local storage vs the
> external direct-attached-storage array.

A local storage often is lying to OS about what did it save to physical
media, if write cache is enabled.

Check this:
    hdparm -I /dev/sda
If before "Write Cache" line in "Commands/features" list is a star (*)
then it is enabled and a disk can lie. It then can not be compared to
anything that does not lie.

If it lies then it is not safe for Postgres to store data on it, as a
power failure or a hang can corrupt this data, because a server can not
know what is written to disk and what is not.

Try this:
$ download postgresql-8.3.1.tar.bz2 and save to /tmp
$ cd /tmp
$ tar xjf postgresql-8.3.1.tar.bz2
$ cd postgresql-8.3.1
$ ./configure
$ cd src/tools/fsync/
$ make
$ ./test_fsync -f [filename]

On my 7200 SATA drive with write cache enabled I get times (in seconds)
like this:
    Simple write timing:
            write                    0.009288

    Compare fsync times on write() and non-write() descriptor:
    (If the times are similar, fsync() can sync data written
     on a different descriptor.)
            write, fsync, close      0.247575
            write, close, fsync      0.236484

    Compare one o_sync write to two:
            (o_sync unavailable)
            (o_dsync unavailable)
            write, fdatasync         0.249111
            write, fsync,            0.254667

    Compare file sync methods with 2 8k writes:
            (o_dsync unavailable)
            write, fdatasync         0.356484
            write, fsync,            0.372457

But when I disable write cache with hdparm -W 0 /dev/sda:
    Simple write timing:
            write                    0.009002

    Compare fsync times on write() and non-write() descriptor:
    (If the times are similar, fsync() can sync data written
     on a different descriptor.)
            write, fsync, close      9.027829
            write, close, fsync      8.879246

    Compare one o_sync write to two:
            (o_sync unavailable)
            (o_dsync unavailable)
            write, fdatasync         8.807287
            write, fsync,            9.345699

    Compare file sync methods with 2 8k writes:
            (o_dsync unavailable)
            write, fdatasync         9.071120
            write, fsync,            9.237469

With write cache enabled it is clearly lying, as my 7200rpm drive can
not save 1000 times in the same place in less than 8.33s.
8.33s = 1000writes / (7200rpm / 60s)


I assume that you do not have an expensive SATA drive controller with
battery backed cache.

Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
                                                      Winnie the Pooh


Re: Performance Problem - pgsql on MD3000 DAS

От
Lincoln Yeoh
Дата:
At 09:19 PM 5/25/2008, Marc wrote:
>Hi All,
>
>We're migrating to new hardware and to pgsql 8.3.1 from pgsql
><http://8.2.4.>8.2.4.  We were scheduled to go live yesterday
>morning but elected not to late Friday after observing this issue:
>
>Our new hardware includes an external Dell MD3000 RAID array of 15
>15k SAS disks.  We have a 2 disk RAID1 array for txnlog and a 12
>disk RAID10 array for pgsql data.

Try the bonnie++ benchmark on the RAID1 txn log array.

Maybe you should try a 4 disk RAID10 array for txn log (or even more
disks :) ).

Link.




Re: Performance Problem - pgsql on MD3000 DAS

От
Greg Smith
Дата:
On Sun, 25 May 2008, Marc wrote:

> [postgres@dbnya1 ~]$ pgbench -p 5462  -c 20 -t 100 pgbench-md3000
> starting vacuum...end.
> transaction type: TPC-B (sort of)
> scaling factor: 1
> number of clients: 20
> number of transactions per client: 100
> number of transactions actually processed: 2000/2000

pgbench run like this is pretty much worthless.  In order to get useful
results from it, you need:

1) The database scale, as specified by "pgbench -i -s <scale>", to be
larger than the number of clients

2) Run the benchmark for a fairly long time.  2000 transactions is barely
doing anything.  You want to aim at close to 100,000 for a quick test and
ten times that for a serious one.

What you're seeing right now is how long it takes to sync 2000
transactions to disk, which is an interesting number but probably not what
you intended to measure.  It's not enough data to even write anything to
the main database disk, will all just get cached in memory and written out
after the test is over.

Increasing the scale can be tricky, as then you need to consider how much
RAM and caching are involved.  I started putting some articles on this
topic at http://www.westnet.com/~gsmith/content/postgresql/ you should
find useful.  I hope you know to do things like increase shared_buffers to
take advantage of the RAM in your server.

> So, for some strange reason, pgsql is struggling with performance when being
> run off this external disk.

Your internal disk is probably caching writes and isn't safe to run a
database from, so it's cheating.  If you run a much longer test with a
much larger database scale, the array may pull ahead anyway.  See
http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm for
notes on that topic.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD