Обсуждение: Disk Benchmarking Question

Поиск
Список
Период
Сортировка

Disk Benchmarking Question

От
Dave Stibrany
Дата:
I'm pretty new to benchmarking hard disks and I'm looking for some advice on interpreting the results of some basic tests.

The server is:
- Dell PowerEdge R430
- 1 x Intel Xeon E5-2620 2.4GHz
- 32 GB RAM
- 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10
- PERC H730P Raid Controller with 2GB cache in write back mode.

The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and an xfs volume for PGDATA.

I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I ran 'bonnie++ -n0 -f' on the root volume.

Here's a link to the bonnie test results

The vendor stats say sustained throughput of 215 to 108 MBps, so I guess I'd expect around 400-800 MBps read and 200-400 MBps write. In any case, I'm pretty confused as to why the read and write sequential speeds are almost identical. Does this look wrong?

Thanks,

Dave



Re: Disk Benchmarking Question

От
"Mike Sofen"
Дата:

Hi Dave,

 

Database disk performance has to take into account IOPs, and IMO, over MBPs, since it’s the ability of the disk subsystem to write lots of little bits (usually) versus writing giant globs, especially in direct attached storage (like yours, versus a SAN).  Most db disk benchmarks revolve around IOPs…and this is where SSDs utterly crush spinning disks.

 

You can get maybe 200 IOPs out of each disk, you have 4 in raid  10 so you get a whopping 400 IOPs.  A single quality SSD (like the Samsung 850 pro) will support a minimum of 40k IOPs on reads and 80k IOPs on writes.  That’s why SSDs are eliminating spinning disks when performance is critical and budget allows.

 

Back to your question – the MBPs is the capacity of interface, so it makes sense that it’s the same for both reads and writes.  The perc raid controller will be saving your bacon on writes, with 2gb cache (assuming it’s caching writes), so it becomes the equivalent of an SSD up to the capacity limit of the write cache.  With only 400 iops of write speed, with a busy server you can easily saturate the cache and then your system will drop to a crawl.

 

If I didn’t answer the intent of your question, feel free to clarify for me.

 

Mike

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Dave Stibrany
Sent: Thursday, March 17, 2016 1:45 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Disk Benchmarking Question

 

I'm pretty new to benchmarking hard disks and I'm looking for some advice on interpreting the results of some basic tests.

 

The server is:

- Dell PowerEdge R430

- 1 x Intel Xeon E5-2620 2.4GHz

- 32 GB RAM

- 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10

- PERC H730P Raid Controller with 2GB cache in write back mode.

 

The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and an xfs volume for PGDATA.

 

I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I ran 'bonnie++ -n0 -f' on the root volume.

 

Here's a link to the bonnie test results

 

The vendor stats say sustained throughput of 215 to 108 MBps, so I guess I'd expect around 400-800 MBps read and 200-400 MBps write. In any case, I'm pretty confused as to why the read and write sequential speeds are almost identical. Does this look wrong?

 

Thanks,

 

Dave

 

 

 

Re: Disk Benchmarking Question

От
Dave Stibrany
Дата:
Hey Mike,

Thanks for the response. I think where I'm confused is that I thought vendor specified MBps was an estimate of sequential read/write speed. Therefore if you're in RAID10, you'd have 4x the sequential read speed and 2x the sequential write speed. Am I misunderstanding something?

Also, when you mention that MBPs is the capacity of the interface, what do you mean exactly. I've been taking interface speed to be the electronic transfer speed, not the speed from the actual physical medium, and more in the 6-12 gigabit range.

Please let me know if I'm way off on any of this, I'm hoping to have my mental model updated.

Thanks!

Dave

On Thu, Mar 17, 2016 at 5:11 PM, Mike Sofen <msofen@runbox.com> wrote:

Hi Dave,

 

Database disk performance has to take into account IOPs, and IMO, over MBPs, since it’s the ability of the disk subsystem to write lots of little bits (usually) versus writing giant globs, especially in direct attached storage (like yours, versus a SAN).  Most db disk benchmarks revolve around IOPs…and this is where SSDs utterly crush spinning disks.

 

You can get maybe 200 IOPs out of each disk, you have 4 in raid  10 so you get a whopping 400 IOPs.  A single quality SSD (like the Samsung 850 pro) will support a minimum of 40k IOPs on reads and 80k IOPs on writes.  That’s why SSDs are eliminating spinning disks when performance is critical and budget allows.

 

Back to your question – the MBPs is the capacity of interface, so it makes sense that it’s the same for both reads and writes.  The perc raid controller will be saving your bacon on writes, with 2gb cache (assuming it’s caching writes), so it becomes the equivalent of an SSD up to the capacity limit of the write cache.  With only 400 iops of write speed, with a busy server you can easily saturate the cache and then your system will drop to a crawl.

 

If I didn’t answer the intent of your question, feel free to clarify for me.

 

Mike

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Dave Stibrany
Sent: Thursday, March 17, 2016 1:45 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Disk Benchmarking Question

 

I'm pretty new to benchmarking hard disks and I'm looking for some advice on interpreting the results of some basic tests.

 

The server is:

- Dell PowerEdge R430

- 1 x Intel Xeon E5-2620 2.4GHz

- 32 GB RAM

- 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10

- PERC H730P Raid Controller with 2GB cache in write back mode.

 

The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and an xfs volume for PGDATA.

 

I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I ran 'bonnie++ -n0 -f' on the root volume.

 

Here's a link to the bonnie test results

 

The vendor stats say sustained throughput of 215 to 108 MBps, so I guess I'd expect around 400-800 MBps read and 200-400 MBps write. In any case, I'm pretty confused as to why the read and write sequential speeds are almost identical. Does this look wrong?

 

Thanks,

 

Dave

 

 

 




--
THIS IS A TEST

Re: Disk Benchmarking Question

От
"Mike Sofen"
Дата:

Sorry for the delay, long work day!

 

Ok, I THINK I understand where you’re going.  Do it this way:

4 drives in Raid10 = 2 pairs of mirrored drives, aka still 2 active drives (2 are failover).  They are sharing the 12gbps SAS interface, but that speed is quite irrelevant…it’s just a giant pipe for filling lots of drives. 

 

Each of your 2 drives has a max seq read/write spec 200 MBPs (WAY max).  When I say max, I mean, under totally edge laboratory conditions, writing to the outer few tracks with purely sequential data (never happens in the real world).  With 2 drives running perfectly in raid 10, the theoretical max would be 400mbps.  Real world, less than half, on sequential.

 

But random writes are the rulers of most activity in the data world (think of writing a single row to a table – a few thousand bytes that might be plopped anywhere on the disk and then randomly retrieved.  So the MBPs throughput number becomes mostly meaningless (because the data chunks are small and random), and IOPs and drive seek times become king (thus my earlier comments).

 

So – if you’re having disk performance issues with a database, you either add more spinning disks (to increase IOPs/distribute them) or switch to SSDs and forget about almost everything…

 

Mike

 

------------------

From: Dave Stibrany [mailto:dstibrany@gmail.com]
Sent: Friday, March 18, 2016 7:48 AM

Hey Mike,

 

Thanks for the response. I think where I'm confused is that I thought vendor specified MBps was an estimate of sequential read/write speed. Therefore if you're in RAID10, you'd have 4x the sequential read speed and 2x the sequential write speed. Am I misunderstanding something?

 

Also, when you mention that MBPs is the capacity of the interface, what do you mean exactly. I've been taking interface speed to be the electronic transfer speed, not the speed from the actual physical medium, and more in the 6-12 gigabit range.

 

Please let me know if I'm way off on any of this, I'm hoping to have my mental model updated.

 

Thanks!

 

Dave

 

On Thu, Mar 17, 2016 at 5:11 PM, Mike Sofen <msofen@runbox.com> wrote:

Hi Dave,

 

Database disk performance has to take into account IOPs, and IMO, over MBPs, since it’s the ability of the disk subsystem to write lots of little bits (usually) versus writing giant globs, especially in direct attached storage (like yours, versus a SAN).  Most db disk benchmarks revolve around IOPs…and this is where SSDs utterly crush spinning disks.

 

You can get maybe 200 IOPs out of each disk, you have 4 in raid  10 so you get a whopping 400 IOPs.  A single quality SSD (like the Samsung 850 pro) will support a minimum of 40k IOPs on reads and 80k IOPs on writes.  That’s why SSDs are eliminating spinning disks when performance is critical and budget allows.

 

Back to your question – the MBPs is the capacity of interface, so it makes sense that it’s the same for both reads and writes.  The perc raid controller will be saving your bacon on writes, with 2gb cache (assuming it’s caching writes), so it becomes the equivalent of an SSD up to the capacity limit of the write cache.  With only 400 iops of write speed, with a busy server you can easily saturate the cache and then your system will drop to a crawl.

 

If I didn’t answer the intent of your question, feel free to clarify for me.

 

Mike

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Dave Stibrany
Sent: Thursday, March 17, 2016 1:45 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Disk Benchmarking Question

 

I'm pretty new to benchmarking hard disks and I'm looking for some advice on interpreting the results of some basic tests.

 

The server is:

- Dell PowerEdge R430

- 1 x Intel Xeon E5-2620 2.4GHz

- 32 GB RAM

- 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10

- PERC H730P Raid Controller with 2GB cache in write back mode.

 

The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and an xfs volume for PGDATA.

 

I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I ran 'bonnie++ -n0 -f' on the root volume.

 

Here's a link to the bonnie test results

 

The vendor stats say sustained throughput of 215 to 108 MBps, so I guess I'd expect around 400-800 MBps read and 200-400 MBps write. In any case, I'm pretty confused as to why the read and write sequential speeds are almost identical. Does this look wrong?

 

Thanks,

 

Dave

 

 

 



 

--

THIS IS A TEST

Re: Disk Benchmarking Question

От
Scott Marlowe
Дата:
On Thu, Mar 17, 2016 at 2:45 PM, Dave Stibrany <dstibrany@gmail.com> wrote:
> I'm pretty new to benchmarking hard disks and I'm looking for some advice on
> interpreting the results of some basic tests.
>
> The server is:
> - Dell PowerEdge R430
> - 1 x Intel Xeon E5-2620 2.4GHz
> - 32 GB RAM
> - 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10
> - PERC H730P Raid Controller with 2GB cache in write back mode.
>
> The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and
> an xfs volume for PGDATA.
>
> I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I
> ran 'bonnie++ -n0 -f' on the root volume.
>
> Here's a link to the bonnie test results
> https://www.dropbox.com/s/pwe2g5ht9fpjl2j/bonnie.today.html?dl=0
>
> The vendor stats say sustained throughput of 215 to 108 MBps, so I guess I'd
> expect around 400-800 MBps read and 200-400 MBps write. In any case, I'm
> pretty confused as to why the read and write sequential speeds are almost
> identical. Does this look wrong?

For future reference, it's good to include the data you linked to in
your post, as in 2, 5 or 10 years the postgresql discussion archives
will still be here but your dropbox may or may not, and then people
won't know what numbers you are referring to.

Given the size of your bonnie test set and the fact that you're using
RAID-10, the cache should make little or no difference. The RAID
controller may or may not interleave reads between all four drives.
Some do, some don't. It looks to me like yours doesn't. I.e. when
reading it's not reading all 4 disks at once, but just 2, 1 from each
pair.

But the important question here is what kind of workload are you
looking at throwing at this server? If it's going to be a reporting
database you may get as good or better read performance from RAID-5 as
RAID-10, especially if you add more drives. If you're looking at
transactional use then as Mike suggested SSDs might be your best
choice.

We run some big transactional dbs at work that are 4 to 6 TB and for
those we use 10 800GB SSDs in RAID-5 with the RAID controller cache
turned off. We can hit ~18k tps in pgbench on ~100GB test sets. With
the cache on we drop to 3 to 5k tps. With 512MB cache we overwrite the
cache every couple of seconds and it just gets in the way.

SSDs win hands down if you need random access speed. It's like a
Stanley Steamer (spinners) versus a Bugatti Veyron (SSDs).

For sequential throughput like a reporting server often spinners do
alright, as long as there's only one or two processes accessing your
data at a time. As soon as you start to get more accesses going as you
have RAID-10 pairs your performance will drop off noticeably.


Re: Disk Benchmarking Question

От
Scott Marlowe
Дата:
On Sat, Mar 19, 2016 at 4:29 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> Given the size of your bonnie test set and the fact that you're using
> RAID-10, the cache should make little or no difference. The RAID
> controller may or may not interleave reads between all four drives.
> Some do, some don't. It looks to me like yours doesn't. I.e. when
> reading it's not reading all 4 disks at once, but just 2, 1 from each
> pair.

Point of clarification. It may be that if two processes are reading
the data set at once you'd get a sustained individual throughput that
matches what a single read can get.


Re: Disk Benchmarking Question

От
Dave Stibrany
Дата:
Thanks for the feedback guys. I'm looking forward to the day when we upgrade to SSDs.

For future reference, the bonnie++ numbers I was referring to are: 

Size: 63G

Sequential Output: 
------------------------
396505 K/sec
% CPU 21

Sequential Input: 
------------------------
401117 K/sec
% CPU 21

Random Seeks:
----------------------
650.7 /sec
% CPU 25

I think a lot of my confusion resulted from expecting sequential reads to be 4x the speed of a single disk because the disks are in RAID10. I'm thinking now that the 4x only applies to random reads.

On Sat, Mar 19, 2016 at 6:32 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Sat, Mar 19, 2016 at 4:29 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> Given the size of your bonnie test set and the fact that you're using
> RAID-10, the cache should make little or no difference. The RAID
> controller may or may not interleave reads between all four drives.
> Some do, some don't. It looks to me like yours doesn't. I.e. when
> reading it's not reading all 4 disks at once, but just 2, 1 from each
> pair.

Point of clarification. It may be that if two processes are reading
the data set at once you'd get a sustained individual throughput that
matches what a single read can get.



--
THIS IS A TEST