Обсуждение: Performance advice for a new low(er)-power server

Поиск
Список
Период
Сортировка

Performance advice for a new low(er)-power server

От
Haestan
Дата:
Hi,

I am evaluating hardware for a new PostgreSQL server. For reasons
concerning power consumption and available space it should not have
more than 4 disks (in a 1U case), if possible. Now, I am not sure what
disks to use and how to layout them to get the best performance.

The cheaper option would be to buy 15k Seagate SAS disks with a 3ware
9750SA (battery backed) controller. Does it matter whether to use a
4-disk RAID10 or 2x 2-disk RAID1 (system+pg_xlog , pg_data) setup? Am
I right that both would be faster than just using a single 2-disk
RAID1 for everything?

A higher end option would be to use 2x 64G Intel X-25E SSD's with a
LSI MegaRAID 9261 controller for pg_data and/or pg_xlog and 2x SAS
disks for the rest. Unfortunately, these SSD are the only ones offered
by our supplier and they don't use a supercapacitor, AFAIK. Therefore
I would have to disable the write cache on the SSD's somehow and just
use the cache on the controller only. Does anyone know if this will
work or even uses such a setup?

Furthermore, the LSI MegaRAID 9261 offers CacheCade which uses SSD
disks a as secondary tier of cache for the SAS disks. Would this
feature make sense for a PostgreSQL server, performance wise?

Thank you for any hints and inputs.

Regards,

Tom.

Re: Performance advice for a new low(er)-power server

От
Merlin Moncure
Дата:
On Thu, Jun 16, 2011 at 10:09 AM, Haestan <haestan@gmail.com> wrote:
> Hi,
>
> I am evaluating hardware for a new PostgreSQL server. For reasons
> concerning power consumption and available space it should not have
> more than 4 disks (in a 1U case), if possible. Now, I am not sure what
> disks to use and how to layout them to get the best performance.
>
> The cheaper option would be to buy 15k Seagate SAS disks with a 3ware
> 9750SA (battery backed) controller. Does it matter whether to use a
> 4-disk RAID10 or 2x 2-disk RAID1 (system+pg_xlog , pg_data) setup? Am
> I right that both would be faster than just using a single 2-disk
> RAID1 for everything?

with 4 drives I think your best bet is single volume raid 10 (ssd or
standard disk).

> A higher end option would be to use 2x 64G Intel X-25E SSD's with a
> LSI MegaRAID 9261 controller for pg_data and/or pg_xlog and 2x SAS
> disks for the rest. Unfortunately, these SSD are the only ones offered
> by our supplier and they don't use a supercapacitor, AFAIK. Therefore
> I would have to disable the write cache on the SSD's somehow and just
> use the cache on the controller only. Does anyone know if this will
> work or even uses such a setup?

I am not a big fan of vendors that do not allow hooking in your own
drives.  How well this setup works is going to depend on how well the
controller works with the SSD.  Still, as of today, it's probably
going to be the best performance you can get for four drives...the
x25-e remains the only SLC drive from a major vendor.

> Furthermore, the LSI MegaRAID 9261 offers CacheCade which uses SSD
> disks a as secondary tier of cache for the SAS disks. Would this
> feature make sense for a PostgreSQL server, performance wise?

I'm really skeptical about this feature.

> Thank you for any hints and inputs.

The SSD space is going to see a lot more options from Intel later this
year.  See: http://www.maximumpc.com/article/news/leaked_roadmap_points_upcoming_intel_ssds.
 If you have time, I'd consider waiting a month or so to see what
options become available.

merlin

Re: Performance advice for a new low(er)-power server

От
Jesper Krogh
Дата:
On 2011-06-16 17:09, Haestan wrote:
> I am evaluating hardware for a new PostgreSQL server. For reasons
> concerning power consumption and available space it should not have
> more than 4 disks (in a 1U case), if possible. Now, I am not sure what
> disks to use and how to layout them to get the best performance.
What is your data:memory-size ratio? Can you afford to have everything
in memory and only have the disks to be able to sustain writes?

> The cheaper option would be to buy 15k Seagate SAS disks with a 3ware
> 9750SA (battery backed) controller. Does it matter whether to use a
> 4-disk RAID10 or 2x 2-disk RAID1 (system+pg_xlog , pg_data) setup? Am
> I right that both would be faster than just using a single 2-disk
> RAID1 for everything?
>
> A higher end option would be to use 2x 64G Intel X-25E SSD's with a
> LSI MegaRAID 9261 controller for pg_data and/or pg_xlog and 2x SAS
> disks for the rest. Unfortunately, these SSD are the only ones offered
> by our supplier and they don't use a supercapacitor, AFAIK. Therefore
> I would have to disable the write cache on the SSD's somehow and just
> use the cache on the controller only. Does anyone know if this will
> work or even uses such a setup.
Any SSD is orders of magnitude better than any rotating drive
in terms of random reads. If you will benefit depends on your
data:memory ratio..

> Furthermore, the LSI MegaRAID 9261 offers CacheCade which uses SSD
> disks a as secondary tier of cache for the SAS disks. Would this
> feature make sense for a PostgreSQL server, performance wise?
I have one CacheCade setup...  not a huge benefit but it seems
measurable. (but really hard to test).  .. compared to a full
SSD-setup I wouldn't consider it at all.

--
Jesper

Re: Performance advice for a new low(er)-power server

От
Greg Smith
Дата:
On 06/16/2011 11:09 AM, Haestan wrote:
> The cheaper option would be to buy 15k Seagate SAS disks with a 3ware
> 9750SA (battery backed) controller. Does it matter whether to use a
> 4-disk RAID10 or 2x 2-disk RAID1 (system+pg_xlog , pg_data) setup? Am
> I right that both would be faster than just using a single 2-disk
> RAID1 for everything?
>

The layout you proposed (OS+WAL , data) might be effective, but if your
write volume is low it may not be much of an improvement at all over a
simple RAID1 of two drives.  The odds that you are going to correctly
lay out individual sections of a disk array with only two pairs to
spread the data across aren't good.  If this is all you have to work
with, a 4-disk RAID10 would at least guarantee you're taking advantage
of all four drives.  With that controller, it should be almost twice as
fast in all cases as hooking up only two drives.

> A higher end option would be to use 2x 64G Intel X-25E SSD's with a
> LSI MegaRAID 9261 controller for pg_data and/or pg_xlog and 2x SAS
> disks for the rest. Unfortunately, these SSD are the only ones offered
> by our supplier and they don't use a supercapacitor, AFAIK. Therefore
> I would have to disable the write cache on the SSD's somehow and just
> use the cache on the controller only. Does anyone know if this will
> work or even uses such a setup?
>

These drives are one of the worst choices on the market for PostgreSQL
storage.  They're unusably slow if you disable the caches, and even that
isn't guaranteed to work.  There is no way to make them safe.  See
http://wiki.postgresql.org/wiki/Reliable_Writes for more details.  The
3rd generation SSDs from Intel are much, much better; see
http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html
for details.

There is another possibility I would suggest you consider.  You could
buy the server with a single pair of drives now, then wait to see what
performance is like before filling the other two slots.  It is far
easier to figure out what drive technology makes sense if you have
measurements from an existing system to guide that decision.  And you
may be able to get newer drives from your vendor that slide into the
empty slots.  You may not ever even need more than a single RAID-1
pair.  I see lots of people waste money on drives that would be better
spent on RAM.

> Furthermore, the LSI MegaRAID 9261 offers CacheCade which uses SSD
> disks a as secondary tier of cache for the SAS disks. Would this
> feature make sense for a PostgreSQL server, performance wise?
>

There are already three layers involved here:

-Database shared_buffers cache
-Operating system read/write cache
-RAID controller cache

I would be skeptical that adding a fourth one near the bottom of this
stack is likely to help a lot.  And you're adding a whole new layer of
difficult to test reliability issues, too.  Overly complicated storage
solutions tend to introduce complicated failures that corrupt your data
in unexpected ways.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Performance advice for a new low(er)-power server

От
Scott Marlowe
Дата:
On Thu, Jun 16, 2011 at 12:43 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> There are already three layers involved here:
>
> -Database shared_buffers cache
> -Operating system read/write cache
> -RAID controller cache
>
> I would be skeptical that adding a fourth one near the bottom of this stack
> is likely to help a lot.  And you're adding a whole new layer of difficult
> to test reliability issues, too.  Overly complicated storage solutions tend
> to introduce complicated failures that corrupt your data in unexpected ways.

Plus each layer is from a different provider.  The drive manufacturers
point to the RAID controller maker, the RAID controller people point
at the SSDs you're using, and so on...

Re: Performance advice for a new low(er)-power server

От
Merlin Moncure
Дата:
On Thu, Jun 16, 2011 at 1:43 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> These drives are one of the worst choices on the market for PostgreSQL
> storage.  They're unusably slow if you disable the caches, and even that
> isn't guaranteed to work.  There is no way to make them safe.  See
> http://wiki.postgresql.org/wiki/Reliable_Writes for more details.  The 3rd
> generation SSDs from Intel are much, much better; see
> http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html
> for details.

I don't necessarily agree. the drives are SLC and have the potential
to have a much longer lifespan than any MLC drive, although this is
going to depend a lot on the raid controller if write caching is
disabled.  Also, reading the post that got all this started
(http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/),
the OP was able to configure them to run durably with 1200 write iops.
 While not great, that's still much better than any spinning disk.

So, if drive lifespan is a big deal, I think they still technically
have a place *today) although the drives that are just about to come
out (the 710 and 720) will make them obsolete, because the built in
caching (particularly for the SLC 720) will make the drive superior in
every respect.

merlin

Re: Performance advice for a new low(er)-power server

От
Greg Smith
Дата:
On 06/16/2011 03:04 PM, Merlin Moncure wrote:
> I don't necessarily agree. the drives are SLC and have the potential
> to have a much longer lifespan than any MLC drive, although this is
> going to depend a lot on the raid controller if write caching is
> disabled.  Also, reading the post that got all this started
> (http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/),
> the OP was able to configure them to run durably with 1200 write iops.
>

We've also seen
http://petereisentraut.blogspot.com/2009/07/solid-state-drive-benchmarks-and-write.html
where Peter was only able to get 441 seeks/second on the bonnie++ mixed
read/write test that way.  And no one has measured the longevity of the
drive when it's running in this mode.  A large portion of the lifespan
advantage MLC would normally have over SLC goes away if it can't cache
writes anymore.  Worst-case, if the drive is always hit with 8K writes
and the erase size is 128KB, you might get only 1/16 of the specified
lifetime running it cacheless.

I just can't recommend that people consider running one of these in a
mode it was never intended to.  The fact that the consumer drives from
this generation still lose data even with the write cache turned off
should make you ask what other, more difficult to trigger failure modes
are still hanging around the enterprise drives, too.  Everyone I've seen
suffer through problems with these gave up on them before even trying
really in-depth reliability tests, so I don't consider that territory
even very well explored.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Performance advice for a new low(er)-power server

От
Merlin Moncure
Дата:
On Thu, Jun 16, 2011 at 5:12 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 06/16/2011 03:04 PM, Merlin Moncure wrote:
>>
>> I don't necessarily agree. the drives are SLC and have the potential
>> to have a much longer lifespan than any MLC drive, although this is
>> going to depend a lot on the raid controller if write caching is
>> disabled.  Also, reading the post that got all this started
>>
>> (http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/),
>> the OP was able to configure them to run durably with 1200 write iops.
>>
>
> We've also seen
> http://petereisentraut.blogspot.com/2009/07/solid-state-drive-benchmarks-and-write.html
> where Peter was only able to get 441 seeks/second on the bonnie++ mixed
> read/write test that way.  And no one has measured the longevity of the
> drive when it's running in this mode.  A large portion of the lifespan
> advantage MLC would normally have over SLC goes away if it can't cache
> writes anymore.  Worst-case, if the drive is always hit with 8K writes and
> the erase size is 128KB, you might get only 1/16 of the specified lifetime
> running it cacheless.
>
> I just can't recommend that people consider running one of these in a mode
> it was never intended to.  The fact that the consumer drives from this
> generation still lose data even with the write cache turned off should make
> you ask what other, more difficult to trigger failure modes are still
> hanging around the enterprise drives, too.  Everyone I've seen suffer
> through problems with these gave up on them before even trying really
> in-depth reliability tests, so I don't consider that territory even very
> well explored.

I've always been more than little suspicious about Peter's results --
using lvm and luks, and the filesystem wasn't specified or the write
barriers setting.  Also they are much slower than any other results
I've seen, and no numbers are given using a more standard setup.
Other people are showing 1200 write iops vs 441 mostly read iops.  Not
that I think they aren't correct, but there simply has to be an
explanation of why his results are so much slower than all others on
the 'net...I think 1200 write iops is a realistic expectation.

One the cache/lifespan issue, you might be correct -- it's going to
depend on the raid controller. The drive has a 10x longer lifespan and
there is not going to be a 1:1 correspondence between sync requests
and actual syncs to the drive.  But the drive is denied the ability to
do it's own reordering so unless the raid controller is really
optimized for flash longevity (doubtful), you probably do have to give
back a lot of the savings you get from being SLC (possibly more).  So
it's complex -- but I think the whole issue becomes moot soon because
non consumer flash drives from here on out are going to have
capacitors in them (the 720 ramsdale will immediately knock out the
x25-e). So the prudent course of action is to wait, or to just go with
the current crop capacitor based drives and deal with the lifespan
issues.

merlin

Re: Performance advice for a new low(er)-power server

От
Haestan
Дата:
On Jun 16, 2011, at 20:29, Jesper Krogh wrote:

> On 2011-06-16 17:09, Haestan wrote:
>> I am evaluating hardware for a new PostgreSQL server. For reasons
>> concerning power consumption and available space it should not have
>> more than 4 disks (in a 1U case), if possible. Now, I am not sure what
>> disks to use and how to layout them to get the best performance.
> What is your data:memory-size ratio? Can you afford to have everything
> in memory and only have the disks to be able to sustain writes?

Yes, I can definitely affort to have everything in memory. Right now,
the main database is about 10GB in size including bloat (around 4GB
without). And there are some more db's of about 4GB in size. So in
total around 14GB at the moment and slowly rising.

I was planning to put in at least 16GB RAM or probably even 24GB to be
safe. The problem is that the data of the main db is more or less
constantly updated or deleted/reinserted throughout the day. It seems
to me that the resulting bloat and the constant new data is really
hurting the cache hit rate (which is now around 90% in the main appl.).
It's those remaining queries that read from the disk that I really
would like to speed up as best as possible.

>> Furthermore, the LSI MegaRAID 9261 offers CacheCade which uses SSD
>> disks a as secondary tier of cache for the SAS disks. Would this
>> feature make sense for a PostgreSQL server, performance wise?
> I have one CacheCade setup...  not a huge benefit but it seems
> measurable. (but really hard to test).  .. compared to a full
> SSD-setup I wouldn't consider it at all.

Thanks for that input. What I've read from you and others, the SSD
cache doesn't seem a viable option for me.


Re: Performance advice for a new low(er)-power server

От
Haestan
Дата:
On Jun 16, 2011, at 20:43, Greg Smith wrote:
> The layout you proposed (OS+WAL , data) might be effective, but if your write volume is low it may not be much of an
improvementat all over a simple RAID1 of two drives.  The odds that you are going to correctly lay out individual
sectionsof a disk array with only two pairs to spread the data across aren't good.  If this is all you have to work
with,a 4-disk RAID10 would at least guarantee you're taking advantage of all four drives.  With that controller, it
shouldbe almost twice as fast in all cases as hooking up only two drives. 

The data is more or less constantly rewritten (it contains hourly updated travel related data). Therefore, I really
tendto buy 4 disks from the start on. 

> There is another possibility I would suggest you consider.  You could buy the server with a single pair of drives
now,then wait to see what performance is like before filling the other two slots.  It is far easier to figure out what
drivetechnology makes sense if you have measurements from an existing system to guide that decision.  And you may be
ableto get newer drives from your vendor that slide into the empty slots.  You may not ever even need more than a
singleRAID-1 pair.  I see lots of people waste money on drives that would be better spent on RAM. 

Actually, there are already two older servers in place right now. The data is about 14GB in size and slowly rising.
Consideringthe price for RAM I can easily afford to install more RAM than the db data is in size. I was aiming for
24GB.But even then, I cannot be sure that no queries will read from the disk. AFAIK, there is no way to force all the
datato stay in cache (shared_buffers for example).  

Thank you for your input so far.

Regards,

Tom.

Re: Performance advice for a new low(er)-power server

От
jesper@krogh.cc
Дата:
> On Jun 16, 2011, at 20:43, Greg Smith wrote:
>> The layout you proposed (OS+WAL , data) might be effective, but if your
>> write volume is low it may not be much of an improvement at all over a
>> simple RAID1 of two drives.  The odds that you are going to correctly
>> lay out individual sections of a disk array with only two pairs to
>> spread the data across aren't good.  If this is all you have to work
>> with, a 4-disk RAID10 would at least guarantee you're taking advantage
>> of all four drives.  With that controller, it should be almost twice as
>> fast in all cases as hooking up only two drives.
>
> The data is more or less constantly rewritten (it contains hourly updated
> travel related data). Therefore, I really tend to buy 4 disks from the
> start on.

Just make sure you have a Battery-backed-write cache or flash backed, so
you can set it safely to "write-back".

>> There is another possibility I would suggest you consider.  You could
>> buy the server with a single pair of drives now, then wait to see what
>> performance is like before filling the other two slots.  It is far
>> easier to figure out what drive technology makes sense if you have
>> measurements from an existing system to guide that decision.  And you
>> may be able to get newer drives from your vendor that slide into the
>> empty slots.  You may not ever even need more than a single RAID-1 pair.
>>  I see lots of people waste money on drives that would be better spent
>> on RAM.
>
> Actually, there are already two older servers in place right now. The data
> is about 14GB in size and slowly rising. Considering the price for RAM I
> can easily afford to install more RAM than the db data is in size. I was
> aiming for 24GB. But even then, I cannot be sure that no queries will read
> from the disk. AFAIK, there is no way to force all the data to stay in
> cache (shared_buffers for example).

tar cf - $PGDATA > /dev/null in cron every now and then /15-30 minutes)
would certainly do the trick . not pretty but we're solving problems not
talking about beauty.  If you can ensure sufficient memory then it should
cause any problems..  if you graph the timing of above command you should
even be able to see when you need to add more memory.


--
Jesper


Re: Performance advice for a new low(er)-power server

От
Merlin Moncure
Дата:
On Thu, Jun 16, 2011 at 5:44 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> it's complex -- but I think the whole issue becomes moot soon because
> non consumer flash drives from here on out are going to have
> capacitors in them (the 720 ramsdale will immediately knock out the
> x25-e). So the prudent course of action is to wait, or to just go with
> the current crop capacitor based drives and deal with the lifespan
> issues.

I may have spoke too soon on this -- the 720 is a pci-e device...spec
details are just now leaking out (see:
http://www.engadget.com/2011/06/16/intels-710-lyndonville-and-720-ramsdale-ssds-see-full-spec/).
 512mb onboard dram and "56k" write iops, and probably stupid
expensive.  The 710 is a standard sata device with 2.4k claimed write
iops, using "HET" MLC about which there is zero information available.

 merlin