Обсуждение: RAID for the DB filesystem

Поиск
Список
Период
Сортировка

RAID for the DB filesystem

От
Brian Modra
Дата:
Hi,
my database is hit with constant inserts to 6 main tables (200 inserts
per minute to one of the tables, less to the others), some updates,
but then the selects:
- large retrievals of randomly different sections of the database
(indexed maps by postgis). This data is static.
- medium sized retrievals of the same tables that are receiving the
inserts. By mediou sized, I mean typically 200 rows at once. These
retrievals are also randomly different to each other, and typically
retrieving the newly inserted data rather than the more historical.
The database size is about 300GB and growing.

What sort of hardware config would you advise?
I'm thinking of 2x300GB SATA RAID 0 for the OS and application files,
and 6x300GB SAS RAID 10 for the database... but some experts have said
RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert.
Your advice will be much appreciated.
Thanks
Brian
--
Brian Modra   Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/

Re: RAID for the DB filesystem

От
Scott Marlowe
Дата:
On Mon, Aug 3, 2009 at 10:15 AM, Brian Modra<brian@zwartberg.com> wrote:
> Hi,
> my database is hit with constant inserts to 6 main tables (200 inserts
> per minute to one of the tables, less to the others), some updates,
> but then the selects:
> - large retrievals of randomly different sections of the database
> (indexed maps by postgis). This data is static.
> - medium sized retrievals of the same tables that are receiving the
> inserts. By mediou sized, I mean typically 200 rows at once. These
> retrievals are also randomly different to each other, and typically
> retrieving the newly inserted data rather than the more historical.
> The database size is about 300GB and growing.
>
> What sort of hardware config would you advise?
> I'm thinking of 2x300GB SATA RAID 0 for the OS and application files,

Is there a valid reason you're NOT considering RAID-1 here?  I hope
RAID-0 is a typo.

> and 6x300GB SAS RAID 10 for the database... but some experts have said
> RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert.
> Your advice will be much appreciated.

Then I question the expertise of your experts.  RAID5 is not fine.
It's slow, more prone to loss due to drive loss, and generally not a
good choice for databases.

I would gladly have more SATA drives in a RAID-10 than fewer SAS
drives in a RAID-5.

if someone is worried about "wasting" disk space tell them to worry
about something else, like losing data.

Re: RAID for the DB filesystem

От
Rodrigo E. De León Plicet
Дата:
On Mon, Aug 3, 2009 at 11:15 AM, Brian Modra<brian@zwartberg.com> wrote:
> (...) and 6x300GB SAS RAID 10 for the database... but some experts have said
> RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert.

These guys:

http://www.baarf.com/

... kinda dislike RAID5 with a passion; dunno if it's related to stuff like:

http://weblogs.sqlteam.com/billg/archive/2007/06/18/RAID-10-vs.-RAID-5-Performance.aspx

Good luck.

Re: RAID for the DB filesystem

От
Scott Marlowe
Дата:
On Mon, Aug 3, 2009 at 4:02 PM, Rodrigo E. De León
Plicet<rdeleonp@gmail.com> wrote:
> On Mon, Aug 3, 2009 at 11:15 AM, Brian Modra<brian@zwartberg.com> wrote:
>> (...) and 6x300GB SAS RAID 10 for the database... but some experts have said
>> RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert.
>
> These guys:
>
> http://www.baarf.com/
>
> ... kinda dislike RAID5 with a passion; dunno if it's related to stuff like:

And understandably so.  RAID5 was invented back when hard drives were
measured in megabytes, and not necessarily hundreds of them either.
Nowadays, you've got two very different uses for arrays of drives.
One is to provide a LOT of storage for a reasonable price, and the
other is to provide maximum throughput by aggregating large numbers of
drives together.

Where I work we have both.  We have media servers that are running 8
2TB drives in a RAID-6 array to provide 12 TBs of redundant storage.
We also have primary db servers running 12 140G SAS drives in a
RAID-10 for fast access storage, providing only 850G or so of storage.
 This is for a db that uses just under 100G of drive space.  It's no
where near the maximum of that array, and long before we run out of
space we'll be adding more drives /controllers to keep up with the
performance needs of the db.  RAID-5/6 would make no sense there
whatsoever.

If you HAVE to go with a striping type solution, RAID-6 is generally
better than RAID-5.  It's like RAID-5 with the spare drive already
built in and generated, and it behaves much better with a failed drive
than RAID-5, and is much less likely to suffer from data loss,
requiring three drives to fail in order to lose coherency.

RAID-5 is the worst of all compromises.

Re: RAID for the DB filesystem

От
Brian Modra
Дата:
2009/8/3 Scott Marlowe <scott.marlowe@gmail.com>
>
> On Mon, Aug 3, 2009 at 10:15 AM, Brian Modra<brian@zwartberg.com> wrote:
> > Hi,
> > my database is hit with constant inserts to 6 main tables (200 inserts
> > per minute to one of the tables, less to the others), some updates,
> > but then the selects:
> > - large retrievals of randomly different sections of the database
> > (indexed maps by postgis). This data is static.
> > - medium sized retrievals of the same tables that are receiving the
> > inserts. By mediou sized, I mean typically 200 rows at once. These
> > retrievals are also randomly different to each other, and typically
> > retrieving the newly inserted data rather than the more historical.
> > The database size is about 300GB and growing.
> >
> > What sort of hardware config would you advise?
> > I'm thinking of 2x300GB SATA RAID 0 for the OS and application files,
>
> Is there a valid reason you're NOT considering RAID-1 here?  I hope
> RAID-0 is a typo.

It was an error. I wanted mirroring. But... on second thoughts, is
there really a good reason for using a second set of disks for the OS?
Once the database is running, its surely not going to be using the OS
disk much, so why not just make a big RAID 10 array and use that for
both OS and DB... partition it as usual I mean - boot, root. Should I
use another disk for swap... for that matter, do I need swap at all...
RAM with be at least 16GB?

>
> > and 6x300GB SAS RAID 10 for the database... but some experts have said
> > RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert.
> > Your advice will be much appreciated.
>
> Then I question the expertise of your experts.  RAID5 is not fine.
> It's slow, more prone to loss due to drive loss, and generally not a
> good choice for databases.
>
> I would gladly have more SATA drives in a RAID-10 than fewer SAS
> drives in a RAID-5.
>
> if someone is worried about "wasting" disk space tell them to worry
> about something else, like losing data.



--
Brian Modra   Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/

Re: RAID for the DB filesystem

От
Greg Spiegelberg
Дата:
On Tue, Aug 4, 2009 at 2:17 AM, Brian Modra <epailty@googlemail.com> wrote:
2009/8/3 Scott Marlowe <scott.marlowe@gmail.com>
>
> On Mon, Aug 3, 2009 at 10:15 AM, Brian Modra<brian@zwartberg.com> wrote:
>
> Is there a valid reason you're NOT considering RAID-1 here?  I hope
> RAID-0 is a typo.

It was an error. I wanted mirroring. But... on second thoughts, is
there really a good reason for using a second set of disks for the OS?
Once the database is running, its surely not going to be using the OS
disk much, so why not just make a big RAID 10 array and use that for
both OS and DB... partition it as usual I mean - boot, root. Should I
use another disk for swap... for that matter, do I need swap at all...
RAM with be at least 16GB?

Initially, I would agree with you that placing the OS and database on the same RAID config sounds logical but once you go through some "disasters" you'll realize it's a putting all your eggs in the same basket kind of thing.

With the OS, and presumably backup software, on it's own RAID config you can recover the database, assuming you lost it due to hardware failure, without having to recover the OS.  This is a nice thing especially if you're remote from your servers, like I am, and do not have the luxury of being able to pop a CD in the server's drive to load the OS again.  That's just one case and not a database-admin one but I'm sure there are others.


>
> Then I question the expertise of your experts.  RAID5 is not fine.
> It's slow, more prone to loss due to drive loss, and generally not a
> good choice for databases.
>
> I would gladly have more SATA drives in a RAID-10 than fewer SAS
> drives in a RAID-5.
>
> if someone is worried about "wasting" disk space tell them to worry
> about something else, like losing data.


On the performance argument, I wholeheartedly agree that RAID-5 is not where it's at.  Sequential I/O is on-par with other RAID types but when it comes to random I/O it's one of, if not, the worst of the bunch.

From a recoverability angle, losing a disk in a RAID-5 isn't the end of the world but your world will spin much, much slower than it did while it's recalculating all those parity blocks and while doing so you're at disk of data loss if a second drive goes.

There are units out there that allow for mirrored RAID-5, RAID-5+1, to protect from multiple disk failures however at that point RAID-10 is the route to go.

There are units that 'format' the RAID group only where the disk has been allocated.  In other words, if you have a 4 disk RAID-6 and 25% of it has been allocated to LUN(s) then the controller will have the parity calculated for only that 25% in use.  Makes recovery quicker in an underallocated situations but there is still a window with a RAID-5 recovery where a second disk failure kills the whole operation.  RAID-6 however is better in this case b/c it takes a third disk failure before data loss but you had better have a second spare waiting in the wings.

I don't believe RAID-10's are perfect either.  If your RAID-10 is really 2 RAID-0's mirrored, i.e. RAID-0+1, and you have 2 disks failure, one in each RAID-0, then that's a go-to-tape situation.  If your RAID-10 is really multiple mirrors striped, i.e. a true RAID-10 or RAID-1+0, you're just as susceptible to data loss except you must lose both sides of a single mirror.  Not as likely but still possible.

Recovery in either RAID-10 setups is simpler than the parity RAID's in that a disk must be copied only and no parity calculated.  This is still a window where a second disk failure could result in data loss.

I believe that regardless of your selection you must, must, must look at things as a 3 year solution, 5 years at the most.  As those disks spin and age the likelihood of multiple failures increase.  You may not have to replace a single disk in the first 3 years but should you lose power and those drives spin down and cool the odds of one or more not spinning back up are pretty good.  Trust me, I've experienced that many times.  Plan to replace aging units.

In the end, people will do what people will do and most likely the largest factor won't be performance, protection or recoverability but instead it will be money.  If you're lucky, money isn't an issue.

Greg

Re: RAID for the DB filesystem

От
Scott Marlowe
Дата:
On Tue, Aug 4, 2009 at 7:11 AM, Greg Spiegelberg<gspiegelberg@gmail.com> wrote:
> On Tue, Aug 4, 2009 at 2:17 AM, Brian Modra <epailty@googlemail.com> wrote:
>>
>> 2009/8/3 Scott Marlowe <scott.marlowe@gmail.com>
>> >
>> > On Mon, Aug 3, 2009 at 10:15 AM, Brian Modra<brian@zwartberg.com> wrote:
>> >
>> > Is there a valid reason you're NOT considering RAID-1 here?  I hope
>> > RAID-0 is a typo.
>>
>> It was an error. I wanted mirroring. But... on second thoughts, is
>> there really a good reason for using a second set of disks for the OS?
>> Once the database is running, its surely not going to be using the OS
>> disk much, so why not just make a big RAID 10 array and use that for
>> both OS and DB... partition it as usual I mean - boot, root. Should I
>> use another disk for swap... for that matter, do I need swap at all...
>> RAM with be at least 16GB?
>
> Initially, I would agree with you that placing the OS and database on the
> same RAID config sounds logical but once you go through some "disasters"
> you'll realize it's a putting all your eggs in the same basket kind of
> thing.
>
> With the OS, and presumably backup software, on it's own RAID config you can
> recover the database, assuming you lost it due to hardware failure, without
> having to recover the OS.  This is a nice thing especially if you're remote
> from your servers, like I am, and do not have the luxury of being able to
> pop a CD in the server's drive to load the OS again.  That's just one case
> and not a database-admin one but I'm sure there are others.

There are other reasons as well.  If your OS goes crazy and starts
filling up the /var/log partition quickly, then it won't make your db
run out of space.  If your DB fills up it's partition, then the
/var/log of the os can keep on writing out logs to tell you what
happened.  Also, I usually put the OS AND the pg_xlog directory on a
mirror set to get the random access of the data directory out of the
way of the sequential writing of the pg_xlog.

>> > if someone is worried about "wasting" disk space tell them to worry
>> > about something else, like losing data.
>>
>
> On the performance argument, I wholeheartedly agree that RAID-5 is not where
> it's at.  Sequential I/O is on-par with other RAID types but when it comes
> to random I/O it's one of, if not, the worst of the bunch.

RAID-5's (and RAID-6's) big failure isn't sequential versus random,
but read versus write.  RAID-5 can do well reading both seq and
randomly, but writes are expensive due to the read read / write write
nature of RAID5 / 6.

For a reporting database (i.e. mostly read) the striped sets can do ok.

> From a recoverability angle, losing a disk in a RAID-5 isn't the end of the
> world but your world will spin much, much slower than it did while it's
> recalculating all those parity blocks and while doing so you're at disk of
> data loss if a second drive goes.

If RAID-5 was barely keeping up to begin with, the system is now going
to slow to a crawl while running in degraded mode AND recovering.
Plus with large drives and all the parity calculation, recovery can
take upwards of a day.  That's a long time to be flying without a net,
so to speak.

> There are units out there that allow for mirrored RAID-5, RAID-5+1, to
> protect from multiple disk failures however at that point RAID-10 is the
> route to go.

Well, RAID-6 is basically RAID5 with the spare already in the loop, so to speak.

> I don't believe RAID-10's are perfect either.

Pretty sure no one claimed they were. :)

> If your RAID-10 is really 2
> RAID-0's mirrored, i.e. RAID-0+1, and you have 2 disks failure, one in each
> RAID-0, then that's a go-to-tape situation.

Almost no one runs RAID01 really.  It's no faster than RAID10 and more
prone to failure.  As you add disks, you get a mirrored set of two
very large RAID0 arrays, where two drive failures are quite likely to
kill the whole thing.

> If your RAID-10 is really
> multiple mirrors striped, i.e. a true RAID-10 or RAID-1+0, you're just as
> susceptible to data loss except you must lose both sides of a single
> mirror.  Not as likely but still possible.

Now, imagine you've got 20 disks in a RAID-5 and 20 disks in a RAID10.
 two disks fail in each.  What's the odds the RAID-5 survived? 0%
What's the odds both disks in the RAID-10 both were in the same mirror
set? (1/10).

> Recovery in either RAID-10 setups is simpler than the parity RAID's in that
> a disk must be copied only and no parity calculated.  This is still a window
> where a second disk failure could result in data loss.

But a much smaller window.  And the chance of that being the right
disk to scram the RAID-10 are much lower.

> I believe that regardless of your selection you must, must, must look at
> things as a 3 year solution, 5 years at the most.  As those disks spin and
> age the likelihood of multiple failures increase.  You may not have to
> replace a single disk in the first 3 years but should you lose power and
> those drives spin down and cool the odds of one or more not spinning back up
> are pretty good.  Trust me, I've experienced that many times.  Plan to
> replace aging units.

Been there, done that, got the T-shirt and the decoder ring.  :)

> In the end, people will do what people will do and most likely the largest
> factor won't be performance, protection or recoverability but instead it
> will be money.  If you're lucky, money isn't an issue.

Data costs money.  Don't forget to value your data and time.  If
you're not doing that, then your calculations will be worthless, much
like the data you once had on the now dead array(s).

Re: RAID for the DB filesystem

От
Brian Modra
Дата:
2009/8/3 Scott Marlowe <scott.marlowe@gmail.com>
>
> On Mon, Aug 3, 2009 at 10:15 AM, Brian Modra<brian@zwartberg.com> wrote:
> > Hi,
> > my database is hit with constant inserts to 6 main tables (200 inserts
> > per minute to one of the tables, less to the others), some updates,
> > but then the selects:
> > - large retrievals of randomly different sections of the database
> > (indexed maps by postgis). This data is static.
> > - medium sized retrievals of the same tables that are receiving the
> > inserts. By mediou sized, I mean typically 200 rows at once. These
> > retrievals are also randomly different to each other, and typically
> > retrieving the newly inserted data rather than the more historical.
> > The database size is about 300GB and growing.
> >
> > What sort of hardware config would you advise?
> > I'm thinking of 2x300GB SATA RAID 0 for the OS and application files,
>
> Is there a valid reason you're NOT considering RAID-1 here?  I hope
> RAID-0 is a typo.

It was an error. I wanted mirroring. But... on second thoughts, is
there really a good reason for using a second set of disks for the OS?
Once the database is running, its surely not going to be using the OS
disk much, so why not just make a big RAID 10 array and use that for
both OS and DB... partition it as usual I mean - boot, root. Should I
use another disk for swap... for that matter, do I need swap at all...
RAM with be at least 16GB?

>
> > and 6x300GB SAS RAID 10 for the database... but some experts have said
> > RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert.
> > Your advice will be much appreciated.
>
> Then I question the expertise of your experts.  RAID5 is not fine.
> It's slow, more prone to loss due to drive loss, and generally not a
> good choice for databases.
>
> I would gladly have more SATA drives in a RAID-10 than fewer SAS
> drives in a RAID-5.
>
> if someone is worried about "wasting" disk space tell them to worry
> about something else, like losing data.



--
Brian Modra   Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/