Обсуждение: RAID for the DB filesystem
Hi, my database is hit with constant inserts to 6 main tables (200 inserts per minute to one of the tables, less to the others), some updates, but then the selects: - large retrievals of randomly different sections of the database (indexed maps by postgis). This data is static. - medium sized retrievals of the same tables that are receiving the inserts. By mediou sized, I mean typically 200 rows at once. These retrievals are also randomly different to each other, and typically retrieving the newly inserted data rather than the more historical. The database size is about 300GB and growing. What sort of hardware config would you advise? I'm thinking of 2x300GB SATA RAID 0 for the OS and application files, and 6x300GB SAS RAID 10 for the database... but some experts have said RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert. Your advice will be much appreciated. Thanks Brian -- Brian Modra Land line: +27 23 5411 462 Mobile: +27 79 69 77 082 5 Jan Louw Str, Prince Albert, 6930 Postal: P.O. Box 2, Prince Albert 6930 South Africa http://www.zwartberg.com/
On Mon, Aug 3, 2009 at 10:15 AM, Brian Modra<brian@zwartberg.com> wrote: > Hi, > my database is hit with constant inserts to 6 main tables (200 inserts > per minute to one of the tables, less to the others), some updates, > but then the selects: > - large retrievals of randomly different sections of the database > (indexed maps by postgis). This data is static. > - medium sized retrievals of the same tables that are receiving the > inserts. By mediou sized, I mean typically 200 rows at once. These > retrievals are also randomly different to each other, and typically > retrieving the newly inserted data rather than the more historical. > The database size is about 300GB and growing. > > What sort of hardware config would you advise? > I'm thinking of 2x300GB SATA RAID 0 for the OS and application files, Is there a valid reason you're NOT considering RAID-1 here? I hope RAID-0 is a typo. > and 6x300GB SAS RAID 10 for the database... but some experts have said > RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert. > Your advice will be much appreciated. Then I question the expertise of your experts. RAID5 is not fine. It's slow, more prone to loss due to drive loss, and generally not a good choice for databases. I would gladly have more SATA drives in a RAID-10 than fewer SAS drives in a RAID-5. if someone is worried about "wasting" disk space tell them to worry about something else, like losing data.
On Mon, Aug 3, 2009 at 11:15 AM, Brian Modra<brian@zwartberg.com> wrote: > (...) and 6x300GB SAS RAID 10 for the database... but some experts have said > RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert. These guys: http://www.baarf.com/ ... kinda dislike RAID5 with a passion; dunno if it's related to stuff like: http://weblogs.sqlteam.com/billg/archive/2007/06/18/RAID-10-vs.-RAID-5-Performance.aspx Good luck.
On Mon, Aug 3, 2009 at 4:02 PM, Rodrigo E. De León Plicet<rdeleonp@gmail.com> wrote: > On Mon, Aug 3, 2009 at 11:15 AM, Brian Modra<brian@zwartberg.com> wrote: >> (...) and 6x300GB SAS RAID 10 for the database... but some experts have said >> RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert. > > These guys: > > http://www.baarf.com/ > > ... kinda dislike RAID5 with a passion; dunno if it's related to stuff like: And understandably so. RAID5 was invented back when hard drives were measured in megabytes, and not necessarily hundreds of them either. Nowadays, you've got two very different uses for arrays of drives. One is to provide a LOT of storage for a reasonable price, and the other is to provide maximum throughput by aggregating large numbers of drives together. Where I work we have both. We have media servers that are running 8 2TB drives in a RAID-6 array to provide 12 TBs of redundant storage. We also have primary db servers running 12 140G SAS drives in a RAID-10 for fast access storage, providing only 850G or so of storage. This is for a db that uses just under 100G of drive space. It's no where near the maximum of that array, and long before we run out of space we'll be adding more drives /controllers to keep up with the performance needs of the db. RAID-5/6 would make no sense there whatsoever. If you HAVE to go with a striping type solution, RAID-6 is generally better than RAID-5. It's like RAID-5 with the spare drive already built in and generated, and it behaves much better with a failed drive than RAID-5, and is much less likely to suffer from data loss, requiring three drives to fail in order to lose coherency. RAID-5 is the worst of all compromises.
2009/8/3 Scott Marlowe <scott.marlowe@gmail.com> > > On Mon, Aug 3, 2009 at 10:15 AM, Brian Modra<brian@zwartberg.com> wrote: > > Hi, > > my database is hit with constant inserts to 6 main tables (200 inserts > > per minute to one of the tables, less to the others), some updates, > > but then the selects: > > - large retrievals of randomly different sections of the database > > (indexed maps by postgis). This data is static. > > - medium sized retrievals of the same tables that are receiving the > > inserts. By mediou sized, I mean typically 200 rows at once. These > > retrievals are also randomly different to each other, and typically > > retrieving the newly inserted data rather than the more historical. > > The database size is about 300GB and growing. > > > > What sort of hardware config would you advise? > > I'm thinking of 2x300GB SATA RAID 0 for the OS and application files, > > Is there a valid reason you're NOT considering RAID-1 here? I hope > RAID-0 is a typo. It was an error. I wanted mirroring. But... on second thoughts, is there really a good reason for using a second set of disks for the OS? Once the database is running, its surely not going to be using the OS disk much, so why not just make a big RAID 10 array and use that for both OS and DB... partition it as usual I mean - boot, root. Should I use another disk for swap... for that matter, do I need swap at all... RAM with be at least 16GB? > > > and 6x300GB SAS RAID 10 for the database... but some experts have said > > RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert. > > Your advice will be much appreciated. > > Then I question the expertise of your experts. RAID5 is not fine. > It's slow, more prone to loss due to drive loss, and generally not a > good choice for databases. > > I would gladly have more SATA drives in a RAID-10 than fewer SAS > drives in a RAID-5. > > if someone is worried about "wasting" disk space tell them to worry > about something else, like losing data. -- Brian Modra Land line: +27 23 5411 462 Mobile: +27 79 69 77 082 5 Jan Louw Str, Prince Albert, 6930 Postal: P.O. Box 2, Prince Albert 6930 South Africa http://www.zwartberg.com/
On Tue, Aug 4, 2009 at 2:17 AM, Brian Modra <epailty@googlemail.com> wrote:
Initially, I would agree with you that placing the OS and database on the same RAID config sounds logical but once you go through some "disasters" you'll realize it's a putting all your eggs in the same basket kind of thing.
With the OS, and presumably backup software, on it's own RAID config you can recover the database, assuming you lost it due to hardware failure, without having to recover the OS. This is a nice thing especially if you're remote from your servers, like I am, and do not have the luxury of being able to pop a CD in the server's drive to load the OS again. That's just one case and not a database-admin one but I'm sure there are others.
2009/8/3 Scott Marlowe <scott.marlowe@gmail.com>>It was an error. I wanted mirroring. But... on second thoughts, is
> On Mon, Aug 3, 2009 at 10:15 AM, Brian Modra<brian@zwartberg.com> wrote:
>
> Is there a valid reason you're NOT considering RAID-1 here? I hope
> RAID-0 is a typo.
there really a good reason for using a second set of disks for the OS?
Once the database is running, its surely not going to be using the OS
disk much, so why not just make a big RAID 10 array and use that for
both OS and DB... partition it as usual I mean - boot, root. Should I
use another disk for swap... for that matter, do I need swap at all...
RAM with be at least 16GB?
Initially, I would agree with you that placing the OS and database on the same RAID config sounds logical but once you go through some "disasters" you'll realize it's a putting all your eggs in the same basket kind of thing.
With the OS, and presumably backup software, on it's own RAID config you can recover the database, assuming you lost it due to hardware failure, without having to recover the OS. This is a nice thing especially if you're remote from your servers, like I am, and do not have the luxury of being able to pop a CD in the server's drive to load the OS again. That's just one case and not a database-admin one but I'm sure there are others.
>
> Then I question the expertise of your experts. RAID5 is not fine.
> It's slow, more prone to loss due to drive loss, and generally not a
> good choice for databases.
>
> I would gladly have more SATA drives in a RAID-10 than fewer SAS
> drives in a RAID-5.
>
> if someone is worried about "wasting" disk space tell them to worry
> about something else, like losing data.
On the performance argument, I wholeheartedly agree that RAID-5 is not where it's at. Sequential I/O is on-par with other RAID types but when it comes to random I/O it's one of, if not, the worst of the bunch.
From a recoverability angle, losing a disk in a RAID-5 isn't the end of the world but your world will spin much, much slower than it did while it's recalculating all those parity blocks and while doing so you're at disk of data loss if a second drive goes.
There are units out there that allow for mirrored RAID-5, RAID-5+1, to protect from multiple disk failures however at that point RAID-10 is the route to go.
There are units that 'format' the RAID group only where the disk has been allocated. In other words, if you have a 4 disk RAID-6 and 25% of it has been allocated to LUN(s) then the controller will have the parity calculated for only that 25% in use. Makes recovery quicker in an underallocated situations but there is still a window with a RAID-5 recovery where a second disk failure kills the whole operation. RAID-6 however is better in this case b/c it takes a third disk failure before data loss but you had better have a second spare waiting in the wings.
I don't believe RAID-10's are perfect either. If your RAID-10 is really 2 RAID-0's mirrored, i.e. RAID-0+1, and you have 2 disks failure, one in each RAID-0, then that's a go-to-tape situation. If your RAID-10 is really multiple mirrors striped, i.e. a true RAID-10 or RAID-1+0, you're just as susceptible to data loss except you must lose both sides of a single mirror. Not as likely but still possible.
Recovery in either RAID-10 setups is simpler than the parity RAID's in that a disk must be copied only and no parity calculated. This is still a window where a second disk failure could result in data loss.
I believe that regardless of your selection you must, must, must look at things as a 3 year solution, 5 years at the most. As those disks spin and age the likelihood of multiple failures increase. You may not have to replace a single disk in the first 3 years but should you lose power and those drives spin down and cool the odds of one or more not spinning back up are pretty good. Trust me, I've experienced that many times. Plan to replace aging units.
In the end, people will do what people will do and most likely the largest factor won't be performance, protection or recoverability but instead it will be money. If you're lucky, money isn't an issue.
Greg
On Tue, Aug 4, 2009 at 7:11 AM, Greg Spiegelberg<gspiegelberg@gmail.com> wrote: > On Tue, Aug 4, 2009 at 2:17 AM, Brian Modra <epailty@googlemail.com> wrote: >> >> 2009/8/3 Scott Marlowe <scott.marlowe@gmail.com> >> > >> > On Mon, Aug 3, 2009 at 10:15 AM, Brian Modra<brian@zwartberg.com> wrote: >> > >> > Is there a valid reason you're NOT considering RAID-1 here? I hope >> > RAID-0 is a typo. >> >> It was an error. I wanted mirroring. But... on second thoughts, is >> there really a good reason for using a second set of disks for the OS? >> Once the database is running, its surely not going to be using the OS >> disk much, so why not just make a big RAID 10 array and use that for >> both OS and DB... partition it as usual I mean - boot, root. Should I >> use another disk for swap... for that matter, do I need swap at all... >> RAM with be at least 16GB? > > Initially, I would agree with you that placing the OS and database on the > same RAID config sounds logical but once you go through some "disasters" > you'll realize it's a putting all your eggs in the same basket kind of > thing. > > With the OS, and presumably backup software, on it's own RAID config you can > recover the database, assuming you lost it due to hardware failure, without > having to recover the OS. This is a nice thing especially if you're remote > from your servers, like I am, and do not have the luxury of being able to > pop a CD in the server's drive to load the OS again. That's just one case > and not a database-admin one but I'm sure there are others. There are other reasons as well. If your OS goes crazy and starts filling up the /var/log partition quickly, then it won't make your db run out of space. If your DB fills up it's partition, then the /var/log of the os can keep on writing out logs to tell you what happened. Also, I usually put the OS AND the pg_xlog directory on a mirror set to get the random access of the data directory out of the way of the sequential writing of the pg_xlog. >> > if someone is worried about "wasting" disk space tell them to worry >> > about something else, like losing data. >> > > On the performance argument, I wholeheartedly agree that RAID-5 is not where > it's at. Sequential I/O is on-par with other RAID types but when it comes > to random I/O it's one of, if not, the worst of the bunch. RAID-5's (and RAID-6's) big failure isn't sequential versus random, but read versus write. RAID-5 can do well reading both seq and randomly, but writes are expensive due to the read read / write write nature of RAID5 / 6. For a reporting database (i.e. mostly read) the striped sets can do ok. > From a recoverability angle, losing a disk in a RAID-5 isn't the end of the > world but your world will spin much, much slower than it did while it's > recalculating all those parity blocks and while doing so you're at disk of > data loss if a second drive goes. If RAID-5 was barely keeping up to begin with, the system is now going to slow to a crawl while running in degraded mode AND recovering. Plus with large drives and all the parity calculation, recovery can take upwards of a day. That's a long time to be flying without a net, so to speak. > There are units out there that allow for mirrored RAID-5, RAID-5+1, to > protect from multiple disk failures however at that point RAID-10 is the > route to go. Well, RAID-6 is basically RAID5 with the spare already in the loop, so to speak. > I don't believe RAID-10's are perfect either. Pretty sure no one claimed they were. :) > If your RAID-10 is really 2 > RAID-0's mirrored, i.e. RAID-0+1, and you have 2 disks failure, one in each > RAID-0, then that's a go-to-tape situation. Almost no one runs RAID01 really. It's no faster than RAID10 and more prone to failure. As you add disks, you get a mirrored set of two very large RAID0 arrays, where two drive failures are quite likely to kill the whole thing. > If your RAID-10 is really > multiple mirrors striped, i.e. a true RAID-10 or RAID-1+0, you're just as > susceptible to data loss except you must lose both sides of a single > mirror. Not as likely but still possible. Now, imagine you've got 20 disks in a RAID-5 and 20 disks in a RAID10. two disks fail in each. What's the odds the RAID-5 survived? 0% What's the odds both disks in the RAID-10 both were in the same mirror set? (1/10). > Recovery in either RAID-10 setups is simpler than the parity RAID's in that > a disk must be copied only and no parity calculated. This is still a window > where a second disk failure could result in data loss. But a much smaller window. And the chance of that being the right disk to scram the RAID-10 are much lower. > I believe that regardless of your selection you must, must, must look at > things as a 3 year solution, 5 years at the most. As those disks spin and > age the likelihood of multiple failures increase. You may not have to > replace a single disk in the first 3 years but should you lose power and > those drives spin down and cool the odds of one or more not spinning back up > are pretty good. Trust me, I've experienced that many times. Plan to > replace aging units. Been there, done that, got the T-shirt and the decoder ring. :) > In the end, people will do what people will do and most likely the largest > factor won't be performance, protection or recoverability but instead it > will be money. If you're lucky, money isn't an issue. Data costs money. Don't forget to value your data and time. If you're not doing that, then your calculations will be worthless, much like the data you once had on the now dead array(s).
2009/8/3 Scott Marlowe <scott.marlowe@gmail.com> > > On Mon, Aug 3, 2009 at 10:15 AM, Brian Modra<brian@zwartberg.com> wrote: > > Hi, > > my database is hit with constant inserts to 6 main tables (200 inserts > > per minute to one of the tables, less to the others), some updates, > > but then the selects: > > - large retrievals of randomly different sections of the database > > (indexed maps by postgis). This data is static. > > - medium sized retrievals of the same tables that are receiving the > > inserts. By mediou sized, I mean typically 200 rows at once. These > > retrievals are also randomly different to each other, and typically > > retrieving the newly inserted data rather than the more historical. > > The database size is about 300GB and growing. > > > > What sort of hardware config would you advise? > > I'm thinking of 2x300GB SATA RAID 0 for the OS and application files, > > Is there a valid reason you're NOT considering RAID-1 here? I hope > RAID-0 is a typo. It was an error. I wanted mirroring. But... on second thoughts, is there really a good reason for using a second set of disks for the OS? Once the database is running, its surely not going to be using the OS disk much, so why not just make a big RAID 10 array and use that for both OS and DB... partition it as usual I mean - boot, root. Should I use another disk for swap... for that matter, do I need swap at all... RAM with be at least 16GB? > > > and 6x300GB SAS RAID 10 for the database... but some experts have said > > RAID 5 is fine. I'm inlined to think RAID 10, but I'm not an expert. > > Your advice will be much appreciated. > > Then I question the expertise of your experts. RAID5 is not fine. > It's slow, more prone to loss due to drive loss, and generally not a > good choice for databases. > > I would gladly have more SATA drives in a RAID-10 than fewer SAS > drives in a RAID-5. > > if someone is worried about "wasting" disk space tell them to worry > about something else, like losing data. -- Brian Modra Land line: +27 23 5411 462 Mobile: +27 79 69 77 082 5 Jan Louw Str, Prince Albert, 6930 Postal: P.O. Box 2, Prince Albert 6930 South Africa http://www.zwartberg.com/