Re: RAID for the DB filesystem

Поиск
Список
Период
Сортировка
От Scott Marlowe
Тема Re: RAID for the DB filesystem
Дата
Msg-id dcc563d10908041213l54a7b881sf379187cf40c2ca6@mail.gmail.com
обсуждение исходный текст
Ответ на Re: RAID for the DB filesystem  (Greg Spiegelberg <gspiegelberg@gmail.com>)
Список pgsql-admin
On Tue, Aug 4, 2009 at 7:11 AM, Greg Spiegelberg<gspiegelberg@gmail.com> wrote:
> On Tue, Aug 4, 2009 at 2:17 AM, Brian Modra <epailty@googlemail.com> wrote:
>>
>> 2009/8/3 Scott Marlowe <scott.marlowe@gmail.com>
>> >
>> > On Mon, Aug 3, 2009 at 10:15 AM, Brian Modra<brian@zwartberg.com> wrote:
>> >
>> > Is there a valid reason you're NOT considering RAID-1 here?  I hope
>> > RAID-0 is a typo.
>>
>> It was an error. I wanted mirroring. But... on second thoughts, is
>> there really a good reason for using a second set of disks for the OS?
>> Once the database is running, its surely not going to be using the OS
>> disk much, so why not just make a big RAID 10 array and use that for
>> both OS and DB... partition it as usual I mean - boot, root. Should I
>> use another disk for swap... for that matter, do I need swap at all...
>> RAM with be at least 16GB?
>
> Initially, I would agree with you that placing the OS and database on the
> same RAID config sounds logical but once you go through some "disasters"
> you'll realize it's a putting all your eggs in the same basket kind of
> thing.
>
> With the OS, and presumably backup software, on it's own RAID config you can
> recover the database, assuming you lost it due to hardware failure, without
> having to recover the OS.  This is a nice thing especially if you're remote
> from your servers, like I am, and do not have the luxury of being able to
> pop a CD in the server's drive to load the OS again.  That's just one case
> and not a database-admin one but I'm sure there are others.

There are other reasons as well.  If your OS goes crazy and starts
filling up the /var/log partition quickly, then it won't make your db
run out of space.  If your DB fills up it's partition, then the
/var/log of the os can keep on writing out logs to tell you what
happened.  Also, I usually put the OS AND the pg_xlog directory on a
mirror set to get the random access of the data directory out of the
way of the sequential writing of the pg_xlog.

>> > if someone is worried about "wasting" disk space tell them to worry
>> > about something else, like losing data.
>>
>
> On the performance argument, I wholeheartedly agree that RAID-5 is not where
> it's at.  Sequential I/O is on-par with other RAID types but when it comes
> to random I/O it's one of, if not, the worst of the bunch.

RAID-5's (and RAID-6's) big failure isn't sequential versus random,
but read versus write.  RAID-5 can do well reading both seq and
randomly, but writes are expensive due to the read read / write write
nature of RAID5 / 6.

For a reporting database (i.e. mostly read) the striped sets can do ok.

> From a recoverability angle, losing a disk in a RAID-5 isn't the end of the
> world but your world will spin much, much slower than it did while it's
> recalculating all those parity blocks and while doing so you're at disk of
> data loss if a second drive goes.

If RAID-5 was barely keeping up to begin with, the system is now going
to slow to a crawl while running in degraded mode AND recovering.
Plus with large drives and all the parity calculation, recovery can
take upwards of a day.  That's a long time to be flying without a net,
so to speak.

> There are units out there that allow for mirrored RAID-5, RAID-5+1, to
> protect from multiple disk failures however at that point RAID-10 is the
> route to go.

Well, RAID-6 is basically RAID5 with the spare already in the loop, so to speak.

> I don't believe RAID-10's are perfect either.

Pretty sure no one claimed they were. :)

> If your RAID-10 is really 2
> RAID-0's mirrored, i.e. RAID-0+1, and you have 2 disks failure, one in each
> RAID-0, then that's a go-to-tape situation.

Almost no one runs RAID01 really.  It's no faster than RAID10 and more
prone to failure.  As you add disks, you get a mirrored set of two
very large RAID0 arrays, where two drive failures are quite likely to
kill the whole thing.

> If your RAID-10 is really
> multiple mirrors striped, i.e. a true RAID-10 or RAID-1+0, you're just as
> susceptible to data loss except you must lose both sides of a single
> mirror.  Not as likely but still possible.

Now, imagine you've got 20 disks in a RAID-5 and 20 disks in a RAID10.
 two disks fail in each.  What's the odds the RAID-5 survived? 0%
What's the odds both disks in the RAID-10 both were in the same mirror
set? (1/10).

> Recovery in either RAID-10 setups is simpler than the parity RAID's in that
> a disk must be copied only and no parity calculated.  This is still a window
> where a second disk failure could result in data loss.

But a much smaller window.  And the chance of that being the right
disk to scram the RAID-10 are much lower.

> I believe that regardless of your selection you must, must, must look at
> things as a 3 year solution, 5 years at the most.  As those disks spin and
> age the likelihood of multiple failures increase.  You may not have to
> replace a single disk in the first 3 years but should you lose power and
> those drives spin down and cool the odds of one or more not spinning back up
> are pretty good.  Trust me, I've experienced that many times.  Plan to
> replace aging units.

Been there, done that, got the T-shirt and the decoder ring.  :)

> In the end, people will do what people will do and most likely the largest
> factor won't be performance, protection or recoverability but instead it
> will be money.  If you're lucky, money isn't an issue.

Data costs money.  Don't forget to value your data and time.  If
you're not doing that, then your calculations will be worthless, much
like the data you once had on the now dead array(s).

В списке pgsql-admin по дате отправления:

Предыдущее
От: Aras Angelo
Дата:
Сообщение: upgrading postgresql broke some queries
Следующее
От: Devrim GÜNDÜZ
Дата:
Сообщение: Re: upgrading postgresql broke some queries