Re: Asynchronous commit | Transaction loss at server crash

Поиск
Список
Период
Сортировка
От Jesper Krogh
Тема Re: Asynchronous commit | Transaction loss at server crash
Дата
Msg-id 4BF610F0.3070105@krogh.cc
обсуждение исходный текст
Ответ на Re: Asynchronous commit | Transaction loss at server crash  (Greg Smith <greg@2ndquadrant.com>)
Список pgsql-admin
On 2010-05-21 00:04, Greg Smith wrote:
> Jesper Krogh wrote:
>> A Battery Backed raid controller is not that expensive. (in the
>> range of 1 or 2 SSD disks). And it is (more or less) a silverbullet
>> to the task you describe.
>
> Maybe even less; in order to get a SSD that's reliable at all in
> terms of good crash recovery, you have buy a fairly expensive one.
> Also, and this is really important, you really don't want to deploy
> onto a single SSD and put critical system files there.  Their failure
> rates are not that low.  You need to put them into a RAID-1 setup and
> budget for two of them, which brings you right back to

I'm currently buillding a HP D2700 box with 25 x X25-M SSD, I have added
a LSI 8888ELP raid-controller with 256MB BBWC and 2 seperate ups'es
for the 2 independent PSU's on the D2700 (in the pricing numbers that
wasn't a huge part of it).

It has to do with the application. It consists of around 1TB of data, that
is accessed fairly rarely and on more or less random basis. A webapplication
is connected that tries to deliver say 200 random rows from a main table
and for each of them traversing to connected tables for information, so
an individual page can easily add up to 1000+ random reads (just
for confirming row information).

What we have done so far is to add a quite big amount of code that tries to
collapes the datastructure and cache each row for the view, so the 1000+
gets down in the order of 200, but it raises the complexity of the applications
which isn't a good thing either.

I still havent got the application onto it, and 12 months of production usage on
top, but so far I'm really looking forward to seeing it, because in this
applications it seems like a very good fit.

And about the disk-wear, as long as they dont blow up all at the same time
then I dont mind having to change a disk every now and then, so it'll
be really interesting to see if the 20GB/disk/day (the X25-M is speced for)
 is going to be something that really matters in my hands.

I plan on putting the xlog and wal-archive on a fibre-channel slice, so they
essentially dont count into above numbers.

I dont know if bonnie is accurate in that range but the last run delivered
over 500K random 4KB read /s, and it saturated the 2 x 3gpbs SAS links
out of the controller in seq-read/seq-writes.

On like up to 10 runs..

> Also, it's questionable whether a SSD is even going to be faster than
> standard disks for the sequential WAL writes anyway, once a
> non-volatile write cache is available.  Sequential writes to SSD are
> the area where the gap in performance between them and spinning disks
> is the smallest.


They are not in a totally other ballpark than spinning disks, but they
requires much less "intellegent logic" in the OS/filesystem for read-ahead
and block io and elevator ..

>> Plugging your system (SSD's) with an UPS and trusting it fully
>> could solve most of the problems (running in writeback mode).
>
> UPS batteries fail, and people accidentally knock out over server
> power cords.  It's a pretty bad server that can't survive someone
> tripping over the cord while it's busy, and that's the situation the
> "use a UPS" idea doesn't improve.

Mounted in a rack with "a lot" of cable binders. Keeping in mind that
it should only have the power for a few ms before the voliatile cache is
flushed.

But I totally agree with you, it is a matter of what applications you're
building on top.

... and we do backup to tape every night, so the "worst case" is not that
the system blows up. It is more:
* The system ends up not performining any better due to "something unknown".
or
* The systems end up taking way to much work on the system administrative
side in changning worn disks and rebuilding arrays and such.

This is not a type if system where a "single lost transaction" is of any matters,
more in the analytics/data-mining category, where last weeks backup is
more or less equally good as todays.

Jesper
--
Jesper

В списке pgsql-admin по дате отправления:

Предыдущее
От: Priya G
Дата:
Сообщение:
Следующее
От: Mikko Partio
Дата:
Сообщение: Re: could not truncate directory "pg_subtrans": apparent wraparound