Re: Optimal settings for RAID controller - optimized for writes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Optimal settings for RAID controller - optimized for writes
Дата
Msg-id 530548A5.7020001@fuzzy.cz
обсуждение исходный текст
Ответ на Re: Optimal settings for RAID controller - optimized for writes  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Ответы Re: Optimal settings for RAID controller - optimized for writes  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Список pgsql-performance
Hi,

On 19.2.2014 03:45, KONDO Mitsumasa wrote:
> (2014/02/19 5:41), Tomas Vondra wrote:
>> On 18.2.2014 02:23, KONDO Mitsumasa wrote:
>>> Hi,
>>>
>>> I don't have PERC H710 raid controller, but I think he would like to
>>> know raid striping/chunk size or read/write cache ratio in
>>> writeback-cache setting is the best. I'd like to know it, too:)
>>
>> The stripe size is actually a very good question. On spinning drives it
>> usually does not matter too much - unless you have a very specialized
>> workload, the 'medium size' is the right choice (AFAIK we're using 64kB
>> on H710, which is the default).
>
> I am interested that raid stripe size of PERC H710 is 64kB. In HP
> raid card, default chunk size is 256kB. If we use two disks with raid
> 0, stripe size will be 512kB. I think that it might too big, but it
> might be optimized in raid card... In actually, it isn't bad in that
> settings.

With HP controllers this depends on RAID level (and maybe even
controller). Which HP controller are you talking about? I have some
basic experience with P400/P800, and those have 16kB (RAID6), 64kB
(RAID5) or 128kB (RAID10) defaults. None of them has 256kB.

See http://bit.ly/1bN3gIs (P800) and http://bit.ly/MdsEKN (P400).


> I'm interested in raid card internal behavior. Fortunately, linux raid
> card driver is open souce, so we might good at looking the source code
> when we have time.

What do you mean by "linux raid card driver"? Afaik the admin tools may
be available, but the interesting stuff happens inside the controller,
and that's still proprietary.

>> With SSDs this might actually matter much more, as the SSDs work with
>> "erase blocks" (mostly 512kB), and I suspect using small stripe might
>> result in repeated writes to the same block - overwriting one block
>> repeatedly and thus increased wearout. But maybe the controller will
>> handle that just fine, e.g. by coalescing the writes and sending them to
>> the drive as a single write. Or maybe the drive can do that in local
>> write cache (all SSDs have that).
>
> I have heard that genuine raid card with genuine ssds are optimized in
> these ssds. It is important that using compatible with ssd for
> performance. If the worst case, life time of ssd is be short, and will
> be bad performance.

Well, that's the main question here, right? Because if the "worst case"
actually happens to be true, then what's the point of SSDs? You have a
disk that does not provite the performance you expected, died much
sooner than you expected and maybe suddenly so it interrupted the operation.

So instead of paying more for higher performance, you paid more for bad
performance and much shorter life of the disk.

Coincidentally we're currently trying to find the answer to this
question too. That is - how long will the SSD endure in that particular
RAID level? Does that pay off?

BTW what you mean by "genuine raid card" and "genuine ssds"?

> I'm wondering about effective of readahead in OS and raid card. In
> general, readahead data by raid card is stored in raid cache, and
> not stored in OS caches. Readahead data by OS is stored in OS cache.
> I'd like to use all raid cache for only write cache, because fsync()
> becomes faster. But then, it cannot use readahead very much by raid
> card.. If we hope to use more effectively, we have to clear it, but
> it seems difficult:(

I've done a lot of testing of this on H710 in 2012 (~18 months ago),
measuring combinations of

   * read-ahead on controller (adaptive, enabled, disabled)
   * read-ahead in kernel (with various sizes)
   * scheduler

The test was the simplest and most suitable workload for this - just
"dd" with 1MB block size (AFAIK, would have to check the scripts).

In short, my findings are that:

   * read-ahead in kernel matters - tweak this
   * read-ahead on controller sucks - either makes no difference, or
     actually harms performance (adaptive with small values set for
     kernel read-ahead)
   * scheduler made no difference (at least for this workload)

So we disable readahead on the controller, use 24576 for kernel and it
works fine.

I've done the same test with fusionio iodrive (attached to PCIe, not
through controller) - absolutely no difference.

Tomas


В списке pgsql-performance по дате отправления:

Предыдущее
От: Scott Marlowe
Дата:
Сообщение: Re: Optimal settings for RAID controller - optimized for writes
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Optimal settings for RAID controller - optimized for writes