От: david@lang.hm
Тема: Re: Raid 10 chunksize
Дата: ,
Msg-id: alpine.DEB.1.10.0904011341500.28893@asgard.lang.hm
(см: обсуждение, исходный текст)
Ответ на: Re: Raid 10 chunksize  ()
Список: pgsql-performance

Скрыть дерево обсуждения

Raid 10 chunksize  (Mark Kirkwood, )
 Re: Raid 10 chunksize  (Scott Carey, )
  Re: Raid 10 chunksize  (David Rees, )
   Re: Raid 10 chunksize  (Greg Smith, )
 Re: Raid 10 chunksize  (Scott Marlowe, )
  Re: Raid 10 chunksize  (Mark Kirkwood, )
   Re: Raid 10 chunksize  (Jerry Champlin, )
   Re: Raid 10 chunksize  (Mark Kirkwood, )
    Re: Raid 10 chunksize  (Scott Carey, )
     Re: Raid 10 chunksize  (Mark Kirkwood, )
      Re: Raid 10 chunksize  (Mark Kirkwood, )
 Re: Raid 10 chunksize  (Greg Smith, )
  Re: Raid 10 chunksize  (Scott Carey, )
   Re: Raid 10 chunksize  (Merlin Moncure, )
    Re: Raid 10 chunksize  (Scott Carey, )
     Re: Raid 10 chunksize  (Merlin Moncure, )
     Re: Raid 10 chunksize  (Scott Carey, )
  Re: Raid 10 chunksize  (Mark Kirkwood, )
   Re: Raid 10 chunksize  (Greg Smith, )
    Re: Raid 10 chunksize  (Mark Kirkwood, )
 Re: Raid 10 chunksize  (Stef Telford, )
  Re: Raid 10 chunksize  (Mark Kirkwood, )
   Re: Raid 10 chunksize  (Scott Carey, )
    Re: Raid 10 chunksize  (Scott Carey, )
     Re: Raid 10 chunksize  (Mark Kirkwood, )
      Re: Raid 10 chunksize  (Stef Telford, )
       Re: Raid 10 chunksize  (Greg Smith, )
        Re: Raid 10 chunksize  (Stef Telford, )
         Re: Raid 10 chunksize  (Scott Marlowe, )
          Re: Raid 10 chunksize  (Stef Telford, )
           Re: Raid 10 chunksize  (Scott Marlowe, )
            Re: Raid 10 chunksize  (Greg Smith, )
             Re: Raid 10 chunksize  (Matthew Wakeling, )
              Re: Raid 10 chunksize  (Scott Marlowe, )
            Re: Raid 10 chunksize  (Scott Carey, )
             Re: Raid 10 chunksize  (, )
             Re: Raid 10 chunksize  (Scott Marlowe, )
           Re: Raid 10 chunksize  (Matthew Wakeling, )
            Re: Raid 10 chunksize  (Scott Marlowe, )
            Re: Raid 10 chunksize  (Stef Telford, )
            Re: Raid 10 chunksize  (Scott Carey, )
          Re: Raid 10 chunksize  (Matthew Wakeling, )
         Re: Raid 10 chunksize  (Scott Carey, )
          Re: Raid 10 chunksize  (Greg Smith, )
           Re: Raid 10 chunksize  (James Mansion, )
            Re: Raid 10 chunksize  (Greg Smith, )
           Re: Raid 10 chunksize  (Scott Carey, )
            Re: Raid 10 chunksize  (Greg Smith, )
           Re: Raid 10 chunksize  (Ron Mayer, )
            Re: Raid 10 chunksize  (Hannes Dorbath, )
       Re: Raid 10 chunksize  (Stef Telford, )
        Re: Raid 10 chunksize  (Stef Telford, )
         Re: Raid 10 chunksize  (Scott Carey, )
       Re: Raid 10 chunksize  (Mark Kirkwood, )
      Re: Raid 10 chunksize  (, )
       Re: Raid 10 chunksize  (, )
 Re: Raid 10 chunksize  (Greg Smith, )
  Re: Raid 10 chunksize  (, )
   Re: Raid 10 chunksize  (Greg Smith, )
   Re: Raid 10 chunksize  (Scott Carey, )

On Wed, 1 Apr 2009,  wrote:

> On Wed, 1 Apr 2009, Mark Kirkwood wrote:
>
>> Scott Carey wrote:
>>>
>>> A little extra info here >>  md, LVM, and some other tools do not allow
>>> the
>>> file system to use write barriers properly.... So those are on the bad
>>> list
>>> for data integrity with SAS or SATA write caches without battery back-up.
>>> However, this is NOT an issue on the postgres data partition.  Data fsync
>>> still works fine, its the file system journal that might have out-of-order
>>> writes.  For xlogs, write barriers are not important, only fsync() not
>>> lying.
>>>
>>> As an additional note, ext4 uses checksums per block in the journal, so it
>>> is resistant to out of order writes causing trouble.  The test compared to
>>> here was on ext4, and most likely the speed increase is partly due to
>>> that.
>>>
>>>
>>
>> [Looks at  Stef's  config - 2x 7200 rpm SATA RAID 0]  I'm still highly
>> suspicious of such a system being capable of outperforming one with the
>> same number of (effective) - much faster - disks *plus* a dedicated WAL
>> disk pair... unless it is being a little loose about fsync! I'm happy to
>> believe ext4 is better than ext3 - but not that much!
>
> given how _horrible_ ext3 is with fsync, I can belive it more easily with
> fsync turned on than with it off.

I realized after sending this that I needed to elaborate a little more.

over the last week there has been a _huge_ thread on the linux-kernel list
(>400 messages) that is summarized on lwn.net at
http://lwn.net/SubscriberLink/326471/b7f5fedf0f7c545f/

there is a lot of information in this thread, but one big thing is that in
data=ordered mode (the default for most distros) ext3 can end up having to
write all pending data when you do a fsync on one file, In addition
reading from disk can take priority over writing the journal entry (the IO
scheduler assumes that there is someone waiting for a read, but not for a
write), so if you have one process trying to do a fsync and another
reading from the disk, the one doing the fsync needs to wait until the
disk is idle to get the fsync completed.

ext4 does things enough differently that fsyncs are relativly cheap again
(like they are on XFS, ext2, and other filesystems). the tradeoff is that
if you _don't_ do an fsync there is a increased window where you will get
data corruption if you crash.

David Lang


В списке pgsql-performance по дате сообщения:

От: Scott Marlowe
Дата:
Сообщение: Re: Raid 10 chunksize
От: Mark Kirkwood
Дата:
Сообщение: Re: Raid 10 chunksize