CRCs (was: beta testing version)

Поиск
Список
Период
Сортировка
От ncm@zembu.com (Nathan Myers)
Тема CRCs (was: beta testing version)
Дата
Msg-id 20001206110800.Q30335@store.zembu.com
обсуждение исходный текст
Ответ на Re: AW: beta testing version  (Bruce Guenter <bruceg@em.ca>)
Ответы Re: CRCs (was: beta testing version)  (Bruce Guenter <bruceg@em.ca>)
Список pgsql-hackers
On Wed, Dec 06, 2000 at 11:49:10AM -0600, Bruce Guenter wrote:
> On Wed, Dec 06, 2000 at 11:15:26AM -0500, Tom Lane wrote:
> > Zeugswetter Andreas SB <ZeugswetterA@Wien.Spardat.at> writes:
> > > Yes, but there would need to be a way to verify the last page or
> > > record from txlog when running on crap hardware.
> >
> > How exactly *do* we determine where the end of the valid log data is,
> > anyway?
> 
> I don't know how pgsql does it, but the only safe way I know of is to
> include an "end" marker after each record.  When writing to the log,
> append the records after the last end marker, ending with another end
> marker, and fdatasync the log.  Then overwrite the previous end marker
> to indicate it's not the end of the log any more and fdatasync again.
>
> To ensure that it is written atomically, the end marker must not cross a
> hardware sector boundary (typically 512 bytes).  This can be trivially
> guaranteed by making the marker a single byte.

An "end" marker is not sufficient, unless all writes are done in
one-sector units with an fsync between, and the drive buffering 
is turned off.  For larger writes the OS will re-order the writes.  
Most drives will re-order them too, even if the OS doesn't.

> Any other way I've seen discussed (here and elsewhere) either
> - Requires atomic multi-sector writes, which are possible only if all
>   the sectors are sequential on disk, the kernel issues one large write
>   for all of them, and you don't powerfail in the middle of the write.
> - Assume that a CRC is a guarantee.  

We are already assuming a CRC is a guarantee.  

The drive computes a CRC for each sector, and if the CRC is OK the 
drive is happy.  CRC errors within the drive are quite frequent, and 
the drive re-reads when a bad CRC comes up.  (If it sees errors too 
frequently on a sector, it rewrites it; if it sees persistent errors 
on a sector, it marks that one bad and relocates it.)  You can expect 
to experience, in production, about the error rate that the drive 
manufacturer specifies as "maximum".

>   ... A CRC would be a good addition to
>   help ensure the data wasn't broken by flakey drive firmware, but
>   doesn't guarantee consistency.

No, a CRC would be a good addition to compensate for sector write
reordering, which is done both by the OS and by the drive, even for 
"atomic" writes.

It is not only "flaky" or "cheap" drives that re-order writes, or
acknowledge writes as complete that have are not yet on disk.  You 
can generally assume that *any* drive does it unless you have 
specifically turned that off.  The assumption is that if you care,
you have a UPS, or at least have configured the hardware yourself
to meet your needs.

It is purely wishful thinking to believe otherwise.

Nathan Myers
ncm@zembu.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: COPY BINARY file format proposal
Следующее
От: Randy Jonasz
Дата:
Сообщение: RFC C++ Interface