Re: Block-level CRC checks

Поиск
Список
Период
Сортировка
От Gregory Stark
Тема Re: Block-level CRC checks
Дата
Msg-id 871vz01b33.fsf@oxford.xeocode.com
обсуждение исходный текст
Ответ на Re: Block-level CRC checks  (Aidan Van Dyk <aidan@highrise.ca>)
Список pgsql-hackers
Aidan Van Dyk <aidan@highrise.ca> writes:

> * Gregory Stark <stark@enterprisedb.com> [081001 11:59]:
>  
>> If setting a hint bit cleared a flag on the buffer header then the
>> checksumming process could set that flag, begin checksumming, and check that
>> the flag is still set when he's finished.
>> 
>> Actually I suppose that wouldn't actually be good enough. He would have to do
>> the i/o and check that the checksum was still valid after the i/o. If not then
>> he would have to recalculate the checksum and repeat the i/o. That might make
>> the idea a loser since I think the only way it wins is if you rarely actually
>> get someone setting the hint bits during i/o anyways.
>
> A doubled-write is essentially "free" with PostgreSQL because it's not
> doing direct IO, rather relying on the OS page cache to be efficient.

All things are relative. What we're talking about here is all cpu and
memory-bandwidth costs anyways so, yes, it'll be cheap compared to the disk
i/o but it'll still represent doubling the memory bandwidth and cpu cost of
these routines.

That said you would only have to do it in cases where the hint bits actually
get twiddled. That might not actually happen often.

> But the problem is if something crashes (or interrupts PG) between those
> two writes, you've got a block of data into the pagecache (and possibly
> to the disks) that PG will no longer read in, because the CRC/checksum
> fails despite the actual content being valid...

I don't think this is a problem because we're still doing WAL logging. The i/o
isn't allowed to happen until the page has been WAL logged and fsynced
anyways.


Incidentally I think the JUST_DIRTIED bit might actually be sufficient here.
Hint bits already cause the buffer to be marked dirty. So the only case I see
a real problem for is when we're writing a block as part of a checkpoint and
find it's JUST_DIRTIED after writing it. In that case we would have to start
over and write it again rather than leave it marked dirty.

If we're writing the block as part of normal i/o then we could just decide to
leave the possibly-bogus checksum in the table since it'll be overwritten by a
full page write anyways. It'll be overwritten in normal use when the newly
dirty buffer is eventually written out again.


If you're not doing full page writes then you would have to restore from
backup in cases where previously the page might actually have been valid
though. That's kind of unfortunate. In theory it hasn't actually changed
anything the risks of running without full page writes but it has certainly
increased the likelihood of actually having to deal with "corruption" in the
form of a gratuitously invalid checksum. (Of course without checksums you
don't ever actually know if you have corruption -- and real corruption).

> One possibility would be to "double-buffer" the write... i.e. as you
> calculate your CRC, you're doing it on a local copy of the block, which
> you hand to the OS to write...  If you're touching the whole block of
> memory to CRC it, it isn't *ridiculously* more expensive to copy the
> memory somewhere else as you do it...

Hm. Well that might actually work. You can do the CRC at the same time as
copying to the buffer, effectively doing it for the same cost as the CRC
alone.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Mielke
Дата:
Сообщение: Re: Block-level CRC checks
Следующее
От: Gregory Stark
Дата:
Сообщение: Re: Block-level CRC checks