Re: Block-level CRC checks

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Block-level CRC checks
Дата
Msg-id 7CE61C21-DA8B-4C7F-AC77-1E3B76E3BB0D@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Block-level CRC checks  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы Re: Block-level CRC checks  (Aidan Van Dyk <aidan@highrise.ca>)
Список pgsql-hackers
[sorry for top-posting - damn phone]

I thought of saying that too but it doesn't really solve the problem.  
Think of what happens if someone sets a hint bit on a dirty page.

greg

On 17 Nov 2008, at 08:26 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com > wrote:

> Martijn van Oosterhout wrote:
>> On Fri, Nov 14, 2008 at 10:51:57AM -0500, Tom Lane wrote:
>>> In fact, if the patch were to break torn-page handling, it would be
>>> 100% likely to be a net *decrease* in system reliability.  It  
>>> would add
>>> detection of a situation that is not supposed to happen (ie, storage
>>> system fails to return the same data it stored) at the cost of  
>>> breaking
>>> one's database when the storage system acts as it's expected and
>>> documented to in a routine power-loss situation.
>> Ok, I see it's a problem because the hint changes are not WAL logged,
>> so torn pages are expected to work in normal operation. But simply
>> skipping the hint bits during checksumming is a terrible solution,
>> since then any errors in those bits will go undetected. To not be  
>> able
>> to say in the documentation that you'll detect 100% of single-bit
>> errors is pretty darn terrible, since that's kind of the goal of the
>> exercise.
>
> Agreed, trying to explain that in the documentation would look like  
> making excuses.
>
> The requirement that all hint bit changes are WAL-logged seems like  
> a pretty big change. I don't like doing that, just for CRCing.
>
> There has been discussion before about not writing out pages to disk  
> that only have hint-bit updates on them. That means that the next  
> time the page is read, the reader needs to do the clog lookups and  
> set the hint bits again. It's a tradeoff, making the first SELECT  
> after modifying a page cheaper, I/O-wise, at the cost of making all  
> subsequent SELECTs that need to read the page from disk or kernel  
> cache more expensive, CPU-wise.
>
> I'm not sure if I like that idea or not, but it would also solve the  
> CRC problem with torn pages. FWIW, it would also solve the problem  
> suggested with IBM DTLA disks and others that might zero-out a  
> sector in case of an interrupted write. I'm not totally convinced  
> that's a problem, as there's apparently other software that make the  
> same assumption as we do, and we haven't heard of any torn-page  
> corruption in real life, but still.
>
> If we made the behavior configurable, that would be pretty hard to  
> explain in the docs. We'd have three options with dependencies
>
> - CRC on/off
> - write pages with only hint bit changes on/off
> - full_page_writes on/off
>
> If disable full_page_writes, you're vulnerable to torn pages. If you  
> enable it, you're not. Except if you also turn CRC on. Except if you  
> also turn "write pages with only hint bit changes" off.
>
>> Unfortunatly, there's not a lot of easy solutions here. You could do
>> two checksums, one with and one without hint bits. The overall  
>> checksum
>> tells you if there's a problem. If it doesn't match the second  
>> checksum
>> will tell you if it's the hint bits or not (torn page problem). If  
>> it's
>> the hint bits you can reset them all and continue. The checksums need
>> not be of equal strength.
>
> Hmm, that would work I guess.
>
>> The extreme case is an ECC where you explicitly can set it so you can
>> alter N bits before you need to recalculate the checksum.
>> Computationally though, that sucks.
>
> Yep. Also, in case of a torn page, you're very likely going to have  
> several hint bits from the old image and several from the new image.  
> An error-correcting code would need to be unfeasibly long to cope  
> with that.
>
> -- 
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Bramandia Ramadhana"
Дата:
Сообщение: Re: Stack trace
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: patch: Client certificate requirements