Re: Enable data checksums by default

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Enable data checksums by default
Дата	22 марта 2019 г. 20:01:32
Msg-id	12110cc1-3729-9e5e-b6bb-62151a68af29@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: Enable data checksums by default (Andres Freund <andres@anarazel.de>)
Ответы	Re: Enable data checksums by default
Список	pgsql-hackers

Дерево обсуждения

On 3/22/19 5:41 PM, Andres Freund wrote:
> Hi,
> 
> On 2019-03-22 17:32:10 +0100, Tomas Vondra wrote:
>> On 3/22/19 5:10 PM, Andres Freund wrote:
>>> IDK, being able to verify in some form that backups aren't corrupted on
>>> an IO level is mighty nice. That often does allow to detect the issue
>>> while one still has older backups around.
>>>
>>
>> Yeah, I agree that's a valuable capability. I think the question is how
>> effective it actually is considering how much the storage changed over
>> the past few years (which necessarily affects the type of failures
>> people have to deal with).
> 
> I'm not sure I understand? How do the changes around storage
> meaningfully affect the need to have some trust in backups and
> benefiting from earlier detection?
> 

Having trusted in backups is still desirable - nothing changes that,
obviously. The question I was posing was rather "Are checksums still
effective on current storage systems?"

I'm wondering if the storage systems people use nowadays may be failing
in ways that are not reliably detectable by checksums. I don't have any
data to either support or reject that hypothesis, though.

> 
>> It's not clear to me what can checksums do about zeroed pages (and/or
>> truncated files) though.
> 
> Well, there's nothing fundamental about needing added pages be
> zeroes. We could expand them to be initialized with actual valid
> checksums instead of
>         /* new buffers are zero-filled */
>         MemSet((char *) bufBlock, 0, BLCKSZ);
>         /* don't set checksum for all-zero page */
>         smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
> 
> the problem is that it's hard to do so safely without adding a lot of
> additional WAL logging. A lot of filesystems will journal metadata
> changes (like the size of the file), but not contents. So after a crash
> the tail end might appear zeroed out, even if we never wrote
> zeroes. That's obviously solvable by WAL logging, but that's not cheap.
> 

Hmmm. I'd say a filesystem that does not guarantee having all the data
after an fsync is outright broken, but maybe that's what checksums are
meant to protect against.

> It might still be a good idea to just write a page with an initialized
> header / checksum at that point, as that ought to still detect a number
> of problems we can't detect right now.
> 

Sounds reasonable.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Alvaro Herrera
Дата: 22 марта 2019 г., 19:52:09
Сообщение: Re: propagating replica identity to partitions

Следующее

От: Andres Freund
Дата: 22 марта 2019 г., 20:07:15
Сообщение: Re: Enable data checksums by default

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Enable data checksums by default

Предыдущее

Следующее