Re: Offline enabling/disabling of data checksums

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Offline enabling/disabling of data checksums
Дата
Msg-id bd6b833b-8250-0810-dfb6-4d8b3e6581b0@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Offline enabling/disabling of data checksums  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Offline enabling/disabling of data checksums  (Michael Paquier <michael@paquier.xyz>)
Re: Offline enabling/disabling of data checksums  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-hackers

On 12/28/18 12:25 AM, Michael Paquier wrote:
> On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:
>> On 12/27/18 11:43 AM, Magnus Hagander wrote:
>>> Should we double-check with packagers that this won't cause a problem?
>>> Though the fact that it's done in a major release should make it
>>> perfectly fine I think -- and it's a smaller change than when we did all
>>> those xlog->wal changes...
>>>
>>
>> I think it makes little sense to not rename the tool now. I'm pretty
>> sure we'd end up doing that sooner or later anyway, and we'll just live
>> with a misnamed tool until then.
> 
> Do you think that a thread Would on -packagers be more adapted then?
> 

I'm sorry, but I'm not sure I understand the question. Of course, asking
over at -packagers won't hurt, but my guess is the response will be it's
not a big deal from the packaging perspective.

>> I don't know, TBH. I agree making the on/off change cheaper makes moves
>> us closer to 'on' by default, because they may disable it if needed. But
>> it's not the whole story.
>>
>> If we enable checksums by default, 99% users will have them enabled.
>> That means more people will actually observe data corruption cases that
>> went unnoticed so far. What shall we do with that? We don't have very
>> good answers to that (tooling, docs) and I'd say "disable checksums" is
>> not a particularly amazing response in this case :-(
> 
> Enabling data checksums by default is still a couple of steps ahead,
> without a way to control them better..
> 

What do you mean by "control" here? Dealing with checksum failures, or
some additional capabilities?

>> FWIW I don't know what to do about that. We certainly can't prevent the
>> data corruption, but maybe we could help with fixing it (although that's
>> bound to be low-level work).
> 
> Yes, data checksums are extremely useful to tell people when the
> problem is *not* from Postgres, which can be really hard in a large
> organization.  Knowing about the corrupted page is also useful as you
> can look at its contents and look at its bytes before it gets zero'ed
> to spot patterns which can help other teams in charge of a lower level
> of the application layer.

I'm not sure data checksums are particularly great evidence. For example
with the recent fsync issues, we might have ended with partial writes
(and thus invalid checksums). The OS migh have even told us about the
failure, but we've gracefully ignored it. So I'm afraid data checksums
are not a particularly great proof it's not our fault.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Petr Jelinek
Дата:
Сообщение: Re: row filtering for logical replication
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: GIN predicate locking slows down valgrind isolationtests tremendously