Re: Enable data checksums by default

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Enable data checksums by default
Дата
Msg-id 58993856-3ce9-4223-9dbe-6b2853a80628@vondra.me
обсуждение исходный текст
Ответ на Re: Enable data checksums by default  (Greg Burd <greg@burd.me>)
Ответы Re: Enable data checksums by default
Re: Enable data checksums by default
Список pgsql-hackers

On 7/31/25 15:39, Greg Burd wrote:
> 
> 
>> On Jul 30, 2025, at 8:09 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>>> On 30 Jul 2025, at 11:58, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>>>
>>> On Tue, 2025-07-29 at 20:24 +0200, Tomas Vondra wrote:
>>>> So, what should we do with the PG18 open item? We (the RMT team) would
>>>> like to know if we shall keep the checksums enabled by default, and if
>>>> there's something that still needs to be done for PG18.
>>>
>>> I don't have a strong opinion, but I lean towards having them on
>>> by default.
>>
>> I agree with that, while there might be a lot of cases where disabling
>> checksums is the right move it's still a sane default.
>>
>> --
>> Daniel Gustafsson
> 
> I realize I’m late to the conversation, I’ve been lurking...
> 
> I agree that enabling checksums by default is the sane default.  Databases
> should always make a best effort for data integrity, checksums are a
> positive step in that direction.
> 
> I recall a conversation at the last PGConf.dev (2025) with a representative
> from Intel and Jeff Davis (CC’ed) that had to do with checksums and a vast
> performance difference between Intel and AMD the latter winning by a mile.
> I forget the details, maybe Jeff remembers more than I do.  I’m not
> suggesting that we disable Intel by default or trying to derail this
> conversation (which appears to be reaching consensus), just raising
> awareness.
> 

I don't know the Intel vs. AMD situation exactly, but e.g. [1] does not
suggest AMD wins by a mile. In fact, it suggests Intel does much better
in this particular benchmark (with AVX-512 improvements). Of course,
this is a fairly recent *kernel* improvement, maybe it wouldn't work for
our data checksums that well.

However, I don't think the cost of the checksum calculation itself is
the main concern. It's probably negligible compared to all the other
costs, triggered by checksums - having to WAL-log hint bits, doing more
expensive checks (that's what the btree regression was about), etc.

[1] https://www.phoronix.com/news/Linux-CRC32C-VPCLMULQDQ

cheers

-- 
Tomas Vondra




В списке pgsql-hackers по дате отправления: