Re: Add recovery to pg_control and remove backup_label

Поиск
Список
Период
Сортировка
От David Steele
Тема Re: Add recovery to pg_control and remove backup_label
Дата
Msg-id c9a8b7e0-a451-4148-abcd-1ba7c2e661b7@pgmasters.net
обсуждение исходный текст
Ответ на Re: Add recovery to pg_control and remove backup_label  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 11/21/23 16:00, Andres Freund wrote:
> Hi,
> 
> On 2023-11-21 14:48:59 -0400, David Steele wrote:
>>> I'd not call 7.06->4.77 or 6.76->4.77 "virtually free".
>>
>> OK, but how does that look with compression
> 
> With compression it's obviously somewhat different - but that part is done in
> parallel, potentially on a different machine with client side compression,
> whereas I think right now the checksumming is single-threaded, on the server
> side.

Ah, yes, that's certainly a bottleneck.

> With parallel server side compression, it's still 20% slower with the default
> checksumming than none. With client side it's 15%.

Yeah, that still seems a lot. But to a large extent it sounds like a 
limitation of the current implementation.

>> -- to a remote location?
> 
> I think this one unfortunately makes checksums a bigger issue, not a smaller
> one. The network interaction piece is single-threaded, adding another
> significant use of CPU onto the same thread means that you are hit harder by
> using substantial amount of CPU for checksumming in the same thread.
> 
> Once you go beyond the small instances, you have plenty network bandwidth in
> cloud environments. We top out well before the network on bigger instances.
> 
>> Uncompressed backup to local storage doesn't seem very realistic. With gzip
>> compression we measure SHA1 checksums at about 5% of total CPU.
> 
> IMO using gzip is basically infeasible for non-toy sized databases today. I
> think we're using our users a disservice by defaulting to it in a bunch of
> places. Even if another default exposes them to difficulty due to potentially
> using a different compiled binary with fewer supported compression methods -
> that's gona be very rare in practice.

Yeah, I don't use gzip anymore, but there are still some platforms that 
do not provide zstd (at least not easily) and lz4 compresses less. One 
thing people do seem to have is a lot of cores.

>> I can't understate how valuable checksums are in finding corruption,
>> especially in long-lived backups.
> 
> I agree!  But I think we need faster checksum algorithms or a faster
> implementation of the existing ones. And probably default to something faster
> once we have it.

We've been using xxHash to generate checksums for our block-level 
incremental and it is seriously fast, written by the same guy who did 
zstd and lz4.

Regards,
-David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Add recovery to pg_control and remove backup_label
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Locks on unlogged tables are locked?!