Re: pg_combinebackup does not detect missing files

Поиск
Список
Период
Сортировка
От David Steele
Тема Re: pg_combinebackup does not detect missing files
Дата
Msg-id 2f9aeae6-c010-43a5-b456-91adeb229160@pgmasters.net
обсуждение исходный текст
Ответ на Re: pg_combinebackup does not detect missing files  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: pg_combinebackup does not detect missing files  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 4/25/24 00:05, Robert Haas wrote:
> On Tue, Apr 23, 2024 at 7:23 PM David Steele <david@pgmasters.net> wrote:
>>> I don't understand what you mean here. I thought we were in agreement
>>> that verifying contents would cost a lot more. The verification that
>>> we can actually do without much cost can only check for missing files
>>> in the most recent backup, which is quite weak. pg_verifybackup is
>>> available if you want more comprehensive verification and you're
>>> willing to pay the cost of it.
>>
>> I simply meant that it is *possible* to verify the output of
>> pg_combinebackup without explicitly verifying all the backups. There
>> would be overhead, yes, but it would be less than verifying each backup
>> individually. For my 2c that efficiency would make it worth doing
>> verification in pg_combinebackup, with perhaps a switch to turn it off
>> if the user is confident in their sources.
> 
> Hmm, can you outline the algorithm that you have in mind? I feel we've
> misunderstood each other a time or two already on this topic, and I'd
> like to avoid more of that. Unless you just mean what the patch I
> posted does (check if anything from the final manifest is missing from
> the corresponding directory), but that doesn't seem like verifying the
> output.

Yeah, it seems you are right that it is not possible to verify the 
output in all cases.

However, I think allowing the user to optionally validate the input 
would be a good feature. Running pg_verifybackup as a separate step is 
going to be a more expensive then verifying/copying at the same time. 
Even with storage tricks to copy ranges of data, pg_combinebackup is 
going to aware of files that do not need to be verified for the current 
operation, e.g. old copies of free space maps.

Additionally, if pg_combinebackup is updated to work against tar.gz, 
which I believe will be important going forward, then there would be 
little penalty to verification since all the required data would be in 
memory at some point anyway. Though, if the file is compressed it might 
be redundant since compression formats generally include checksums.

One more thing occurs to me -- if data checksums are enabled then a 
rough and ready output verification would be to test the checksums 
during combine. Data checksums aren't very good but something should be 
triggered if a bunch of pages go wrong, especially since the block 
offset is part of the checksum. This would be helpful for catching 
combine bugs.

Regards,
-David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Schema variables - new implementation for Postgres 15
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Why does pgindent's README say to download typedefs.list from the buildfarm?