Re: pg_combinebackup does not detect missing files

Поиск
Список
Период
Сортировка
От David Steele
Тема Re: pg_combinebackup does not detect missing files
Дата
Msg-id dc10b9d7-b484-489f-b2bc-070c425151dc@pgmasters.net
обсуждение исходный текст
Ответ на Re: pg_combinebackup does not detect missing files  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: pg_combinebackup does not detect missing files
Список pgsql-hackers
On 4/19/24 00:50, Robert Haas wrote:
> On Wed, Apr 17, 2024 at 7:09 PM David Steele <david@pgmasters.net> wrote:
> 
>> Fair enough. I accept that your reasoning is not random, but I'm still
>> not very satisfied that the user needs to run a separate and rather
>> expensive process to do the verification when pg_combinebackup already
>> has the necessary information at hand. My guess is that most users will
>> elect to skip verification.
> 
> I think you're probably right that a lot of people will skip it; I'm
> just less convinced than you are that it's a bad thing. It's not a
> *great* thing if people skip it, but restore time is actually just
> about the worst time to find out that you have a problem with your
> backups. I think users would be better served by verifying stored
> backups periodically when they *don't* need to restore them. 

Agreed, running verify regularly is a good idea, but in my experience 
most users are only willing to run verify once they suspect (or know) 
there is an issue. It's a pretty expensive process depending on how many 
backups you have and where they are stored.

 > Also,
> saying that we have all of the information that we need to do the
> verification is only partially true:
> 
> - we do have to parse the manifest anyway, but we don't have to
> compute checksums anyway, and I think that cost can be significant
> even for CRC-32C and much more significant for any of the SHA variants
> 
> - we don't need to read all of the files in all of the backups. if
> there's a newer full, the corresponding file in older backups, whether
> full or incremental, need not be read
> 
> - incremental files other than the most recent only need to be read to
> the extent that we need their data; if some of the same blocks have
> been changed again, we can economize
> 
> How much you save because of these effects is pretty variable. Best
> case, you have a 2-backup chain with no manifest checksums, and all
> verification will have to do that you wouldn't otherwise need to do is
> walk each older directory tree in toto and cross-check which files
> exist against the manifest. That's probably cheap enough that nobody
> would be too fussed. Worst case, you have a 10-backup (or whatever)
> chain with SHA512 checksums and, say, a 50% turnover rate. In that
> case, I think having verification happen automatically could be a
> pretty major hit, both in terms of I/O and CPU. If your database is
> 1TB, it's ~5.5TB of read I/O (because one 1TB full backup and 9 0.5TB
> incrementals) instead of ~1TB of read I/O, plus the checksumming.
> 
> Now, obviously you can still feel that it's totally worth it, or that
> someone in that situation shouldn't even be using incremental backups,
> and it's a value judgement, so fair enough. But my guess is that the
> efforts that this implementation makes to minimize the amount of I/O
> required for a restore are going to be important for a lot of people.

Sure -- pg_combinebackup would only need to verify the data that it 
uses. I'm not suggesting that it should do an exhaustive verify of every 
single backup in the chain. Though I can see how it sounded that way 
since with pg_verifybackup that would pretty much be your only choice.

The beauty of doing verification in pg_combinebackup is that it can do a 
lot less than running pg_verifybackup against every backup but still get 
a valid result. All we care about is that the output is correct -- if 
there is corruption in an unused part of an earlier backup 
pg_combinebackup doesn't need to care about that.

As far as I can see, pg_combinebackup already checks most of the boxes. 
The only thing I know that it can't do is detect missing files and that 
doesn't seem like too big a thing to handle.

Regards,
-David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Disallow changing slot's failover option in transaction block
Следующее
От: "Zhijie Hou (Fujitsu)"
Дата:
Сообщение: RE: Disallow changing slot's failover option in transaction block