Re: Online verification of checksums
От | Michael Banck |
---|---|
Тема | Re: Online verification of checksums |
Дата | |
Msg-id | 1553803762.4884.52.camel@credativ.de обсуждение исходный текст |
Ответ на | Re: Online verification of checksums (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: Online verification of checksums
|
Список | pgsql-hackers |
Hi, Am Donnerstag, den 28.03.2019, 18:19 +0100 schrieb Tomas Vondra: > On Thu, Mar 28, 2019 at 05:08:33PM +0100, Michael Banck wrote: > > I also fixed the two issues Andres reported, namely a zeroed-out > > pageheader and a random LSN. The first is caught be checking for an all- > > zero-page in the way PageIsVerified() does. The second is caught by > > comparing the upper 32 bits of the LSN as well and demanding that they > > are equal. If the LSN is corrupted, the upper 32 bits should be wildly > > different to the current checkpoint LSN. > > > > Well, at least that is a stab at a fix; there is a window where the > > upper 32 bits could legitimately be different. In order to make that as > > small as possible, I update the checkpoint LSN every once in a while. I decided it makes more sense to just re-read the checkpoint LSN from the control file when we encounter a wrong checksum on re-read of a page as that is when it counts, instead of doing it only every once in a while. > Doesn't that mean we'll report a false positive? A false positive would be pg_checksums claiming a block has a wrong checksum while in fact it does not (after it is correctly written out and synced to disk), right? If pg_checksums reads a current first part and a stale second part twice in a row (we re-read the block), then the LSN of the first part would presumably(?) be higher than the latest checkpoint LSN. If there was a wraparound in the lower part of the LSN so that the upper part is now different to the latest checkpoint LSN, then pg_checksums would report this as a false positive I believe. We could add some additional heuristics like checking the upper part of the LSN has advanced by at most one but that does not seem to make it 100% certified robust either, does it? If pg_checksums reads a current second part and a stale first part twice, then the pageheader LSN would presumably be lower than the checkpoint LSN and again a false positive would be reported. At least in my testing I haven't seen the second case and the first (disregarding the wraparound issue for now) extremely rarely if at all (usually the torn page is gone on re-read). The first case requiring a wraparound since the latest checkpointLSN update also seems quite narrow compared to the issue of random data being written due to corruption. So I think it is more important to make sure random data won't be a false negative than this being a false positive. Maybe we can just issue a warning in online mode that some checksum failures could be false positives and advise the user to recheck those files (using the -r switch) again? I have added this in the attached new version: + printf(_("%s ran against an online cluster and found some bad checksums.\n"), progname); + printf(_("It could be that those are false positives due concurrently updated blocks,\n")); + printf(_("checking the offending files again with the -r option is advised.\n")); It was not mentioned on this thread, but I want to stress again that you cannot run the current pg_checksums on a basebackup due to the control file claiming it is still online. This makes the current program pretty useless for production setups right now in my opinion as few people have the luxury of regular maintenance downtimes when pg_checksums could run and running it against base backups is quite cumbersome. Maybe we can improve things by checking for the postmaster.pid as well and going ahead (only for --check of course) if it is missing, but that hasn't been implemented yet. I agree that the current patch might have some corner-cases where it does not guarantee 100% accuracy in online mode, but I hope the current version at least has no more false negatives. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Вложения
В списке pgsql-hackers по дате отправления: