Re: Online verification of checksums

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Online verification of checksums
Дата
Msg-id CAOuzzgoMGsWx-_pJH6hLLs=_a91wa+POzyntsesnO3ajOm0MyA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Online verification of checksums  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Greetings,

On Tue, Mar 5, 2019 at 18:36 Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Mar 05, 2019 at 02:08:03PM +0100, Tomas Vondra wrote:
> Based on quickly skimming that thread the main issue seems to be
> deciding which files in the data directory are expected to have
> checksums. Which is a valid issue, of course, but I was expecting
> something about partial read/writes etc.

I remember complaining about partial write handling as well for the
base backup checks...  There should be an email about it on the list,
cannot find it now ;p

> My understanding is that:
>
> (a) The checksum verification should not generate false positives (same
> as for basebackup).
>
> (b) The partial reads do emit warnings, which might be considered false
> positives I guess. Which is why I'm arguing for changing it to do the
> same thing basebackup does, i.e. ignore this.

Well, at least that's consistent...  Argh, I really think that we
ought to make the failures reported harder because that's easier to
detect within a tool and some deployments set log_min_messages >
WARNING so checksum failures would just be lost.  For base backups we
don't care much about that as files are just blindly copied so they
could have torn pages, which is fine as that's fixed at replay.  Now
we are talking about a set of tools which could have reliable
detection mechanisms for those problems.

I’m traveling but will try to comment more in the coming days but in general I agree with Tomas on these items. Also, pg_basebackup has to handle torn pages when it comes to checksums just like the verify tool does, and having them be consistent (along with external tools) would really be for the best, imv.  I still feel like a retry of a short read (try reading more to get the whole page..) would be alright and reading until we hit eof and then moving on. I’m not sure it’s possible but I do worry a bit that we might get a short read from a network file system or something that isn’t actually at eof and then we would skip a significant remaining portion of the file...   another thought might be to stat the file after we have opened it to see it’s length...

Just a few thoughts since I’m on my phone.  Will try to write up something more in a day or two. 

Thanks!

Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Online verification of checksums
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Tab completion for SKIP_LOCKED option