Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Дата
Msg-id CA+hUKG+a+M6tbKJ5Ei2SFBDJxw4UjGLyRBDVrUfuSBZZ0ht0LQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Tue, Jul 25, 2023 at 8:18 AM Robert Haas <robertmhaas@gmail.com> wrote:
> (Yeah, I know we have code to verify checksums during a base
> backup, but as discussed elsewhere, it doesn't work.)

BTW the the code you are referring to there seems to think 4KB
page-halves are atomic; not sure if that's imagining page-level
locking in ancient Linux (?), or imagining default setvbuf() buffer
size observed with some specific implementation of fread(), or
confusing power-failure-sector-based atomicity with concurrent access
atomicity, or something else, but for the record what we actually see
in this scenario on ext4 is the old/new page contents mashed together
on much smaller boundaries (maybe cache lines), caused by duelling
concurrent memcpy() to/from, independent of any buffer/page-level
implementation details we might have been thinking of with that code.
Makes me wonder if it's even technically sound to examine the LSN.

> It's also why we
> have to force full-page write on during a backup. But the whole thing
> is nasty because you can't really verify anything about the backup you
> just took. It may be full of gibberish blocks but don't worry because,
> if all goes well, recovery will fix it. But you won't really know
> whether recovery actually does fix it. You just kind of have to cross
> your fingers and hope.

Well, not without also scanning the WAL for FPIs, anyway...  And
conceptually, that's why I think we probably want an 'FPI' of the
control file somewhere.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Следующее
От: "Mr.Bim"
Дата:
Сообщение: Partition pruning not working on updates