Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Дата
Msg-id CA+hUKGJg341Rf1zD3Rh3vXUbs_bP+LuOiT-Juj+nOWVr1QUkBg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Список pgsql-bugs
On Wed, Jun 23, 2021 at 7:46 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Let's just add the lock there.

+1, no doubt about that.

> Now, that leaves the question with pg_control. That's a different
> situation. It doesn't rely on read() and write() being atomic across
> processes, but on a 512 sector write not being torn on power failure.
> How strong is that guarantee? It used to be common wisdom with hard
> drives, and it was carried over to SSDs although I'm not sure if it was
> ever strictly speaking guaranteed. ...

Right, it's always been tacit, no standard relevant to userspace
mentions any of this AFAIK.

> ... What about the new kid on the block:
> Persistent Memory? I found this article:
> https://lwn.net/Articles/686150/. So at hardware level, Persistent
> Memory only guarantees atomicity at cache line level (64 bytes). To
> provide the traditional 512 byte sector atomicity, there's a feature in
> Linux called BTT. Perhaps we should add a note to the docs that you
> should enable that.

Right, also called sector mode.  I don't know enough about that to
comment really, but... if my google-fu is serving me, you can't
actually use interesting sector sizes like 8KB (you have to choose 512
or 4096 bytes), so you'll have to pay for *two* synthetic atomic page
schemes: BTT and our full page writes.  That makes me wonder... if you
need to leave full page writes on anyway, maybe it would be a better
trade-off to do double writes of our special atomic files (relmapper
files and control file) so that we could safely turn BTT off and avoid
double-taxation for relation data.  Just a thought.  No pmem
experience here, I could be way off.

> We haven't heard of broken control files from the field, so that doesn't
> seem to be a problem in practice, at least not yet. Still, I would sleep
> better if the control file had more redundancy. For example, have two
> copies of it on disk. At startup, read both copies, and if they're both
> valid, ignore the one with older timestamp. When updating it, write over
> the older copy. That way, if you crash in the middle of updating it, the
> old copy is still intact.

+1, with a flush in between so that only one can be borked no matter
how the storage works.  It is interesting how few reports there are on
the mailing list of a control file CRC check failures though, if I'm
searching for the right thing[1].

[1] https://www.postgresql.org/search/?m=1&q=calculated+CRC+checksum+does+not+match+value+stored+in+file&l=&d=-1&s=r



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Следующее
От: Telford Tendys
Дата:
Сообщение: Re: Unicode FFFF Special Codepoint should always collate high.