Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Дата
Msg-id YNKBazxEayjtyb1x@paquier.xyz
обсуждение исходный текст
Ответ на Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Список pgsql-bugs
On Tue, Jun 22, 2021 at 10:11:06AM -0400, Tom Lane wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
>> Your analysis seems right to me.  We have to worry about both things:
>> atomicity of writes on power failure (assumed to be sector-level,
>> hence our 512 byte struct -- all good), and atomicity of concurrent
>> reads and writes (we can't assume anything at all, so r/w locking is
>> the simplest way to get a consistent read).  Shouldn't relmap_redo()
>> also acquire the lock exclusively?

You are implying anything calling write_relmap_file(), right?

> Shouldn't we instead file a kernel bug report?  I seem to recall that
> POSIX guarantees atomicity of these things up to some operation size.
> Or is that just for pipe I/O?

Even if this is recognized as a bug report, it seems to me that we'd
better cope with an extra lock for instances that may run into this
issue anyway in the future, no?  Just to be on the safe side.

> If we can't assume atomicity of relmapper file I/O, I wonder about
> pg_control as well.  But on the whole, what I'm smelling is a moderately
> recently introduced kernel bug.  We've been doing this this way for
> years and heard no previous reports.

True.  PG_CONTROL_MAX_SAFE_SIZE relies on that.  Now, the only things
updating the control file are the startup process and the checkpointer
so that's less prone to conflicts contrary to the reported problem
here, and the code takes a ControlFileLock where necessary.
--
Michael

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alexander Lakhin
Дата:
Сообщение: Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"