Re: Offline enabling/disabling of data checksums

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Offline enabling/disabling of data checksums
Дата
Msg-id 20190314043936.GH3493@paquier.xyz
обсуждение исходный текст
Ответ на Re: Offline enabling/disabling of data checksums  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Offline enabling/disabling of data checksums
Список pgsql-hackers
On Wed, Mar 13, 2019 at 08:56:33PM +0900, Michael Paquier wrote:
> On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote:
> > The attached patch should do the above, on top of Michael's last
> > patchset.
>
> What you are doing here looks like a good defense in itself.

More thoughts on that, and here is a short summary of the thread.

+       /* Check if control file has changed */
+       if (controlfile_last_updated != ControlFile->time)
+       {
+           fprintf(stderr, _("%s: control file has changed since startup\n"), progname);
+           exit(1);
+       }
Actually, under the conditions discussed on this thread that Postgres
is started in parallel of pg_checksums, imagine the following
scenario:
- pg_checksums starts to enable checksums, it reads a block to
calculate its checksum, then calculates it.
- Postgres has been started in parallel, writes the same block.
- pg_checksums finishes the block calculation, writes back the block
it has just read.
- Postgres stops, some data is lost.
- At the end of pg_checksums, we complain that the control file has
been updated since the start of pg_checksums.
I think that we should be way more noisy about this error message
document properly that Postgres should not be started while checksums
are enabled.  Basically, I guess that it should mention that there is
a risk of corruption because of this parallel operation.

Hence, based on what I could read on this thread, we'd like to have
the following things added to the core patch set:
1) When enabling checksums, fsync the data folder.  Then update the
control file, and finally fsync the control file (+ flush of the
parent folder to make the whole durable).  This way a host crash never
actually means that we finish in an inconsistent state.
2) When checksums are disabled, update the control file, then fsync
it + its parent folder.
3) Add tracking of the control file data, and complain loudly before
trying to update the file if there are any inconsistencies found.
4) Document with a big-fat-red warning that postgres should not be
worked on while the tool is enabling or disabling checksums.

There is a part discussed about standbys and primaries with not the
same checksum settings, but I commented on that in [1].

There is a secondary bug fix to prevent the tool to run if the data
folder has been initialized with a block size different than what
pg_checksums has been compiled with in [2].  The patch looks good,
still the error message could be better per my lookup.

[1]: https://www.postgresql.org/message-id/20190314002342.GC3493@paquier.xyz
[2]: https://www.postgresql.org/message-id/20190313224742.GA3493@paquier.xyz

Am I missing something?
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Takuma Hoshiai
Дата:
Сообщение: Proposal to suppress errors thrown by to_reg*()
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pgsql: Add support for hyperbolic functions, as well as log10().