Re: BUG #14999: pg_rewind corrupts control file global/pg_control

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Дата
Msg-id 30255.1522711675@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: BUG #14999: pg_rewind corrupts control file global/pg_control  (Dmitry Dolgov <9erthalion6@gmail.com>)
Ответы Re: BUG #14999: pg_rewind corrupts control file global/pg_control  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs
Dmitry Dolgov <9erthalion6@gmail.com> writes:
> Ok, I agree. Then yes, this patch can be probably marked as ready.

I started to look over this patch, and soon decided that the
commentary in pg_rewind is impossibly bad :-(.  You need to study
libpq_executeFileMap for a long time before you are even sure which side
of the transfer it's executing on; the fact that it's doing a copy *TO*
the remote server is incredibly misleading, and the comments are not
anywhere near good enough to de-confuse somebody coming upon the code for
the first time.  Not to mention that the function's name conveys nothing
whatever.  Maybe it's not the job of this patch to fix that, but I can't
help thinking that poor design and documentation are a big part of why
this bug exists in the first place.

But enough of that rant.  Now that I've more or less wrapped my head
around what's happening, I think this is the core of the problem:
libpq_executeFileMap truncates every non-data file in the local data
directory before it's fetched anything at all from the remote server.
This seems to me to be utterly cavalier and brittle, because ANY FAILURE
AT ALL in the fetch sequence leaves you with a data directory that's been
trashed in whole or in part --- and that sequence could span a good long
time, if there's a lot to copy.  The originally complained-of problem
(that pg_control gets nuked) is just the tip of that iceberg, and I'm
afraid that the proposed patch is only removing a single potential
failure mode.

I think that a saner design would involve not truncating any particular
target file until we first get some data to write to it.  Christian's
original patch shown at
https://www.postgresql.org/message-id/CAB7nPqTyq3cjphJmoCPc0n8r90O4j1BpZC2BMaj8dKZ7g0-mtw%40mail.gmail.com
seems to be headed in that direction, in that it forces ordering of the
data fetch (which is likely a good idea anyway to ensure locality of
reference) and then performs the truncation when we get the first chunk
of data for a file.

Michael's proposal to have separate error handling for data and config
files is not bad in itself, and it does improve the behavior for one
foreseeable trouble case.  But I think we need the other thing as well,
or some variant of it, before we can regard pg_rewind as anything except
a chainsaw with the safety catch missing.

            regards, tom lane


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: BUG #15112: Unable to run pg_upgrade with earthdistance extension
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #15141: Faulty ISO 8601 parsing