Re: Concurrency issue in pg_rewind

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Concurrency issue in pg_rewind
Дата
Msg-id 20201007191312.GB3063@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: Concurrency issue in pg_rewind  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
Greetings,

* Heikki Linnakangas (hlinnaka@iki.fi) wrote:
> On 18/09/2020 10:17, Alexander Kukushkin wrote:
> >At the same time, pg_rewind due to such "fatal" error leaves PGDATA in
> >an inconsistent state with empty pg_control file, this is totally bad
> >and easily fixable. We want the specific file to be absent and it is
> >already absent, why should it be a fatal error and not warning?
>
> Whenever pg_rewind runs into something unexpected, it fails loudly, so that
> the administrator can re-initialize from a base backup. That's the general
> rule. If a file goes missing while pg_rewind is running, that is unexpected.
> It could be a sign that the server was started concurrently, or another
> pg_rewind was started against it, for example.

Agreed.

> I feel that we could make an exception of some sort here, but I'm not sure
> what exactly. I don't feel comfortable just downgrading the unexpected
> ENOENT on unlink() to warning in all cases. Besides, scary warnings that you
> routinely ignore is not good either.

I also dislike the idea of downgrading this.

> I have a hard time coming up with a general rule and justification that's
> not just "do X because WAL-G does Y". pg_rewind failing because WAL-G
> removed a file unexpectedly is one problem, but another is that the
> restore_command might get confused if a pg_rewind removes a file that
> restore_command needs. This is hard when restore_command does things in the
> background, and there's no communication between the background process and
> pg_rewind.

I would also point out that wal-g isn't the only backup/restore tool
that does pre-fetching: so does pgbackrest, but we pre-fetch into an
independent spool directory, because these tools really should *not* be
modifying the PGDATA directory during restore_command.

I'm really disinclined to make concessions for external tools to start
writing into directories that they shouldn't be- and this goes for
removing .ready files too, imv.  Yes, you can do such things and maybe
things will work, but if you run into issues with that, that's on you
for making changes to the PGDATA directory, not on PG to try and guess
at what you, or any other external tool, did and magically work around
it or keep it working.

Thanks,

Stephen

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: Parallel copy
Следующее
От: Emil Iggland
Дата:
Сообщение: Re: BUG #15858: could not stat file - over 4GB