Re: "could not open file "pg_wal/…": No such file or directory" potential crashing bug due to race condition between restartpoint and recovery

Поиск
Список
Период
Сортировка
От Thomas Crayford
Тема Re: "could not open file "pg_wal/…": No such file or directory" potential crashing bug due to race condition between restartpoint and recovery
Дата
Msg-id CAJgZ2Z4-dPQd1V7PS04JESELCEWtykCBtvcJ6Ezpd+7xW2qqiA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: "could not open file "pg_wal/…": No such file or directory" potential crashing bug due to race condition betweenrestartpoint and recovery  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: "could not open file "pg_wal/…": No such file or directory" potential crashing bug due to race condition betweenrestartpoint and recovery  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs
Hi there Michael,

Sorry for the slow response on this - I was oncall last week and it was quite distracting and busy.

With respect to the restore_command, we use wal-e: https://github.com/wal-e/wal-e, specifically:

envdir DIRECTORY wal-e wal-fetch "%f" "%p"

Thanks

Tom

On Fri, Sep 28, 2018 at 11:59 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Sep 28, 2018 at 01:02:42PM +0100, Thomas Crayford wrote:
> Ok, thanks for the pointer. It seems like the race condition I talked about
> is still accurate, does that seem right?

KeepFileRestoredFromArchive() looks like a good candidate on the matter
as it removes a WAL segment before replacing it by another with the same
name.  I have a hard time understanding why the checkpointer would try
to recycle a segment just recovered though as the startup process would
immediately try to use it.  I have not spent more than one hour looking
at potential spots though, which is not much for this kind of race
conditions.

It is also why I am curious about what kind of restore_command you are
using.
--
Michael

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #15412: "invalid contrecord length" during WAL replicarecovery
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #15413: windows 10