Re: [BUG] non archived WAL removed during production crash recovery

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [BUG] non archived WAL removed during production crash recovery
Дата
Msg-id 20200424014856.f3jvvcxxmt5p7k5z@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [BUG] non archived WAL removed during production crash recovery  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: [BUG] non archived WAL removed during production crash recovery  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs
Hi,

On 2020-04-24 08:54:02 +0900, Michael Paquier wrote:
> On Thu, Apr 23, 2020 at 06:59:53PM +0200, Jehan-Guillaume de Rorthais wrote:
> > Please, find in v11 for version 9.5, 9.6 and 10.
>
> I have worked more on that using your v11, tweaked few comments in the
> new test parts for 9.5~10, and applied the whole.  Thanks all for your
> work.  I am keeping now an eye on the buildfarm.

I am confused by this commit. You added shared memory to differentiate
between crash recovery and standby mode/archive recovery, correct? You
write:

>    This commit fixes the regression by tracking in shared memory if a live
>    cluster is either in crash recovery or archive recovery as the handling
>    of WAL segments ready to be archived is different in both cases (those
>    WAL segments should not be removed during crash recovery), and by using
>    this new shared memory state to decide if a segment can be recycled or
>    not.

But don't we pretty much know this already from the state of the system?
During crash recovery there's nothing running RemoveOldXLogFiles() but
the startup process. Right? And in DB_IN_ARCHIVE_RECOVERY it should only
happen as part of restartpoints (i.e. the checkpointer).

Did you add the new shared state to avoid deducing things from the
"environment"? If so, it should really be mentioned in the commit
message & code. Because:

>    Previously, it was not possible to know if a cluster was in crash
>    recovery or archive recovery as the shared state was able to track only
>    if recovery was happening or not, leading to the problem.

really doesn't make that obvious.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: M Fysh
Дата:
Сообщение: Re: BUG #16364: ICACLS error when installing under system context "NTAUTHORITY\SYSTEM" ie installing with SCCM
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [BUG] non archived WAL removed during production crash recovery