Re: [BUG] non archived WAL removed during production crash recovery

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: [BUG] non archived WAL removed during production crash recovery
Дата
Msg-id 20200421.120925.76453535156648004.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: [BUG] non archived WAL removed during production crash recovery  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: [BUG] non archived WAL removed during production crash recovery  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs
At Tue, 21 Apr 2020 11:15:01 +0900, Michael Paquier <michael@paquier.xyz> wrote in
> On Mon, Apr 20, 2020 at 02:22:35PM +0200, Jehan-Guillaume de Rorthais wrote:
> > The problem is that we would have to read the controldata file each time we
> > wonder if a segment should be archived/removed. Moreover, the controldata
> > file might not be in sync quickly enough with the real state for some other
> > code path or futur needs.
>
> I don't think that this is what Horiguchi-san meant here.  What I got
> from his previous message would be to be to copy the shared value from
> the control file when necessary, and have the shared state use only a
> subset of the existing values of DBState, aka:
> - DB_IN_CRASH_RECOVERY
> - DB_IN_ARCHIVE_RECOVERY
> - DB_IN_PRODUCTION

First I thought as above, but I thought that we could use
ControlFile->state itself in this case, by regarding the symbols less
than DB_IN_CRASH_RECOVERY as RECOVERY_STATE_CRASH.  I don't think
there's no problem if the update to DB_IN_ARCHIVE_RECOVERY reaches
checkpointer with some delay.

> Still, that sounds wrong to me because then somebody would be tempted
> to change the shared value thinking that things like DB_SHUTDOWNING,
> DB_SHUTDOWNED_* or DB_STARTUP are valid but we don't want that here.

That is not an issue if we just use DBState to know whether we have
started archive recovery.

> Note that there may be a case for DB_STARTUP to be used in
> XLOGShmemInit(), but I'd rather let the code use the safest default,
> DB_IN_CRASH_RECOVERY to control that we won't remove .ready files by
> default until the startup process sees fit to do the actual switch
> depending on the checkpoint record lookup, if archive recovery was
> actually requested, etc.

I'm not sure I read this correctly. But I think I agree to this.

+    if (!XLogArchivingAlways() &&
+        GetRecoveryState() == RECOVERY_STATE_ARCHIVE)

Is rewritten as

+    if (!XLogArchivingAlways() &&
+        GetDBState() > DB_IN_CRASH_RECOVERY)

FWIW, what annoyed me is there are three variables that are quite
similar but has different domains, ControlFile->state,
XLogCtl->SharedRecoveryState, and LocalRecoveryInProgress. I didn't
mind there were two, but three seems a bit too many to me.

But it may be different issue.

> > Indeed, Benoît Lobréau reported this behavior to me.
>
> Noted.  Thanks for the information.  I don't think that I have ever
> met Benoît in person, do I?  Tell him that I owe him one beer or a
> beverage of his choice when we meet IRL, and that he had better use
> this message-id to make me keep my promise :)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [BUG] non archived WAL removed during production crash recovery
Следующее
От: cbw
Дата:
Сообщение: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever