Re: [BUG] non archived WAL removed during production crash recovery

Поиск

Список

Период

Сортировка

От	Jehan-Guillaume de Rorthais
Тема	Re: [BUG] non archived WAL removed during production crash recovery
Дата	21 апреля 2020 г. 22:00:22
Msg-id	20200422000022.7aa6a62b@firost обсуждение исходный текст
Ответ на	Re: [BUG] non archived WAL removed during production crash recovery (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы	Re: [BUG] non archived WAL removed during production crash recovery
Список	pgsql-bugs

Дерево обсуждения

On Tue, 21 Apr 2020 15:08:17 +0900 (JST)
Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

> At Tue, 21 Apr 2020 13:57:39 +0900, Michael Paquier <michael@paquier.xyz>
> wrote in 
> > On Tue, Apr 21, 2020 at 12:09:25PM +0900, Kyotaro Horiguchi wrote:  
> > > +    if (!XLogArchivingAlways() &&
> > > +        GetRecoveryState() == RECOVERY_STATE_ARCHIVE)
> > > 
> > > Is rewritten as 
> > > 
> > > +    if (!XLogArchivingAlways() &&
> > > +        GetDBState() > DB_IN_CRASH_RECOVERY)
> > > 
> > > FWIW, what annoyed me is there are three variables that are quite
> > > similar but has different domains, ControlFile->state,
> > > XLogCtl->SharedRecoveryState, and LocalRecoveryInProgress. I didn't
> > > mind there were two, but three seems a bit too many to me.  

In fact, my original goal [1] was to avoid adding another shared boolean related
to the same topic. So I understand your feeling.

I played with this idea again, based on your argument that there's no problem
if the update to DB_IN_ARCHIVE_RECOVERY reaches checkpointer with some delay.

The other point I feel bad with is to open and check the controlfile again and
again for each segment to archive, even on running production instance.
It's not that it would be heavy, but it feels overkill to fetch this information
that should be available more easily.

That leads me an idea where we would keep the ControlFile data up-to-date in
shared memory. There's a few duplicates between ControlFile and XLogCtl, so
maybe it could make the code a little simpler at some other places than just
fixing $SUBJECT using DBState? I'm not sure of the implications and impacts
though. This seems way bigger than the current fix and with many traps on the
way. Maybe we could discuss this in another thread if you think it deserves it?

Regards,

[1]
https://www.postgresql.org/message-id/flat/20200403182625.0fccc6fd%40firost#28a756094a4b1f3dd24927fb6311927d

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [BUG] non archived WAL removed during production crash recovery