Re: using an end-of-recovery record in all cases

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: using an end-of-recovery record in all cases
Дата
Msg-id CAA4eK1LJ_0YGp8EPqXRL41cVQFC=DgGrx_7=_4v1hiDkAbMy7Q@mail.gmail.com
обсуждение исходный текст
Ответ на using an end-of-recovery record in all cases  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Tue, Aug 10, 2021 at 12:31 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jul 29, 2021 at 11:49 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Wed, Jul 28, 2021 at 3:28 PM Andres Freund <andres@anarazel.de> wrote:
> > > > Imagine if instead of
> > > > all the hairy logic we have now we just replaced this whole if
> > > > (IsInRecovery) stanza with this:
> > > >
> > > > if (InRecovery)
> > > >     CreateEndOfRecoveryRecord();
> > > >
> > > > That would be WAY easier to reason about than the rat's nest we have
> > > > here today. Now, I am not sure what it would take to get there, but I
> > > > think that is the direction we ought to be heading.
> > >
> > > What are we going to do in the single user ([1]) case in this awesome future?
> > > I guess we could just not create a checkpoint until single user mode is shut
> > > down / creates a checkpoint for other reasons?
> >
> > It probably depends on who writes this thus-far-hypothetical patch,
> > but my thought is that we'd unconditionally request a checkpoint after
> > writing the end-of-recovery record, same as we do now if (promoted).
> > If we happened to be in single-user mode, then that checkpoint request
> > would turn into performing a checkpoint on the spot in the one and
> > only process we've got, but StartupXLOG() wouldn't really need to care
> > what happens under the hood after it called RequestCheckpoint().
>
> I decided to try writing a patch to use an end-of-recovery record
> rather than a checkpoint record in all cases. The patch itself was
> pretty simple but didn't work. There are two different reasons why it
> didn't work, which I'll explain in a minute. I'm not sure whether
> there are any other problems; these are the only two things that cause
> problems with 'make check-world', but that's hardly a guarantee of
> anything. Anyway, I thought it would be useful to report these issues
> first and hopefully get some feedback.
>
> The first problem I hit was that GetRunningTransactionData() does
> Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid)).
> While that does seem like a superficially reasonable thing to assert,
> StartupXLOG() initializes latestCompletedXid by calling
> TransactionIdRetreat on the value extracted from checkPoint.nextXid,
> and the first-ever checkpoint that is written at the beginning of WAL
> has a nextXid of 3, so when starting up from that checkpoint nextXid
> is 2, which is not a normal XID. When we try to create the next
> checkpoint, CreateCheckPoint() does LogStandbySnapshot() which calls
> GetRunningTransactionData() and the assertion fails. In the current
> code, we avoid this more or less accidentally because
> LogStandbySnapshot() is skipped when starting from a shutdown
> checkpoint. If we write an end-of-recovery record and then trigger a
> checkpoint afterwards, then we don't avoid the problem. Although I'm
> impishly tempted to suggest that we #define SecondNormalTransactionId
> 4 and then use that to create the first checkpoint instead of
> FirstNormalTransactionId -- naturally with no comments explaining why
> we're doing it -- I think the real problem is that the assertion is
> just wrong. CurrentRunningXacts->latestCompletedXid shouldn't be
> InvalidTransactionId or BootstrapTransactionId, but
> FrozenTransactionId is a legal, if corner-case, value, at least as the
> code stands today.
>
> Unfortunately we can't just relax the assertion, because the
> XLOG_RUNNING_XACTS record will eventually be handed to
> ProcArrayApplyRecoveryInfo() for processing ... and that function
> contains a matching assertion which would in turn fail. It in turn
> passes the value to MaintainLatestCompletedXidRecovery() which
> contains yet another matching assertion, so the restriction to normal
> XIDs here looks pretty deliberate. There are no comments, though, so
> the reader is left to guess why. I see one problem:
> MaintainLatestCompletedXidRecovery uses FullXidRelativeTo, which
> expects a normal XID. Perhaps it's best to just dodge the entire issue
> by skipping LogStandbySnapshot() if latestCompletedXid happens to be
> 2,
>

By reading above explanation, it seems like it is better to skip
LogStandbySnapshot() for this proposal. Can't we consider skipping
LogStandbySnapshot() by passing some explicit flag-like
'recovery_checkpoint' or something like that?

I think there is some prior discussion in another thread related to
this work but it would be easier to follow if you can summarize the
same.


With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: Avoid stuck of pbgench due to skipped transactions
Следующее
От: Gilles Darold
Дата:
Сообщение: Re: [PATCH] Hooks at XactCommand level