Re: [Patch] ALTER SYSTEM READ ONLY

Поиск
Список
Период
Сортировка
От Amul Sul
Тема Re: [Patch] ALTER SYSTEM READ ONLY
Дата
Msg-id CAAJ_b973D6dJocpFEMVwDT2rdw_V0czToWViEnj4+Ydq+y8Y=w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [Patch] ALTER SYSTEM READ ONLY  (Amul Sul <sulamul@gmail.com>)
Ответы Re: [Patch] ALTER SYSTEM READ ONLY  (Amul Sul <sulamul@gmail.com>)
Список pgsql-hackers
On Wed, Nov 17, 2021 at 11:13 AM Amul Sul <sulamul@gmail.com> wrote:
>
> On Sat, Nov 13, 2021 at 2:18 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Mon, Nov 8, 2021 at 8:20 AM Amul Sul <sulamul@gmail.com> wrote:
> > > Attached is the rebased version of refactoring as well as the
> > > pg_prohibit_wal feature patches for the latest master head (commit #
> > > 39a3105678a).
> >
> > I spent a lot of time today studying 0002, and specifically the
> > question of whether EndOfLog must be the same as
> > XLogCtl->replayEndRecPtr and whether EndOfLogTLI must be the same as
> > XLogCtl->replayEndTLI.
> >
> > The answer to the former question is "no" because, if we don't enter
> > redo, XLogCtl->replayEndRecPtr won't be initialized at all. If we do
> > enter redo, then I think it has to be the same unless something very
> > weird happens. EndOfLog gets set like this:
> >
> >     XLogBeginRead(xlogreader, LastRec);
> >     record = ReadRecord(xlogreader, PANIC, false, replayTLI);
> >     EndOfLog = EndRecPtr;
> >
> > In every case that exists in our regression tests, EndRecPtr is the
> > same before these three lines of code as it is afterward. However, if
> > you test with recovery_target=immediate, you can get it to be
> > different, because in that case we drop out of the redo loop after
> > calling recoveryStopsBefore() rather than after calling
> > recoveryStopsAfter(). Similarly I'm fairly sure that if you use
> > recovery_target_inclusive=off you can likewise get it to be different
> > (though I discovered the hard way that recovery_target_inclusive=off
> > is ignored when you use recovery_target_name). It seems like a really
> > bad thing that neither recovery_target=immediate nor
> > recovery_target_inclusive=off have any tests, and I think we ought to
> > add some.
> >
>
> recovery/t/003_recovery_targets.pl has test for
> recovery_target=immediate but not for recovery_target_inclusive=off, we
> can add that for recovery_target_lsn, recovery_target_time, and
> recovery_target_xid case only where it affects.
>
> > Anyway, in effect, these three lines of code have the effect of
> > backing up the xlogreader by one record when we stop before rather
> > than after a record that we're replaying. What that means is that
> > EndOfLog is going to be the end+1 of the last record that we actually
> > replayed. There might be one more record that we read but did not
> > replay, and that record won't impact the value we end up with in
> > EndOfLog. Now, XLogCtl->replayEndRecPtr is also that end+1 of the last
> > record that we actually replayed. To put that another way, there's no
> > way to exit the main redo loop after we set XLogCtl->replayEndRecPtr
> > and before we change LastRec. So in the cases where
> > XLogCtl->replayEndRecPtr gets initialized at all, it can only be
> > different from EndOfLog if something different happens when we re-read
> > the last-replayed WAL record than what happened when we read it the
> > first time. That seems unlikely, and would be unfortunate it if it did
> > happen. I am inclined to think that it might be better not to reread
> > the record at all, though.
>
> There are two reasons that the record is reread; first, one that you
> have just explained where the redo loop drops out due to
> recoveryStopsBefore() and another one is that InRecovery is false.
>
> In the formal case at the end, redo while-loop does read a new record
> which in effect updates EndRecPtr and when we breaks the loop, we do
> reach the place where we do reread record -- where we do read the
> record (i.e. LastRec) before the record that redo loop has read and
> which correctly sets EndRecPtr. In the latter case, definitely, we
> don't need any adjustment to EndRecPtr.
>
> So technically one case needs reread but that is also not needed, we
> have that value in XLogCtl->lastReplayedEndRecPtr. I do agree that we
> do not need to reread the record, but EndOfLog and EndOfLogTLI should
> be set conditionally something like:
>
> if (InRecovery)
> {
>     EndOfLog = XLogCtl->lastReplayedEndRecPtr;
>     EndOfLogTLI = XLogCtl->lastReplayedTLI;
> }
> else
> {
>     EndOfLog = EndRecPtr;
>     EndOfLogTLI = replayTLI;
> }
>
> > As far as this patch goes, I think we need
> > a solution that doesn't involve fetching EndOfLog from a variable
> > that's only sometimes initialized and then not doing anything with it
> > except in the cases where it was initialized.
> >
>
> Another reason could be EndOfLog changes further in the following case:
>
> /*
>  * Actually, if WAL ended in an incomplete record, skip the parts that
>  * made it through and start writing after the portion that persisted.
>  * (It's critical to first write an OVERWRITE_CONTRECORD message, which
>  * we'll do as soon as we're open for writing new WAL.)
>  */
> if (!XLogRecPtrIsInvalid(missingContrecPtr))
> {
>     Assert(!XLogRecPtrIsInvalid(abortedRecPtr));
>     EndOfLog = missingContrecPtr;
> }
>
> Now only solution that I can think is to copy EndOfLog (so
> EndOfLogTLI) into shared memory.
>
> > As for EndOfLogTLI, I'm afraid I don't think that's the same thing as
> > XLogCtl->replayEndTLI. Now, it's hard to be sure, because I don't
> > think the regression tests contain any scenarios where we run recovery
> > and the values end up different. However, I think that the code sets
> > EndOfLogTLI to the TLI of the last WAL file that we read, and I think
> > XLogCtl->replayEndTLI gets set to the timeline from which that WAL
> > record originated. So imagine that we are looking for WAL that ought
> > to be in 000000010000000000000003 but we don't find it; instead we
> > find 000000020000000000000003 because our recovery target timeline is
> > 2, or something that has 2 in its history. We will read the WAL for
> > timeline 1 from this file which has timeline 2 in the file name. I
> > think if recovery ends in this file before the timeline switch, these
> > values will be different. I did not try to construct a test case for
> > this today due to not having enough time, so it's possible that I'm
> > wrong about this, but that's how it looks to me from the code.
> >
>
> I am not sure, I have understood this scenario due to lack of
> expertise in this area -- Why would the record we looking that ought
> to be in 000000010000000000000003 we don't find it? Possibly WAL
> corruption or that file is missing?
>

On further study, XLogPageRead(), WaitForWALToBecomeAvailable(), and
XLogFileReadAnyTLI(), I think I could make a sense that there could be
a case where the record belong to TLI 1 we are looking for; we might
open the file with TLI 2. But, I am wondering what's wrong if we say
that TLI 1 for that record even if we read it from the file has TLI 2 or 3 or 4
in its file name -- that statement is still true, and that record
should be still accessible from the filename with TLI 1.  Also, if we
going to consider this reading record exists before the timeline
switch point as the EndOfLog then why should be worried about the
latter timeline switch which eventually everything after the EndOfLog
going to be useless for us. We might continue switching TLI and/or
writing the WAL right after EndOfLog, correct me if I am missing
something here.

Further, I still think replayEndTLI has set to the correct value what
we looking for EndOfLogTLI when we go through the redo loop. When it
read the record and finds a change in the current replayTLI then it
updates that as:

if (newReplayTLI != replayTLI)
{
    /* Check that it's OK to switch to this TLI */
    checkTimeLineSwitch(EndRecPtr, newReplayTLI,
                        prevReplayTLI, replayTLI);

    /* Following WAL records should be run with new TLI */
    replayTLI = newReplayTLI;
    switchedTLI = true;
}

Then replayEndTLI gets updated. If we going to skip the reread of
"LastRec" that we were discussing, then I think the following code
that fetches the EndOfLogTLI is also not needed, XLogCtl->replayEndTLI
(or XLogCtl->lastReplayedTLI) or replayTLI (when InRecovery is false)
should be enough, AFAICU.

/*
 * EndOfLogTLI is the TLI in the filename of the XLOG segment containing
 * the end-of-log. It could be different from the timeline that EndOfLog
 * nominally belongs to, if there was a timeline switch in that segment,
 * and we were reading the old WAL from a segment belonging to a higher
 * timeline.
 */
EndOfLogTLI = xlogreader->seg.ws_tli;

Regards,
Amul



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: pg_get_publication_tables() output duplicate relid
Следующее
От: vignesh C
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side