Обсуждение: removing global variable ThisTimeLineID

Поиск
Список
Период
Сортировка

removing global variable ThisTimeLineID

От
Robert Haas
Дата:
Hi,

The global variable ThisTimeLineID is rather confusing. During
recovery, in the startup process, when we're reading a WAL record, it
is the timeline of the WAL record we are trying to read or have just
read, except when we're trying to read the initial checkpoint record,
when it's zero. In other processes, it's typically 0 throughout
recovery, but sometimes it's not, because for example
CreateRestartPoint temporarily sets it to the timeline from which
we're replaying WAL, since there's no other way to get
RemoveOldXlogFiles() to recycle files onto the right timeline, or
PreallocXlogFiles() to preallocate onto the right timeline. Similarly,
walreceiver and walsender find it convenient to make ThisTimeLineID
the timeline from which WAL is being replayed or at least the one from
which it was being replayed at last check. logicalfuncs.c and
slotfuncs.c set the global variable sometimes too, apparently just for
fun, as the value doesn't seem to get used for anything. In normal
running, once the startup process exits, the next call to
RecoveryInProgress() sets the value to the timeline into which new WAL
can be inserted. Note also that all of this is different from
XLogCtl->ThisTimeLineID, which is set at the end of recovery and thus,
in the startup process, generally different from ThisTImeLineID, but
in other processes, generally the same as ThisTimeLineID.

At the risk of stating the obvious, using the same variable for
different purposes at different times is not a great programming
practice. Commits 2f5c4397c39dea49c5608ba583868e26d767fc32 and
902a2c280012557b85c7e0fce3f6f0e355cb2d69 show that the possibility of
programmer error is not zero, even though neither of those issues are
serious. Moreover, the global variable itself seems to do nothing but
invite programming errors. The name itself is a disaster. What is
"this" timeline ID? Well, uh, the right one, of course! We should be
more clear about what we mean: the timeline into which we are
inserting, the timeline from which we are replaying, the timeline from
which we are performing logical decoding. I suspect that having a
global variable here isn't even really saving us anything much, as a
value that does not change can be read from shared memory without a
lock.

I would like to clean this up. Attached is a series of patches which
try to do that. For ease of review, this is separated out into quite a
few separate patches, but most likely we'd combine some of them for
commit. Patches 0001 through 0004 eliminate all use of the global
variable ThisTimeLineID outside of xlog.c, and patch 0005 makes it
"static" so that it ceases to be visible outside of xlog.c. Patches
0006 to 0010 clean up uses of ThisTimeLineID itself, passing it around
as a function parameter, or otherwise arranging to fetch the relevant
timeline ID from someplace sensible rather than using the global
variable as a scratchpad. Finally, patch 0011 deletes the global
variable.

I have not made a serious attempt to clear up all of the
terminological confusion created by the term ThisTimeLineID, which
also appears as part of some structs, so you'll still see that name in
the source code after applying these patches, just not as the name of
a global variable. I have, however, used names like insertTLI and
replayTLI in many places changed by the patch, and I think that makes
it significantly more clear which timeline is being discussed in each
case. In some places I have endeavored to improve the comments. There
is doubtless more that could be done here, but I think this is a
fairly respectable start.

Opinions welcome.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: removing global variable ThisTimeLineID

От
Michael Paquier
Дата:
On Mon, Nov 01, 2021 at 03:16:27PM -0400, Robert Haas wrote:
> At the risk of stating the obvious, using the same variable for
> different purposes at different times is not a great programming
> practice. Commits 2f5c4397c39dea49c5608ba583868e26d767fc32 and
> 902a2c280012557b85c7e0fce3f6f0e355cb2d69 show that the possibility of
> programmer error is not zero, even though neither of those issues are
> serious.

This can lead to more serious issues, see 595b9cba as recent example.
I agree that getting rid of it in the long term is appealing.

> Moreover, the global variable itself seems to do nothing but
> invite programming errors. The name itself is a disaster. What is
> "this" timeline ID? Well, uh, the right one, of course! We should be
> more clear about what we mean: the timeline into which we are
> inserting, the timeline from which we are replaying, the timeline from
> which we are performing logical decoding. I suspect that having a
> global variable here isn't even really saving us anything much, as a
> value that does not change can be read from shared memory without a
> lock.

Some callbacks related to the WAL reader to read a patch have
introduced a dependency to ThisTimelineID as a mean to be used for
logical decoding on standbys.  This is discussed a bit on this thread,
for example:
https://www.postgresql.org/message-id/ddc31aa9-8083-58b7-0b47-e371cd4c705b@oss.nttdata.com

Tracking properly the TLI in logical decoding was mentioned by Andres
multiple times on -hackers, and this has not been really done.  Using
ThisTimelineID for this purpose is something I've found confusing
myself, FWIW, so this is a step in the good direction.

> I would like to clean this up. Attached is a series of patches which
> try to do that. For ease of review, this is separated out into quite a
> few separate patches, but most likely we'd combine some of them for
> commit. Patches 0001 through 0004 eliminate all use of the global
> variable ThisTimeLineID outside of xlog.c, and patch 0005 makes it
> "static" so that it ceases to be visible outside of xlog.c. Patches
> 0006 to 0010 clean up uses of ThisTimeLineID itself, passing it around
> as a function parameter, or otherwise arranging to fetch the relevant
> timeline ID from someplace sensible rather than using the global
> variable as a scratchpad. Finally, patch 0011 deletes the global
> variable.
>
> I have not made a serious attempt to clear up all of the
> terminological confusion created by the term ThisTimeLineID, which
> also appears as part of some structs, so you'll still see that name in
> the source code after applying these patches, just not as the name of
> a global variable. I have, however, used names like insertTLI and
> replayTLI in many places changed by the patch, and I think that makes
> it significantly more clear which timeline is being discussed in each
> case. In some places I have endeavored to improve the comments. There
> is doubtless more that could be done here, but I think this is a
> fairly respectable start.

Thanks for splitting things this way.  This helps a lot.

0001 looks fair.

+   /*
+    * If we're writing and flushing WAL, the time line can't be changing,
+    * so no lock is required.
+    */
+   if (insertTLI)
+       *insertTLI = XLogCtl->ThisTimeLineID;
In 0002, there is no downside in putting that in the spinlock section
either, no?  It seems to me that we should be able to call this one
even in recovery, as flush LSNs exist while in recovery.

+TimeLineID
+GetWALInsertionTimeLine(void)
+{
+   Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);
Okay, that's cheaper.

The series 0003-0006 seemed rather fine, at quick glance.

In 0007, XLogFileClose() should reset openLogTLI.  The same can be
said in BootStrapXLOG() before creating the control file.  It may be
cleaner here to introduce an InvalidTimelineId, as discussed a couple
of days ago.

Some comments at the top of recoveryTargetTLIRequested need a refresh
with their references to ThisTimelineID.

The handling of XLogCtl->ThisTimeLineID still seems a bit blurry when
it comes to timeline switches because we don't update it while in
recovery when replaying a end-of-recovery record.  This could at least
be documented better.
--
Michael

Вложения

Re: removing global variable ThisTimeLineID

От
Robert Haas
Дата:
On Mon, Nov 1, 2021 at 11:33 PM Michael Paquier <michael@paquier.xyz> wrote:
> +   /*
> +    * If we're writing and flushing WAL, the time line can't be changing,
> +    * so no lock is required.
> +    */
> +   if (insertTLI)
> +       *insertTLI = XLogCtl->ThisTimeLineID;
> In 0002, there is no downside in putting that in the spinlock section
> either, no?  It seems to me that we should be able to call this one
> even in recovery, as flush LSNs exist while in recovery.

I don't think it matters very much from a performance point of view,
although putting a branch inside of a spinlock-protected section may
be undesirable. My bigger issues with this kind of idea is that we
don't want to use spinlocks as a "security blanket" (as in Linus from
Peanuts). Every time we use a spinlock, or any other kind of locking
mechanism, we should have a clear idea what we're protecting ourselves
against. I'm quite suspicious that there are a number of places where
we're taking spinlocks in xlog.c just to feel virtuous, and not
because it does anything useful, and I don't particularly want to
increase the number of such places.

Also, XLogCtl->ThisTimeLineID isn't set until the end of recovery, so
calling this function during recovery would be a mistake. There seem
to be a number of interface functions in xlog.c that should only be
called when not in recovery, and neither their names nor their
comments make that clear. I wasn't bold enough to go label all the
ones I think fall into that category, but maybe we should. Honestly,
the naming of things in this file is annoyingly bad in general. My
favorite example, found during this project, is:

#define ConvertToXSegs(x, segsize)  XLogMBVarToSegs((x), (segsize))

It's not good enough to have one name that's difficult to understand
or remember or capitalize properly -- let's have TWO -- for the same
thing!

> In 0007, XLogFileClose() should reset openLogTLI.  The same can be
> said in BootStrapXLOG() before creating the control file.  It may be
> cleaner here to introduce an InvalidTimelineId, as discussed a couple
> of days ago.

I thought about that, but I think it's pointless. I think we can just
say that if openLogFile == -1, openLogSegNo and openLogTLI are
meaningless. I don't think we're currently resetting openLogSegNo to
an invalid value either, so I don't think openLogTLI should be treated
differently. I'm not opposed to introducing InvalidTimeLineID on
principle, and I looked for a way of doing it in this patch set, but
it didn't seem all that valuable, and I feel that someone who cares
about it can do it as a separate effort after I commit these.
Honestly, I think these patches greatly reduce the need to be afraid
of leaving the TLI unset, because they remove the crutch of hoping
that ThisTimeLineID is initialized to the proper value. You actually
have to think about which TLI you are going to pass. Now I'm very sure
further improvement is possible, but I think this patch set is complex
enough already, and I'd like to get it committed before contemplating
any other work in this area.

> Some comments at the top of recoveryTargetTLIRequested need a refresh
> with their references to ThisTimelineID.

After thinking this over, I think we should just delete the entire
first paragraph, so that the comments start like this:

/*
 * recoveryTargetTimeLineGoal: what the user requested, if any
 *
 * recoveryTargetTLIRequested: numeric value of requested timeline, if constant

I don't see that we'd be losing anything that way. When you look at
that paragraph now, the first sentence is just confusing the issue,
because we actually don't care about XLogCtl->ThisTimeLineID during
recovery, because it's not set. We do care about plain old
ThisTimeLineID, but that's going to be demoted to a local variable by
these patches, and there just doesn't seem to be any reason to discuss
it here. The second sentence is wrong already, because the rmgr
routines do not rely on ThisTimeLineID. And the third sentence is also
wrong and/or confusing, because not all of the variables that follow
are TLIs -- only 3 of 5 are. I don't actually think there's any useful
introduction we need to provide here; we just need to tell people what
the variables we're about to declare do.

> The handling of XLogCtl->ThisTimeLineID still seems a bit blurry when
> it comes to timeline switches because we don't update it while in
> recovery when replaying a end-of-recovery record.  This could at least
> be documented better.

We don't update it during recovery at all. There's exactly one
assignment statement for that variable and it's here:

    /* Save the selected TimeLineID in shared memory, too */
    XLogCtl->ThisTimeLineID = ThisTimeLineID;
    XLogCtl->PrevTimeLineID = PrevTimeLineID;

That's after the main redo loop completes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: removing global variable ThisTimeLineID

От
Amul Sul
Дата:
On Tue, Nov 2, 2021 at 12:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> [...]
> I would like to clean this up. Attached is a series of patches which
> try to do that. For ease of review, this is separated out into quite a
> few separate patches, but most likely we'd combine some of them for
> commit. Patches 0001 through 0004 eliminate all use of the global
> variable ThisTimeLineID outside of xlog.c, and patch 0005 makes it
> "static" so that it ceases to be visible outside of xlog.c. Patches
> 0006 to 0010 clean up uses of ThisTimeLineID itself, passing it around
> as a function parameter, or otherwise arranging to fetch the relevant
> timeline ID from someplace sensible rather than using the global
> variable as a scratchpad. Finally, patch 0011 deletes the global
> variable.
>
> I have not made a serious attempt to clear up all of the
> terminological confusion created by the term ThisTimeLineID, which
> also appears as part of some structs, so you'll still see that name in
> the source code after applying these patches, just not as the name of
> a global variable. I have, however, used names like insertTLI and
> replayTLI in many places changed by the patch, and I think that makes
> it significantly more clear which timeline is being discussed in each
> case. In some places I have endeavored to improve the comments. There
> is doubtless more that could be done here, but I think this is a
> fairly respectable start.
>
> Opinions welcome.
>

I had a plan to look at these patches closely, but unfortunately,
didn't get enough time; and might not be able to spend time in the
rest of this week. Here are a few thoughts that I had in the initial
go-through, but that may or may not sound very interesting:

0002:
-GetFlushRecPtr(void)
+GetFlushRecPtr(TimeLineID *insertTLI)

Should we rename this argument to more generic as "tli", like
GetStandbyFlushRecPtr, since the caller in 003 patch trying to fetch a
TLI that means different for them, e.g. currTLI, FlushTLI, etc.


0004:
 static int
-XLogFileInitInternal(XLogSegNo logsegno, bool *added, char *path)
+XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
+                    bool *added, char *path)

 int
-XLogFileInit(XLogSegNo logsegno)
+XLogFileInit(XLogSegNo logsegno, TimeLineID logtli)
 {

How about logsegtli instead, to be inlined with logsegno ?


0007:
 static XLogSegNo openLogSegNo = 0;
+static TimeLineID openLogTLI = 0;

Similarly, openLogSegTLI ?


0008:
+   /*
+    * Given that we're not in recovery, ThisTimeLineID is set and can't
+    * change, so we can read it without a lock.
+    */
+   insertTLI = XLogCtl->ThisTimeLineID;

Can't GetWALInsertionTimeLine() called instead? I guess the reason is
to avoid the unnecessary overhead involved in the function call. (Same
at a few other places.)

Regards,
Amul



Re: removing global variable ThisTimeLineID

От
Robert Haas
Дата:
On Wed, Nov 3, 2021 at 9:12 AM Amul Sul <sulamul@gmail.com> wrote:
> 0002:
> -GetFlushRecPtr(void)
> +GetFlushRecPtr(TimeLineID *insertTLI)
>
> Should we rename this argument to more generic as "tli", like
> GetStandbyFlushRecPtr, since the caller in 003 patch trying to fetch a
> TLI that means different for them, e.g. currTLI, FlushTLI, etc.

It's possible that more consistency would be better here, but I don't
think making this name more generic is going in the right direction.
If somebody gets the TLI using this function, it's the timeline into
which a system not in recovery is inserting WAL, which is the same
timeline to which WAL is being flushed. I think it's important for
this name to try to make that clear. It doesn't really matter if
someone treats the insertTLI as the writeTLI or the flushTLI since on
a system in production they're all the same - but they must not
confuse it with, say, the replayTLI.

> How about logsegtli instead, to be inlined with logsegno ?
> Similarly, openLogSegTLI ?

Hmm, well I guess it depends on how you think the words group. If you
logsegno means "the number of the log segment" then it would make
sense to have logsegli to mean "the tli of the log segment." But I
think of logsegno as meaning "the segment number of the log file," so
it makes more sense to have logtli to mean "the TLI of the log file".
I think it's definitely not a "log segment file" - just a "log file".

> Can't GetWALInsertionTimeLine() called instead? I guess the reason is
> to avoid the unnecessary overhead involved in the function call. (Same
> at a few other places.)

That, plus to avoid failing the assertion in that function. As we've
discussed on the ALTER SYSTEM READ ONLY thread, xlog.c itself inserts
some WAL before recovery is officially over.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: removing global variable ThisTimeLineID

От
Alvaro Herrera
Дата:
I looked at these in a very quick skim.  I agree that this is a
good step in a good direction.

I 'git rebase -x'd this series in order to compile and run the tests on
each patch individually.  There are no compiler warnings anywhere and no
test failures.


-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"



Re: removing global variable ThisTimeLineID

От
Robert Haas
Дата:
On Fri, Nov 5, 2021 at 9:49 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> I looked at these in a very quick skim.  I agree that this is a
> good step in a good direction.

Cool. I have now committed these. I grouped them into just 3 commits
rather than having 11 separate commits as I did in my earlier posting,
but the content is the same.

> I 'git rebase -x'd this series in order to compile and run the tests on
> each patch individually.  There are no compiler warnings anywhere and no
> test failures.

Thanks for checking. It's hard to tell whether the stuff that works on
my machine will work everywhere, unless someone tests it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: removing global variable ThisTimeLineID

От
Michael Paquier
Дата:
On Tue, Nov 02, 2021 at 08:45:57AM -0400, Robert Haas wrote:
> Also, XLogCtl->ThisTimeLineID isn't set until the end of recovery, so
> calling this function during recovery would be a mistake. There seem
> to be a number of interface functions in xlog.c that should only be
> called when not in recovery, and neither their names nor their
> comments make that clear. I wasn't bold enough to go label all the
> ones I think fall into that category, but maybe we should.

I got to wonder whether it would be better to add in GetFlushRecPtr()
the same assertion as GetWALInsertionTimeLine().  All the in-core
callers of this routine already assume that, and it would be buggy if
one passes down insertTLI to get the current TLI.

> I thought about that, but I think it's pointless. I think we can just
> say that if openLogFile == -1, openLogSegNo and openLogTLI are
> meaningless. I don't think we're currently resetting openLogSegNo to
> an invalid value either, so I don't think openLogTLI should be treated
> differently.

Wouldn't it be better to at least document that as a comment rather
than letting the reader guess it, then?  I agree that this is a minor
point, though.

> I'm not opposed to introducing InvalidTimeLineID on
> principle, and I looked for a way of doing it in this patch set, but
> it didn't seem all that valuable, and I feel that someone who cares
> about it can do it as a separate effort after I commit these.

No issues from me.  I have bumped into a case just at the end of last
week where I wanted a better way to tell if a TLI is invalid rather
than leaving things to 0, FWIW.

>> The handling of XLogCtl->ThisTimeLineID still seems a bit blurry when
>> it comes to timeline switches because we don't update it while in
>> recovery when replaying a end-of-recovery record.  This could at least
>> be documented better.
>
> We don't update it during recovery at all. There's exactly one
> assignment statement for that variable and it's here:

Indeed.  It looks like my reading was sloppy here.
--
Michael

Вложения

Re: removing global variable ThisTimeLineID

От
Robert Haas
Дата:
On Sun, Nov 7, 2021 at 11:24 PM Michael Paquier <michael@paquier.xyz> wrote:
> I got to wonder whether it would be better to add in GetFlushRecPtr()
> the same assertion as GetWALInsertionTimeLine().  All the in-core
> callers of this routine already assume that, and it would be buggy if
> one passes down insertTLI to get the current TLI.

Perhaps. That's related to the point I made before, that it might be a
good idea to be more clear about which of these functions are intended
to be used in recovery and which ones are intended to be used in
normal running. I don't rule out the possibility of doing some more
research here and tightening that up, but I don't want to go start
tweaking things unless I'm sure I know what I'm talking about. This
stuff is so confusing that I don't want to take any risk of making
changes that turn out to be incorrect.

> > I thought about that, but I think it's pointless. I think we can just
> > say that if openLogFile == -1, openLogSegNo and openLogTLI are
> > meaningless. I don't think we're currently resetting openLogSegNo to
> > an invalid value either, so I don't think openLogTLI should be treated
> > differently.
>
> Wouldn't it be better to at least document that as a comment rather
> than letting the reader guess it, then?  I agree that this is a minor
> point, though.

I don't think it's very confusing, but if you want to improve the
comments, go for it!

> >> The handling of XLogCtl->ThisTimeLineID still seems a bit blurry when
> >> it comes to timeline switches because we don't update it while in
> >> recovery when replaying a end-of-recovery record.  This could at least
> >> be documented better.
> >
> > We don't update it during recovery at all. There's exactly one
> > assignment statement for that variable and it's here:
>
> Indeed.  It looks like my reading was sloppy here.

I can hardly blame you. There are comments all over the place claiming
that this thing and that thing are the same when they're only
sometimes the same. Or, as in this case, things that are named the
same when they're actually different. I'm actually tempted to make a
pass over the comments related to this topic and just try to make
things more comprehensible. It's super-confusing as things stand.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: removing global variable ThisTimeLineID

От
Robert Haas
Дата:
On Mon, Nov 8, 2021 at 8:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Perhaps. That's related to the point I made before, that it might be a
> good idea to be more clear about which of these functions are intended
> to be used in recovery and which ones are intended to be used in
> normal running. I don't rule out the possibility of doing some more
> research here and tightening that up, but I don't want to go start
> tweaking things unless I'm sure I know what I'm talking about. This
> stuff is so confusing that I don't want to take any risk of making
> changes that turn out to be incorrect.

Well I went through and it seems to be OK: all the existing callers of
that function first verify that we're not in recovery. The patch to
make logical decoding work in standby mode might change that, but as
of now it's so. So in the attached patch, I have added the assertion
and comments that you have proposed there. I also removed the
misleading comment we discussed before. In addition to that, I renamed
the local variable ThisTimeLineID to replayTLI, and I also renamed
XLogCtl->ThisTimeLineID to InsertTimeLineID, which I think should make
things clearer: imagine if things that were used for different
purposes had different names!

Even with this patch, the name ThisTimeLineID is still used for
multiple purposes. It remains part of the CheckPoint struct, and also
part of the xl_end_of_recovery struct. But in both of those cases, the
name ThisTimeLineID actually makes sense, because what we're thinking
about is whether there's a timeline change, so we have ThisTimeLineID
and PrevTimeLineID and it seems fairly clear what that's supposed to
mean.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: removing global variable ThisTimeLineID

От
Michael Paquier
Дата:
On Mon, Nov 08, 2021 at 12:49:52PM -0500, Robert Haas wrote:
> Even with this patch, the name ThisTimeLineID is still used for
> multiple purposes. It remains part of the CheckPoint struct, and also
> part of the xl_end_of_recovery struct. But in both of those cases, the
> name ThisTimeLineID actually makes sense, because what we're thinking
> about is whether there's a timeline change, so we have ThisTimeLineID
> and PrevTimeLineID and it seems fairly clear what that's supposed to
> mean.

I got to wonder if it would be better to bite the bullet here for
these two looking at your patch, but I am fine to keep things as they
are, as well.

> Thoughts?

I think that this patch is an improvement.

@@ -6686,8 +6682,8 @@ StartupXLOG(void)
-   TimeLineID  ThisTimeLineID,
-               PrevTimeLineID;
+   TimeLineID  replayTLI,
+               newTLI;
One problem with newTLI is that this conflicts with the declaration
around 7663 in xlog.c, where we check after a TLI switch when
replaying such a record.  Perhaps this could be named newInsertTLI,
for example.
--
Michael

Вложения

Re: removing global variable ThisTimeLineID

От
Robert Haas
Дата:
On Mon, Nov 8, 2021 at 10:31 PM Michael Paquier <michael@paquier.xyz> wrote:
> I think that this patch is an improvement.

Cool.

> @@ -6686,8 +6682,8 @@ StartupXLOG(void)
> -   TimeLineID  ThisTimeLineID,
> -               PrevTimeLineID;
> +   TimeLineID  replayTLI,
> +               newTLI;
> One problem with newTLI is that this conflicts with the declaration
> around 7663 in xlog.c, where we check after a TLI switch when
> replaying such a record.  Perhaps this could be named newInsertTLI,
> for example.

That's a good point. However, since I think newTLI is already in use
in some of the functions StartupXLOG() calls, and since I think it
would be good to use the same name in StartupXLOG() as in the
functions that it calls, what I'd prefer to do is rename the newTLI
variable in the inner scope to newReplayTLI, as in the attached v2.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: removing global variable ThisTimeLineID

От
Michael Paquier
Дата:
On Tue, Nov 09, 2021 at 02:15:51PM -0500, Robert Haas wrote:
> That's a good point. However, since I think newTLI is already in use
> in some of the functions StartupXLOG() calls, and since I think it
> would be good to use the same name in StartupXLOG() as in the
> functions that it calls, what I'd prefer to do is rename the newTLI
> variable in the inner scope to newReplayTLI, as in the attached v2.

WFM.  Thanks.
--
Michael

Вложения

Re: removing global variable ThisTimeLineID

От
Robert Haas
Дата:
On Tue, Nov 9, 2021 at 7:41 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Nov 09, 2021 at 02:15:51PM -0500, Robert Haas wrote:
> > That's a good point. However, since I think newTLI is already in use
> > in some of the functions StartupXLOG() calls, and since I think it
> > would be good to use the same name in StartupXLOG() as in the
> > functions that it calls, what I'd prefer to do is rename the newTLI
> > variable in the inner scope to newReplayTLI, as in the attached v2.
>
> WFM.  Thanks.

Cool. Committed that way.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: removing global variable ThisTimeLineID

От
"Drouvot, Bertrand"
Дата:
Hi,

On 11/8/21 6:49 PM, Robert Haas wrote:
> On Mon, Nov 8, 2021 at 8:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Well I went through and it seems to be OK: all the existing callers of
> that function first verify that we're not in recovery. The patch to
> make logical decoding work in standby mode might change that,

Indeed, I think the logical decoding on standby patch just needs to 
change the Assert in GetWALInsertionTimeLine() to check 
SharedRecoveryState is RECOVERY_STATE_DONE or RECOVERY_STATE_ARCHIVE.

Would that make sense from your point of view to add the check on 
RECOVERY_STATE_ARCHIVE or do you think it would need to be more 
"sophisticated"?

Thanks

Bertrand




Re: removing global variable ThisTimeLineID

От
Robert Haas
Дата:
On Tue, Nov 23, 2021 at 4:36 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> Indeed, I think the logical decoding on standby patch just needs to
> change the Assert in GetWALInsertionTimeLine() to check
> SharedRecoveryState is RECOVERY_STATE_DONE or RECOVERY_STATE_ARCHIVE.
>
> Would that make sense from your point of view to add the check on
> RECOVERY_STATE_ARCHIVE or do you think it would need to be more
> "sophisticated"?

Unless I'm confused, that would just be incorrect.
GetWALInsertionTimeLine() just returns InsertTimeLineID, which won't
be initialized during archive recovery. I discussed this point a bit
on some other thread with Andres. It's possible that all that you need
to do here is determine the replayTLI using e.g. GetXLogReplayRecPtr.
But I don't really see why that should be correct:
read_local_xlog_page() has some kind of wait-and-retry logic and I'm
not clear why we don't need the same thing here. After all, if we're
on the primary and we determine that sendTimeLineHistoric = false and
sendTimeLine = whatever, neither of those values can become incorrect
afterward. But on a standby that can certainly happen, if an upstream
server is promoted. Note that the comment in read_local_xlog_page()
says this:

         * If that happens after our caller determined the TLI but before
         * we actually read the xlog page, we might still try to read from the
         * old (now renamed) segment and fail. There's not much we can do
         * about this, but it can only happen when we're a leaf of a cascading
         * standby whose primary gets promoted while we're decoding, so a
         * one-off ERROR isn't too bad.

That seems to imply that the retry loop here is designed, at least in
part, to protect against exactly this kind of scenario.

-- 
Robert Haas
EDB: http://www.enterprisedb.com