Обсуждение: walsender.c fileheader comment
Hi,
I was reading the walsender.c fileheader comment while studying
another thread. I think if there is logical replication in progress
then the PROCSIG_WALSND_INIT_STOPPING handler will *always* switch to
a "stopping" state: e.g.,
/*
* Handle PROCSIG_WALSND_INIT_STOPPING signal.
*/
void
HandleWalSndInitStopping(void)
{
Assert(am_walsender);
/*
* If replication has not yet started, die like with SIGTERM. If
* replication is active, only set a flag and wake up the main loop. It
* will send any outstanding WAL, wait for it to be replicated to the
* standby, and then exit gracefully.
*/
if (!replication_active)
kill(MyProcPid, SIGTERM);
else
got_STOPPING = true;
}
~~~
But the walsender.c fileheader comment seems to be saying something
slightly different. IIUC, some minor rewording of the comment is
needed so it describes the code better.
HEAD
...
* shutdown, if logical replication is in progress all existing WAL records
* are processed followed by a shutdown. Otherwise this causes the walsender
* to switch to the "stopping" state. In this state, the walsender will reject
* any further replication commands. The checkpointer begins the shutdown
...
SUGGESTION
.. shutdown. If logical replication is in progress, the walsender
switches to a "stopping" state. In this state, the walsender will
reject any further replication commands - but all existing WAL records
are processed - followed by a shutdown.
~~~
I attached a patch for the above-suggested change.
Thoughts?
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Вложения
On 6/11/24 04:35, Peter Smith wrote:
> Hi,
>
> I was reading the walsender.c fileheader comment while studying
> another thread. I think if there is logical replication in progress
> then the PROCSIG_WALSND_INIT_STOPPING handler will *always* switch to
> a "stopping" state: e.g.,
>
> /*
> * Handle PROCSIG_WALSND_INIT_STOPPING signal.
> */
> void
> HandleWalSndInitStopping(void)
> {
> Assert(am_walsender);
>
> /*
> * If replication has not yet started, die like with SIGTERM. If
> * replication is active, only set a flag and wake up the main loop. It
> * will send any outstanding WAL, wait for it to be replicated to the
> * standby, and then exit gracefully.
> */
> if (!replication_active)
> kill(MyProcPid, SIGTERM);
> else
> got_STOPPING = true;
> }
>
> ~~~
>
> But the walsender.c fileheader comment seems to be saying something
> slightly different. IIUC, some minor rewording of the comment is
> needed so it describes the code better.
>
> HEAD
> ...
> * shutdown, if logical replication is in progress all existing WAL records
> * are processed followed by a shutdown. Otherwise this causes the walsender
> * to switch to the "stopping" state. In this state, the walsender will reject
> * any further replication commands. The checkpointer begins the shutdown
> ...
>
> SUGGESTION
> .. shutdown. If logical replication is in progress, the walsender
> switches to a "stopping" state. In this state, the walsender will
> reject any further replication commands - but all existing WAL records
> are processed - followed by a shutdown.
>
> ~~~
>
> I attached a patch for the above-suggested change.
>
> Thoughts?
I did look at this, and while the explanation in the current comment may
seem a bit confusing, I'm not sure the suggested changes improve the
situation very much.
This suggests the two comments somehow disagree, but it does not say in
what exactly, so perhaps I just missed it :-(
ISTM there's a bit of confusion what is meant by "stopping" state - you
seem to be interpreting it as a general concept, where the walsender is
requested to stop (through the signal), and starts doing stuff to exit.
But the comments actually talk about WalSnd->state, where "stopping"
means it needs to be set to WALSNDSTATE_STOPPING.
And we only ever switch to that state in two places - in WalSndPhysical
and exec_replication_command. And that does not happen in regular
logical replication (which is what "logical replication is in progress"
refers to) - if you have a walsender just replicating DML, it will never
see the WALSNDSTATE_STOPPING state. It will simply do the cleanup while
still in WALSNDSTATE_STREAMING state, and then just exit.
So from this point of view, the suggestion is actually wrong.
To conclude, I think this probably makes the comments more confusing. If
we want to make it clearer, I'd probably start by clarifying what the
"stopping" state means. Also, it's a bit surprising we may not actually
go through the "stopping" state during shutdown.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi, Thankyou for taking the time to look at this and reply. > > I did look at this, and while the explanation in the current comment may > seem a bit confusing, I'm not sure the suggested changes improve the > situation very much. > > This suggests the two comments somehow disagree, but it does not say in > what exactly, so perhaps I just missed it :-( > > ISTM there's a bit of confusion what is meant by "stopping" state - you > seem to be interpreting it as a general concept, where the walsender is > requested to stop (through the signal), and starts doing stuff to exit. > But the comments actually talk about WalSnd->state, where "stopping" > means it needs to be set to WALSNDSTATE_STOPPING. Yes, I interpreted the "stopping" state meaning as when the boolean flag 'got_STOPPING' is assigned true. > > And we only ever switch to that state in two places - in WalSndPhysical > and exec_replication_command. And that does not happen in regular > logical replication (which is what "logical replication is in progress" > refers to) - if you have a walsender just replicating DML, it will never > see the WALSNDSTATE_STOPPING state. It will simply do the cleanup while > still in WALSNDSTATE_STREAMING state, and then just exit. > > So from this point of view, the suggestion is actually wrong. OK. > > To conclude, I think this probably makes the comments more confusing. If > we want to make it clearer, I'd probably start by clarifying what the > "stopping" state means. Also, it's a bit surprising we may not actually > go through the "stopping" state during shutdown. > I agree. My interpretation of the (ambiguous) "stopping" state led me to believe the comment was quite wrong. So, this thread was only intended as a trivial comment fix in passing but clearly there is more to this than I anticipated. I would be happy if someone with more knowledge about the WALSNDSTATE_STOPPING versus got_STOPPING could disambiguate the file header comment, but that's not me, so I have withdrawn this from the Commitfest. ====== Kind Regards, Peter Smith. Fujitsu Australia
On 7/19/24 07:02, Peter Smith wrote: > ... >> >> To conclude, I think this probably makes the comments more confusing. If >> we want to make it clearer, I'd probably start by clarifying what the >> "stopping" state means. Also, it's a bit surprising we may not actually >> go through the "stopping" state during shutdown. >> > > I agree. My interpretation of the (ambiguous) "stopping" state led me > to believe the comment was quite wrong. So, this thread was only > intended as a trivial comment fix in passing but clearly there is more > to this than I anticipated. I would be happy if someone with more > knowledge about the WALSNDSTATE_STOPPING versus got_STOPPING could > disambiguate the file header comment, but that's not me, so I have > withdrawn this from the Commitfest. > Understood. Thanks for the patch anyway, I appreciate you took the time to try to improve the comments! I agree the state transitions in walsender are not very clear, and the fact that it may shutdown without ever going through STOPPING state is quite confusing. That being said, I personally don't have ambition to improve this. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company