Re: Add WALRCV_CONNECTING state to walreceiver
| От | Xuneng Zhou |
|---|---|
| Тема | Re: Add WALRCV_CONNECTING state to walreceiver |
| Дата | |
| Msg-id | CABPTF7UEgDEMZRyTY5h0VGW+oo1fnEYh7hswUv_PWh1aHyMqVA@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Add WALRCV_CONNECTING state to walreceiver (Noah Misch <noah@leadboat.com>) |
| Ответы |
Re: Add WALRCV_CONNECTING state to walreceiver
|
| Список | pgsql-hackers |
Hi,
On Sun, Dec 14, 2025 at 1:14 PM Noah Misch <noah@leadboat.com> wrote:
>
> On Sun, Dec 14, 2025 at 12:45:46PM +0800, Xuneng Zhou wrote:
> > On Fri, Dec 12, 2025 at 9:52 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > On Fri, Dec 12, 2025 at 4:45 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > On Fri, Dec 12, 2025 at 1:05 PM Noah Misch <noah@leadboat.com> wrote:
> > > > > Waiting for applyPtr to advance
> > > > > would avoid the short-lived STREAMING. What's the feasibility of that?
> > > >
> > > > I think this could work, but with complications. If replay latency is
> > > > high or replay is paused with pg_wal_replay_pause, the WalReceiver
> > > > would stay in the CONNECTING state longer than expected. Whether this
> > > > is ok depends on the definition of the 'connecting' state. For the
> > > > implementation, deciding where and when to check applyPtr against LSNs
> > > > like receiveStart is more difficult—the WalReceiver doesn't know when
> > > > applyPtr advances. While the WalReceiver can read applyPtr from shared
> > > > memory, it isn't automatically notified when that pointer advances.
> > > > This leads to latency between checking and replay if this is done in
> > > > the WalReceiver part unless we let the startup process set the state,
> > > > which would couple the two components. Am I missing something here?
> > >
> > > After some thoughts, a potential approach could be to expose a new
> > > function in the WAL receiver that transitions the state from
> > > CONNECTING to STREAMING. This function can then be invoked directly
> > > from WaitForWALToBecomeAvailable in the startup process, ensuring the
> > > state change aligns with the actual acceptance of the WAL stream.
> >
> > V2 makes the transition from WALRCV_CONNECTING to STREAMING only when
> > the first valid WAL record is processed by the startup process. A new
> > function WalRcvSetStreaming is introduced to enable the transition.
>
> The original patch set STREAMING in XLogWalRcvFlush(). XLogWalRcvFlush()
> callee XLogWalRcvSendReply() already fetches applyPtr to send a status
> message. So I would try the following before involving the startup process
> like v2 does:
>
> 1. store the applyPtr when we enter CONNECTING
> 2. force a status message as long as we remain in CONNECTING
> 3. become STREAMING when applyPtr differs from the one stored at (1)
Thanks for the suggestion. Using XLogWalRcvSendReply() for the
transition could make sense. My concern before is about latency in a
rare case: if the first flush completes but applyPtr hasn't advanced
yet at the time of check and then the flush stalls after that, we
might wait up to wal_receiver_status_interval (default 10s) before the
next check or indefinitely if (wal_receiver_status_interval <= 0).
This could be mitigated by shortening the wakeup interval while in
CONNECTING (step 2), which reduces worst-case latency to ~1 second.
Given that monitoring typically doesn't require sub-second precision,
this approach could be feasible.
case WALRCV_WAKEUP_REPLY:
if (WalRcv->walRcvState == WALRCV_CONNECTING)
{
/* Poll frequently while CONNECTING to avoid long latency */
wakeup[reason] = TimestampTzPlusMilliseconds(now, 1000);
}
> A possible issue with all patch versions: when the primary is writing no WAL
> and the standby was caught up before this walreceiver started, CONNECTING
> could persist for an unbounded amount of time. Only actual primary WAL
> generation would move the walreceiver to STREAMING. This relates to your
> above point about high latency. If that's a concern, perhaps this change
> deserves a total of two new states, CONNECTING and a state that represents
> "connection exists, no WAL yet applied"?
Yes, this could be an issue. Using two states would help address it.
That said, when the primary is idle in this case, we might end up
repeatedly polling the apply status in the state before streaming if
we implement the 1s short-interval checking like above, which could be
costful. However, If we do not implement it &&
wal_receiver_status_interval is set to < 0 && flush stalls, the
walreceiver could stay in the pre-streaming state indefinitely even if
streaming did occur, which violates the semantics. Do you think this
is a valid concern or just an artificial edge case?
--
Best,
Xuneng
В списке pgsql-hackers по дате отправления: