Re: Add WALRCV_CONNECTING state to walreceiver
| От | Xuneng Zhou |
|---|---|
| Тема | Re: Add WALRCV_CONNECTING state to walreceiver |
| Дата | |
| Msg-id | CABPTF7UkUUxy6z8a2fcOkkxG=OgG1Ae0fJxnr7syz3wX5KjO6g@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Add WALRCV_CONNECTING state to walreceiver (Xuneng Zhou <xunengzhou@gmail.com>) |
| Ответы |
Re: Add WALRCV_CONNECTING state to walreceiver
|
| Список | pgsql-hackers |
Hi, On Fri, Dec 12, 2025 at 9:52 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > Hi, > > On Fri, Dec 12, 2025 at 4:45 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > Hi Noah, > > > > On Fri, Dec 12, 2025 at 1:05 PM Noah Misch <noah@leadboat.com> wrote: > > > > > > On Fri, Dec 12, 2025 at 12:51:00PM +0800, Xuneng Zhou wrote: > > > > Bug #19093 [1] reported that pg_stat_wal_receiver.status = 'streaming' > > > > does not accurately reflect streaming health. In that discussion, > > > > Noah noted that even before the reported regression, status = > > > > 'streaming' was unreliable because walreceiver sets it during early > > > > startup, before attempting a connection. He suggested: > > > > > > > > "Long-term, in master only, perhaps we should introduce another status > > > > like 'connecting'. Perhaps enact the connecting->streaming status > > > > transition just before tendering the first byte of streamed WAL to the > > > > startup process. Alternatively, enact that transition when the startup > > > > process accepts the > > > > first streamed byte." > > > > > > > == Proposal == > > > > > > > > Introduce WALRCV_CONNECTING as an intermediate state between STARTING > > > > and STREAMING: > > > > > > > > - When walreceiver starts, it enters CONNECTING (instead of going > > > > directly to STREAMING). > > > > - The transition to STREAMING occurs in XLogWalRcvFlush(), inside the > > > > existing spinlock-protected block that updates flushedUpto. > > > > > > I think this has the drawback that if the primary's WAL is incompatible, > > > e.g. unacceptable timeline, the walreceiver will still briefly enter > > > STREAMING. That could trick monitoring. > > > > Thanks for pointing this out. > > > > Waiting for applyPtr to advance > > > would avoid the short-lived STREAMING. What's the feasibility of that? > > > > I think this could work, but with complications. If replay latency is > > high or replay is paused with pg_wal_replay_pause, the WalReceiver > > would stay in the CONNECTING state longer than expected. Whether this > > is ok depends on the definition of the 'connecting' state. For the > > implementation, deciding where and when to check applyPtr against LSNs > > like receiveStart is more difficult—the WalReceiver doesn't know when > > applyPtr advances. While the WalReceiver can read applyPtr from shared > > memory, it isn't automatically notified when that pointer advances. > > This leads to latency between checking and replay if this is done in > > the WalReceiver part unless we let the startup process set the state, > > which would couple the two components. Am I missing something here? > > > > After some thoughts, a potential approach could be to expose a new > function in the WAL receiver that transitions the state from > CONNECTING to STREAMING. This function can then be invoked directly > from WaitForWALToBecomeAvailable in the startup process, ensuring the > state change aligns with the actual acceptance of the WAL stream. > V2 makes the transition from WALRCV_CONNECTING to STREAMING only when the first valid WAL record is processed by the startup process. A new function WalRcvSetStreaming is introduced to enable the transition. -- Best, Xuneng
Вложения
В списке pgsql-hackers по дате отправления: