Re: Exit walsender before confirming remote flush in logical replication
| От | Chao Li |
|---|---|
| Тема | Re: Exit walsender before confirming remote flush in logical replication |
| Дата | |
| Msg-id | DF779135-64BA-421A-B835-8E815399BEC3@gmail.com обсуждение |
| Ответ на | Re: Exit walsender before confirming remote flush in logical replication (Fujii Masao <masao.fujii@gmail.com>) |
| Ответы |
Re: Exit walsender before confirming remote flush in logical replication
|
| Список | pgsql-hackers |
> On Apr 23, 2026, at 12:51, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Wed, Apr 22, 2026 at 3:32 AM Fujii Masao <masao.fujii@gmail.com> wrote: >> Therefore, since replacing pq_flush() with pq_flush_if_writable() seems to >> change behavior only in a limited and acceptable way, I'm thinking to create >> the patch doing that replacement. > > On second thought, replacing pq_flush() with pq_flush_if_writable() is not > sufficient. EndCommand(), which WalSndDone() calls before pq_flush(), can also > block when the send buffer is full. That happens because EndCommand() uses > pq_putmessage() rather than pq_putmessage_noblock(). > > Also, replacing pq_flush() with pq_flush_if_writable() would cause walsender to > give up sending pending messages, including CommandComplete, even before > wal_sender_shutdown_timeout expires. That seems a bit odd. I think it is better > for walsender to continue honoring wal_sender_shutdown_timeout while attempting > to send the final CommandComplete message. > > I've attached a patch that addresses both issues. For the first, it introduces > EndCommandExtended(), which allows CommandComplete to be queued with > pq_putmessage_noblock(). For the second, it updates WalSndDone() to use > ProcessPendingWrites() instead of pq_flush(), so the walsender write loop can > continue processing replies and checking replication and shutdown timeouts > while pending output is being flushed. > > Thoughts? > > Regards, > > -- > Fujii Masao > <v1-0001-Avoid-blocking-indefinitely-while-finishing-walse.patch> ``` - EndCommand(&qc, DestRemote, false); - pq_flush(); + EndCommandExtended(&qc, DestRemote, false, true); + shutdown_stream_done_queued = true; + + /* + * Don't call pq_flush() here. It can block indefinitely waiting for + * the socket to become writeable, which would prevent + * wal_sender_shutdown_timeout from being enforced. Use the regular + * walsender non-blocking flush path so shutdown and replication + * timeouts continue to be checked while waiting for the send buffer + * to drain. + */ + ProcessPendingWrites(); ``` I think adding EndCommandExtended() with a “nonblock” parameter is good. However, I have a suspicion replacing pg_flush withProcessPendingWrites(). ProcessPendingWrites() calls ProcessRepliesIfAny() in the first place, so if it is possible that, a new COPY message is appendedafter the already-queued CommandComplete? Which seems to violate the protocol, but I am not sure if that would leadto any trouble. So, maybe we need a new helper, say ProcessPendingWritesForShutdown(), that loops while pq_is_send_pending(), call WalSndCheckShutdownTimeout()and only wait for WL_SOCKET_WRITEABLE, then pq_flush_if_writable(), on flush failure, maybe WalSndShutdown(). Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
В списке pgsql-hackers по дате отправления: