Обсуждение: pgsql: When WalSndCaughtUp, sleep only in WalSndWaitForWal().
When WalSndCaughtUp, sleep only in WalSndWaitForWal(). Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write < sentPtr. That is important in logical replication. When the latest physical LSN yields no logical replication messages (a common case), that keepalive elicits a reply, and processing the reply updates pg_stat_replication.replay_lsn. WalSndLoop() lacks that; when WalSndLoop() slept, replay_lsn advancement could stall until wal_receiver_status_interval elapsed. This sometimes stalled src/test/subscription/t/001_rep_changes.pl for up to 10s. Discussion: https://postgr.es/m/20200406063649.GA3738151@rfd.leadboat.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/421685812290406daea58b78dfab0346eb683bbb Modified Files -------------- src/backend/replication/walsender.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-)
On 2020/04/12 2:35, Noah Misch wrote: > When WalSndCaughtUp, sleep only in WalSndWaitForWal(). > > Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write > < sentPtr. That is important in logical replication. When the latest > physical LSN yields no logical replication messages (a common case), > that keepalive elicits a reply, and processing the reply updates > pg_stat_replication.replay_lsn. WalSndLoop() lacks that; when > WalSndLoop() slept, replay_lsn advancement could stall until > wal_receiver_status_interval elapsed. This sometimes stalled > src/test/subscription/t/001_rep_changes.pl for up to 10s. Since this commit, walsender started consuming CPU resource too much in my env. wakeEvents = WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT | - WL_SOCKET_READABLE; + WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE; I wonder if this change caused WaitLatchOrSocket() in WalSndLoop() to wake up frequently more than necessary. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, Apr 17, 2020 at 04:50:38AM +0900, Fujii Masao wrote: > On 2020/04/12 2:35, Noah Misch wrote: > >When WalSndCaughtUp, sleep only in WalSndWaitForWal(). > > > >Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write > >< sentPtr. That is important in logical replication. When the latest > >physical LSN yields no logical replication messages (a common case), > >that keepalive elicits a reply, and processing the reply updates > >pg_stat_replication.replay_lsn. WalSndLoop() lacks that; when > >WalSndLoop() slept, replay_lsn advancement could stall until > >wal_receiver_status_interval elapsed. This sometimes stalled > >src/test/subscription/t/001_rep_changes.pl for up to 10s. > > Since this commit, walsender started consuming CPU resource too much in my env. Confirmed. I have shared this with the main thread and added details there. > wakeEvents = WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT | > - WL_SOCKET_READABLE; > + WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE; > > I wonder if this change caused WaitLatchOrSocket() in WalSndLoop() to wake up > frequently more than necessary. I collected lower wakeup counts after the commit. The problem is a shortage of waits.