Re: Logical replication keepalive flood

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Logical replication keepalive flood
Дата
Msg-id 20210930.165629.1243423254265459564.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Logical replication keepalive flood  (Greg Nancarrow <gregn4422@gmail.com>)
Ответы Re: Logical replication keepalive flood  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Logical replication keepalive flood  (Greg Nancarrow <gregn4422@gmail.com>)
Список pgsql-hackers
At Thu, 30 Sep 2021 16:21:25 +1000, Greg Nancarrow <gregn4422@gmail.com> wrote in 
> On Thu, Sep 30, 2021 at 3:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > It claims to skip sending keepalive if the sleep time is shorter than
> > KEEPALIVE_TIMEOUT (a new threshold) but the above code seems to
> > suggest it sends in both cases. What am I missing?
> >
> 
> The comment does seem to be wrong.

Mmm. Indeed the comment does say something like that...  Looking the
patch name together, I might have confused something. However, the
patch looks like working for the purpose.

> The way I see it, if the calculated sleeptime is greater than
> KEEPALIVE_TIMEOUT, then it first sleeps for up to KEEPALIVE_TIMEOUT
> milliseconds and skips sending a keepalive if something happens (i.e.
> the socket becomes readable/writeable) during that time (WalSendWait
> will return non-zero in that case). Otherwise, it sends a keepalive
> and sleeps for (sleeptime - KEEPALIVE_TIMEOUT), then loops around
> again ...

The maim point of the patch is moving of the timing of sending the
before-sleep keepalive.  It seems to me that currently
WalSndWaitForWal may send "before-sleep" keepalive every run of the
loop under a certain circumstance. I suspect this happen in this case.

After the patch applied, that keepalive is sent only when the loop is
actually going to sleep some time.  In case the next WAL doesn't come
for KEEPALIVE_TIMEOUT milliseconds, it sends a keepalive. There's a
dubious behavior when sleeptime <= KEEPALIVE_TIMEOUT that it sends a
keepalive immediately.  It was (as far as I recall) intentional in
order to make the code simpler.  However, on second thought, we will
have the next chance to send keepalive in that case, and intermittent
frequent keepalives can happen with that behavior.  So I came to think
that we can omit keepalives at all that case.

(I myself haven't see the keepalive flood..)

> > Also, more to the point this special keep_alive seems to be sent for
> > synchronous replication and walsender shutdown as they can expect
> > updated locations. You haven't given any reason (theory) why those two
> > won't be impacted due to this change? I am aware that for synchronous
> > replication, we wake waiters while ProcessStandbyReplyMessage but I am
> > not sure how it helps with wal sender shutdown? I think we need to
> > know the reasons for this message and then need to see if the change
> > has any impact on the same.
> >
> 
> Yes, I'm not sure about the possible impacts, still looking at it.

If the comment describes the objective correctly, the only possible
impact would be that there may be a case where server responds a bit
slowly for a shutdown request.  But I'm not sure it is definitely
true.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3ca2a11389..0b04b978e9 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -105,6 +105,9 @@
  */
 #define MAX_SEND_SIZE (XLOG_BLCKSZ * 16)
 
+/* Minimum idle time for sending an idle-time keepalive in milliseconds */
+#define KEEPALIVE_TIMEOUT 250
+
 /* Array of WalSnds in shared memory */
 WalSndCtlData *WalSndCtl = NULL;
 
@@ -244,7 +247,7 @@ static void WalSndKeepalive(bool requestReply);
 static void WalSndKeepaliveIfNecessary(void);
 static void WalSndCheckTimeOut(void);
 static long WalSndComputeSleeptime(TimestampTz now);
-static void WalSndWait(uint32 socket_events, long timeout, uint32 wait_event);
+static int WalSndWait(uint32 socket_events, long timeout, uint32 wait_event);
 static void WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write);
 static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write);
 static void WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid);
@@ -1436,19 +1439,6 @@ WalSndWaitForWal(XLogRecPtr loc)
         if (got_STOPPING)
             break;
 
-        /*
-         * We only send regular messages to the client for full decoded
-         * transactions, but a synchronous replication and walsender shutdown
-         * possibly are waiting for a later location. So, before sleeping, we
-         * send a ping containing the flush location. If the receiver is
-         * otherwise idle, this keepalive will trigger a reply. Processing the
-         * reply will update these MyWalSnd locations.
-         */
-        if (MyWalSnd->flush < sentPtr &&
-            MyWalSnd->write < sentPtr &&
-            !waiting_for_ping_response)
-            WalSndKeepalive(false);
-
         /* check whether we're done */
         if (loc <= RecentFlushPtr)
             break;
@@ -1491,6 +1481,41 @@ WalSndWaitForWal(XLogRecPtr loc)
         if (pq_is_send_pending())
             wakeEvents |= WL_SOCKET_WRITEABLE;
 
+        /*
+         * We only send regular messages to the client for full decoded
+         * transactions, but a synchronous replication and walsender shutdown
+         * possibly are waiting for a later location. So, before sleeping, we
+         * send a ping containing the flush location. If the receiver is
+         * otherwise idle, this keepalive will trigger a reply. Processing the
+         * reply will update these MyWalSnd locations. To prevent too-frequent
+         * keepalives, wait KEEPALIVE_TIMEOUT milliseconds before sending a
+         * keepalive. If sleeptime is less than KEEPALIVE_TIMEOUT, instead
+         * WalSndKeepaliveIfNecessary will work if needed.
+         */
+        if (MyWalSnd->flush < sentPtr &&
+            MyWalSnd->write < sentPtr &&
+            !waiting_for_ping_response &&
+            sleeptime > KEEPALIVE_TIMEOUT)
+        {
+            int r;
+            
+            r = WalSndWait(wakeEvents, KEEPALIVE_TIMEOUT,
+                           WAIT_EVENT_WAL_SENDER_WAIT_WAL);
+
+            /*
+             * If some event happens before the timeout expires, go ahead
+             * without sending keepalive.
+             */
+            if (r != 0)
+                continue;
+
+            sleeptime -= KEEPALIVE_TIMEOUT;
+
+            WalSndKeepalive(false);
+            if (pq_flush_if_writable() != 0)
+                WalSndShutdown();
+        }
+        
         WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_WAIT_WAL);
     }
 
@@ -3144,15 +3169,18 @@ WalSndWakeup(void)
  * composed of optional WL_SOCKET_WRITEABLE and WL_SOCKET_READABLE flags.  Exit
  * on postmaster death.
  */
-static void
+static int
 WalSndWait(uint32 socket_events, long timeout, uint32 wait_event)
 {
     WaitEvent    event;
+    int            ret;
 
     ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, socket_events, NULL);
-    if (WaitEventSetWait(FeBeWaitSet, timeout, &event, 1, wait_event) == 1 &&
-        (event.events & WL_POSTMASTER_DEATH))
+    ret = WaitEventSetWait(FeBeWaitSet, timeout, &event, 1, wait_event);
+    if (ret == 1 && (event.events & WL_POSTMASTER_DEATH))
         proc_exit(1);
+
+    return ret;
 }
 
 /*

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: prevent immature WAL streaming
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Logical replication keepalive flood