Re: [HACKERS] Walsender timeouts and large transactions

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: [HACKERS] Walsender timeouts and large transactions
Дата
Msg-id 20171117.163543.173137002.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] Walsender timeouts and large transactions  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Ответы Re: [HACKERS] Walsender timeouts and large transactions
Re: [HACKERS] Walsender timeouts and large transactions
Список pgsql-hackers
Ouch.. I'd doubly mistaked.

> I found that the patch is the latest one and will look this
> soon. Sorry for the ignorance.

Thats...wrong. Sorry. There's no new patch since the Reboer's
comment.

I think this is just a bug fix and needs no more argument on its
functionality.  (and might ought to be backpatched?)

At Fri, 3 Nov 2017 15:54:09 +0100, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<e2939d26-f5cb-6581-0ca3-a1b0556ed729@2ndquadrant.com>
> > This change falsifies the comments.  Maybe initialize now just after
> > resetSetringInfo() is done.
> 
> Eh, right, I can do that.

It is reasonable. (Or rewrite the comment?)

At Fri, 3 Nov 2017 15:54:09 +0100, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<e2939d26-f5cb-6581-0ca3-a1b0556ed729@2ndquadrant.com>
> > 
> > -    /* fast path */
> > -    /* Try to flush pending output to the client */
> > -    if (pq_flush_if_writable() != 0)
> > -        WalSndShutdown();
> > +    /* Try taking fast path unless we get too close to walsender timeout. */
> > +    if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
> > +                                          wal_sender_timeout / 2))
> > +    {
> > +        CHECK_FOR_INTERRUPTS();
> > 
> > -    if (!pq_is_send_pending())
> > -        return;
> > +        /* Try to flush pending output to the client */
> > +        if (pq_flush_if_writable() != 0)
> > +            WalSndShutdown();
> > +
> > +        if (!pq_is_send_pending())
> > +            return;
> > +    }
> > 
> > I think it's only the if (!pq_is_send_pending()) return; that needs to
> > be conditional here, isn't it?  The pq_flush_if_writable() can be done
> > unconditionally.
> > 
> 
> Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes.
> It just seems like it's needless call as we'll call both in for loop
> anyway if we take the "slow" path. I admit it's not exactly big win
> though. If you think it would improve readability I can move it.

Moving around the code allow us to place ps_is_send_pending() in
the while condition, which seems to be more proper place to do
that. I haven't added test for this particular case.

I tested this that

- cleanly applies on the current master HEAD and passes make check and subscription test.

- walsender properly chooses the slow-path even if pq_is_send_pending() is always false. (happens on a fast enough
network)

- walsender waits properly waits on socket and process-reply time in WaitLatchOrSocket.

- walsender exits by timeout on network stall.

So, I think the patch is functionally perfect.

I'm a reviewer of this patch but I think I'm not allowed to mark
this "Ready for Commiter" since the last change is made by me.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 508c43ef43eebee7de89094658b74f9cef8e4342 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Tue, 12 Sep 2017 17:31:28 +0900
Subject: [PATCH] Fix walsender timeouts when decoding large transaction

The logical slots have fast code path for sending data in order to not
impose too high per message overhead. The fast path skips checks for
interrupts and timeouts. However, the existing coding failed to consider
the fact that transaction with large number of changes may take very long
to be processed and sent to the client. This causes walsender to ignore
interrupts for potentially long time and more importantly it will cause
walsender being killed due to timeout at the end of such transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when last keeplaive check happened less
than half of walsender timeout ago, otherwise the slower code path will
be taken.
---src/backend/replication/walsender.c | 58 ++++++++++++++++++++-----------------1 file changed, 32 insertions(+), 26
deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index fa1db74..5a4c31f 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1151,6 +1151,8 @@ static voidWalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
       bool last_write){
 
+    TimestampTz    now;
+    /* output previously gathered data in a CopyData packet */    pq_putmessage_noblock('d', ctx->out->data,
ctx->out->len);
@@ -1160,23 +1162,46 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,     * several
releasesby streaming physical replication.     */    resetStringInfo(&tmpbuf);
 
-    pq_sendint64(&tmpbuf, GetCurrentTimestamp());
+    now = GetCurrentTimestamp();
+    pq_sendint64(&tmpbuf, now);    memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],           tmpbuf.data,
sizeof(int64));
-    /* fast path */
+    CHECK_FOR_INTERRUPTS();
+    /* Try to flush pending output to the client */    if (pq_flush_if_writable() != 0)        WalSndShutdown();
-    if (!pq_is_send_pending())
-        return;
+    /* Try taking fast path unless we get too close to walsender timeout. */
+    if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+                                          wal_sender_timeout / 2))
+    {
+        if (!pq_is_send_pending())
+            return;
+    }
-    for (;;)
+    /* If we have pending write here, go to slow path */
+    while (pq_is_send_pending())    {        int            wakeEvents;        long        sleeptime;
-        TimestampTz now;
+
+        /* die if timeout was reached */
+        WalSndCheckTimeOut(now);
+
+        /* Send keepalive if the time has come */
+        WalSndKeepaliveIfNecessary(now);
+
+        sleeptime = WalSndComputeSleeptime(now);
+
+        wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
+            WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
+
+        /* Sleep until something happens or we time out */
+        WaitLatchOrSocket(MyLatch, wakeEvents,
+                          MyProcPort->sock, sleeptime,
+                          WAIT_EVENT_WAL_SENDER_WRITE_DATA);        /*         * Emergency bailout if postmaster has
died. This is to avoid the
 
@@ -1205,27 +1230,8 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,        if
(pq_flush_if_writable()!= 0)            WalSndShutdown();
 
-        /* If we finished clearing the buffered data, we're done here. */
-        if (!pq_is_send_pending())
-            break;
-
+        /* Update the timestamp for the next trial */        now = GetCurrentTimestamp();
-
-        /* die if timeout was reached */
-        WalSndCheckTimeOut(now);
-
-        /* Send keepalive if the time has come */
-        WalSndKeepaliveIfNecessary(now);
-
-        sleeptime = WalSndComputeSleeptime(now);
-
-        wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
-            WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
-
-        /* Sleep until something happens or we time out */
-        WaitLatchOrSocket(MyLatch, wakeEvents,
-                          MyProcPort->sock, sleeptime,
-                          WAIT_EVENT_WAL_SENDER_WRITE_DATA);    }    /* reactivate latch so WalSndLoop knows to
continue*/
 
-- 
2.9.2


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: Speed up the removal of WAL files
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Speed up the removal of WAL files