Re: [HACKERS] Walsender timeouts and large transactions

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: [HACKERS] Walsender timeouts and large transactions
Дата
Msg-id 20170929.172635.133003609.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] Walsender timeouts and large transactions  (Sokolov Yura <funny.falcon@postgrespro.ru>)
Ответы Re: [HACKERS] Walsender timeouts and large transactions  (Sokolov Yura <funny.falcon@postgrespro.ru>)
Список pgsql-hackers
Hello,

At Wed, 27 Sep 2017 14:28:37 +0300, Sokolov Yura <funny.falcon@postgrespro.ru> wrote in
<90bb67da7131e6186b50897c4b0f0ec3@postgrespro.ru>
> On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote:
> > Hello,
> > At Wed, 06 Sep 2017 13:46:16 +0000, Yura Sokolov
> > <funny.falcon@postgrespro.ru> wrote in
> > <20170906134616.18925.88390.pgcf@coridan.postgresql.org>
> > As the result, I think that the functions should be modified as
> > the following.
> > - Forcing slow-path if time elapses a half of a ping period is
> >   right. (GetCurrentTimestamp is anyway requried.)
> > - The slow-path should not sleep waiting Latch. It should just
> >   pg_usleep() for maybe 1-10ms.
> > - We should go to the fast path just after keepalive or response
> >   message has been sent. In other words, the "if (now <" block
> >   should be in the "for (;;)" loop. This avoids needless runs on
> >   the slow-path.
> > It would be refactorable as the following.
> >   prepare for the send buffer;
> >   for (;;)
> >   {
> >     now = GetCurrentTimeStamp();
> >     if (now < )...
> >     {
> >       fast-path
> >     }
> >     else
> >     {
> >       slow-path
> >     }
> >     return if finished
> >     sleep for 1ms?
> >   }
> > What do you think about this?
> > regards,
> > --
> > Kyotaro Horiguchi
> > NTT Open Source Software Center
> 
> Good day, Petr, Kyotaro
> 
> I've created failing test for issue (0001-Add-failing-test...) .
> It tests insertion of 20000 rows with 10ms wal_sender_timeout
> (it fails in WalSndWriteData on master) and then deletion of
> those rows with 1ms wal_sender_timeout (it fails in WalSndLoop).
> 
> Both Peter's patch and my simplified suggestion didn't pass the
> test. I didn't checked Kyotaro's suggestion, though, cause I
> didn't understand it well.

Mmm. The test seems broken. wal_sender_timeout = 10ms with
wal_receiver_status_interval=10s immediately causes a
timeout. Avoiding the timeout is just breaking the sane code.

wal_sender_timeout = 3s and wal_receiver_status_interval=1s
effectively causes the problem with about 1000000 lines of (int)
insertion on UNIX socket connection, on my poor box.

The original complain here came from the fact that
WalSndWriteData skips processing of replies for a long time on a
fast network. However Petr's patch fixed the problem, I pointed
that just letting the function take the slow path leads to
another problem, that is, waiting for new WAL records can result
in a unwanted pause in the slow path.

Combining the solutions for the two problem is my proposal sited
above. The sentence seems in quite bad style but the attached
file is the concorete patch of that.

Any thoughts?

regards,
-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***************
*** 1151,1156 **** static void
--- 1151,1158 ---- WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,                 bool
last_write){
 
+     TimestampTz now = GetCurrentTimestamp();
+      /* output previously gathered data in a CopyData packet */     pq_putmessage_noblock('d', ctx->out->data,
ctx->out->len);
 
***************
*** 1160,1235 **** WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,      * several
releasesby streaming physical replication.      */     resetStringInfo(&tmpbuf);
 
!     pq_sendint64(&tmpbuf, GetCurrentTimestamp());     memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
     tmpbuf.data, sizeof(int64)); 
 
-     /* fast path */
-     /* Try to flush pending output to the client */
-     if (pq_flush_if_writable() != 0)
-         WalSndShutdown();
- 
-     if (!pq_is_send_pending())
-         return;
-      for (;;)     {         int            wakeEvents;         long        sleeptime;         TimestampTz now;
 /*
 
!          * Emergency bailout if postmaster has died.  This is to avoid the
!          * necessity for manual cleanup of all postmaster children.          */
!         if (!PostmasterIsAlive())
!             exit(1);
! 
!         /* Clear any already-pending wakeups */
!         ResetLatch(MyLatch);
! 
!         CHECK_FOR_INTERRUPTS();
! 
!         /* Process any requests or signals received recently */
!         if (ConfigReloadPending)
!         {
!             ConfigReloadPending = false;
!             ProcessConfigFile(PGC_SIGHUP);
!             SyncRepInitConfig();
!         }
! 
!         /* Check for input from the client */
!         ProcessRepliesIfAny();
! 
!         /* Try to flush pending output to the client */
!         if (pq_flush_if_writable() != 0)
!             WalSndShutdown();
! 
!         /* If we finished clearing the buffered data, we're done here. */
!         if (!pq_is_send_pending())
!             break;
!          now = GetCurrentTimestamp();
- 
-         /* die if timeout was reached */
-         WalSndCheckTimeOut(now);
- 
-         /* Send keepalive if the time has come */
-         WalSndKeepaliveIfNecessary(now);
- 
-         sleeptime = WalSndComputeSleeptime(now);
- 
-         wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
-             WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
- 
-         /* Sleep until something happens or we time out */
-         WaitLatchOrSocket(MyLatch, wakeEvents,
-                           MyProcPort->sock, sleeptime,
-                           WAIT_EVENT_WAL_SENDER_WRITE_DATA);     }
- 
-     /* reactivate latch so WalSndLoop knows to continue */
-     SetLatch(MyLatch); }  /*
--- 1162,1238 ----      * several releases by streaming physical replication.      */     resetStringInfo(&tmpbuf);
!     pq_sendint64(&tmpbuf, now);     memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
tmpbuf.data,sizeof(int64));      for (;;)     {         int            wakeEvents;         long        sleeptime;
 TimestampTz now; 
 
+         if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp,
+                                                wal_sender_timeout / 2))
+         {
+             /* fast path */
+             /* Try to flush pending output to the client */
+             if (pq_flush_if_writable() != 0)
+                 WalSndShutdown();
+         }
+         else
+         {
+             /*
+              * Emergency bailout if postmaster has died.  This is to avoid the
+              * necessity for manual cleanup of all postmaster children.
+              */
+             if (!PostmasterIsAlive())
+                 exit(1);
+ 
+             CHECK_FOR_INTERRUPTS();
+ 
+             /* Process any requests or signals received recently */
+             if (ConfigReloadPending)
+             {
+                 ConfigReloadPending = false;
+                 ProcessConfigFile(PGC_SIGHUP);
+                 SyncRepInitConfig();
+             }
+ 
+             /* Check for input from the client */
+             ProcessRepliesIfAny();
+ 
+             /* Try to flush pending output to the client */
+             if (pq_flush_if_writable() != 0)
+                 WalSndShutdown();
+ 
+             /* If we finished clearing the buffered data, we're done here. */
+             if (!pq_is_send_pending())
+                 break;
+ 
+             /* die if timeout was reached */
+             WalSndCheckTimeOut(now);
+ 
+             /* Send keepalive if the time has come */
+             WalSndKeepaliveIfNecessary(now);
+         }
+ 
+         /* return if finished */
+         if (!pq_is_send_pending())
+             return;
+          /*
!          * Send buffer is filled up. Being different from WalSndLoop, since
!          * this function is called at a commit time for all the modifications
!          * within the commit, we should continue sending data without waiting
!          * for the next WAL record.
!          *
!          * Not sure how long we should wait for a room to send more data, but
!          * 1ms would be sufficient.          */
!         pg_usleep(1000);         now = GetCurrentTimestamp();     } }  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: [HACKERS] pg_prepared_xact_status
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: [HACKERS] pg_prepared_xact_status