BUG #17345: pg_basebackup stucked for 2 hours before timeout

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #17345: pg_basebackup stucked for 2 hours before timeout
Дата
Msg-id 17345-a66a0084532b7beb@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #17345: pg_basebackup stucked for 2 hours before timeout  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17345
Logged by:          Bo Chen
Email address:      bchen90@163.com
PostgreSQL version: 11.13
Operating system:   euleros v2r9 x86_64
Description:

Hello experts,
    I am facing an issue for pg_basebackup in docker env. when the primary
VM restarted while pg_basebackup is running on the standby docker in VM. It
takes 2 hours before pg_basebackup times out. 
    After analysis and reproduce the problem, I think the reason is the
parent process for fetching data files is blocking for tcp keeplive, and it
ignore or block SIGCHLD when running poll API. So we add signaling the
parent when fetching wal exit not zero.

Belowing is the modifing code.
 #include "streamutil.h"
+#include <sys/prctl.h>
 
 #define ERRCODE_DATA_CORRUPTED    "XX001"
 
@@ -565,6 +566,8 @@ StartLogStreamer(char *startpos, uint32 timeline, char
*sysidentifier)
     uint32        hi,
                 lo;
     char        statusdir[MAXPGPATH];
+    pid_t bgpid;
+    int ret;
 
     param = pg_malloc0(sizeof(logstreamer_param));
     param->timeline = timeline;
@@ -662,12 +665,24 @@ StartLogStreamer(char *startpos, uint32 timeline, char
*sysidentifier)
      * a fork(). On Windows, we create a thread.
      */
 #ifndef WIN32
+    bgpid = getpid();
+
     bgchild = fork();
     if (bgchild == 0)
     {
+        (void)prctl(PR_SET_PDEATHSIG, SIGQUIT);
         /* in child process */
-        exit(LogStreamerMain(param));
+        ret = LogStreamerMain(param);
+        if (ret != 0)
+        {
+            kill(bgpid, SIGINT);
+        }
+        exit(ret);
     }
     else if (bgchild < 0)
     {

This is the stacks when pg_basebackup stucking
#0  0xf7f6e039 in __kernel_vsyscall ()
#1  0xf7a1f2ea in poll () from /usr/lib/libc.so.6
#2  0xf7b25ea0 in pqSocketPoll (sock=5, forRead=1, forWrite=0, end_time=-1)
at fe-misc.c:1127

Belowing is the same issue from Ninad Shah.
https://www.postgresql.org/message-id/CAOFEiBd9j620TsBZPT0%2BuvdemQqwTrCLohcLjuDfQ2ye-xdswQ%40mail.gmail.com

Regards,
Bo Chenbo


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Dmitry Dolgov
Дата:
Сообщение: Re: BUG #17344: Assert failed on queiring async_capable foreign table with inheritance
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: BUG #17345: pg_basebackup stucked for 2 hours before timeout