Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests
Дата
Msg-id 31674.1496780737@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 6, 2017 at 2:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm.  With some generous assumptions it'd be possible to think that
>> aa1351f1eec4adae39be59ce9a21410f9dd42118 triggered this.  That commit was
>> present in 20 successful lorikeet runs before the first of these failures,
>> which is a bit more than the MTBF after that, but not a huge amount more.
>> That commit in itself looks innocent enough, but could it have exposed
>> some latent bug in bgworker launching?

> Hmm, that's a really interesting idea, but I can't quite put together
> a plausible theory around it.

Yeah, me either, but we're really theorizing in advance of the data here.
Andrew, could you apply the attached patch on lorikeet and run the
regression tests enough times to get a couple of failures?  Then grepping
the postmaster log for 'parallel worker' should give you results like

2017-06-06 16:20:12.393 EDT [31216] LOG:  starting PID 31216, parallel worker for PID 31215, worker number 0
2017-06-06 16:20:12.400 EDT [31216] LOG:  stopping PID 31216, parallel worker for PID 31215, worker number 0
2017-06-06 16:20:12.406 EDT [31217] LOG:  starting PID 31217, parallel worker for PID 31215, worker number 3
2017-06-06 16:20:12.406 EDT [31218] LOG:  starting PID 31218, parallel worker for PID 31215, worker number 2
2017-06-06 16:20:12.406 EDT [31219] LOG:  starting PID 31219, parallel worker for PID 31215, worker number 1
2017-06-06 16:20:12.406 EDT [31220] LOG:  starting PID 31220, parallel worker for PID 31215, worker number 0
2017-06-06 16:20:12.412 EDT [31218] LOG:  stopping PID 31218, parallel worker for PID 31215, worker number 2
2017-06-06 16:20:12.412 EDT [31219] LOG:  stopping PID 31219, parallel worker for PID 31215, worker number 1
2017-06-06 16:20:12.412 EDT [31220] LOG:  stopping PID 31220, parallel worker for PID 31215, worker number 0
2017-06-06 16:20:12.412 EDT [31217] LOG:  stopping PID 31217, parallel worker for PID 31215, worker number 3
... etc etc ...

If it looks different from that in a crash case, we'll have something
to go on.

(I'm tempted to add something like this permanently, at DEBUG1 or DEBUG2
or so.)

            regards, tom lane

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index cb22174..d3cb26c 100644
*** a/src/backend/access/transam/parallel.c
--- b/src/backend/access/transam/parallel.c
*************** ParallelWorkerMain(Datum main_arg)
*** 950,955 ****
--- 950,961 ----
      Assert(ParallelWorkerNumber == -1);
      memcpy(&ParallelWorkerNumber, MyBgworkerEntry->bgw_extra, sizeof(int));

+     /* Log parallel worker startup. */
+     ereport(LOG,
+             (errmsg("starting PID %d, %s, worker number %d",
+                     MyProcPid, MyBgworkerEntry->bgw_name,
+                     ParallelWorkerNumber)));
+
      /* Set up a memory context and resource owner. */
      Assert(CurrentResourceOwner == NULL);
      CurrentResourceOwner = ResourceOwnerCreate(NULL, "parallel toplevel");
*************** ParallelWorkerMain(Datum main_arg)
*** 1112,1117 ****
--- 1118,1129 ----

      /* Report success. */
      pq_putmessage('X', NULL, 0);
+
+     /* Log parallel worker shutdown. */
+     ereport(LOG,
+             (errmsg("stopping PID %d, %s, worker number %d",
+                     MyProcPid, MyBgworkerEntry->bgw_name,
+                     ParallelWorkerNumber)));
  }

  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: [HACKERS] Re: [BUGS] BUG #14682: row level security not work with partitionedtable
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table