Re: delay starting WAL receiver

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: delay starting WAL receiver
Дата	11 января 2023 г. 07:20:38
Msg-id	CA+hUKGLWv2PfMQR3FSo=M65+MGCGp_ZiYiWfS42+4VNqrrA+ig@mail.gmail.com обсуждение исходный текст
Ответ на	delay starting WAL receiver (Nathan Bossart <nathandbossart@gmail.com>)
Ответы	Re: delay starting WAL receiver Re: delay starting WAL receiver
Список	pgsql-hackers

Дерево обсуждения

On Wed, Jan 11, 2023 at 2:08 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I discussed this a bit in a different thread [0], but I thought it deserved
> its own thread.
>
> After setting wal_retrieve_retry_interval to 1ms in the tests, I noticed
> that the recovery tests consistently take much longer.  Upon further
> inspection, it looks like a similar race condition to the one described in
> e5d494d's commit message.  With some added debug logs, I see that all of
> the callers of MaybeStartWalReceiver() complete before SIGCHLD is
> processed, so ServerLoop() waits for a minute before starting the WAL
> receiver.
>
> The attached patch fixes this by adjusting DetermineSleepTime() to limit
> the sleep to at most 100ms when WalReceiverRequested is set, similar to how
> the sleep is limited when background workers must be restarted.

Is the problem here that SIGCHLD is processed ...

            PG_SETMASK(&UnBlockSig); <--- here?

            selres = select(nSockets, &rmask, NULL, NULL, &timeout);

Meanwhile the SIGCHLD handler code says:

         * Was it the wal receiver?  If exit status is zero (normal) or one
         * (FATAL exit), we assume everything is all right just like normal
         * backends.  (If we need a new wal receiver, we'll start one at the
         * next iteration of the postmaster's main loop.)

... which is true, but that won't be reached for a while in this case
if the timeout has already been set to 60s.  Your patch makes that
100ms, in that case, a time delay that by now attracts my attention
like a red rag to a bull (I don't know why you didn't make it 0).

I'm not sure, but if I got that right, then I think the whole problem
might automatically go away with CF #4032.  The SIGCHLD processing
code will run not when signals are unblocked before select() (that is
gone), but instead *after* the event loop wakes up with WL_LATCH_SET,
and runs the handler code in the regular user context before dropping
through to the rest of the main loop.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Bharath Rupireddy
Дата: 11 января 2023 г., 07:17:50
Сообщение: Re: Strengthen pg_waldump's --save-fullpage tests

Следующее

От: Amit Kapila
Дата: 11 января 2023 г., 07:21:02
Сообщение: Re: Perform streaming logical transactions by background workers and parallel apply

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: delay starting WAL receiver

Предыдущее

Следующее