Re: Timeout failure in 019_replslot_limit.pl

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Timeout failure in 019_replslot_limit.pl
Дата
Msg-id CAA4eK1JSCFrLi7p=vxGr2_kKjv6RzjyhsTK5rxFDNrNTPdqiwA@mail.gmail.com
обсуждение исходный текст
Ответ на Timeout failure in 019_replslot_limit.pl  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Timeout failure in 019_replslot_limit.pl  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Wed, Sep 22, 2021 at 12:57 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Sep 20, 2021 at 09:38:29AM -0300, Alvaro Herrera wrote:
> > On 2021-Sep-20, Michael Paquier wrote:
> >> The test gets the right PIDs, as the logs showed:
> >> ok 17 - have walsender pid 12663
> >> ok 18 - have walreceiver pid 12662
> >
> > As I understood, Horiguchi-san's point isn't that the PIDs might be
> > wrong -- the point is to make sure that the process is in state T before
> > moving on to the next step in the test.
>
> I have spent more time on this issue, as it looks that I am the only
> one with an environment able to reproduce it (Big Sur 11.6).
>
> As far as I can see, the states of the WAL sender and receiver are
> fine, after adding some extra debugging with ps called from the test
> itself, and I have checked that they are SIGSTOP'd or SIGCONT'd when a
> failure shows up.
>
> In a sequence that passes, we have the following sequence:
> - Start checkpoint.
> - SIGSTOP sent to WAL sender and receiver.
> - Advance WAL (CREATE TABLE, DROP TABLE and pg_switch_wal)
> - Check that WAL sender is stopped
> - SIGCONT on WAL sender.
>

Am I understanding correctly that after sending SIGCONT to the WAL
sender, the checkpoint's SIGTERM signal for the WAL sender is received
and it releases the slot and terminates itself?

> - Invalidate the slot, checkpoint completes.

After which checkpoint invalidates the slot and completes.

Now, in the failed run, it appears that due to some reason WAL sender
has not released the slot. Is it possible to see if the WAL sender is
still alive when a checkpoint is stuck at ConditionVariableSleep? And
if it is active, what is its call stack?

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: OpenSSL 3.0.0 compatibility
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: prevent immature WAL streaming