Re: Delay of standby shutdown
От | Fujii Masao |
---|---|
Тема | Re: Delay of standby shutdown |
Дата | |
Msg-id | 87b9d893-101e-664f-270e-6029b1fe7a0c@oss.nttdata.com обсуждение исходный текст |
Ответ на | Re: Delay of standby shutdown (Soumyadeep Chakraborty <soumyadeep2007@gmail.com>) |
Ответы |
Re: Delay of standby shutdown
|
Список | pgsql-hackers |
On 2020/11/04 9:35, Soumyadeep Chakraborty wrote: > Hello Fujii, > > On Thu, Sep 17, 2020 at 6:49 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > As far as I understand after debugging, the problem is as follows: Yes. > > 1. After the primary is stopped, and the standby startup process is > waiting inside: > > (void) WaitLatch(&XLogCtl->recoveryWakeupLatch, > WL_LATCH_SET | WL_TIMEOUT | > WL_EXIT_ON_PM_DEATH, > wait_time, > WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL); > > it receives SIGTERM on account of stopping the standby and it leads to > the WaitLatch call returning with WL_LATCH_SET. > > 2. WaitForWALToBecomeAvailable() then will return true after calling > XLogFileReadAnyTLI() and eventually, XLogReadRecord() will return NULL > since there is no new WAL to read, which means ReadRecord() will loop > back and perform another XLogReadRecord(). > > 3. The additional XLogReadRecord() will lead to a 5s wait inside > WaitForWALToBecomeAvailable() - a different WaitLatch() call this time: > > /* > * Wait for more WAL to arrive. Time out after 5 seconds > * to react to a trigger file promptly and to check if the > * WAL receiver is still active. > */ > (void) WaitLatch(&XLogCtl->recoveryWakeupLatch, > WL_LATCH_SET | WL_TIMEOUT | > WL_EXIT_ON_PM_DEATH, > 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM); > > 4. And then eventually, the code will handle interrupts here inside > WaitForWALToBecomeAvailable() after the 5s wait: > > /* > * This possibly-long loop needs to handle interrupts of startup > * process. > */ > HandleStartupProcInterrupts(); > >> To avoid this issue, I think that ReadRecord() should call >> HandleStartupProcInterrupts() whenever looping back to retry. > > Alternatively, perhaps we can place it inside > WaitForWALToBecomeAvailable() (which already has a similar call), > since it is more semantically aligned to checking for interrupts, rather > than ReadRecord()? Like this: Yes. Your approach looks better to me. Attached is the updated version of the patch implementing that. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Вложения
В списке pgsql-hackers по дате отправления: