Re: Add Information during standby recovery conflicts
От | Drouvot, Bertrand |
---|---|
Тема | Re: Add Information during standby recovery conflicts |
Дата | |
Msg-id | 8c1fca5b-6c5c-df6f-0b6d-4702eeb11c5c@amazon.com обсуждение исходный текст |
Ответ на | Re: Add Information during standby recovery conflicts (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Ответы |
Re: Add Information during standby recovery conflicts
(Fujii Masao <masao.fujii@oss.nttdata.com>)
|
Список | pgsql-hackers |
Hi, On 12/14/20 4:20 PM, Fujii Masao wrote: > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender > and know the content is safe. > > > > On 2020/12/14 21:31, Fujii Masao wrote: >> >> >> On 2020/12/05 12:38, Masahiko Sawada wrote: >>> On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand >>> <bdrouvot@amazon.com> wrote: >>>> >>>> Hi, >>>> >>>> On 12/4/20 2:21 AM, Fujii Masao wrote: >>>>> >>>>> On 2020/12/04 9:28, Masahiko Sawada wrote: >>>>>> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao >>>>>> <masao.fujii@oss.nttdata.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2020/12/01 17:29, Drouvot, Bertrand wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 12/1/20 12:35 AM, Masahiko Sawada wrote: >>>>>>>>> CAUTION: This email originated from outside of the organization. >>>>>>>>> Do not click links or open attachments unless you can confirm the >>>>>>>>> sender and know the content is safe. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera >>>>>>>>> <alvherre@alvh.no-ip.org> wrote: >>>>>>>>>> On 2020-Dec-01, Fujii Masao wrote: >>>>>>>>>> >>>>>>>>>>> + if (proc) >>>>>>>>>>> + { >>>>>>>>>>> + if (nprocs == 0) >>>>>>>>>>> + appendStringInfo(&buf, "%d", proc->pid); >>>>>>>>>>> + else >>>>>>>>>>> + appendStringInfo(&buf, ", %d", proc->pid); >>>>>>>>>>> + >>>>>>>>>>> + nprocs++; >>>>>>>>>>> >>>>>>>>>>> What happens if all the backends in wait_list have gone? In >>>>>>>>>>> other words, >>>>>>>>>>> how should we handle the case where nprocs == 0 (i.e., nprocs >>>>>>>>>>> has not been >>>>>>>>>>> incrmented at all)? This would very rarely happen, but can >>>>>>>>>>> happen. >>>>>>>>>>> In this case, since buf.data is empty, at least there seems no >>>>>>>>>>> need to log >>>>>>>>>>> the list of conflicting processes in detail message. >>>>>>>>>> Yes, I noticed this too; this can be simplified by changing the >>>>>>>>>> condition in the ereport() call to be "nprocs > 0" (rather than >>>>>>>>>> wait_list being null), otherwise not print the errdetail. (You >>>>>>>>>> could >>>>>>>>>> test buf.data or buf.len instead, but that seems uglier to me.) >>>>>>>>> +1 >>>>>>>>> >>>>>>>>> Maybe we can also improve the comment of this function from: >>>>>>>>> >>>>>>>>> + * This function also reports the details about the conflicting >>>>>>>>> + * process ids if *wait_list is not NULL. >>>>>>>>> >>>>>>>>> to " This function also reports the details about the conflicting >>>>>>>>> process ids if exist" or something. >>>>>>>>> >>>>>>>> Thank you all for the review/remarks. >>>>>>>> >>>>>>>> They have been addressed in the new attached patch version. >>>>>>> >>>>>>> Thanks for updating the patch! I read through the patch again >>>>>>> and applied the following chages to it. Attached is the updated >>>>>>> version of the patch. Could you review this version? If there is >>>>>>> no issue in it, I'm thinking to commit this version. >>>>>> >>>>>> Thank you for updating the patch! I have one question. >>>>>> >>>>>>> >>>>>>> + timeouts[cnt].id = STANDBY_TIMEOUT; >>>>>>> + timeouts[cnt].type = TMPARAM_AFTER; >>>>>>> + timeouts[cnt].delay_ms = DeadlockTimeout; >>>>>>> >>>>>>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here? >>>>>>> I changed the code that way. >>>>>> >>>>>> As the comment of ResolveRecoveryConflictWithLock() says the >>>>>> following, a deadlock is detected by the ordinary backend process: >>>>>> >>>>>> * Deadlocks involving the Startup process and an ordinary backend >>>>>> proces >>>>>> * will be detected by the deadlock detector within the ordinary >>>>>> backend. >>>>>> >>>>>> If we use STANDBY_DEADLOCK_TIMEOUT, >>>>>> SendRecoveryConflictWithBufferPin() will be called after >>>>>> DeadlockTimeout passed, but I think it's not necessary for the >>>>>> startup >>>>>> process in this case. >>>>> >>>>> Thanks for pointing this! You are right. >>>>> >>>>> >>>>>> If we want to just wake up the startup process >>>>>> maybe we can use STANDBY_TIMEOUT here? >>>>> >>>> Thanks for the patch updates! Except what we are still discussing >>>> below, >>>> it looks good to me. >>>> >>>>> When STANDBY_TIMEOUT happens, a request to release conflicting buffer >>>>> pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT >>>>> there? >>>> >>>> Agree >>>> >>>>> >>>>> Or, first of all, we don't need to enable the deadlock timer at all? >>>>> Since what we'd like to do is to wake up after deadlock_timeout >>>>> passes, we can do that by changing ProcWaitForSignal() so that it can >>>>> accept the timeout and giving the deadlock_timeout to it. If we do >>>>> this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from >>>>> ResolveRecoveryConflictWithLock(). Thought? >>> >>> Where do we enable deadlock timeout in hot standby case? You meant to >>> enable it in ProcWaitForSignal() or where we set a timer for not hot >>> standby case, in ProcSleep()? >> >> No, what I tried to say is to change >> ResolveRecoveryConflictWithLock() so that it does >> >> 1. calculate the "minimum" timeout from deadlock_timeout and >> max_standby_xxx_delay >> 2. give the calculated timeout value to ProcWaitForSignal() >> 3. wait for signal and timeout on ProcWaitForSignal() >> >> Since ProcWaitForSignal() calls WaitLatch(), seems it's not so >> difficult to make ProcWaitForSignal() handle the timeout. If we do >> this, I was thinking that we can get rid of enable_timeouts() from >> ResolveRecoveryConflictWithLock(). >> >> >>> >>>> >>>> Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it >>>> triggers >>>> a call to StandbyLockTimeoutHandler() which does nothing, except >>>> waking >>>> up. That's what we want, right?) >>> >>> Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup >>> process can wake up and do nothing. Thank you for pointing out. >> >> Okay, understood! Firstly I was thinking that enabling the same type >> (i.e., STANDBY_LOCK_TIMEOUT) of lock twice doesn't work properly, but >> as far as I read the code, it works. In that case, only the shorter >> timeout would be activated in enable_timeouts(). So I agree to use >> STANDBY_LOCK_TIMEOUT. > > So I renamed the argument "deadlock_timer" in > ResolveRecoveryConflictWithLock() > because it's not the timer for deadlock and is confusing. Attached is the > updated version of the patch. Barring any objection, I will commit > this version. Thanks for the update! Indeed the naming is more appropriate and less confusing that way, this version looks all good to me. Thanks! Bertrand
В списке pgsql-hackers по дате отправления:
Следующее
От: Bharath RupireddyДата:
Сообщение: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit