Re: Deadlock between backend and recovery may not be detected
От | Fujii Masao |
---|---|
Тема | Re: Deadlock between backend and recovery may not be detected |
Дата | |
Msg-id | 3bb4666a-1e8b-dd42-6483-db2b8b81f804@oss.nttdata.com обсуждение исходный текст |
Ответ на | Re: Deadlock between backend and recovery may not be detected ("Drouvot, Bertrand" <bdrouvot@amazon.com>) |
Список | pgsql-hackers |
On 2020/12/19 1:43, Drouvot, Bertrand wrote: > Hi, > > On 12/18/20 10:35 AM, Fujii Masao wrote: >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you canconfirm the sender and know the content is safe. >> >> >> >> On 2020/12/17 2:15, Fujii Masao wrote: >>> >>> >>> On 2020/12/16 23:28, Drouvot, Bertrand wrote: >>>> Hi, >>>> >>>> On 12/16/20 2:36 PM, Victor Yegorov wrote: >>>>> >>>>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unless youcan confirm the sender and know the content is safe. >>>>> >>>>> >>>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: >>>>> >>>>> After doing this procedure, you can see the startup process and backend >>>>> wait for the table lock each other, i.e., deadlock. But this deadlock remains >>>>> even after deadlock_timeout passes. >>>>> >>>>> This seems a bug to me. >>>>> >>>> +1 >>>> >>>>> >>>>> > * Deadlocks involving the Startup process and an ordinary backend process >>>>> > * will be detected by the deadlock detector within the ordinary backend. >>>>> >>>>> The cause of this issue seems that ResolveRecoveryConflictWithLock() that >>>>> the startup process calls when recovery conflict on lock happens doesn't >>>>> take care of deadlock case at all. You can see this fact by reading the above >>>>> source code comment for ResolveRecoveryConflictWithLock(). >>>>> >>>>> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT >>>>> timer in ResolveRecoveryConflictWithLock() so that the startup process can >>>>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend. >>>>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, >>>>> the backend should check whether the deadlock actually happens or not. >>>>> Attached is the POC patch implimenting this. >>>>> >>>> good catch! >>>> >>>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock()too (it is already set in ResolveRecoveryConflictWithBufferPin()). >>>> >>>> So + 1 to consider this as a bug and for the way the patch proposes to fix it. >>> >>> Thanks Victor and Bertrand for agreeing! >>> Attached is the updated version of the patch. >> >> Attached is v3 of the patch. Could you review this version? >> >> While the startup process is waiting for recovery conflict on buffer pin, >> it repeats sending the request for deadlock check to all the backends >> every deadlock_timeout. This may increase the workload in the startup >> process and backends, but since this is the original behavior, the patch >> doesn't change that. > > Agree. > > IMHO that may need to be rethink (as we are already in a conflict situation, i am tempted to say that the less is beingdone the better it is), but i think that's outside the scope of this patch. Yes, I agree that's better. I think that we should improve that as a separate patch only for master branch, after fixing the bug and back-patching that at first. > >> Also in practice this may not be so harmful because >> the period that the buffer is kept pinned is basically not so long. > > Agree that chances are less to be in this mode for a "long" duration (as compare to the lock conflict). > >> >> On the other hand, IMO we should avoid this issue while the startup >> process is waiting for recovery conflict on locks since it can take >> a long time to release the locks. So the patch tries to fix it. > Agree >> >> Or I'm overthinking this? If this doesn't need to be handled, >> the patch can be simplified more. Thought? > > I do think that's good to handle it that way for the lock conflict: the less work is done the better it is (specially ina conflict situation). > > The patch does look good to me. Thanks for the review! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
В списке pgsql-hackers по дате отправления:
Следующее
От: Fujii MasaoДата:
Сообщение: Re: Deadlock between backend and recovery may not be detected