Re: Deadlock between backend and recovery may not be detected
От | Drouvot, Bertrand |
---|---|
Тема | Re: Deadlock between backend and recovery may not be detected |
Дата | |
Msg-id | 2626254b-0578-875c-bb62-34e0accdb5ba@amazon.com обсуждение исходный текст |
Ответ на | Re: Deadlock between backend and recovery may not be detected (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Ответы |
Re: Deadlock between backend and recovery may not be detected
(Fujii Masao <masao.fujii@oss.nttdata.com>)
|
Список | pgsql-hackers |
Hi, On 12/18/20 10:35 AM, Fujii Masao wrote: > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender > and know the content is safe. > > > > On 2020/12/17 2:15, Fujii Masao wrote: >> >> >> On 2020/12/16 23:28, Drouvot, Bertrand wrote: >>> Hi, >>> >>> On 12/16/20 2:36 PM, Victor Yegorov wrote: >>>> >>>> *CAUTION*: This email originated from outside of the organization. >>>> Do not click links or open attachments unless you can confirm the >>>> sender and know the content is safe. >>>> >>>> >>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao >>>> <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>: >>>> >>>> After doing this procedure, you can see the startup process and >>>> backend >>>> wait for the table lock each other, i.e., deadlock. But this >>>> deadlock remains >>>> even after deadlock_timeout passes. >>>> >>>> This seems a bug to me. >>>> >>> +1 >>> >>>> >>>> > * Deadlocks involving the Startup process and an ordinary >>>> backend process >>>> > * will be detected by the deadlock detector within the >>>> ordinary backend. >>>> >>>> The cause of this issue seems that >>>> ResolveRecoveryConflictWithLock() that >>>> the startup process calls when recovery conflict on lock >>>> happens doesn't >>>> take care of deadlock case at all. You can see this fact by >>>> reading the above >>>> source code comment for ResolveRecoveryConflictWithLock(). >>>> >>>> To fix this issue, I think that we should enable >>>> STANDBY_DEADLOCK_TIMEOUT >>>> timer in ResolveRecoveryConflictWithLock() so that the startup >>>> process can >>>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the >>>> backend. >>>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, >>>> the backend should check whether the deadlock actually happens >>>> or not. >>>> Attached is the POC patch implimenting this. >>>> >>> good catch! >>> >>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT >>> shouldn't be set in ResolveRecoveryConflictWithLock() too (it is >>> already set in ResolveRecoveryConflictWithBufferPin()). >>> >>> So + 1 to consider this as a bug and for the way the patch proposes >>> to fix it. >> >> Thanks Victor and Bertrand for agreeing! >> Attached is the updated version of the patch. > > Attached is v3 of the patch. Could you review this version? > > While the startup process is waiting for recovery conflict on buffer pin, > it repeats sending the request for deadlock check to all the backends > every deadlock_timeout. This may increase the workload in the startup > process and backends, but since this is the original behavior, the patch > doesn't change that. Agree. IMHO that may need to be rethink (as we are already in a conflict situation, i am tempted to say that the less is being done the better it is), but i think that's outside the scope of this patch. > Also in practice this may not be so harmful because > the period that the buffer is kept pinned is basically not so long. Agree that chances are less to be in this mode for a "long" duration (as compare to the lock conflict). > > On the other hand, IMO we should avoid this issue while the startup > process is waiting for recovery conflict on locks since it can take > a long time to release the locks. So the patch tries to fix it. Agree > > Or I'm overthinking this? If this doesn't need to be handled, > the patch can be simplified more. Thought? I do think that's good to handle it that way for the lock conflict: the less work is done the better it is (specially in a conflict situation). The patch does look good to me. Bertrand
В списке pgsql-hackers по дате отправления: