Deadlock between backend and recovery may not be detected

Поиск

Список

Период

Сортировка

От	Fujii Masao
Тема	Deadlock between backend and recovery may not be detected
Дата	16 декабря 2020 г. 12:49:07
Msg-id	4041d6b6-cf24-a120-36fa-1294220f8243@oss.nttdata.com обсуждение исходный текст
Ответы	Re: Deadlock between backend and recovery may not be detected
Список	pgsql-hackers

Дерево обсуждения

Hi,

While reviewing the patch proposed at [1], I found that there is the case
where deadlock that recovery conflict on lock is involved in may not be
detected. This deadlock can happen between backends and the startup
process, in the standby server. Please see the following procedure to
reproduce the deadlock.

#1. Set up streaming replication.

#2. Set max_standby_streaming_delay to -1 in the standby.

#3. Create two tables in the primary.

     [PRIMARY: SESSION1]
     CREATE TABLE t1 ();
     CREATE TABLE t2 ();

#4. Start transaction and access to the table t1.

     [STANDBY: SESSION2]
     BEGIN;
     SELECT * FROM t1;

#5. Start transaction and lock table t2 in access exclusive mode,
     in the primary. Also execute pg_switch_wal() to transfer WAL record
     for access exclusive lock to the standby.

     [PRIMARY: SESSION1]
     BEGIN;
     LOCK TABLE t2 IN ACCESS EXCLUSIVE MODE;
     SELECT pg_switch_wal();

#6. Access to the table t2 within the transaction that started at #4,
     in the standby.

     [STANDBY: SESSION2]
     SELECT * FROM t2;

#7. Lock table t1 in access exclusive mode within the transaction that
     started in #5, in the primary. Also execute pg_switch_wal() to transfer
     WAL record for access exclusive lock to the standby.

     [PRIMARY: SESSION1]
     LOCK TABLE t1 IN ACCESS EXCLUSIVE MODE;
     SELECT pg_switch_wal();

After doing this procedure, you can see the startup process and backend
wait for the table lock each other, i.e., deadlock. But this deadlock remains
even after deadlock_timeout passes.

This seems a bug to me.

> * Deadlocks involving the Startup process and an ordinary backend process
> * will be detected by the deadlock detector within the ordinary backend.

The cause of this issue seems that ResolveRecoveryConflictWithLock() that
the startup process calls when recovery conflict on lock happens doesn't
take care of deadlock case at all. You can see this fact by reading the above
source code comment for ResolveRecoveryConflictWithLock().

To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
timer in ResolveRecoveryConflictWithLock() so that the startup process can
send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
the backend should check whether the deadlock actually happens or not.
Attached is the POC patch implimenting this.

Thought?

Regards,

[1] https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Вложения

recovery_conflict_lock_deadlock_v1.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Deadlock between backend and recovery may not be detected

Вложения