Re: Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes)

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes)
Дата
Msg-id ea96bc84-e242-4179-a440-9d4b8a7bae9f@enterprisedb.com
обсуждение исходный текст
Ответ на RE:Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes)  (<Rintaro.Ikeda@nttdata.com>)
Список pgsql-bugs

On 3/4/24 09:35, Rintaro.Ikeda@nttdata.com wrote:
> Hi,
> 
> I correct the previous bug report [1] to provide a more accurate 
> description. The bug report demonstrated undetected deadlock between 
> client backend and startup processes on a standby server. (The title
> in the previous bug report is "Undetected deadlock between primary
> and standby processes". But this was wrong. Actually, this should be
> noted that "Undetected deadlock between client backend and startup
> process on a standby server".)
> 
> After the procedures proposed in my bug report [1], a recovery 
> conflict is present because the tablespace which startup process
> tries to drop is used by cliend backend process in standby. We see
> the pg_stat_activity (shown below), which implies a deadlock. A
> client backend process waits for AccessExclusiveLock to be released.
> Startup process waits for recovery conflict resolution for dropping
> the tablespace. This deadlock is not resolved after deadlock_timeout
> passes.
>
> (Standby server)
> postgres=# select datid, datname, wait_event_type, wait_event, query, backend_type from pg_stat_activity ;
> datid | datname  | wait_event_type |         wait_event         |                                              query
                                           |   backend_type
 
>
-------+----------+-----------------+----------------------------+-------------------------------------------------------------------------------------------------+-------------------
>      5 | postgres | Lock            | relation                   | SELECT * FROM t;
                                            | client backend
 
>        |          | IPC             | RecoveryConflictTablespace |
                                            | startup
 
> 
> 
> This deadlock is similar to the previously identified and patched 
> issue [2], which also involved an undetected deadlock between
> backend process and recovery on a standby server. I think the
> deadlock explained in this report should be detected and resolved.
>

Thanks for the report.

So what are the steps to reproduce this? The previous message did all
kinds of stuff on the primary and then got stuck on pg_switch_wal() on
the primary, but this updated seems to do stuff on the standby and gets
the lockup there.

It seems similar in the sense that it's about interaction between
recovery and a regular backend, but unfortunately
ResolveRecoveryConflictWithVirtualXIDs does not wait for a lock, it just
checks if the XID is still running, so it's invisible to the deadlock
detector :-(

But it's still checked against max_standby_streaming_delay, which should
resolve the deadlock (unless set to -1 to allow infinite delays) at some
point, right?

Also, I'm not very familiar with ResolveRecoveryConflictWithVirtualXIDs,
but it seems it's doing a busy wait. I wonder if that's a good idea, but
it's independent of this bug report.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18385: Assert("strategy_delta >= 0") in BgBufferSync() fails due to race condition
Следующее
От: Alexey Ermakov
Дата:
Сообщение: Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker