Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Дата
Msg-id 20230114172022.3oy77jhzippyupgx@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1  (Andres Freund <andres@anarazel.de>)
Ответы Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Список pgsql-bugs
Hi,

On 2023-01-14 08:02:01 -0800, Andres Freund wrote:
> Because the logical rep code explicitly prevents interrupts:
> 
>     /*
>      * Create a new permanent logical decoding slot. This slot will be used
>      * for the catchup phase after COPY is done, so tell it to use the
>      * snapshot to make the final data consistent.
>      *
>      * Prevent cancel/die interrupts while creating slot here because it is
>      * possible that before the server finishes this command, a concurrent
>      * drop subscription happens which would complete without removing this
>      * slot leading to a dangling slot on the server.
>      */
>     HOLD_INTERRUPTS();
>     walrcv_create_slot(LogRepWorkerWalRcvConn,
>                        slotname, false /* permanent */ , false /* two_phase */ ,
>                        CRS_USE_SNAPSHOT, origin_startpos);
>     RESUME_INTERRUPTS();
> 
> Which is just completely entirely wrong. Independent of this issue even. Not
> allowing termination for the duration of command executed over network?
> 
> This is from:
> 
> commit 6b67d72b604cb913e39324b81b61ab194d94cba0
> Author: Amit Kapila <akapila@postgresql.org>
> Date:   2021-03-17 08:15:12 +0530
> 
>     Fix race condition in drop subscription's handling of tablesync slots.
> 
>     Commit ce0fdbfe97 made tablesync slots permanent and allow Drop
>     Subscription to drop such slots. However, it is possible that before
>     tablesync worker could get the acknowledgment of slot creation, drop
>     subscription stops it and that can lead to a dangling slot on the
>     publisher. Prevent cancel/die interrupts while creating a slot in the
>     tablesync worker.
> 
>     Reported-by: Thomas Munro as per buildfarm
>     Author: Amit Kapila
>     Reviewed-by: Vignesh C, Takamichi Osumi
>     Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com
> 
> 
> But this can't be the right fix.

I wonder if we ought to put a
    Assert(InterruptHoldoffCount == 0 && CritSectionCount == 0)

in some of the routines in libpqwalreceiver to protect against issues like
this. It'd be easy enough to introduce one accidentally, due to holding an
lwlock.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17739: postgres ts_headline function is not returning matches it should during full text search
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1