Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Дата
Msg-id CAA4eK1L5c+ZcK72evGxodq3zLje=Qv-t2Qi1GcAmKxnm5SQhYQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1  (Andres Freund <andres@anarazel.de>)
Ответы Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Список pgsql-bugs
On Sat, Jan 14, 2023 at 9:32 PM Andres Freund <andres@anarazel.de> wrote:
>
> The problem is here:
>
> On 2023-01-13 20:53:49 +0530, Lakshmi Narayanan Sreethar wrote:
> > #7  0x0000559cccbe1e71 in LogicalRepSyncTableStart
> > (origin_startpos=0x7fffb26f7728) at
> > /pg15.1/src/backend/replication/logical/tablesync.c:1353
>
> Because the logical rep code explicitly prevents interrupts:
>
>         /*
>          * Create a new permanent logical decoding slot. This slot will be used
>          * for the catchup phase after COPY is done, so tell it to use the
>          * snapshot to make the final data consistent.
>          *
>          * Prevent cancel/die interrupts while creating slot here because it is
>          * possible that before the server finishes this command, a concurrent
>          * drop subscription happens which would complete without removing this
>          * slot leading to a dangling slot on the server.
>          */
>         HOLD_INTERRUPTS();
>         walrcv_create_slot(LogRepWorkerWalRcvConn,
>                                            slotname, false /* permanent */ , false /* two_phase */ ,
>                                            CRS_USE_SNAPSHOT, origin_startpos);
>         RESUME_INTERRUPTS();
>
> Which is just completely entirely wrong. Independent of this issue even. Not
> allowing termination for the duration of command executed over network?
>
> This is from:
>
> commit 6b67d72b604cb913e39324b81b61ab194d94cba0
> Author: Amit Kapila <akapila@postgresql.org>
> Date:   2021-03-17 08:15:12 +0530
>
>     Fix race condition in drop subscription's handling of tablesync slots.
>
>     Commit ce0fdbfe97 made tablesync slots permanent and allow Drop
>     Subscription to drop such slots. However, it is possible that before
>     tablesync worker could get the acknowledgment of slot creation, drop
>     subscription stops it and that can lead to a dangling slot on the
>     publisher. Prevent cancel/die interrupts while creating a slot in the
>     tablesync worker.
>
>     Reported-by: Thomas Munro as per buildfarm
>     Author: Amit Kapila
>     Reviewed-by: Vignesh C, Takamichi Osumi
>     Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com
>
>
> But this can't be the right fix.
>

I will look into this and your suggestion in a later email.

-- 
With Regards,
Amit Kapila.



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Следующее
От: Mats Kindahl
Дата:
Сообщение: Re: Crash during backend start when low on memory