Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Дата
Msg-id CAJpy0uDCUw8P+RorkNcsC77fnKXLzkdQUGKJu+s_ewBWTnQ92Q@mail.gmail.com
обсуждение исходный текст
Ответ на RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication  ("shiy.fnst@fujitsu.com" <shiy.fnst@fujitsu.com>)
Ответы Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Melih Mutlu <m.melihmutlu@gmail.com>)
Список pgsql-hackers
On Tue, Feb 7, 2023 at 8:18 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
>
> On Thu, Feb 2, 2023 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> >
> > So to fix this, I think either we update origin and slot entries in
> > the system catalog after the creation has passed or we clean-up the
> > system catalog in case of failure. What do you think?
> >
>
> I think the first way seems better.

Yes, I agree.

>
> I reproduced the problem I reported before with latest patch (v7-0001,
> v10-0002), and looked into this problem. It is caused by a similar reason. Here
> is some analysis for the problem I reported [1].#6.
>
> First, a tablesync worker (worker-1) started for "tbl1", its originname is
> "pg_16398_1". And it exited because of unique constraint. In
> LogicalRepSyncTableStart(), originname in pg_subscription_rel is updated when
> updating table state to DATASYNC, and the origin is created when updating table
> state to FINISHEDCOPY. So when it exited with state DATASYNC , the origin is not
> created but the originname has been updated in pg_subscription_rel.
>
> Then a tablesync worker (worker-2) started for "tbl2", its originname is
> "pg_16398_2". After tablesync of "tbl2" finished, this worker moved to sync
> table "tbl1". In LogicalRepSyncTableStart(), it got the originname of "tbl1" -
> "pg_16398_1", by calling ReplicationOriginNameForLogicalRep(), and tried to drop
> the origin (although it is not actually created before). After that, it called
> replorigin_by_name to get the originid whose name is "pg_16398_1" and the result
> is InvalidOid. Origin won't be created in this case because the sync worker has
> created a replication slot (when it synced tbl2), so the originid was still
> invalid and it caused an assertion failure when calling replorigin_advance().
>
> It seems we don't need to drop previous origin in worker-2 because the previous
> origin was not created in worker-1. I think one way to fix it is to not update
> originname of pg_subscription_rel when setting state to DATASYNC, and only do
> that when setting state to FINISHEDCOPY. If so, the originname in
> pg_subscription_rel will be set at the same time the origin is created.

+1. Update of system-catalog needs to be done carefully and only when
origin is created.

thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: James Coleman
Дата:
Сообщение: Re: Parallelize correlated subqueries that execute within each worker
Следующее
От: Aleksander Alekseev
Дата:
Сообщение: Re: Hi i am Intrested to contribute