Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Дата
Msg-id CAA4eK1Lo4A4Anyp07r+XF4zp7okfNtUiM7_=e_s7tUedUB3ejw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Melih Mutlu <m.melihmutlu@gmail.com>)
Ответы Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Melih Mutlu <m.melihmutlu@gmail.com>)
Список pgsql-hackers
On Wed, Jul 27, 2022 at 3:56 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
>
> Hi Amit,
>
> I updated the patch in order to prevent the problems that might be caused by using different replication slots for
syncinga table.
 
> As suggested in previous emails, replication slot names are stored in the catalog. So slot names can be reached later
andit is ensured
 
> that same replication slot is used during tablesync step of a table.
>
> With the current version of the patch:
> -. "srrelslotname" column is introduced into pg_subscibtion_rel catalog. It stores the slot name for tablesync
>
> -. Tablesync worker logic is now as follows:
> 1. Tablesync worker is launched by apply worker for a table.
> 2. Worker generates a default replication slot name for itself. Slot name includes subid and worker pid for tracking
purposes.
> 3. If table has a slot name value in the catalog:
>
> i. If the table state is DATASYNC, drop the replication slot from the catalog and proceed tablesync with a new slot.
>
> ii. If the table state is FINISHEDCOPY, use the replicaton slot from the catalog, do not create a new slot.
>
> 4. Before worker moves to new table, drop any replication slot that are retrieved from the catalog and used.
>

Why after step 4, do you need to drop the replication slot? Won't just
clearing the required info from the catalog be sufficient?

> 5. In case of no table left to sync, drop the replication slot of that sync worker with worker pid if it exists.
(It'spossible that a sync worker do not create a replication slot for itself but uses slots read from the catalog in
eachiteration)
 
>
>
>> I think using worker_pid also has similar risks of mixing slots from
>> different workers because after restart same worker_pid could be
>> assigned to a totally different worker. Can we think of using a unique
>> 64-bit number instead? This will be allocated when each workers
>> started for the very first time and after that we can refer catalog to
>> find it as suggested in the idea below.
>
>
> I'm not sure how likely to have colliding pid's for different tablesync workers in the same subscription.
>

Hmm, I think even if there is an iota of a chance which I think is
there, we can't use worker_pid. Assume, that if the same worker_pid is
assigned to another worker once the worker using it got an error out,
the new worker will fail as soon as it will try to create a
replication slot.

> Though ,having pid in slot name makes it easier to track which slot belongs to which worker. That's why I kept using
pidin slot names.
 
> But I think it should be simple to switch to using a unique 64-bit number. So I can remove pid's from slot names, if
youthink that it would be better.
 
>

I feel it would be better or maybe we need to think of some other
identifier but one thing we need to think about before using a 64-bit
unique identifier here is how will we retrieve its last used value
after restart of server. We may need to store it in a persistent way
somewhere.
>>
>> We should use the same for the origin name as well.
>
>
> I did not really change anything related to origin names. Origin names are still the same and include relation id.
Whatdo you think would be an issue with origin names in this patch?
 
>

The problems will be similar to the slot name. The origin is used to
track the progress of replication, so, if we use the wrong origin name
after the restart, it can send the wrong start_streaming position to
the publisher.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Matthias van de Meent
Дата:
Сообщение: Re: Maximize page freezing
Следующее
От: Zhang Mingli
Дата:
Сообщение: Re: Patch to address creation of PgStat* contexts with null parent context