Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Дата
Msg-id CAA4eK1Knrj73YW181CBW15vCAqEgNi7Xk7dG8Oah4gg0A_GcOA@mail.gmail.com
обсуждение исходный текст
Ответ на [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Melih Mutlu <m.melihmutlu@gmail.com>)
Ответы Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Melih Mutlu <m.melihmutlu@gmail.com>)
Список pgsql-hackers
On Tue, Jul 5, 2022 at 7:20 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
>
> I created a patch to reuse tablesync workers and their replication slots for more tables that are not synced yet. So
thatoverhead of creating and dropping workers/replication slots can be reduced. 
>
> Current version of logical replication has two steps: tablesync and apply.
> In tablesync step, apply worker creates a tablesync worker for each table and those tablesync workers are killed when
they'redone with their associated table. (the number of tablesync workers running at the same time is limited by
"max_sync_workers_per_subscription")
> Each tablesync worker also creates a replication slot on publisher during its lifetime and drops the slot before
exiting.
>
> The purpose of this patch is getting rid of the overhead of creating/killing a new worker (and replication slot) for
eachtable. 
> It aims to reuse tablesync workers and their replication slots so that tablesync workers can copy multiple tables
frompublisher to subscriber during their lifetime. 
>
> The benefits of reusing tablesync workers can be significant if tables are empty or close to empty.
> In an empty table case, spawning tablesync workers and handling replication slots are where the most time is spent
sincethe actual copy phase takes too little time. 
>
>
> The changes in the behaviour of tablesync workers with this patch as follows:
> 1- After tablesync worker is done with syncing the current table, it takes a lock and fetches tables in init state
> 2- it looks for a table that is not already being synced by another worker from the tables with init state
> 3- If it founds one, updates its state for the new table and loops back to beginning to start syncing
> 4- If no table found, it drops the replication slot and exits
>

How would you choose the slot name for the table sync, right now it
contains the relid of the table for which it needs to perform sync?
Say, if we ignore to include the appropriate identifier in the slot
name, we won't be able to resue/drop the slot after restart of table
sync worker due to an error.

>
> With those changes, I did some benchmarking to see if it improves anything.
> This results compares this patch with the latest version of master branch. "max_sync_workers_per_subscription" is set
to2 as default. 
> Got some results simply averaging timings from 5 consecutive runs for each branch.
>
> First, tested logical replication with empty tables.
> 10 tables
> ----------------
> - master:    286.964 ms
> - the patch:    116.852 ms
>
> 100 tables
> ----------------
> - master:    2785.328 ms
> - the patch:    706.817 ms
>
> 10K tables
> ----------------
> - master:    39612.349 ms
> - the patch:    12526.981 ms
>
>
> Also tried replication tables with some data
> 10 tables loaded with 10MB data
> ----------------
> - master:    1517.714 ms
> - the patch:    1399.965 ms
>
> 100 tables loaded with 10MB data
> ----------------
> - master:    16327.229 ms
> - the patch:    11963.696 ms
>
>
> Then loaded more data
> 10 tables loaded with 100MB data
> ----------------
> - master:    13910.189 ms
> - the patch:    14770.982 ms
>
> 100 tables loaded with 100MB data
> ----------------
> - master:    146281.457 ms
> - the patch:    156957.512
>
>
> If tables are mostly empty, the improvement can be significant - up to 3x faster logical replication.
> With some data loaded, it can still be faster to some extent.
>

These results indicate that it is a good idea, especially for very small tables.

> When the table size increases more, the advantage of reusing workers becomes insignificant.
>

It seems from your results that performance degrades for large
relations. Did you try to investigate the reasons for the same?

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pg_upgrade (12->14) fails on aggregate
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Re-order "disable_on_error" in tab-complete COMPLETE_WITH