Re: Initial Schema Sync for Logical Replication

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Initial Schema Sync for Logical Replication
Дата
Msg-id CAD21AoAeNN=ANDbUFbDjZm0HfARAhCmocfKZaU1P4omquqDcOQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Initial Schema Sync for Logical Replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы RE: Initial Schema Sync for Logical Replication
Re: Initial Schema Sync for Logical Replication
Список pgsql-hackers
On Wed, Jul 5, 2023 at 11:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Jun 19, 2023 at 5:29 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi,
> >
> > Below are my review comments for the PoC patch 0001.
> >
> > In addition,  the patch needed rebasing, and, after I rebased it
> > locally in my private environment there were still test failures:
> > a) The 'make check' tests fail but only in a minor way due to changes colname
> > b) the subscription TAP test did not work at all for me -- many errors.
>
> Thank you for reviewing the patch.
>
> While updating the patch, I realized that the current approach won't
> work well or at least has the problem with partition tables. If a
> publication has a partitioned table with publish_via_root = false, the
> subscriber launches tablesync workers for its partitions so that each
> tablesync worker copies data of each partition. Similarly, if it has a
> partition table with publish_via_root = true, the subscriber launches
> a tablesync worker for the parent table. With the current design,
> since the tablesync worker is responsible for both schema and data
> synchronization for the target table, it won't be possible to
> synchronize both the parent table's schema and partitions' schema. For
> example, there is no pg_subscription_rel entry for the parent table if
> the publication has publish_via_root = false. In addition to that, we
> need to be careful about the order of synchronization of the parent
> table and its partitions. We cannot start schema synchronization for
> partitions before its parent table. So it seems to me that we need to
> consider another approach.

So I've implemented a different approach; doing schema synchronization
at a CREATE SUBSCRIPTION time. The backend executing CREATE
SUBSCRIPTION uses pg_dump and restores the table schemas including
both partitioned tables and their partitions regardless of
publish_via_partition_root option, and then creates
pg_subscription_rel entries for tables while respecting
publish_via_partition_root option.

There is a window between table creations and the tablesync workers
starting to process the tables. If DDLs are executed in this window,
the tablesync worker might fail because the table schema might have
already been changed. We need to mention this note in the
documentation. BTW, I think we will be able to get rid of this
downside if we support DDL replication. DDLs executed in the window
are applied by the apply worker and it takes over the data copy to the
tablesync worker at a certain LSN.

I've attached PoC patches. It has regression tests but doesn't have
the documentation yet.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: jian he
Дата:
Сообщение: Re: Add hint message for check_log_destination()
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [PoC] Improve dead tuple storage for lazy vacuum