Re: Initial Schema Sync for Logical Replication

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Initial Schema Sync for Logical Replication
Дата
Msg-id CAA4eK1J43qN0ixCv_mcc6mAe2PJ8_s=_UMMs-8e1FAGbB6XkBA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Initial Schema Sync for Logical Replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы RE: Initial Schema Sync for Logical Replication
RE: Initial Schema Sync for Logical Replication
Re: Initial Schema Sync for Logical Replication
Re: Initial Schema Sync for Logical Replication
Список pgsql-hackers
On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com> wrote:
> >
> > > You should
> > > exclude them removing these objects from the TOC before running pg_restore or
> > > adding a few pg_dump options to exclude these objects. Another issue is related
> > > to different version. Let's say the publisher has a version ahead of the
> > > subscriber version, a new table syntax can easily break your logical
> > > replication setup. IMO pg_dump doesn't seem like a good solution for initial
> > > synchronization.
> > >
> > > Instead, the backend should provide infrastructure to obtain the required DDL
> > > commands for the specific (set of) tables. This can work around the issues from
> > > the previous paragraph:
> > >
> > ...
> > > * don't need to worry about different versions.
> > >
> >
> > AFAICU some of the reasons why pg_dump is not allowed to dump from the
> > newer version are as follows: (a) there could be more columns in the
> > newer version of the system catalog and then Select * type of stuff
> > won't work because the client won't have knowledge of additional
> > columns. (b) the newer version could have new features (represented by
> > say new columns in existing catalogs or new catalogs) that the older
> > version of pg_dump has no knowledge of and will fail to get that data
> > and hence an inconsistent dump. The subscriber will easily be not in
> > sync due to that.
> >
> > Now, how do we avoid these problems even if we have our own version of
> > functionality similar to pg_dump for selected objects? I guess we will
> > face similar problems.
>
> Right. I think that such functionality needs to return DDL commands
> that can be executed on the requested version.
>
> > If so, we may need to deny schema sync in any such case.
>
> Yes. Do we have any concrete use case where the subscriber is an older
> version, in the first place?
>

As per my understanding, it is mostly due to the reason that it can
work today. Today, during an off-list discussion with Jonathan on this
point, he pointed me to a similar incompatibility in MySQL
replication. See the "SQL incompatibilities" section in doc[1]. Also,
please note that this applies not only to initial sync but also to
schema sync during replication. I don't think it would be feasible to
keep such cross-version compatibility for DDL replication.

Having said above, I don't intend that we must use pg_dump from the
subscriber for the purpose of initial sync. I think the idea at this
stage is to primarily write a POC patch to see what difficulties we
may face. The other options that we could try out are (a) try to
duplicate parts of pg_dump code in some way (by extracting required
code) for the subscription's initial sync, or (b) have a common code
(probably as a library or some other way) for the required
functionality. There could be more possibilities that we may not have
thought of yet. But the main point is that for approaches other than
using pg_dump, we should consider ways to avoid duplicity of various
parts of its code. Due to this, I think before ruling out using
pg_dump, we should be clear about its risks and limitations.

Thoughts?

[1] - https://dev.mysql.com/doc/refman/8.0/en/replication-compatibility.html
[2] - https://www.postgresql.org/message-id/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: CREATE DATABASE ... STRATEGY WAL_LOG issues
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Error "initial slot snapshot too large" in create replication slot