Re: Initial Schema Sync for Logical Replication

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Initial Schema Sync for Logical Replication
Дата
Msg-id CAA4eK1Kw8WhioNekkeffQJ4=YM=zi+OhHv_N5aXy1zGybGPKbA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Initial Schema Sync for Logical Replication  ("Euler Taveira" <euler@eulerto.com>)
Ответы Re: Initial Schema Sync for Logical Replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: Initial Schema Sync for Logical Replication  ("Euler Taveira" <euler@eulerto.com>)
Список pgsql-hackers
On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote:
>
> > From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> > On 2023-Mar-15, Kumar, Sachin wrote:
> >
> > > 1. In  CreateSubscription()  when we create replication
> > > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we
> > can use this snapshot later in the pg_dump.
> > >
> > > 2.  Now we can call pg_dump with above snapshot from CreateSubscription.
> >
> > Overall I'm not on board with the idea that logical replication would depend on
> > pg_dump; that seems like it could run into all sorts of trouble (what if calling
> > external binaries requires additional security setup?  what about pg_hba
> > connection requirements? what about max_connections in tight
> > circumstances?).
> > what if calling external binaries requires additional security setup
> I am not sure what kind of security restriction would apply in this case, maybe pg_dump
> binary can be changed ?
>
> Using pg_dump as part of this implementation is not acceptable because we
> expect the backend to be decoupled from the client. Besides that, pg_dump
> provides all table dependencies (such as tablespaces, privileges, security
> labels, comments); not all dependencies shouldn't be replicated.
>

I agree that in the initial version we may not support sync of all
objects but why that shouldn't be possible in the later versions?

> You should
> exclude them removing these objects from the TOC before running pg_restore or
> adding a few pg_dump options to exclude these objects. Another issue is related
> to different version. Let's say the publisher has a version ahead of the
> subscriber version, a new table syntax can easily break your logical
> replication setup. IMO pg_dump doesn't seem like a good solution for initial
> synchronization.
>
> Instead, the backend should provide infrastructure to obtain the required DDL
> commands for the specific (set of) tables. This can work around the issues from
> the previous paragraph:
>
...
> * don't need to worry about different versions.
>

AFAICU some of the reasons why pg_dump is not allowed to dump from the
newer version are as follows: (a) there could be more columns in the
newer version of the system catalog and then Select * type of stuff
won't work because the client won't have knowledge of additional
columns. (b) the newer version could have new features (represented by
say new columns in existing catalogs or new catalogs) that the older
version of pg_dump has no knowledge of and will fail to get that data
and hence an inconsistent dump. The subscriber will easily be not in
sync due to that.

Now, how do we avoid these problems even if we have our own version of
functionality similar to pg_dump for selected objects? I guess we will
face similar problems. If so, we may need to deny schema sync in any
such case.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: MERGE ... WHEN NOT MATCHED BY SOURCE
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Request for comment on setting binary format output per session