RE: Support logical replication of DDLs

Поиск
Список
Период
Сортировка
От Phil Florent
Тема RE: Support logical replication of DDLs
Дата
Msg-id PA4P191MB160007A5477AC53CBE1CAF74BA929@PA4P191MB1600.EURP191.PROD.OUTLOOK.COM
обсуждение исходный текст
Ответ на Re: Support logical replication of DDLs  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hi,
Sorry about the list. Since it was a question about the specifications I thought I had to ask it first in the general list. I will reply in the hackers list only for new features.

Replicating from orcl to postgres was difficult. You mentionned renaming of columns, the ordinal position of a column is reused with a drop/add column in orcl and you can wrongly think it is a renaming from an external point of view. Only "advantage" with orcl is that you can drop/add columns thousands of times if you want, not with postgres.
 From PostgreSQL to PostgreSQL it's now easier of course but difficulty is that we have to separate DDL things. The "+" things have to be executed first on the replicated db (new tables, new columns, enlargement of columns). The "-" things have to be executed first on the source db (dropped tables, dropped columns, downsize of columns). DSS and OLTP teams are different, OLTP teams cannot or don't want to deal with DSS concerns etc.  If replication is delayed it's not so trivial anway to know when you can drop a table on the replicated db for example. DSS team has in fact to build a system that detects a posteriori why the subscription is KO if something goes wrong. It can also be a human mistake, e.g a "create table very_important_table_to_save as select * from very_important_table;" and the replication is KO if the _save table is created in the published schema.
I had read too fast. I read the proposals and Vignesh suggestion & syntax seem very promising. If I understand well an existing "for all tables" / "tables in schema" DML publication would have be to altered with
ALTER PUBLICATION simple_applicative_schema_replication_that_wont_be_interrupted_for_an_rdbms_reason WITH (ddl = 'table:create,alter'); to get rid of the majority of possible interruptions.

> Additionally, there could be some additional problems to deal with
> like say if the column list has been specified then we ideally
> shouldn't send those columns even in DDL. For example, if one uses
> Alter Table .. Rename Column and the new column name is not present in
> the published column list then we shouldn't send it.
Perhaps I miss something but the names are not relevant here. The column is part of the publication and the corresponding DDL has to be sent, the column is not part of the publication and the DDL should not be sent. Dependencies are not based on names, it currently works like that with DML publication but also with views for example. 
Quick test :

bas=# \dRp+

                                         Publication test_phil

Propriétaire | Toutes les tables | Insertions | Mises à jour | Suppressions | Tronque | Via la racine

--------------+-------------------+------------+--------------+--------------+---------+---------------

postgres     | f                 | t          | t            | t            | t       | f

Tables :

    "test_phil.t1" (c1, c2)

 

bas=# alter table test_phil.t1 rename column c2 to c4;

ALTER TABLE

 

bas=# \dRp+

                                         Publication test_phil

Propriétaire | Toutes les tables | Insertions | Mises à jour | Suppressions | Tronque | Via la racine

--------------+-------------------+------------+--------------+--------------+---------+---------------

postgres     | f                 | t          | t            | t            | t       | f

Tables :

    "test_phil.t1" (c1, c4)


"rename column" DDL has to be sent and the new name is not relevant in the decision to send it. If "rename column" DDL had concerned a column that is not part of the publication you wouldn't have to send the DDL, no matter the new name. Drop is not a problem. You cannot drop an existing column that is part of a publication without a "cascade". What could be problematic is a "add column" DDL and after that the column is added to the publication via "alter publication set". Such a case is difficult to deal with I guess. But the initial DDL to create a table is also not sent anyway right ?  It could be a known limitation. 


I usually only test things in beta to report but I will try to have a look earlier at this patch since it is very interesting. That and the TDE thing but TDE is an external obligation and not a real interest. I obtained a delay but hopefully we will have this encryption thing or perhaps we will be obliged to go back to the proprietary RDBMS for some needs even if the feature is in fact mostly useless...

Best regards,
Phil

De : Amit Kapila <amit.kapila16@gmail.com>
Envoyé : lundi 3 avril 2023 06:07
À : Phil Florent <philflorent@hotmail.com>
Cc : vignesh C <vignesh21@gmail.com>; houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>; Ajin Cherian <itsajin@gmail.com>; wangw.fnst@fujitsu.com <wangw.fnst@fujitsu.com>; Runqi Tian <runqidev@gmail.com>; Peter Smith <smithpb2250@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; li jie <ggysxcq@gmail.com>; Dilip Kumar <dilipbalaut@gmail.com>; Alvaro Herrera <alvherre@alvh.no-ip.org>; Masahiko Sawada <sawada.mshk@gmail.com>; Japin Li <japinli@hotmail.com>; rajesh singarapu <rajesh.rs0541@gmail.com>; Zheng Li <zhengli10@gmail.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Objet : Re: Support logical replication of DDLs
 
On Sun, Apr 2, 2023 at 3:25 PM Phil Florent <philflorent@hotmail.com> wrote:
>
> As an end-user, I am highly interested in the patch https://commitfest.postgresql.org/42/3595/ but I don't fully get its main goal in its first version.
> It's "for all tables"  that will be implemented ?
> If one needs a complete replication of a cluster, a hot standby will always be more efficient than a publication right ? I use both for different needs in public hospitals I work for (hot standby for disaster recovery & logical replication for dss)
> The main interest of a publication is to be able to filter things on the publisher and to add stuff on the replicated cluster.
> If you compare PostgreSQL with less avanced RDBMS that don't really implement schemas (typically Oracle), the huge advantage of Postgre is that many things (e.g security) can be dynamically implemented via schemas.
> Developers just have put a table in the "good" schema and that's all. Logical DML replication now fully implements this logic since PostgreSQL 15. Only remaining problem is that a "for tables in schema" publication has to be owned by a superuser (because a normal user can have tables that don't belong to him in a schema it owns ?) If DDL replication only works with "FOR ALL TABLES " and not "FOR TABLES IN SCHEMA" it reduces its interest anyway.
>

I don't see any major issue with supporting it for both "FOR ALL
TABLES" and "FOR ALL TABLES IN SCHEMA". However, if we want to support
it with the "FOR TABLE .." variant then we will probably need to apply
some restrictions as we can only support 'alter' and 'drop'.
Additionally, there could be some additional problems to deal with
like say if the column list has been specified then we ideally
shouldn't send those columns even in DDL. For example, if one uses
Alter Table .. Rename Column and the new column name is not present in
the published column list then we shouldn't send it.

BTW, we have some proposals related to the specification of this
feature in emails [1][2][3]. See, if you have any suggestions on the
same?

Note- It seems you have copied this thread to pgsql-general. Is it
because you are not subscribed to pgsql-hackers? As this is a
development project so better to keep the discussion on pgsql-hackers.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2B%2BY7a2SQq55DXT6neghZgj3j%2BpQ74%3D8zfT3k8Tkdj0ZA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAA4eK1KZqvJsTt7OkS8AkxOKVvSpkQkPwsqzNmo10mFaVAKeZg%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/OS0PR01MB571646874A3E165D93999A9D94889%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: Add missing copyright for pg_upgrade/t/* files