Re: Support logical replication of DDLs

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Support logical replication of DDLs
Дата
Msg-id CAD21AoBVCoPPRKvU_5-=wEXsa92GsNJFJOcYyXzvoSEJCx5dKw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Support logical replication of DDLs  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Support logical replication of DDLs  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wed, Jun 15, 2022 at 1:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10@gmail.com> wrote:
> >
> >
> > While I agree that the deparser is needed to handle the potential
> > syntax differences between
> > the pub/sub, I think it's only relevant for the use cases where only a
> > subset of tables in the database
> > are replicated. For other use cases where all tables, functions and
> > other objects need to be replicated,
> > (for example, creating a logical replica for major version upgrade)
> > there won't be any syntax difference to
> > handle and the schemas are supposed to match exactly between the
> > pub/sub. In other words the user seeks to create an identical replica
> > of the source database and the DDLs should be replicated
> > as is in this case.
> >
>
> I think even for database-level replication we can't assume that
> source and target will always have the same data in which case "Create
> Table As ..", "Alter Table .. " kind of statements can't be replicated
> as it is because that can lead to different results. The other point
> is tomorrow we can extend the database level option/syntax to exclude
> a few objects (something like [1]) as well in which case we again need
> to filter at the publisher level.

Good point.

Regarding the idea of using the parse-tree representation produced by
nodeToString(), I’ve not read the patch yet but I'm not sure it's a
good idea. A field name of a node could be changed in a major version.
If a publisher sends a parse-tree string representation to a newer
major version subscriber, the subscriber needs to be able to parse the
old format parse-tree string representation in order to reconstruct
the DDL, which reduces the maintainability much. On the other hand,
the format of deparsed json string would not be changed often.

>
> >
>  So I think it's an overkill to use deparser for
> > such use cases. It also costs more space and
> > time using deparsing. For example, the following simple ALTER TABLE
> > command incurs 11 times more space
> > in the WAL record if we were to use the format from the deparser,
> > there will also be time and CPU overhead from the deparser.
> >
> ...
> >
> > So I think it's better to define DDL replication levels [1] to tailor
> > for the two different use cases. We can use different logging format
> > based on the DDL replication level. For example,
> > we can simply log the DDL query string and the search_path for
> > database level DDL replication. But for table level DDL replication we
> > need to use the deparser format in order to
> > handle the potential syntax differences and schema mapping requests.
> >
>
> I think having different logging formats is worth considering but I am
> not sure we can distinguish it for database and table level
> replication because of the reasons mentioned above. One thing which
> may need a different format is the replication of global objects like
> roles, tablespace, etc. but we haven't analyzed them in detail about
> those. I feel we may also need a different syntax altogether to
> replicate such objects. Also, I think we may want to optimize the
> current format in some cases so that the WAL amount could be reduced.
>
> I feel if we think that deparsing is required for this project then
> probably at this stage it would be a good idea to explore ways to have
> independent ways to test it. One way is to do testing via the logical
> replication of DDL (aka via this patch) and the other is to write an
> independent test suite as Sawada-San seems to be speculating above
> [2]. I am not sure if there is any progress yet on the independent
> test suite front yet.

I've attached a WIP patch for adding regression tests for DDL deparse.
The patch can be applied on
v9-0001-Functions-to-deparse-DDL-commands.patch Hou recently
submitted[1]. The basic idea is to define the event trigger to deparse
DDLs, run the regression tests, load the deparsed DDLs to another
database cluster, dump both databases, and compare the dumps. Since
the patch doesn't support deparsing all DDLs and there is a bug[2],
the attached regression test does CREATE TABLE and some ALTER TABLE
instead of running regression tests.

Regards,

[1]
https://www.postgresql.org/message-id/OS0PR01MB5716B1526C2EDA66907E733B94B39%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] deparsing "ALTER INDEX tbl_idx ALTER COLUMN 2 SET STATISTICS
1000;" causes an assertion failure.

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: Replica Identity check of partition table on subscriber
Следующее
От: RKN Sai Krishna
Дата:
Сообщение: pg_page_repair: a tool/extension to repair corrupted pages in postgres with streaming/physical replication