Re: Column Filtering in Logical Replication

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Column Filtering in Logical Replication
Дата
Msg-id CAA4eK1+1DMkCip9SB3B0_u0Q6fGf-D3vgqQodkLfur0qkL482g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Column Filtering in Logical Replication  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы RE: Column Filtering in Logical Replication  ("tanghy.fnst@fujitsu.com" <tanghy.fnst@fujitsu.com>)
Re: Column Filtering in Logical Replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
On Wed, Jan 12, 2022 at 2:40 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> Is there any coordination between the "column filter" patch and the "row
> filter" patch ?  Are they both on track for PG15 ?  Has anybody run them
> together ?
>

The few things where I think we might need to define some common
behavior are as follows:

1. Replica Identity handling: Currently the column filter patch gives
an error during create/alter subscription if the specified column list
is invalid (Replica Identity columns are missing). It also gives an
error if the user tries to change the replica identity. However, it
doesn't deal with cases where the user drops and adds a different
primary key that has a different set of columns which can lead to
failure during apply on the subscriber.

I think another issue w.r.t column filter patch is that even while
creating publication (even for 'insert' publications) it should check
that all primary key columns must be part of published columns,
otherwise, it can fail while applying on subscriber as it will try to
insert NULL for the primary key column.

2. Handling of partitioned tables vs. Replica Identity (RI): When
adding a partitioned table with a column list to the publication (with
publish_via_partition_root = false), we should check the Replica
Identity of all its leaf partition as the RI on the partition is the
one actually takes effect when publishing DML changes. We need to
check RI while attaching the partition as well, as the newly added
partitions will automatically become part of publication if the
partitioned table is part of the publication. If we don't do this the
later deletes/updates can fail.

All these cases are dealt with in row filter patch because of the
on-the-fly check which means we check the validation of columns in row
filters while actual operation update/delete via
CheckCmdReplicaIdentity and cache the result of same for future use.
This is inline with existing checks of RI vs. operations on tables.
The primary reason for this was we didn't want to handle validation of
row filters at so many places.

3. Tablesync.c handling: Ideally, it would be good if we have a single
query to fetch both row filters and column filters but even if that is
not possible in the first version, the behavior should be same for
both queries w.r.t partitioned tables, For ALL Tables and For All
Tables In Schema cases.

Currently, the column filter patch doesn't seem to respect For ALL
Tables and For All Tables In Schema cases, basically, it just copies
the columns it finds through some of the publications even if one of
the publications is defined as For All Tables. The row filter patch
ignores the row filters if one of the publications is defined as For
ALL Tables and For All Tables In Schema.

For row filter patch, if the publication contains a partitioned table,
the publication parameter publish_via_partition_root determines if it
uses the partition row filter (if the parameter is false, the default)
or the root partitioned table row filter and this is taken care of
even during the initial tablesync.

For column filter patch, if the publication contains a partitioned
table, it seems that it finds all columns that the tables in its
partition tree specified in the publications, whether
publish_via_partition_root is true or false.

We have done some testing w.r.t above cases with both patches and my
colleague will share the results.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Maxim Orlov
Дата:
Сообщение: Re: O(n) tasks cause lengthy startups and checkpoints
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Logical replication timeout problem