Обсуждение: Report replica identity in pg_publication_tables

Поиск
Список
Период
Сортировка

Report replica identity in pg_publication_tables

От
Ashutosh Bapat
Дата:
Hi All,

The commit message in the patch says it all, but let me repeat it here.

When debugging issues with logical replication, replica identity
property of tables in publication is often useful, for example, to
determine the amount of data logged for an UPDATE or DELETE on a given
table.

Given a set of publications that a WAL sender is using,
pg_publication_tables can be used to get the list of tables whose
changes will be replicated including the columns of those tables and
row
filters. But the replica identity of those tables needs to be
separately found out by querying pg_class or joining pg_class with
pg_publication_tables on schemaname and relname. Adding the replica
identity column to pg_publication_tables avoids this extra step.

The replica identity of a given table is not a property of
publication, per say, so it's arguable whether it should be included
in pg_publication_tables or not. But the output seems to be useful.
E.g. from the tests

SELECT * FROM pg_publication_tables WHERE pubname = 'testpub6';
 pubname  | schemaname |      tablename      | attnames | rowfilter |
replica_identity
----------+------------+---------------------+----------+-----------+------------------
 testpub6 | public     | rf_tbl_abcd_part_pk | {a,b}    | (b > 99)  | default
(1 row)

This line gives all the information related to logical replication of
table rf_tbl_abcd_part_pk together.

-- 
Best Wishes,
Ashutosh Bapat



Re: Report replica identity in pg_publication_tables

От
Amit Kapila
Дата:
On Mon, Jun 30, 2025 at 3:44 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Hi All,
>
> The commit message in the patch says it all, but let me repeat it here.
>

You forgot to attach the patch.

> When debugging issues with logical replication, replica identity
> property of tables in publication is often useful, for example, to
> determine the amount of data logged for an UPDATE or DELETE on a given
> table.
>

I think it can help to determine what is logged for the DELETE or
UPDATE operation, but not the exact amount of data. Can you please
explain with an example how such information can help with debugging?

> Given a set of publications that a WAL sender is using,
> pg_publication_tables can be used to get the list of tables whose
> changes will be replicated including the columns of those tables and
> row
> filters. But the replica identity of those tables needs to be
> separately found out by querying pg_class or joining pg_class with
> pg_publication_tables on schemaname and relname. Adding the replica
> identity column to pg_publication_tables avoids this extra step.
>
> The replica identity of a given table is not a property of
> publication, per say, so it's arguable whether it should be included
> in pg_publication_tables or not.
>

Right, discussing the use case a bit more might help us to find if
this is the right place to add 'replica identity' information.

--
With Regards,
Amit Kapila.



Re: Report replica identity in pg_publication_tables

От
Melih Mutlu
Дата:
Hi Ashutosh,

Thanks for the patch.

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, 30 Haz 2025 Pzt, 13:14 tarihinde şunu yazdı:
The replica identity of a given table is not a property of
publication, per say, so it's arguable whether it should be included
in pg_publication_tables or not. But the output seems to be useful.
E.g. from the tests

To my understanding, pg_publication_tables doesn’t just contain publication properties, but also information about the tables included in the publication. As long as the exposed information about the published tables is relevant in the context of publications in logical replication, I think it’s reasonable to include it in pg_publication_tables. IMHO replica identity seems like relevant information.


Regards,--

Melih Mutlu

RE: Report replica identity in pg_publication_tables

От
"Zhijie Hou (Fujitsu)"
Дата:
On Mon, Jun 30, 2025 at 8:09 PM Ashutosh Bapat wrote:

> 
> On Mon, Jun 30, 2025 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Mon, Jun 30, 2025 at 3:44 PM Ashutosh Bapat 
> > <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > > Given a set of publications that a WAL sender is using, 
> > > pg_publication_tables can be used to get the list of tables whose 
> > > changes will be replicated including the columns of those tables 
> > > and row filters. But the replica identity of those tables needs to 
> > > be separately found out by querying pg_class or joining pg_class 
> > > with pg_publication_tables on schemaname and relname. Adding the 
> > > replica identity column to pg_publication_tables avoids this extra step.
> > >
> > > The replica identity of a given table is not a property of 
> > > publication, per say, so it's arguable whether it should be 
> > > included in pg_publication_tables or not.
> > >
> >
> > Right, discussing the use case a bit more might help us to find if 
> > this is the right place to add 'replica identity' information.
> 
> In our case, we had multiple replication slots, each with a different publication.
> One of the slots was lagging because of the amount of data being sent 
> through the slot. For that we wanted to know which tables are being 
> published through the corresponding publication and what's the replica 
> identity of each of the tables.
> pg_publication_tables gave us the list of tables but in order to get 
> its replica identity we needed to join it with pg_class again.
> pg_publication_tables already joins pg_class. Exposing replica 
> identity through pg_publication_tables makes it more convenient to get 
> all the information related to a tables replication through that 
> publication in a single line without much code change or run time cost.

Thanks for explaining the use case.

I'm concerned about whether we can correctly display replica identity in the
view for partitioned tables.

In the case of partitioned tables, we display only the root table in the view
if publish_via_partition_root is enabled for the publication. However, the
logged information depends on the replica identity (RI) of each child
partition. Displaying only the root table's RI could be a bit confusing since
each partition might have different RI settings, making it challenging to
represent all of them in the entry for the root table.

Best Regards,
Hou zj

Re: Report replica identity in pg_publication_tables

От
Ashutosh Bapat
Дата:
On Wed, Jul 2, 2025 at 1:46 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

>
> I'm concerned about whether we can correctly display replica identity in the
> view for partitioned tables.
>
> In the case of partitioned tables, we display only the root table in the view
> if publish_via_partition_root is enabled for the publication. However, the
> logged information depends on the replica identity (RI) of each child
> partition. Displaying only the root table's RI could be a bit confusing since
> each partition might have different RI settings, making it challenging to
> represent all of them in the entry for the root table.
>

I see following possibilities:
1. If a publication has publish_via_partition_root true, then for all
the partitioned tables we show "partitioned" or "partition root" in
replica identity column with a note in documentation that it means
that the replica identity of the actual leaf partition would be used
instead of replica identity of the partitioned table. And then it will
be for the user to find out all the replica identities.
2. If a publication has publish_via_partition_root true, then for all
the partitioned tables we show a set of distinct replica identities of
the leaf partitions. That way, if all the leaf partitions have the
same replica identity setting, that will be reported. Otherwise, the
user will be able to know all the possible replica identity settings
that may be used. Again, documentation should clarify what this value
means for a partitioned table. But then somehow we need to indicate
whether a given table is partitioned or not.
3. We report "partitioned" or "partition root" followed by all the
distinct replica identities, thus letting the user know that the table
is partitioned and what replica identity settings its partitions have.
And document.

Since the targeted use of this column is to know the replica
identities of the tables published by a given publication so that they
can guess what might be WAL logged, these options seem ok along with
the clarifying documentation.

What do you think?
--
Best Wishes,
Ashutosh Bapat



Re: Report replica identity in pg_publication_tables

От
Ashutosh Bapat
Дата:
On Fri, Jul 4, 2025 at 8:16 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Jul 2, 2025 at 1:46 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
>
> >
> > I'm concerned about whether we can correctly display replica identity in the
> > view for partitioned tables.
> >
> > In the case of partitioned tables, we display only the root table in the view
> > if publish_via_partition_root is enabled for the publication. However, the
> > logged information depends on the replica identity (RI) of each child
> > partition. Displaying only the root table's RI could be a bit confusing since
> > each partition might have different RI settings, making it challenging to
> > represent all of them in the entry for the root table.
> >
>
> I see following possibilities:
> 1. If a publication has publish_via_partition_root true, then for all
> the partitioned tables we show "partitioned" or "partition root" in
> replica identity column with a note in documentation that it means
> that the replica identity of the actual leaf partition would be used
> instead of replica identity of the partitioned table. And then it will
> be for the user to find out all the replica identities.
> 2. If a publication has publish_via_partition_root true, then for all
> the partitioned tables we show a set of distinct replica identities of
> the leaf partitions. That way, if all the leaf partitions have the
> same replica identity setting, that will be reported. Otherwise, the
> user will be able to know all the possible replica identity settings
> that may be used. Again, documentation should clarify what this value
> means for a partitioned table. But then somehow we need to indicate
> whether a given table is partitioned or not.
> 3. We report "partitioned" or "partition root" followed by all the
> distinct replica identities, thus letting the user know that the table
> is partitioned and what replica identity settings its partitions have.
> And document.
>
> Since the targeted use of this column is to know the replica
> identities of the tables published by a given publication so that they
> can guess what might be WAL logged, these options seem ok along with
> the clarifying documentation.
>
> What do you think?

I researched this a bit more and then I found some inconsistency
between the code and the documentation

At https://www.postgresql.org/docs/current/sql-createpublication.html,
we mention that changes to a partition table will use identity of a
partititioned table when publish_via_partition_root is true.

--- quote documentation
This parameter determines whether changes in a partitioned table (or
on its partitions) contained in the publication will be published
using the identity and schema of the partitioned table rather than
that of the individual partitions that are actually changed; the
latter is the default. Enabling this allows the changes to be
replicated into a non-partitioned table or a partitioned table
consisting of a different set of partitions.

There can be a case where a subscription combines multiple
publications. If a partitioned table is published by any subscribed
publications which set publish_via_partition_root = true, changes on
this partitioned table (or on its partitions) will be published using
the identity and schema of this partitioned table rather than that of
the individual partitions.
--- unquote documentation

Which identity are we talking about here? If that's replica identity,
the documentation is wrong since code says otherwise
In pub_rf_contains_invalid_column()

/*
* For a partition, if pubviaroot is true, find the topmost ancestor that
* is published via this publication as we need to use its row filter
* expression to filter the partition's changes.
*
* Note that even though the row filter used is for an ancestor, the
* REPLICA IDENTITY used will be for the actual child table.
*/

Let me revise possible solutions so that this view can report replica
identity even in case of publish_via_partition_root

1. If a publication has publish_via_partition_root true, then for all
the partitioned tables we show replica identity of the root partition
suffixed by (mixed). E.g. full (mixed). Add a note in documentation
explaining the meaning of suffix "mixed".

2. If a publication has publish_via_partition_root true, then for all
the partitioned tables we show a set of distinct replica identities of
its partitions and also the replica identity of the root. If all the
partitions have the same replica identity setting, only one replica
identity type will be reported. Otherwise, the user will be able to
know all the possible replica identity settings that may be used. Add
a note to the documentation explaining the meaning of this value for a
partitioned table.

3. Let the view report all the partitions as well as the partition
root. Against each relation, report its replica identity. Expand the
pg_publication_tables view to also report the relation via which the
changes are published (under column name "publish_via"). When
publish_via_partition_root is true, publish_via reports the top
partitioned relation. I think that gives full information about tables
being published through the publication.

What do you think?

--
Best Wishes,
Ashutosh Bapat