Обсуждение: Report replica identity in pg_publication_tables
Hi All,
The commit message in the patch says it all, but let me repeat it here.
When debugging issues with logical replication, replica identity
property of tables in publication is often useful, for example, to
determine the amount of data logged for an UPDATE or DELETE on a given
table.
Given a set of publications that a WAL sender is using,
pg_publication_tables can be used to get the list of tables whose
changes will be replicated including the columns of those tables and
row
filters. But the replica identity of those tables needs to be
separately found out by querying pg_class or joining pg_class with
pg_publication_tables on schemaname and relname. Adding the replica
identity column to pg_publication_tables avoids this extra step.
The replica identity of a given table is not a property of
publication, per say, so it's arguable whether it should be included
in pg_publication_tables or not. But the output seems to be useful.
E.g. from the tests
SELECT * FROM pg_publication_tables WHERE pubname = 'testpub6';
pubname | schemaname | tablename | attnames | rowfilter |
replica_identity
----------+------------+---------------------+----------+-----------+------------------
testpub6 | public | rf_tbl_abcd_part_pk | {a,b} | (b > 99) | default
(1 row)
This line gives all the information related to logical replication of
table rf_tbl_abcd_part_pk together.
--
Best Wishes,
Ashutosh Bapat
On Mon, Jun 30, 2025 at 3:44 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Hi All, > > The commit message in the patch says it all, but let me repeat it here. > You forgot to attach the patch. > When debugging issues with logical replication, replica identity > property of tables in publication is often useful, for example, to > determine the amount of data logged for an UPDATE or DELETE on a given > table. > I think it can help to determine what is logged for the DELETE or UPDATE operation, but not the exact amount of data. Can you please explain with an example how such information can help with debugging? > Given a set of publications that a WAL sender is using, > pg_publication_tables can be used to get the list of tables whose > changes will be replicated including the columns of those tables and > row > filters. But the replica identity of those tables needs to be > separately found out by querying pg_class or joining pg_class with > pg_publication_tables on schemaname and relname. Adding the replica > identity column to pg_publication_tables avoids this extra step. > > The replica identity of a given table is not a property of > publication, per say, so it's arguable whether it should be included > in pg_publication_tables or not. > Right, discussing the use case a bit more might help us to find if this is the right place to add 'replica identity' information. -- With Regards, Amit Kapila.
Hi Ashutosh,
Thanks for the patch.
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, 30 Haz 2025 Pzt, 13:14 tarihinde şunu yazdı:
The replica identity of a given table is not a property of
publication, per say, so it's arguable whether it should be included
in pg_publication_tables or not. But the output seems to be useful.
E.g. from the tests
To my understanding, pg_publication_tables doesn’t just contain publication properties, but also information about the tables included in the publication. As long as the exposed information about the published tables is relevant in the context of publications in logical replication, I think it’s reasonable to include it in pg_publication_tables. IMHO replica identity seems like relevant information.
Regards,--
Melih Mutlu
On Mon, Jun 30, 2025 at 8:09 PM Ashutosh Bapat wrote: > > On Mon, Jun 30, 2025 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > On Mon, Jun 30, 2025 at 3:44 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > Given a set of publications that a WAL sender is using, > > > pg_publication_tables can be used to get the list of tables whose > > > changes will be replicated including the columns of those tables > > > and row filters. But the replica identity of those tables needs to > > > be separately found out by querying pg_class or joining pg_class > > > with pg_publication_tables on schemaname and relname. Adding the > > > replica identity column to pg_publication_tables avoids this extra step. > > > > > > The replica identity of a given table is not a property of > > > publication, per say, so it's arguable whether it should be > > > included in pg_publication_tables or not. > > > > > > > Right, discussing the use case a bit more might help us to find if > > this is the right place to add 'replica identity' information. > > In our case, we had multiple replication slots, each with a different publication. > One of the slots was lagging because of the amount of data being sent > through the slot. For that we wanted to know which tables are being > published through the corresponding publication and what's the replica > identity of each of the tables. > pg_publication_tables gave us the list of tables but in order to get > its replica identity we needed to join it with pg_class again. > pg_publication_tables already joins pg_class. Exposing replica > identity through pg_publication_tables makes it more convenient to get > all the information related to a tables replication through that > publication in a single line without much code change or run time cost. Thanks for explaining the use case. I'm concerned about whether we can correctly display replica identity in the view for partitioned tables. In the case of partitioned tables, we display only the root table in the view if publish_via_partition_root is enabled for the publication. However, the logged information depends on the replica identity (RI) of each child partition. Displaying only the root table's RI could be a bit confusing since each partition might have different RI settings, making it challenging to represent all of them in the entry for the root table. Best Regards, Hou zj
On Wed, Jul 2, 2025 at 1:46 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > I'm concerned about whether we can correctly display replica identity in the > view for partitioned tables. > > In the case of partitioned tables, we display only the root table in the view > if publish_via_partition_root is enabled for the publication. However, the > logged information depends on the replica identity (RI) of each child > partition. Displaying only the root table's RI could be a bit confusing since > each partition might have different RI settings, making it challenging to > represent all of them in the entry for the root table. > I see following possibilities: 1. If a publication has publish_via_partition_root true, then for all the partitioned tables we show "partitioned" or "partition root" in replica identity column with a note in documentation that it means that the replica identity of the actual leaf partition would be used instead of replica identity of the partitioned table. And then it will be for the user to find out all the replica identities. 2. If a publication has publish_via_partition_root true, then for all the partitioned tables we show a set of distinct replica identities of the leaf partitions. That way, if all the leaf partitions have the same replica identity setting, that will be reported. Otherwise, the user will be able to know all the possible replica identity settings that may be used. Again, documentation should clarify what this value means for a partitioned table. But then somehow we need to indicate whether a given table is partitioned or not. 3. We report "partitioned" or "partition root" followed by all the distinct replica identities, thus letting the user know that the table is partitioned and what replica identity settings its partitions have. And document. Since the targeted use of this column is to know the replica identities of the tables published by a given publication so that they can guess what might be WAL logged, these options seem ok along with the clarifying documentation. What do you think? -- Best Wishes, Ashutosh Bapat
On Fri, Jul 4, 2025 at 8:16 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Wed, Jul 2, 2025 at 1:46 PM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > > I'm concerned about whether we can correctly display replica identity in the > > view for partitioned tables. > > > > In the case of partitioned tables, we display only the root table in the view > > if publish_via_partition_root is enabled for the publication. However, the > > logged information depends on the replica identity (RI) of each child > > partition. Displaying only the root table's RI could be a bit confusing since > > each partition might have different RI settings, making it challenging to > > represent all of them in the entry for the root table. > > > > I see following possibilities: > 1. If a publication has publish_via_partition_root true, then for all > the partitioned tables we show "partitioned" or "partition root" in > replica identity column with a note in documentation that it means > that the replica identity of the actual leaf partition would be used > instead of replica identity of the partitioned table. And then it will > be for the user to find out all the replica identities. > 2. If a publication has publish_via_partition_root true, then for all > the partitioned tables we show a set of distinct replica identities of > the leaf partitions. That way, if all the leaf partitions have the > same replica identity setting, that will be reported. Otherwise, the > user will be able to know all the possible replica identity settings > that may be used. Again, documentation should clarify what this value > means for a partitioned table. But then somehow we need to indicate > whether a given table is partitioned or not. > 3. We report "partitioned" or "partition root" followed by all the > distinct replica identities, thus letting the user know that the table > is partitioned and what replica identity settings its partitions have. > And document. > > Since the targeted use of this column is to know the replica > identities of the tables published by a given publication so that they > can guess what might be WAL logged, these options seem ok along with > the clarifying documentation. > > What do you think? I researched this a bit more and then I found some inconsistency between the code and the documentation At https://www.postgresql.org/docs/current/sql-createpublication.html, we mention that changes to a partition table will use identity of a partititioned table when publish_via_partition_root is true. --- quote documentation This parameter determines whether changes in a partitioned table (or on its partitions) contained in the publication will be published using the identity and schema of the partitioned table rather than that of the individual partitions that are actually changed; the latter is the default. Enabling this allows the changes to be replicated into a non-partitioned table or a partitioned table consisting of a different set of partitions. There can be a case where a subscription combines multiple publications. If a partitioned table is published by any subscribed publications which set publish_via_partition_root = true, changes on this partitioned table (or on its partitions) will be published using the identity and schema of this partitioned table rather than that of the individual partitions. --- unquote documentation Which identity are we talking about here? If that's replica identity, the documentation is wrong since code says otherwise In pub_rf_contains_invalid_column() /* * For a partition, if pubviaroot is true, find the topmost ancestor that * is published via this publication as we need to use its row filter * expression to filter the partition's changes. * * Note that even though the row filter used is for an ancestor, the * REPLICA IDENTITY used will be for the actual child table. */ Let me revise possible solutions so that this view can report replica identity even in case of publish_via_partition_root 1. If a publication has publish_via_partition_root true, then for all the partitioned tables we show replica identity of the root partition suffixed by (mixed). E.g. full (mixed). Add a note in documentation explaining the meaning of suffix "mixed". 2. If a publication has publish_via_partition_root true, then for all the partitioned tables we show a set of distinct replica identities of its partitions and also the replica identity of the root. If all the partitions have the same replica identity setting, only one replica identity type will be reported. Otherwise, the user will be able to know all the possible replica identity settings that may be used. Add a note to the documentation explaining the meaning of this value for a partitioned table. 3. Let the view report all the partitions as well as the partition root. Against each relation, report its replica identity. Expand the pg_publication_tables view to also report the relation via which the changes are published (under column name "publish_via"). When publish_via_partition_root is true, publish_via reports the top partitioned relation. I think that gives full information about tables being published through the publication. What do you think? -- Best Wishes, Ashutosh Bapat