Обсуждение: Proposal: Conflict log history table for Logical Replication

Поиск
Список
Период
Сортировка

Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
Currently we log conflicts to the server's log file and updates, this
approach has limitations, 1) Difficult to query and analyze, parsing
plain text log files for conflict details is inefficient. 2) Lack of
structured data, key conflict attributes (table, operation, old/new
data, LSN, etc.) are not readily available in a structured, queryable
format. 3) Difficult for external monitoring tools or custom
resolution scripts to consume conflict data directly.

This proposal aims to address these limitations by introducing a
conflict log history table, providing a structured, and queryable
record of all logical replication conflicts.  This should be a
configurable option whether to log into the conflict log history
table, server logs or both.

This proposal has two main design questions:
===================================

1. How do we store conflicting tuples from different tables?
Using a JSON column to store the row data seems like the most flexible
solution, as it can accommodate different table schemas.

2. Should this be a system table or a user table?
a) System Table: Storing this in a system catalog is simple, but
catalogs aren't designed for ever-growing data. While pg_large_object
is an exception, this is not what we generally do IMHO.
b) User Table: This offers more flexibility. We could allow a user to
specify the table name during CREATE SUBSCRIPTION.  Then we choose to
either create the table internally or let the user create the table
with a predefined schema.

A potential drawback is that a user might drop or alter the table.
However, we could mitigate this risk by simply logging a WARNING if
the table is configured but an insertion fails.
I am currently working on a POC patch for the same, but will post that
once we have some thoughts on design choices.

Schema for the conflict log history table may look like this, although
there is a room for discussion on this.

Note:  I think these fields are self explanatory so I haven't
explained them here.

conflict_log_table (
    logid  SERIAL PRIMARY KEY,
    subid                OID,
    schema_id          OID,
    table_id            OID,
    conflict_type        TEXT NOT NULL,
    operation_type       TEXT NOT NULL,
    replication_origin   TEXT,
    remote_commit_ts TIMESTAMPTZ,
    local_commit_ts TIMESTAMPTZ,
    ri_key                    JSON,
    remote_tuple         JSON,
    local_tuple          JSON,
);

Credit:  Thanks to Amit Kapila for discussing this offlist and
providing some valuable suggestions.

-- 
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Currently we log conflicts to the server's log file and updates, this
> approach has limitations, 1) Difficult to query and analyze, parsing
> plain text log files for conflict details is inefficient. 2) Lack of
> structured data, key conflict attributes (table, operation, old/new
> data, LSN, etc.) are not readily available in a structured, queryable
> format. 3) Difficult for external monitoring tools or custom
> resolution scripts to consume conflict data directly.
>
> This proposal aims to address these limitations by introducing a
> conflict log history table, providing a structured, and queryable
> record of all logical replication conflicts.  This should be a
> configurable option whether to log into the conflict log history
> table, server logs or both.
>

+1 for the idea.

> This proposal has two main design questions:
> ===================================
>
> 1. How do we store conflicting tuples from different tables?
> Using a JSON column to store the row data seems like the most flexible
> solution, as it can accommodate different table schemas.

Yes, that is one option. I have not looked into details myself, but
you can also explore 'anyarray' used in pg_statistics to store 'Column
data values of the appropriate kind'.

> 2. Should this be a system table or a user table?
> a) System Table: Storing this in a system catalog is simple, but
> catalogs aren't designed for ever-growing data. While pg_large_object
> is an exception, this is not what we generally do IMHO.
> b) User Table: This offers more flexibility. We could allow a user to
> specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> either create the table internally or let the user create the table
> with a predefined schema.
>
> A potential drawback is that a user might drop or alter the table.
> However, we could mitigate this risk by simply logging a WARNING if
> the table is configured but an insertion fails.

I believe it makes more sense for this to be a catalog table rather
than a user table. I wanted to check if we already have a large
catalog table of this kind, and I think pg_statistic could be an
example of a sizable catalog table. To get a rough idea of how size
scales with data, I ran a quick experiment: I created 1000 tables,
each with 2 JSON columns, 1 text column, and 2 integer columns. Then,
I inserted 1000 rows into each table and ran ANALYZE to collect
statistics. Here’s what I observed on a fresh database before and
after:

Before:
pg_statistic row count: 412
Table size: ~256 kB

After:
pg_statistic row count: 6,412
Table size: ~5.3 MB

Although it isn’t an exact comparison, this gives us some insight into
how the statistics catalog table size grows with the number of rows.
It doesn’t seem excessively large with 6k rows, given the fact that
pg_statistic itself is a complex table having many 'anyarray'-type
columns.

That said, irrespective of what we decide, it would be ideal to offer
users an option for automatic purging, perhaps via a retention period
parameter like conflict_stats_retention_period (say default to 30
days), or a manual purge API such as purge_conflict_stats('older than
date'). I wasn’t able to find any such purge mechanism for PostgreSQL
stats tables, but Oracle does provide such purging options for some of
their statistics tables (not related to conflicts), see [1], [2].
And to manage it better, it could be range partitioned on timestamp.


> I am currently working on a POC patch for the same, but will post that
> once we have some thoughts on design choices.
>
> Schema for the conflict log history table may look like this, although
> there is a room for discussion on this.
>
> Note:  I think these fields are self explanatory so I haven't
> explained them here.
>
> conflict_log_table (
>     logid  SERIAL PRIMARY KEY,
>     subid                OID,
>     schema_id          OID,
>     table_id            OID,
>     conflict_type        TEXT NOT NULL,
>     operation_type       TEXT NOT NULL,

I feel operation_type is not needed when we already have
conflict_type. The name of 'conflict_type' is enough to give us info
on operation-type.

>     replication_origin   TEXT,
>     remote_commit_ts TIMESTAMPTZ,
>     local_commit_ts TIMESTAMPTZ,
>     ri_key                    JSON,
>     remote_tuple         JSON,
>     local_tuple          JSON,
> );
>
> Credit:  Thanks to Amit Kapila for discussing this offlist and
> providing some valuable suggestions.
>

[1]

https://docs.oracle.com/en/database/oracle/oracle-database/21/arpls/DBMS_STATS.html#GUID-8E6413D5-F827-4F57-9FAD-7EC56362A98C

[2]

https://docs.oracle.com/en/database/oracle/oracle-database/21/arpls/DBMS_STATS.html#GUID-A04AE1C0-5DE1-4AFC-91F8-D35D41DF98A2

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Thu, Aug 7, 2025 at 12:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Currently we log conflicts to the server's log file and updates, this
> > approach has limitations, 1) Difficult to query and analyze, parsing
> > plain text log files for conflict details is inefficient. 2) Lack of
> > structured data, key conflict attributes (table, operation, old/new
> > data, LSN, etc.) are not readily available in a structured, queryable
> > format. 3) Difficult for external monitoring tools or custom
> > resolution scripts to consume conflict data directly.
> >
> > This proposal aims to address these limitations by introducing a
> > conflict log history table, providing a structured, and queryable
> > record of all logical replication conflicts.  This should be a
> > configurable option whether to log into the conflict log history
> > table, server logs or both.
> >
>
> +1 for the idea.
>
> > This proposal has two main design questions:
> > ===================================
> >
> > 1. How do we store conflicting tuples from different tables?
> > Using a JSON column to store the row data seems like the most flexible
> > solution, as it can accommodate different table schemas.
>
> Yes, that is one option. I have not looked into details myself, but
> you can also explore 'anyarray' used in pg_statistics to store 'Column
> data values of the appropriate kind'.
>
> > 2. Should this be a system table or a user table?
> > a) System Table: Storing this in a system catalog is simple, but
> > catalogs aren't designed for ever-growing data. While pg_large_object
> > is an exception, this is not what we generally do IMHO.
> > b) User Table: This offers more flexibility. We could allow a user to
> > specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> > either create the table internally or let the user create the table
> > with a predefined schema.
> >
> > A potential drawback is that a user might drop or alter the table.
> > However, we could mitigate this risk by simply logging a WARNING if
> > the table is configured but an insertion fails.
>
> I believe it makes more sense for this to be a catalog table rather
> than a user table. I wanted to check if we already have a large
> catalog table of this kind, and I think pg_statistic could be an
> example of a sizable catalog table. To get a rough idea of how size
> scales with data, I ran a quick experiment: I created 1000 tables,
> each with 2 JSON columns, 1 text column, and 2 integer columns. Then,
> I inserted 1000 rows into each table and ran ANALYZE to collect
> statistics. Here’s what I observed on a fresh database before and
> after:
>
> Before:
> pg_statistic row count: 412
> Table size: ~256 kB
>
> After:
> pg_statistic row count: 6,412
> Table size: ~5.3 MB
>
> Although it isn’t an exact comparison, this gives us some insight into
> how the statistics catalog table size grows with the number of rows.
> It doesn’t seem excessively large with 6k rows, given the fact that
> pg_statistic itself is a complex table having many 'anyarray'-type
> columns.
>
> That said, irrespective of what we decide, it would be ideal to offer
> users an option for automatic purging, perhaps via a retention period
> parameter like conflict_stats_retention_period (say default to 30
> days), or a manual purge API such as purge_conflict_stats('older than
> date'). I wasn’t able to find any such purge mechanism for PostgreSQL
> stats tables, but Oracle does provide such purging options for some of
> their statistics tables (not related to conflicts), see [1], [2].
> And to manage it better, it could be range partitioned on timestamp.
>

It seems BDR also has one such conflict-log table which is a catalog
table and is also partitioned on time. It has a default retention
period of 30 days. See 'bdr.conflict_history' mentioned under
'catalogs' in [1]

[1]: https://www.enterprisedb.com/docs/pgd/latest/reference/tables-views-functions/#user-visible-catalogs-and-views

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Aug 7, 2025 at 1:43 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Aug 7, 2025 at 12:25 PM shveta malik <shveta.malik@gmail.com> wrote:

Thanks Shveta for your opinion on the design.

> > On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >

> > > This proposal aims to address these limitations by introducing a
> > > conflict log history table, providing a structured, and queryable
> > > record of all logical replication conflicts.  This should be a
> > > configurable option whether to log into the conflict log history
> > > table, server logs or both.
> > >
> >
> > +1 for the idea.

Thanks

> >
> > > This proposal has two main design questions:
> > > ===================================
> > >
> > > 1. How do we store conflicting tuples from different tables?
> > > Using a JSON column to store the row data seems like the most flexible
> > > solution, as it can accommodate different table schemas.
> >
> > Yes, that is one option. I have not looked into details myself, but
> > you can also explore 'anyarray' used in pg_statistics to store 'Column
> > data values of the appropriate kind'.

I think conversion from row to json and json to row is convenient and
also other extensions like pgactive/bdr also provide as JSON.  But we
can explore this alternative options as well, thanks

> > > 2. Should this be a system table or a user table?
> > > a) System Table: Storing this in a system catalog is simple, but
> > > catalogs aren't designed for ever-growing data. While pg_large_object
> > > is an exception, this is not what we generally do IMHO.
> > > b) User Table: This offers more flexibility. We could allow a user to
> > > specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> > > either create the table internally or let the user create the table
> > > with a predefined schema.
> > >
> > > A potential drawback is that a user might drop or alter the table.
> > > However, we could mitigate this risk by simply logging a WARNING if
> > > the table is configured but an insertion fails.
> >
> > I believe it makes more sense for this to be a catalog table rather
> > than a user table. I wanted to check if we already have a large
> > catalog table of this kind, and I think pg_statistic could be an
> > example of a sizable catalog table. To get a rough idea of how size
> > scales with data, I ran a quick experiment: I created 1000 tables,
> > each with 2 JSON columns, 1 text column, and 2 integer columns. Then,
> > I inserted 1000 rows into each table and ran ANALYZE to collect
> > statistics. Here’s what I observed on a fresh database before and
> > after:
> >
> > Before:
> > pg_statistic row count: 412
> > Table size: ~256 kB
> >
> > After:
> > pg_statistic row count: 6,412
> > Table size: ~5.3 MB
> >
> > Although it isn’t an exact comparison, this gives us some insight into
> > how the statistics catalog table size grows with the number of rows.
> > It doesn’t seem excessively large with 6k rows, given the fact that
> > pg_statistic itself is a complex table having many 'anyarray'-type
> > columns.

Yeah that's good analysis, apart from this pg_largeobject is also a
catalog which grows with each large object and growth rate for that
will be very high because it stores large object data in catalog.

> >
> > That said, irrespective of what we decide, it would be ideal to offer
> > users an option for automatic purging, perhaps via a retention period
> > parameter like conflict_stats_retention_period (say default to 30
> > days), or a manual purge API such as purge_conflict_stats('older than
> > date'). I wasn’t able to find any such purge mechanism for PostgreSQL
> > stats tables, but Oracle does provide such purging options for some of
> > their statistics tables (not related to conflicts), see [1], [2].
> > And to manage it better, it could be range partitioned on timestamp.

Yeah that's an interesting suggestion to timestamp based partitioning
it for purging.

> It seems BDR also has one such conflict-log table which is a catalog
> table and is also partitioned on time. It has a default retention
> period of 30 days. See 'bdr.conflict_history' mentioned under
> 'catalogs' in [1]
>
> [1]: https://www.enterprisedb.com/docs/pgd/latest/reference/tables-views-functions/#user-visible-catalogs-and-views

Actually bdr is an extension and this table is under extension
namespace (bdr.conflict_history) so this is not really a catalog but
its a extension managed table.  So logically for PostgreSQL its an
user table but yeah this is created and managed by the extension.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Aug 7, 2025 at 1:43 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Aug 7, 2025 at 12:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Thanks Shveta for your opinion on the design.
>
> > > On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
>
> > > > This proposal aims to address these limitations by introducing a
> > > > conflict log history table, providing a structured, and queryable
> > > > record of all logical replication conflicts.  This should be a
> > > > configurable option whether to log into the conflict log history
> > > > table, server logs or both.
> > > >
> > >
> > > +1 for the idea.
>
> Thanks
>
> > >
> > > > This proposal has two main design questions:
> > > > ===================================
> > > >
> > > > 1. How do we store conflicting tuples from different tables?
> > > > Using a JSON column to store the row data seems like the most flexible
> > > > solution, as it can accommodate different table schemas.
> > >
> > > Yes, that is one option. I have not looked into details myself, but
> > > you can also explore 'anyarray' used in pg_statistics to store 'Column
> > > data values of the appropriate kind'.
>
> I think conversion from row to json and json to row is convenient and
> also other extensions like pgactive/bdr also provide as JSON.

Okay. Agreed.

> But we
> can explore this alternative options as well, thanks
>
> > > > 2. Should this be a system table or a user table?
> > > > a) System Table: Storing this in a system catalog is simple, but
> > > > catalogs aren't designed for ever-growing data. While pg_large_object
> > > > is an exception, this is not what we generally do IMHO.
> > > > b) User Table: This offers more flexibility. We could allow a user to
> > > > specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> > > > either create the table internally or let the user create the table
> > > > with a predefined schema.
> > > >
> > > > A potential drawback is that a user might drop or alter the table.
> > > > However, we could mitigate this risk by simply logging a WARNING if
> > > > the table is configured but an insertion fails.
> > >
> > > I believe it makes more sense for this to be a catalog table rather
> > > than a user table. I wanted to check if we already have a large
> > > catalog table of this kind, and I think pg_statistic could be an
> > > example of a sizable catalog table. To get a rough idea of how size
> > > scales with data, I ran a quick experiment: I created 1000 tables,
> > > each with 2 JSON columns, 1 text column, and 2 integer columns. Then,
> > > I inserted 1000 rows into each table and ran ANALYZE to collect
> > > statistics. Here’s what I observed on a fresh database before and
> > > after:
> > >
> > > Before:
> > > pg_statistic row count: 412
> > > Table size: ~256 kB
> > >
> > > After:
> > > pg_statistic row count: 6,412
> > > Table size: ~5.3 MB
> > >
> > > Although it isn’t an exact comparison, this gives us some insight into
> > > how the statistics catalog table size grows with the number of rows.
> > > It doesn’t seem excessively large with 6k rows, given the fact that
> > > pg_statistic itself is a complex table having many 'anyarray'-type
> > > columns.
>
> Yeah that's good analysis, apart from this pg_largeobject is also a
> catalog which grows with each large object and growth rate for that
> will be very high because it stores large object data in catalog.
>
> > >
> > > That said, irrespective of what we decide, it would be ideal to offer
> > > users an option for automatic purging, perhaps via a retention period
> > > parameter like conflict_stats_retention_period (say default to 30
> > > days), or a manual purge API such as purge_conflict_stats('older than
> > > date'). I wasn’t able to find any such purge mechanism for PostgreSQL
> > > stats tables, but Oracle does provide such purging options for some of
> > > their statistics tables (not related to conflicts), see [1], [2].
> > > And to manage it better, it could be range partitioned on timestamp.
>
> Yeah that's an interesting suggestion to timestamp based partitioning
> it for purging.
>
> > It seems BDR also has one such conflict-log table which is a catalog
> > table and is also partitioned on time. It has a default retention
> > period of 30 days. See 'bdr.conflict_history' mentioned under
> > 'catalogs' in [1]
> >
> > [1]: https://www.enterprisedb.com/docs/pgd/latest/reference/tables-views-functions/#user-visible-catalogs-and-views
>
> Actually bdr is an extension and this table is under extension
> namespace (bdr.conflict_history) so this is not really a catalog but
> its a extension managed table.

Yes, right. Sorry for confusion.

> So logically for PostgreSQL its an
> user table but yeah this is created and managed by the extension.
>

Any idea if the user can alter/drop or perform any DML on it? I could
not find any details on this part.

> --
> Regards,
> Dilip Kumar
> Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > So logically for PostgreSQL its an
> > user table but yeah this is created and managed by the extension.
> >
>
> Any idea if the user can alter/drop or perform any DML on it? I could
> not find any details on this part.

In my experience, for such extension managed tables where we want them
to behave like catalog, generally users are just granted with SELECT
permission.  So although it is not a catalog but for accessibility
wise for non admin users it is like a catalog.  IMHO, even if we
choose to create a user table for conflict log history we can also
control the permissions similarly.  What's your opinion on this?

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Fri, Aug 8, 2025 at 10:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > So logically for PostgreSQL its an
> > > user table but yeah this is created and managed by the extension.
> > >
> >
> > Any idea if the user can alter/drop or perform any DML on it? I could
> > not find any details on this part.
>
> In my experience, for such extension managed tables where we want them
> to behave like catalog, generally users are just granted with SELECT
> permission.  So although it is not a catalog but for accessibility
> wise for non admin users it is like a catalog.  IMHO, even if we
> choose to create a user table for conflict log history we can also
> control the permissions similarly.
>

Yes, it can be done. Technically there is nothing preventing us from
doing it. But in my experience, I have never seen any
system-maintained statistics tables to be a user table rather than
catalog table. Extensions are a different case; they typically manage
their own tables, which are not part of the system catalog. But if any
such stats related functionality is part of the core database, it
generally makes more sense to implement it as a catalog table
(provided there are no major obstacles to doing so). But I am curious
to know what others think here.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Aug 8, 2025 at 10:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > So logically for PostgreSQL its an
> > > user table but yeah this is created and managed by the extension.
> > >
> >
> > Any idea if the user can alter/drop or perform any DML on it? I could
> > not find any details on this part.
>
> In my experience, for such extension managed tables where we want them
> to behave like catalog, generally users are just granted with SELECT
> permission.  So although it is not a catalog but for accessibility
> wise for non admin users it is like a catalog.  IMHO, even if we
> choose to create a user table for conflict log history we can also
> control the permissions similarly.  What's your opinion on this?
>

Yes, I think it is important to control permissions on this table even
if it is a user table. How about giving SELECT, DELETE, TRUNCATE
permissions to subscription owner assuming we create one such table
per subscription?

It should be a user table due to following reasons (a) It is an ever
growing table by definition and we need some level of user control to
manage it (like remove the old data); (b) We may want some sort of
partitioning streategy to manage it, even though, we decide to do it
ourselves now but in future, we should allow user to also specify it;
(c) We may also want user to specify what exact information she wants
to get stored considering in future we want resolutions to also be
stored in it. See a somewhat similar proposal to store errors during
copy by Tom [1]; (d) In a near-by thread, we are discussing storing
errors during copy in user table [2] and we have some similarity with
that proposal as well.

If we agree on this then the next thing to consider is whether we
allow users to create such a table or do it ourselves. In the long
term, we may want both but for simplicity, we can auto-create
ourselves during CREATE SUBSCRIPTION with some option. BTW, if we
decide to let user create it then we can consider the idea of TYPED
tables as discussed in emails [3][4].

For user tables, we need to consider how to avoid replicating these
tables for publications that use FOR ALL TABLES specifier. One idea is
to use EXCLUDE table functionality as being discussed in thread [5]
but that would also be a bit tricky especially if we decide to create
such a table automatically. One naive idea is that internally we skip
sending changes from this table for "FOR ALL TABLES" publication, and
we shouldn't allow creating publication for this table. OTOH, if we
allow the user to create and specify this table, we can ask her to
specify with EXCLUDE syntax in publication. This needs more thoughts.

[1] - https://www.postgresql.org/message-id/flat/752672.1699474336%40sss.pgh.pa.us#b8450be5645c4252d7d02cf7aca1fc7b
[2] - https://www.postgresql.org/message-id/CACJufxH_OJpVra%3D0c4ow8fbxHj7heMcVaTNEPa5vAurSeNA-6Q%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/28c420cf-f25d-44f1-89fd-04ef0b2dd3db%40dunslane.net
[4] -
https://www.postgresql.org/message-id/CADrsxdYG%2B%2BK%3DiKjRm35u03q-Nb0tQPJaqjxnA2mGt5O%3DDht7sw%40mail.gmail.com
[5] -
https://www.postgresql.org/message-id/CANhcyEW%2BuJB_bvQLEaZCgoRTc1%3Di%2BQnrPPHxZ2%3D0SBSCyj9pkg%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Alastair Turner
Дата:
On Wed, 13 Aug 2025 at 11:09, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Aug 8, 2025 at 10:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > So logically for PostgreSQL its an
> > > user table but yeah this is created and managed by the extension.
> > >
> >
> > Any idea if the user can alter/drop or perform any DML on it? I could
> > not find any details on this part.
>
> In my experience, for such extension managed tables where we want them
> to behave like catalog, generally users are just granted with SELECT
> permission.  So although it is not a catalog but for accessibility
> wise for non admin users it is like a catalog.  IMHO, even if we
> choose to create a user table for conflict log history we can also
> control the permissions similarly.  What's your opinion on this?
>

Yes, I think it is important to control permissions on this table even
if it is a user table. How about giving SELECT, DELETE, TRUNCATE
permissions to subscription owner assuming we create one such table
per subscription?

It should be a user table due to following reasons (a) It is an ever
growing table by definition and we need some level of user control to
manage it (like remove the old data); (b) We may want some sort of
partitioning streategy to manage it, even though, we decide to do it
ourselves now but in future, we should allow user to also specify it;
(c) We may also want user to specify what exact information she wants
to get stored considering in future we want resolutions to also be
stored in it. See a somewhat similar proposal to store errors during
copy by Tom [1]; (d) In a near-by thread, we are discussing storing
errors during copy in user table [2] and we have some similarity with
that proposal as well.

If we agree on this then the next thing to consider is whether we
allow users to create such a table or do it ourselves. In the long
term, we may want both but for simplicity, we can auto-create
ourselves during CREATE SUBSCRIPTION with some option. BTW, if we
decide to let user create it then we can consider the idea of TYPED
tables as discussed in emails [3][4].

Having it be a user table, and specifying the table per subscription sounds good. This is very similar to how the load error tables for CloudBerry behave, for instance. To have both options for table creation, CREATE ... IF NOT EXISTS semantics work well - if the option on CREATE SUBSCRIPTION specifies an existing table of the right type use it, or create one with the name supplied. This would also give the user control over whether to have one table per subscription, one central table or anything in between. Rather than constraining permissions on the table, the CREATE SUBSCRIPTION command could create a dependency relationship between the table and the subscription.This would prevent removal of the table, even by a superuser.
 
For user tables, we need to consider how to avoid replicating these
tables for publications that use FOR ALL TABLES specifier. One idea is
to use EXCLUDE table functionality as being discussed in thread [5]
but that would also be a bit tricky especially if we decide to create
such a table automatically. One naive idea is that internally we skip
sending changes from this table for "FOR ALL TABLES" publication, and
we shouldn't allow creating publication for this table. OTOH, if we
allow the user to create and specify this table, we can ask her to
specify with EXCLUDE syntax in publication. This needs more thoughts.

If a dependency relationship is established between the error table and the subscription, could this be used as a basis for filtering the error tables from FOR ALL TABLES subscriptions?

Regards

Alastair 

Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Aug 14, 2025 at 4:26 PM Alastair Turner <minion@decodable.me> wrote:
>
> On Wed, 13 Aug 2025 at 11:09, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Aug 8, 2025 at 10:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> >
>> > On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
>> > >
>> > > On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> > > >
>> > > > So logically for PostgreSQL its an
>> > > > user table but yeah this is created and managed by the extension.
>> > > >
>> > >
>> > > Any idea if the user can alter/drop or perform any DML on it? I could
>> > > not find any details on this part.
>> >
>> > In my experience, for such extension managed tables where we want them
>> > to behave like catalog, generally users are just granted with SELECT
>> > permission.  So although it is not a catalog but for accessibility
>> > wise for non admin users it is like a catalog.  IMHO, even if we
>> > choose to create a user table for conflict log history we can also
>> > control the permissions similarly.  What's your opinion on this?
>> >
>>
>> Yes, I think it is important to control permissions on this table even
>> if it is a user table. How about giving SELECT, DELETE, TRUNCATE
>> permissions to subscription owner assuming we create one such table
>> per subscription?
>>
>> It should be a user table due to following reasons (a) It is an ever
>> growing table by definition and we need some level of user control to
>> manage it (like remove the old data); (b) We may want some sort of
>> partitioning streategy to manage it, even though, we decide to do it
>> ourselves now but in future, we should allow user to also specify it;
>> (c) We may also want user to specify what exact information she wants
>> to get stored considering in future we want resolutions to also be
>> stored in it. See a somewhat similar proposal to store errors during
>> copy by Tom [1]; (d) In a near-by thread, we are discussing storing
>> errors during copy in user table [2] and we have some similarity with
>> that proposal as well.
>>
>> If we agree on this then the next thing to consider is whether we
>> allow users to create such a table or do it ourselves. In the long
>> term, we may want both but for simplicity, we can auto-create
>> ourselves during CREATE SUBSCRIPTION with some option. BTW, if we
>> decide to let user create it then we can consider the idea of TYPED
>> tables as discussed in emails [3][4].
>
>
> Having it be a user table, and specifying the table per subscription sounds good. This is very similar to how the
loaderror tables for CloudBerry behave, for instance. To have both options for table creation, CREATE ... IF NOT EXISTS
semanticswork well - if the option on CREATE SUBSCRIPTION specifies an existing table of the right type use it, or
createone with the name supplied. This would also give the user control over whether to have one table per
subscription,one central table or anything in between. 
>

Sounds reasonable. I think the first version we can let such a table
be created automatically with some option(s) with subscription. Then,
in subsequent versions, we can extend the functionality to allow
existing tables.

>
> Rather than constraining permissions on the table, the CREATE SUBSCRIPTION command could create a dependency
relationshipbetween the table and the subscription.This would prevent removal of the table, even by a superuser. 
>

Okay, that makes sense. But, we still probably want to disallow users
from inserting or updating rows in the conflict table.

>>
>> For user tables, we need to consider how to avoid replicating these
>> tables for publications that use FOR ALL TABLES specifier. One idea is
>> to use EXCLUDE table functionality as being discussed in thread [5]
>> but that would also be a bit tricky especially if we decide to create
>> such a table automatically. One naive idea is that internally we skip
>> sending changes from this table for "FOR ALL TABLES" publication, and
>> we shouldn't allow creating publication for this table. OTOH, if we
>> allow the user to create and specify this table, we can ask her to
>> specify with EXCLUDE syntax in publication. This needs more thoughts.
>
>
> If a dependency relationship is established between the error table and the subscription, could this be used as a
basisfor filtering the error tables from FOR ALL TABLES subscriptions? 
>

Yeah, that is worth considering.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Aug 13, 2025 at 3:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 10:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > So logically for PostgreSQL its an
> > > > user table but yeah this is created and managed by the extension.
> > > >
> > >
> > > Any idea if the user can alter/drop or perform any DML on it? I could
> > > not find any details on this part.
> >
> > In my experience, for such extension managed tables where we want them
> > to behave like catalog, generally users are just granted with SELECT
> > permission.  So although it is not a catalog but for accessibility
> > wise for non admin users it is like a catalog.  IMHO, even if we
> > choose to create a user table for conflict log history we can also
> > control the permissions similarly.  What's your opinion on this?
> >
>
> Yes, I think it is important to control permissions on this table even
> if it is a user table. How about giving SELECT, DELETE, TRUNCATE
> permissions to subscription owner assuming we create one such table
> per subscription?

Right, we need to control the permission.  I am not sure whether we
want a per subscription table or a common one. Earlier I was thinking
of a single table, but I think per subscription is not a bad idea
especially for managing the permissions.  And there can not be a
really huge number of subscriptions that we need to worry about
creating many conflict log history tables and that too we will only
create such tables when users pass that subscription option.


> It should be a user table due to following reasons (a) It is an ever
> growing table by definition and we need some level of user control to
> manage it (like remove the old data); (b) We may want some sort of
> partitioning streategy to manage it, even though, we decide to do it
> ourselves now but in future, we should allow user to also specify it;

Maybe we can partition by range on date (when entry is inserted) .
That way it would be easy to get rid of older partitions for users.

> (c) We may also want user to specify what exact information she wants
> to get stored considering in future we want resolutions to also be
> stored in it. See a somewhat similar proposal to store errors during
> copy by Tom [1]; (d) In a near-by thread, we are discussing storing
> errors during copy in user table [2] and we have some similarity with
> that proposal as well.

Right, we may consider that as well.

> If we agree on this then the next thing to consider is whether we
> allow users to create such a table or do it ourselves. In the long
> term, we may want both but for simplicity, we can auto-create
> ourselves during CREATE SUBSCRIPTION with some option. BTW, if we
> decide to let user create it then we can consider the idea of TYPED
> tables as discussed in emails [3][4].

Yeah that's an interesting option.

>
> For user tables, we need to consider how to avoid replicating these
> tables for publications that use FOR ALL TABLES specifier. One idea is
> to use EXCLUDE table functionality as being discussed in thread [5]
> but that would also be a bit tricky especially if we decide to create
> such a table automatically. One naive idea is that internally we skip
> sending changes from this table for "FOR ALL TABLES" publication, and
> we shouldn't allow creating publication for this table. OTOH, if we
> allow the user to create and specify this table, we can ask her to
> specify with EXCLUDE syntax in publication. This needs more thoughts.

Yes this needs more thought, I will think more on this point and respond.

Yet another question is about table names, whether we keep some
standard name like conflict_log_history_$subid or let users pass the
name.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Aug 15, 2025 at 2:31 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Yet another question is about table names, whether we keep some
> standard name like conflict_log_history_$subid or let users pass the
> name.
>

It would be good if we can let the user specify the table_name and if
she didn't specify then use an internally generated name. I think it
will be somewhat similar to slot_name. However, in this case, there is
one challenge which is how can we decide whether the schema of the
user provided table_name is correct or not? Do we compare it with the
standard schema we are planning to use?

One idea to keep things simple for the first version is that we allow
users to specify the table_name for storing conflicts but the table
should be created internally and if the same name table already
exists, we can give an ERROR. Then we can later extend the
functionality to even allow storing conflicts in pre-created tables
with more checks about its schema.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Aug 18, 2025 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Aug 15, 2025 at 2:31 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Yet another question is about table names, whether we keep some
> > standard name like conflict_log_history_$subid or let users pass the
> > name.
> >
>
> It would be good if we can let the user specify the table_name and if
> she didn't specify then use an internally generated name. I think it
> will be somewhat similar to slot_name. However, in this case, there is
> one challenge which is how can we decide whether the schema of the
> user provided table_name is correct or not? Do we compare it with the
> standard schema we are planning to use?

Ideally we can do that, if you see in this thread [1] there is a patch
[2] which first try to validate the table schema and if it doesn't
exist it creates it on its own.  And it seems fine to me.

> One idea to keep things simple for the first version is that we allow
> users to specify the table_name for storing conflicts but the table
> should be created internally and if the same name table already
> exists, we can give an ERROR. Then we can later extend the
> functionality to even allow storing conflicts in pre-created tables
> with more checks about its schema.

That's fair too.  I am wondering what namespace we should create this
user table in. If we are creating internally, I assume the user should
provide a schema qualified name right?


[1] https://www.postgresql.org/message-id/flat/752672.1699474336%40sss.pgh.pa.us#b8450be5645c4252d7d02cf7aca1fc7b
[2] https://www.postgresql.org/message-id/attachment/152792/v8-0001-Add-a-new-COPY-option-SAVE_ERROR.patch


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Wed, Aug 20, 2025 at 11:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Aug 18, 2025 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > One idea to keep things simple for the first version is that we allow
> > users to specify the table_name for storing conflicts but the table
> > should be created internally and if the same name table already
> > exists, we can give an ERROR. Then we can later extend the
> > functionality to even allow storing conflicts in pre-created tables
> > with more checks about its schema.
>
> That's fair too.  I am wondering what namespace we should create this
> user table in. If we are creating internally, I assume the user should
> provide a schema qualified name right?
>

Yeah, but if not provided then we should create it based on
search_path similar to what we do when user created the table from
psql.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Aug 20, 2025 at 5:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 20, 2025 at 11:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Aug 18, 2025 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > > One idea to keep things simple for the first version is that we allow
> > > users to specify the table_name for storing conflicts but the table
> > > should be created internally and if the same name table already
> > > exists, we can give an ERROR. Then we can later extend the
> > > functionality to even allow storing conflicts in pre-created tables
> > > with more checks about its schema.
> >
> > That's fair too.  I am wondering what namespace we should create this
> > user table in. If we are creating internally, I assume the user should
> > provide a schema qualified name right?
> >
>
> Yeah, but if not provided then we should create it based on
> search_path similar to what we do when user created the table from
> psql.

Yeah that makes sense.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Aug 21, 2025 at 9:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Aug 20, 2025 at 5:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Aug 20, 2025 at 11:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Aug 18, 2025 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > >
> > > > One idea to keep things simple for the first version is that we allow
> > > > users to specify the table_name for storing conflicts but the table
> > > > should be created internally and if the same name table already
> > > > exists, we can give an ERROR. Then we can later extend the
> > > > functionality to even allow storing conflicts in pre-created tables
> > > > with more checks about its schema.
> > >
> > > That's fair too.  I am wondering what namespace we should create this
> > > user table in. If we are creating internally, I assume the user should
> > > provide a schema qualified name right?
> > >
> >
> > Yeah, but if not provided then we should create it based on
> > search_path similar to what we do when user created the table from
> > psql.

While working on the patch, I see there are some open questions

1. We decided to pass the conflict history table name during
subscription creation. And it makes sense to create this table when
the CREATE SUBSCRIPTION command is executed. A potential concern is
that the subscription owner will also own this table, having full
control over it, including the ability to drop or alter its schema.
This might not be an issue. If an INSERT into the conflict table
fails, we can check the table's existence and schema. If they are not
as expected, the conflict log history option can be disabled and
re-enabled later via ALTER SUBSCRIPTION.

2. A further challenge is how to exclude these tables from publishing
changes. If we support a subscription-level log history table and the
user publishes ALL TABLES, the output plugin uses
is_publishable_relation() to check if a table is publishable. However,
applying the same logic here would require checking each subscription
on the node to see if the table is designated as a conflict log
history table for any subscription, which could be costly.

3. And one last thing is about should we consider dropping this table
when we drop the subscription, I think this makes sense as we are
internally creating it while creating the subscription.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Alastair Turner
Дата:
Hi Dilip

Thanks for working on this, I think it will make conflict detection a lot more useful. 

On Sat, 6 Sept 2025, 10:38 Dilip Kumar, <dilipbalaut@gmail.com> wrote:
While working on the patch, I see there are some open questions

1. We decided to pass the conflict history table name during
subscription creation. And it makes sense to create this table when
the CREATE SUBSCRIPTION command is executed. A potential concern is
that the subscription owner will also own this table, having full
control over it, including the ability to drop or alter its schema. 
...

Typed tables and the dependency framework can address this concern. The schema of a typed table cannot be changed. If the subscription is marked as a dependency of the log table, the table cannot be dropped while the subscription exists.
 
2. A further challenge is how to exclude these tables from publishing
changes. If we support a subscription-level log history table and the
user publishes ALL TABLES, the output plugin uses
is_publishable_relation() to check if a table is publishable. However,
applying the same logic here would require checking each subscription
on the node to see if the table is designated as a conflict log
history table for any subscription, which could be costly.

 Checking the type of a table and/or whether a subscription object depends on it in a certain way would be a far less costly operation to add to is_publishable_relation()
 
3. And one last thing is about should we consider dropping this table
when we drop the subscription, I think this makes sense as we are
internally creating it while creating the subscription.

Having to clean up the log table explicitly is likely to annoy users far less than having the conflict data destroyed as a side effect of another operation. I would strongly suggest leaving the table in place when the subscription is dropped.

Regards
Alastair

Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sun, Sep 7, 2025 at 1:42 PM Alastair Turner <minion@decodable.me> wrote:
>
> Hi Dilip
>
> Thanks for working on this, I think it will make conflict detection a lot more useful.

Thanks for the suggestions, please find my reply inline.

> On Sat, 6 Sept 2025, 10:38 Dilip Kumar, <dilipbalaut@gmail.com> wrote:
>>
>> While working on the patch, I see there are some open questions
>>
>> 1. We decided to pass the conflict history table name during
>> subscription creation. And it makes sense to create this table when
>> the CREATE SUBSCRIPTION command is executed. A potential concern is
>> that the subscription owner will also own this table, having full
>> control over it, including the ability to drop or alter its schema.

>
> Typed tables and the dependency framework can address this concern. The schema of a typed table cannot be changed. If
thesubscription is marked as a dependency of the log table, the table cannot be dropped while the subscription exists. 

Yeah type table can be useful here, but only concern is when do we
create this type.  One option is whenever we can create a catalog
relation say "conflict_log_history" that will create a type and then
for each subscription if we need to create the conflict history table
we can create it as "conflict_log_history" type, but this might not be
a best option as we are creating catalog just for using this type.
Second option is to create a type while creating a table itself but
then again the problem remains the same as subscription owners get
control over altering the schema of the type itself.  So the goal is
we want this type to be created such that it can not be altered so
IMHO option1 is more suitable i.e. creating conflict_log_history as
catalog and per subscription table can be created as this type.

>>
>> 2. A further challenge is how to exclude these tables from publishing
>> changes. If we support a subscription-level log history table and the
>> user publishes ALL TABLES, the output plugin uses
>> is_publishable_relation() to check if a table is publishable. However,
>> applying the same logic here would require checking each subscription
>> on the node to see if the table is designated as a conflict log
>> history table for any subscription, which could be costly.
>
>
>  Checking the type of a table and/or whether a subscription object depends on it in a certain way would be a far less
costlyoperation to add to is_publishable_relation() 
+1

>
>>
>> 3. And one last thing is about should we consider dropping this table
>> when we drop the subscription, I think this makes sense as we are
>> internally creating it while creating the subscription.
>
>
> Having to clean up the log table explicitly is likely to annoy users far less than having the conflict data destroyed
asa side effect of another operation. I would strongly suggest leaving the table in place when the subscription is
dropped.

Thanks for the input, I would like to hear opinions from others as
well here.  I agree that implicitly getting rid of the conflict
history might be problematic but we also need to consider that we are
considering dropping this when the whole subscription is dropped.  Not
sure even after subscription drop users will be interested in conflict
history, if yes then they need to be aware of preserving that isn't
it.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Sep 8, 2025 at 12:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sun, Sep 7, 2025 at 1:42 PM Alastair Turner <minion@decodable.me> wrote:
> >
> > Hi Dilip
> >
> > Thanks for working on this, I think it will make conflict detection a lot more useful.
>
> Thanks for the suggestions, please find my reply inline.
>
> > On Sat, 6 Sept 2025, 10:38 Dilip Kumar, <dilipbalaut@gmail.com> wrote:
> >>
> >> While working on the patch, I see there are some open questions
> >>
> >> 1. We decided to pass the conflict history table name during
> >> subscription creation. And it makes sense to create this table when
> >> the CREATE SUBSCRIPTION command is executed. A potential concern is
> >> that the subscription owner will also own this table, having full
> >> control over it, including the ability to drop or alter its schema.
>
> >
> > Typed tables and the dependency framework can address this concern. The schema of a typed table cannot be changed.
Ifthe subscription is marked as a dependency of the log table, the table cannot be dropped while the subscription
exists.
>
> Yeah type table can be useful here, but only concern is when do we
> create this type.
>

How about having this as a built-in type?

>  One option is whenever we can create a catalog
> relation say "conflict_log_history" that will create a type and then
> for each subscription if we need to create the conflict history table
> we can create it as "conflict_log_history" type, but this might not be
> a best option as we are creating catalog just for using this type.
> Second option is to create a type while creating a table itself but
> then again the problem remains the same as subscription owners get
> control over altering the schema of the type itself.  So the goal is
> we want this type to be created such that it can not be altered so
> IMHO option1 is more suitable i.e. creating conflict_log_history as
> catalog and per subscription table can be created as this type.
>

I think having it as a catalog table has drawbacks like who will clean
this ever growing table. The one thing is not clear from Alastair's
response is that he said to make subscription as a dependency of
table, if we do so, then won't it be difficult to even drop
subscription and also doesn't that sound reverse of what we want.

> >>
> >> 2. A further challenge is how to exclude these tables from publishing
> >> changes. If we support a subscription-level log history table and the
> >> user publishes ALL TABLES, the output plugin uses
> >> is_publishable_relation() to check if a table is publishable. However,
> >> applying the same logic here would require checking each subscription
> >> on the node to see if the table is designated as a conflict log
> >> history table for any subscription, which could be costly.
> >
> >
> >  Checking the type of a table and/or whether a subscription object depends on it in a certain way would be a far
lesscostly operation to add to is_publishable_relation() 
> +1
>
> >
> >>
> >> 3. And one last thing is about should we consider dropping this table
> >> when we drop the subscription, I think this makes sense as we are
> >> internally creating it while creating the subscription.
> >
> >
> > Having to clean up the log table explicitly is likely to annoy users far less than having the conflict data
destroyedas a side effect of another operation. I would strongly suggest leaving the table in place when the
subscriptionis dropped. 
>
> Thanks for the input, I would like to hear opinions from others as
> well here.
>

But OTOH, there could be users who want such a table to be dropped.
One possibility is that if we user provided us a pre-created table
then we leave it to user to remove the table, otherwise, we can remove
with drop subscription. BTW, did we decide that we want a
conflict-table-per-subscription or one table for all subscriptions, if
later, then I guess the problem would be that it has to be a shared
table across databases.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Sep 10, 2025 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Sep 8, 2025 at 12:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Sun, Sep 7, 2025 at 1:42 PM Alastair Turner <minion@decodable.me> wrote:
> > >
> > > Hi Dilip
> > >
> > > Thanks for working on this, I think it will make conflict detection a lot more useful.
> >
> > Thanks for the suggestions, please find my reply inline.
> >
> > > On Sat, 6 Sept 2025, 10:38 Dilip Kumar, <dilipbalaut@gmail.com> wrote:
> > >>
> > >> While working on the patch, I see there are some open questions
> > >>
> > >> 1. We decided to pass the conflict history table name during
> > >> subscription creation. And it makes sense to create this table when
> > >> the CREATE SUBSCRIPTION command is executed. A potential concern is
> > >> that the subscription owner will also own this table, having full
> > >> control over it, including the ability to drop or alter its schema.
> >
> > >
> > > Typed tables and the dependency framework can address this concern. The schema of a typed table cannot be
changed.If the subscription is marked as a dependency of the log table, the table cannot be dropped while the
subscriptionexists. 
> >
> > Yeah type table can be useful here, but only concern is when do we
> > create this type.
> >
>
> How about having this as a built-in type?

Here we will have to create a built-in type of type table which is I
think typcategory => 'C' and if we create this type it should be
supplied with the "typrelid" that means there should be a backing
catalog table. At least thats what I think.

> >  One option is whenever we can create a catalog
> > relation say "conflict_log_history" that will create a type and then
> > for each subscription if we need to create the conflict history table
> > we can create it as "conflict_log_history" type, but this might not be
> > a best option as we are creating catalog just for using this type.
> > Second option is to create a type while creating a table itself but
> > then again the problem remains the same as subscription owners get
> > control over altering the schema of the type itself.  So the goal is
> > we want this type to be created such that it can not be altered so
> > IMHO option1 is more suitable i.e. creating conflict_log_history as
> > catalog and per subscription table can be created as this type.
> >
>
> I think having it as a catalog table has drawbacks like who will clean
> this ever growing table.

No, I didn't mean an ever growing catalog table, I was giving an
option to create a catalog table just to create a built-in type and
then we will create an actual log history table of this built-in type
for each subscription while creating the subscription.  So this
catalog table will be there but nothing will be inserted to this table
and whenever the user supplies a conflict log history table name while
creating a subscription that time we will create an actual table and
the type of the table will be as the catalog table type.  I agree
creating a catalog table for this purpose might not be worth it, but I
am not yet able to figure out how to create a built-in type of type
table without creating the actual table.

 The one thing is not clear from Alastair's
> response is that he said to make subscription as a dependency of
> table, if we do so, then won't it be difficult to even drop
> subscription and also doesn't that sound reverse of what we want.

I assume he means subscription will be dependent on the log table,
that means we can not drop the log table as subscription is dependent
on this table.

> > >>
> > >> 2. A further challenge is how to exclude these tables from publishing
> > >> changes. If we support a subscription-level log history table and the
> > >> user publishes ALL TABLES, the output plugin uses
> > >> is_publishable_relation() to check if a table is publishable. However,
> > >> applying the same logic here would require checking each subscription
> > >> on the node to see if the table is designated as a conflict log
> > >> history table for any subscription, which could be costly.
> > >
> > >
> > >  Checking the type of a table and/or whether a subscription object depends on it in a certain way would be a far
lesscostly operation to add to is_publishable_relation() 
> > +1
> >
> > >
> > >>
> > >> 3. And one last thing is about should we consider dropping this table
> > >> when we drop the subscription, I think this makes sense as we are
> > >> internally creating it while creating the subscription.
> > >
> > >
> > > Having to clean up the log table explicitly is likely to annoy users far less than having the conflict data
destroyedas a side effect of another operation. I would strongly suggest leaving the table in place when the
subscriptionis dropped. 
> >
> > Thanks for the input, I would like to hear opinions from others as
> > well here.
> >
>
> But OTOH, there could be users who want such a table to be dropped.
> One possibility is that if we user provided us a pre-created table
> then we leave it to user to remove the table, otherwise, we can remove
> with drop subscription.

Thanks make sense.

 BTW, did we decide that we want a
> conflict-table-per-subscription or one table for all subscriptions, if
> later, then I guess the problem would be that it has to be a shared
> table across databases.

Right and I don't think there is an option to create a user defined
shared table.  And I don't think there is any issue creating per
subscription conflict log history table, except that the subscription
owner should have permission to create the table in the database while
creating the subscription, but I think this is expected, either user
can get the sufficient privilege or disable the option for conflict
log history table.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Alastair Turner
Дата:


On Wed, 10 Sept 2025 at 11:15, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 10, 2025 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
... 
>
> How about having this as a built-in type?

Here we will have to create a built-in type of type table which is I
think typcategory => 'C' and if we create this type it should be
supplied with the "typrelid" that means there should be a backing
catalog table. At least thats what I think.
A compound type can be used for building a table, it's not necessary to create a table when creating the type. In user SQL:

CREATE TYPE conflict_log_type AS (
  conflictid UUID,
  subid OID,
  tableid OID,
  conflicttype TEXT,
  operationtype TEXT,
  replication_origin   TEXT,
  remote_commit_ts TIMESTAMPTZ,
  local_commit_ts TIMESTAMPTZ,
  ri_key                    JSON,
  remote_tuple         JSON,
  local_tuple          JSON
);

CREATE TABLE my_subscription_conflicts OF conflict_log_type;
 
...

 The one thing is not clear from Alastair's
> response is that he said to make subscription as a dependency of
> table, if we do so, then won't it be difficult to even drop
> subscription and also doesn't that sound reverse of what we want.

I assume he means subscription will be dependent on the log table,
that means we can not drop the log table as subscription is dependent
on this table.
 
Yes, that's what I was proposing.
 
> > >>
> > >> 2. A further challenge is how to exclude these tables from publishing
> > >> changes. If we support a subscription-level log history table and the
> > >> user publishes ALL TABLES, the output plugin uses
> > >> is_publishable_relation() to check if a table is publishable. However,
> > >> applying the same logic here would require checking each subscription
> > >> on the node to see if the table is designated as a conflict log
> > >> history table for any subscription, which could be costly.
> > >
> > >
> > >  Checking the type of a table and/or whether a subscription object depends on it in a certain way would be a far less costly operation to add to is_publishable_relation()
> > +1
> >
> > >
> > >>
> > >> 3. And one last thing is about should we consider dropping this table
> > >> when we drop the subscription, I think this makes sense as we are
> > >> internally creating it while creating the subscription.
> > >
> > >
> > > Having to clean up the log table explicitly is likely to annoy users far less than having the conflict data destroyed as a side effect of another operation. I would strongly suggest leaving the table in place when the subscription is dropped.
> >
> > Thanks for the input, I would like to hear opinions from others as
> > well here.
> >
>
> But OTOH, there could be users who want such a table to be dropped.
> One possibility is that if we user provided us a pre-created table
> then we leave it to user to remove the table, otherwise, we can remove
> with drop subscription.

Thanks make sense.

 BTW, did we decide that we want a
> conflict-table-per-subscription or one table for all subscriptions, if
> later, then I guess the problem would be that it has to be a shared
> table across databases.

Right and I don't think there is an option to create a user defined
shared table.  And I don't think there is any issue creating per
subscription conflict log history table, except that the subscription
owner should have permission to create the table in the database while
creating the subscription, but I think this is expected, either user
can get the sufficient privilege or disable the option for conflict
log history table.

Since  subscriptions are created in a particular database, it seems reasonable that error tables would also be created in a particular database.

Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Sep 10, 2025 at 4:32 PM Alastair Turner <minion@decodable.me> wrote:
>
>> Here we will have to create a built-in type of type table which is I
>> think typcategory => 'C' and if we create this type it should be
>> supplied with the "typrelid" that means there should be a backing
>> catalog table. At least thats what I think.
>
> A compound type can be used for building a table, it's not necessary to create a table when creating the type. In
userSQL: 
>
> CREATE TYPE conflict_log_type AS (
>   conflictid UUID,
>   subid OID,
>   tableid OID,
>   conflicttype TEXT,
>   operationtype TEXT,
>   replication_origin   TEXT,
>   remote_commit_ts TIMESTAMPTZ,
>   local_commit_ts TIMESTAMPTZ,
>   ri_key                    JSON,
>   remote_tuple         JSON,
>   local_tuple          JSON
> );
>
> CREATE TABLE my_subscription_conflicts OF conflict_log_type;

Problem is if you CREATE TYPE just before creating the table that
means subscription owners get full control over the type as well it
means they can alter the type itself.  So logically this TYPE should
be a built-in type so that subscription owners do not have control to
ALTER the type but they have permission to create a table from this
type.  But the problem is whenever you create a type it needs to have
corresponding relid in pg_class in fact you can just create a type as
per your example and see[1] it will get corresponding entry in
pg_class.

So the problem is if you create a user defined type it will be created
under the subscription owner and it defeats the purpose of not
allowing to alter the type OTOH if we create a built-in type it needs
to have a corresponding entry in pg_class.

So what's your proposal, create this type while creating a
subscription or as a built-in type, or anything else?


[1]
postgres[1948123]=# CREATE TYPE conflict_log_type AS (conflictid UUID);
postgres[1948123]=# select oid, typrelid, typcategory from pg_type
where typname='conflict_log_type';

  oid  | typrelid | typcategory
-------+----------+-------------
 16386 |    16384 | C
(1 row)

postgres[1948123]=# select relname from pg_class where oid=16384;
      relname
-------------------
 conflict_log_type


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Bharath Rupireddy
Дата:
Hi,

On Tue, Aug 5, 2025 at 5:24 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Currently we log conflicts to the server's log file and updates, this
> approach has limitations, 1) Difficult to query and analyze, parsing
> plain text log files for conflict details is inefficient. 2) Lack of
> structured data, key conflict attributes (table, operation, old/new
> data, LSN, etc.) are not readily available in a structured, queryable
> format. 3) Difficult for external monitoring tools or custom
> resolution scripts to consume conflict data directly.
>
> This proposal aims to address these limitations by introducing a
> conflict log history table, providing a structured, and queryable
> record of all logical replication conflicts.  This should be a
> configurable option whether to log into the conflict log history
> table, server logs or both.

+1 for the overall idea. Having an option to separate out the
conflicts helps analyze the data correctness issues and understand the
behavior of conflicts.

Parsing server logs file for analysis and debugging is a typical
requirement differently met with tools like log_fdw or capture server
logs in CSV format for parsing or do text search and analyze etc.

> This proposal has two main design questions:
> ===================================
>
> 1. How do we store conflicting tuples from different tables?
> Using a JSON column to store the row data seems like the most flexible
> solution, as it can accommodate different table schemas.

How good is storing conflicts on the table? Is it okay to generate WAL
traffic? Is it okay to physically replicate this log table to all
replicas? Is it okay to logically replicate this log table to all
subscribers and logical decoding clients? How does this table get
truncated? If truncation gets delayed, won't it unnecessarily fill up
storage?

> 2. Should this be a system table or a user table?
> a) System Table: Storing this in a system catalog is simple, but
> catalogs aren't designed for ever-growing data. While pg_large_object
> is an exception, this is not what we generally do IMHO.
> b) User Table: This offers more flexibility. We could allow a user to
> specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> either create the table internally or let the user create the table
> with a predefined schema.

-1 for the system table for sure.

> A potential drawback is that a user might drop or alter the table.
> However, we could mitigate this risk by simply logging a WARNING if
> the table is configured but an insertion fails.
> I am currently working on a POC patch for the same, but will post that
> once we have some thoughts on design choices.

How about streaming the conflicts in fixed format to a separate log
file other than regular postgres server log file?  All the
rules/settings that apply to regular postgres server log files also
apply for conflicts server log files (rotation, GUCs, format
CSV/JSON/TEXT etc.). This way there's no additional WAL, and we don't
have to worry about drop/alter, truncate, delete, update/insert,
permission model, physical replication, logical replication, storage
space etc.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Sep 11, 2025 at 12:53 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Aug 5, 2025 at 5:24 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Currently we log conflicts to the server's log file and updates, this
> > approach has limitations, 1) Difficult to query and analyze, parsing
> > plain text log files for conflict details is inefficient. 2) Lack of
> > structured data, key conflict attributes (table, operation, old/new
> > data, LSN, etc.) are not readily available in a structured, queryable
> > format. 3) Difficult for external monitoring tools or custom
> > resolution scripts to consume conflict data directly.
> >
> > This proposal aims to address these limitations by introducing a
> > conflict log history table, providing a structured, and queryable
> > record of all logical replication conflicts.  This should be a
> > configurable option whether to log into the conflict log history
> > table, server logs or both.
>
> +1 for the overall idea. Having an option to separate out the
> conflicts helps analyze the data correctness issues and understand the
> behavior of conflicts.
>
> Parsing server logs file for analysis and debugging is a typical
> requirement differently met with tools like log_fdw or capture server
> logs in CSV format for parsing or do text search and analyze etc.
>
> > This proposal has two main design questions:
> > ===================================
> >
> > 1. How do we store conflicting tuples from different tables?
> > Using a JSON column to store the row data seems like the most flexible
> > solution, as it can accommodate different table schemas.
>
> How good is storing conflicts on the table? Is it okay to generate WAL
> traffic?
>

Yesh, I think so. One would like to query conflicts and resolutions
for those conflicts at a later point to ensure consistency. BTW, if
you are worried about WAL traffic, please note conflicts shouldn't be
a very often event, so additional WAL should be okay. OTOH, if the
conflicts are frequent, anyway, the performance won't be that great as
that means there is a kind of ERROR which we have to deal by having
resolution for it.

> Is it okay to physically replicate this log table to all
> replicas?
>

Yes, that should be okay as we want the conflict_tables to be present
after failover.

 Is it okay to logically replicate this log table to all
> subscribers and logical decoding clients?
>

I think we should avoid this.

> How does this table get
> truncated? If truncation gets delayed, won't it unnecessarily fill up
> storage?
>

I think it should be users responsibility to clean this table as they
better know when the data in the table is obsolete. Eventually, we can
also have some policies via options or some other way to get it
truncated. IIRC, we also discussed having these as partition tables so
that it is easy to discard data. However, for initial version, we may
want something simpler.

> > 2. Should this be a system table or a user table?
> > a) System Table: Storing this in a system catalog is simple, but
> > catalogs aren't designed for ever-growing data. While pg_large_object
> > is an exception, this is not what we generally do IMHO.
> > b) User Table: This offers more flexibility. We could allow a user to
> > specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> > either create the table internally or let the user create the table
> > with a predefined schema.
>
> -1 for the system table for sure.
>
> > A potential drawback is that a user might drop or alter the table.
> > However, we could mitigate this risk by simply logging a WARNING if
> > the table is configured but an insertion fails.
> > I am currently working on a POC patch for the same, but will post that
> > once we have some thoughts on design choices.
>
> How about streaming the conflicts in fixed format to a separate log
> file other than regular postgres server log file?
>

I would prefer this info to be stored in tables as it would be easy to
query them. If we use separate LOGs then we should provide some views
to query the LOG.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Sep 11, 2025 at 8:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 11, 2025 at 12:53 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Aug 5, 2025 at 5:24 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > Currently we log conflicts to the server's log file and updates, this
> > > approach has limitations, 1) Difficult to query and analyze, parsing
> > > plain text log files for conflict details is inefficient. 2) Lack of
> > > structured data, key conflict attributes (table, operation, old/new
> > > data, LSN, etc.) are not readily available in a structured, queryable
> > > format. 3) Difficult for external monitoring tools or custom
> > > resolution scripts to consume conflict data directly.
> > >
> > > This proposal aims to address these limitations by introducing a
> > > conflict log history table, providing a structured, and queryable
> > > record of all logical replication conflicts.  This should be a
> > > configurable option whether to log into the conflict log history
> > > table, server logs or both.
> >
> > +1 for the overall idea. Having an option to separate out the
> > conflicts helps analyze the data correctness issues and understand the
> > behavior of conflicts.
> >
> > Parsing server logs file for analysis and debugging is a typical
> > requirement differently met with tools like log_fdw or capture server
> > logs in CSV format for parsing or do text search and analyze etc.
> >
> > > This proposal has two main design questions:
> > > ===================================
> > >
> > > 1. How do we store conflicting tuples from different tables?
> > > Using a JSON column to store the row data seems like the most flexible
> > > solution, as it can accommodate different table schemas.
> >
> > How good is storing conflicts on the table? Is it okay to generate WAL
> > traffic?
> >
>
> Yesh, I think so. One would like to query conflicts and resolutions
> for those conflicts at a later point to ensure consistency. BTW, if
> you are worried about WAL traffic, please note conflicts shouldn't be
> a very often event, so additional WAL should be okay. OTOH, if the
> conflicts are frequent, anyway, the performance won't be that great as
> that means there is a kind of ERROR which we have to deal by having
> resolution for it.
>
> > Is it okay to physically replicate this log table to all
> > replicas?
> >
>
> Yes, that should be okay as we want the conflict_tables to be present
> after failover.
>
>  Is it okay to logically replicate this log table to all
> > subscribers and logical decoding clients?
> >
>
> I think we should avoid this.
>
> > How does this table get
> > truncated? If truncation gets delayed, won't it unnecessarily fill up
> > storage?
> >
>
> I think it should be users responsibility to clean this table as they
> better know when the data in the table is obsolete. Eventually, we can
> also have some policies via options or some other way to get it
> truncated. IIRC, we also discussed having these as partition tables so
> that it is easy to discard data. However, for initial version, we may
> want something simpler.
>
> > > 2. Should this be a system table or a user table?
> > > a) System Table: Storing this in a system catalog is simple, but
> > > catalogs aren't designed for ever-growing data. While pg_large_object
> > > is an exception, this is not what we generally do IMHO.
> > > b) User Table: This offers more flexibility. We could allow a user to
> > > specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> > > either create the table internally or let the user create the table
> > > with a predefined schema.
> >
> > -1 for the system table for sure.
> >
> > > A potential drawback is that a user might drop or alter the table.
> > > However, we could mitigate this risk by simply logging a WARNING if
> > > the table is configured but an insertion fails.
> > > I am currently working on a POC patch for the same, but will post that
> > > once we have some thoughts on design choices.
> >
> > How about streaming the conflicts in fixed format to a separate log
> > file other than regular postgres server log file?
> >
>
> I would prefer this info to be stored in tables as it would be easy to
> query them. If we use separate LOGs then we should provide some views
> to query the LOG.

I was looking into another thread where we provide an error table for
COPY [1], it requires the user to pre-create the error table. And
inside the COPY command we will validate the table, validation in that
context is a one-time process checking for: (1) table existence, (2)
ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
matching column names and data types. This approach avoids concerns
about the user's DROP or ALTER permissions.

Our requirement for the logical replication conflict log table
differs, as we must validate the target table upon every conflict
insertion, not just at subscription creation. A more robust
alternative is to perform validation and acquire a lock on the
conflict table whenever the subscription worker starts. This prevents
modifications (like ALTER or DROP) while the worker is active. When
the worker gets restarted, we can re-validate the table and
automatically disable the conflict logging feature if validation
fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
option again.

And if we want in first version we can expect user to create the table
as per the expected schema and supply it, this will avoid the need of
handling how to avoid it from publishing as it will be user's
responsibility and then in top up patches we can also allow to create
the table internally if tables doesn't exist and then we can find out
solution to avoid it from being publish when ALL TABLES are published.

Thoughts?

[1] https://www.postgresql.org/message-id/CACJufxEo-rsH5v__S3guUhDdXjakC7m7N5wj%3DmOB5rPiySBoQg%40mail.gmail.com

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Bharath Rupireddy
Дата:
Hi,

On Wed, Sep 10, 2025 at 8:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > How about streaming the conflicts in fixed format to a separate log
> > file other than regular postgres server log file?
>
> I would prefer this info to be stored in tables as it would be easy to
> query them. If we use separate LOGs then we should provide some views
> to query the LOG.

Providing views to query the conflicts LOG is the easiest way than
having tables (Probably we must provide both - logging conflicts to
tables and separate LOG files). However, wanting the conflicts logs
after failovers is something that makes me think the table approach is
better. I'm open to more thoughts here.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Bharath Rupireddy
Дата:
Hi,

On Fri, Sep 12, 2025 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I was looking into another thread where we provide an error table for
> COPY [1], it requires the user to pre-create the error table. And
> inside the COPY command we will validate the table, validation in that
> context is a one-time process checking for: (1) table existence, (2)
> ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
> matching column names and data types. This approach avoids concerns
> about the user's DROP or ALTER permissions.
>
> Our requirement for the logical replication conflict log table
> differs, as we must validate the target table upon every conflict
> insertion, not just at subscription creation. A more robust
> alternative is to perform validation and acquire a lock on the
> conflict table whenever the subscription worker starts. This prevents
> modifications (like ALTER or DROP) while the worker is active. When
> the worker gets restarted, we can re-validate the table and
> automatically disable the conflict logging feature if validation
> fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
> option again.

Having to worry about ALTER/DROP and adding code to protect seems like
an overkill.

> And if we want in first version we can expect user to create the table
> as per the expected schema and supply it, this will avoid the need of
> handling how to avoid it from publishing as it will be user's
> responsibility and then in top up patches we can also allow to create
> the table internally if tables doesn't exist and then we can find out
> solution to avoid it from being publish when ALL TABLES are published.

This looks much more simple to start with.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sat, Sep 13, 2025 at 6:16 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks for the feedback Bharath

> On Fri, Sep 12, 2025 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I was looking into another thread where we provide an error table for
> > COPY [1], it requires the user to pre-create the error table. And
> > inside the COPY command we will validate the table, validation in that
> > context is a one-time process checking for: (1) table existence, (2)
> > ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
> > matching column names and data types. This approach avoids concerns
> > about the user's DROP or ALTER permissions.
> >
> > Our requirement for the logical replication conflict log table
> > differs, as we must validate the target table upon every conflict
> > insertion, not just at subscription creation. A more robust
> > alternative is to perform validation and acquire a lock on the
> > conflict table whenever the subscription worker starts. This prevents
> > modifications (like ALTER or DROP) while the worker is active. When
> > the worker gets restarted, we can re-validate the table and
> > automatically disable the conflict logging feature if validation
> > fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
> > option again.
>
> Having to worry about ALTER/DROP and adding code to protect seems like
> an overkill.

IMHO eventually if we can control that I feel this is a good goal to
have.  So that we can avoid failure during conflict insertion.  We may
argue its user's responsibility to not alter the table and we can just
check the validity during create/alter subscription.

> > And if we want in first version we can expect user to create the table
> > as per the expected schema and supply it, this will avoid the need of
> > handling how to avoid it from publishing as it will be user's
> > responsibility and then in top up patches we can also allow to create
> > the table internally if tables doesn't exist and then we can find out
> > solution to avoid it from being publish when ALL TABLES are published.
>
> This looks much more simple to start with.

Right.

PFA, attached WIP patches, 0001 allow user created tables to provide
as input for conflict history tables and we will validate the table
during create/alter subscription.  0002 add an option to internally
create the table if it does not exist.

TODO:
- Still patches are WIP and need more work testing for different failure cases
- Need to explore an option to create a built-in type (I will start a
separate thread for the same)
- Need to add test cases
- Need to explore options to avoid getting published, but maybe we
only need to avoid this when we internally create the table?

Here is some basic test I tried:

psql -d postgres -c "CREATE TABLE test(a int, b int, primary key(a));"
psql -d postgres -p 5433 -c "CREATE SCHEMA myschema"
psql -d postgres -p 5433 -c "CREATE TABLE test(a int, b int, primary key(a));"
psql -d postgres -p 5433 -c "GRANT INSERT, UPDATE, SELECT, DELETE ON
test TO dk "
psql -d postgres -c "CREATE PUBLICATION pub FOR ALL TABLES ;"

psql -d postgres -p 5433 -c "CREATE SUBSCRIPTION sub CONNECTION
'dbname=postgres port=5432' PUBLICATION pub
WITH(conflict_log_table=myschema.conflict_log_history)";
psql -d postgres -p 5432 -c "INSERT INTO test VALUES(1,2);"
psql -d postgres -p 5433 -c "UPDATE test SET b=10 WHERE a=1;"
psql -d postgres -p 5432 -c "UPDATE test SET b=20 WHERE a=1;"

postgres[1202034]=# select * from myschema.conflict_log_history ;
-[ RECORD 1 ]-----+------------------------------
relid             | 16385
local_xid         | 763
remote_xid        | 757
local_lsn         | 0/00000000
remote_commit_lsn | 0/0174AB30
local_commit_ts   | 2025-09-14 06:45:00.828874+00
remote_commit_ts  | 2025-09-14 06:45:05.845614+00
table_schema      | public
table_name        | test
conflict_type     | update_origin_differs
local_origin      |
remote_origin     | pg_16396
key_tuple         | {"a":1,"b":20}
local_tuple       | {"a":1,"b":10}
remote_tuple      | {"a":1,"b":20}


--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Sun, Sep 14, 2025 at 12:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Sep 13, 2025 at 6:16 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Thanks for the feedback Bharath
>
> > On Fri, Sep 12, 2025 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > I was looking into another thread where we provide an error table for
> > > COPY [1], it requires the user to pre-create the error table. And
> > > inside the COPY command we will validate the table, validation in that
> > > context is a one-time process checking for: (1) table existence, (2)
> > > ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
> > > matching column names and data types. This approach avoids concerns
> > > about the user's DROP or ALTER permissions.
> > >
> > > Our requirement for the logical replication conflict log table
> > > differs, as we must validate the target table upon every conflict
> > > insertion, not just at subscription creation. A more robust
> > > alternative is to perform validation and acquire a lock on the
> > > conflict table whenever the subscription worker starts. This prevents
> > > modifications (like ALTER or DROP) while the worker is active. When
> > > the worker gets restarted, we can re-validate the table and
> > > automatically disable the conflict logging feature if validation
> > > fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
> > > option again.
> >
> > Having to worry about ALTER/DROP and adding code to protect seems like
> > an overkill.
>
> IMHO eventually if we can control that I feel this is a good goal to
> have.  So that we can avoid failure during conflict insertion.  We may
> argue its user's responsibility to not alter the table and we can just
> check the validity during create/alter subscription.
>

If we compare conflict_history_table with the slot that gets created
with subscription, one can say the same thing about slots. Users can
drop the slots and whole replication will stop. I think this table
will be created with the same privileges as the owner of a
subscription which can be either a superuser or a user with the
privileges of the pg_create_subscription role, so we can rely on such
users.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Sep 18, 2025 at 2:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Sep 14, 2025 at 12:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Sat, Sep 13, 2025 at 6:16 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Thanks for the feedback Bharath
> >
> > > On Fri, Sep 12, 2025 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I was looking into another thread where we provide an error table for
> > > > COPY [1], it requires the user to pre-create the error table. And
> > > > inside the COPY command we will validate the table, validation in that
> > > > context is a one-time process checking for: (1) table existence, (2)
> > > > ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
> > > > matching column names and data types. This approach avoids concerns
> > > > about the user's DROP or ALTER permissions.
> > > >
> > > > Our requirement for the logical replication conflict log table
> > > > differs, as we must validate the target table upon every conflict
> > > > insertion, not just at subscription creation. A more robust
> > > > alternative is to perform validation and acquire a lock on the
> > > > conflict table whenever the subscription worker starts. This prevents
> > > > modifications (like ALTER or DROP) while the worker is active. When
> > > > the worker gets restarted, we can re-validate the table and
> > > > automatically disable the conflict logging feature if validation
> > > > fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
> > > > option again.
> > >
> > > Having to worry about ALTER/DROP and adding code to protect seems like
> > > an overkill.
> >
> > IMHO eventually if we can control that I feel this is a good goal to
> > have.  So that we can avoid failure during conflict insertion.  We may
> > argue its user's responsibility to not alter the table and we can just
> > check the validity during create/alter subscription.
> >
>
> If we compare conflict_history_table with the slot that gets created
> with subscription, one can say the same thing about slots. Users can
> drop the slots and whole replication will stop. I think this table
> will be created with the same privileges as the owner of a
> subscription which can be either a superuser or a user with the
> privileges of the pg_create_subscription role, so we can rely on such
> users.

Yeah that's a valid point.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Masahiko Sawada
Дата:
On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Sep 14, 2025 at 12:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Sat, Sep 13, 2025 at 6:16 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Thanks for the feedback Bharath
> >
> > > On Fri, Sep 12, 2025 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I was looking into another thread where we provide an error table for
> > > > COPY [1], it requires the user to pre-create the error table. And
> > > > inside the COPY command we will validate the table, validation in that
> > > > context is a one-time process checking for: (1) table existence, (2)
> > > > ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
> > > > matching column names and data types. This approach avoids concerns
> > > > about the user's DROP or ALTER permissions.
> > > >
> > > > Our requirement for the logical replication conflict log table
> > > > differs, as we must validate the target table upon every conflict
> > > > insertion, not just at subscription creation. A more robust
> > > > alternative is to perform validation and acquire a lock on the
> > > > conflict table whenever the subscription worker starts. This prevents
> > > > modifications (like ALTER or DROP) while the worker is active. When
> > > > the worker gets restarted, we can re-validate the table and
> > > > automatically disable the conflict logging feature if validation
> > > > fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
> > > > option again.
> > >
> > > Having to worry about ALTER/DROP and adding code to protect seems like
> > > an overkill.
> >
> > IMHO eventually if we can control that I feel this is a good goal to
> > have.  So that we can avoid failure during conflict insertion.  We may
> > argue its user's responsibility to not alter the table and we can just
> > check the validity during create/alter subscription.
> >
>
> If we compare conflict_history_table with the slot that gets created
> with subscription, one can say the same thing about slots. Users can
> drop the slots and whole replication will stop. I think this table
> will be created with the same privileges as the owner of a
> subscription which can be either a superuser or a user with the
> privileges of the pg_create_subscription role, so we can rely on such
> users.

We might want to consider which role inserts the conflict info into
the history table. For example, if any table created by a user can be
used as the history table for a subscription and the conflict info
insertion is performed by the subscription owner, we would end up
having the same security issue that was addressed by the run_as_owner
subscription option.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Sep 18, 2025 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > If we compare conflict_history_table with the slot that gets created
> > with subscription, one can say the same thing about slots. Users can
> > drop the slots and whole replication will stop. I think this table
> > will be created with the same privileges as the owner of a
> > subscription which can be either a superuser or a user with the
> > privileges of the pg_create_subscription role, so we can rely on such
> > users.
>
> We might want to consider which role inserts the conflict info into
> the history table. For example, if any table created by a user can be
> used as the history table for a subscription and the conflict info
> insertion is performed by the subscription owner, we would end up
> having the same security issue that was addressed by the run_as_owner
> subscription option.
>

Yeah, I don't think we want to open that door. For user created
tables, we should perform actions with table_owner's privilege. In
such a case, if one wants to create a subscription with run_as_owner
option, she should give DML operation permissions to the subscription
owner. OTOH, if we create this table internally (via subscription
owner) then irrespective of run_as_owner, we will always insert as
subscription_owner.

AFAIR, one open point for internally created tables is whether we
should skip changes to conflict_history table while replicating
changes? The table will be considered under for ALL TABLES
publications, if defined? Ideally, these should behave as catalog
tables, so one option is to mark them as 'user_catalog_table', or the
other option is we have some hard-code checks during replication. The
first option has the advantage that it won't write additional WAL for
these tables which is otherwise required under wal_level=logical. What
other options do we have?

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Masahiko Sawada
Дата:
On Sat, Sep 20, 2025 at 4:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 18, 2025 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > If we compare conflict_history_table with the slot that gets created
> > > with subscription, one can say the same thing about slots. Users can
> > > drop the slots and whole replication will stop. I think this table
> > > will be created with the same privileges as the owner of a
> > > subscription which can be either a superuser or a user with the
> > > privileges of the pg_create_subscription role, so we can rely on such
> > > users.
> >
> > We might want to consider which role inserts the conflict info into
> > the history table. For example, if any table created by a user can be
> > used as the history table for a subscription and the conflict info
> > insertion is performed by the subscription owner, we would end up
> > having the same security issue that was addressed by the run_as_owner
> > subscription option.
> >
>
> Yeah, I don't think we want to open that door. For user created
> tables, we should perform actions with table_owner's privilege. In
> such a case, if one wants to create a subscription with run_as_owner
> option, she should give DML operation permissions to the subscription
> owner. OTOH, if we create this table internally (via subscription
> owner) then irrespective of run_as_owner, we will always insert as
> subscription_owner.

Agreed.

>
> AFAIR, one open point for internally created tables is whether we
> should skip changes to conflict_history table while replicating
> changes? The table will be considered under for ALL TABLES
> publications, if defined? Ideally, these should behave as catalog
> tables, so one option is to mark them as 'user_catalog_table', or the
> other option is we have some hard-code checks during replication. The
> first option has the advantage that it won't write additional WAL for
> these tables which is otherwise required under wal_level=logical. What
> other options do we have?

I think conflict history information is subscriber local information
so doesn't have to be replicated to another subscriber. Also it could
be problematic in cross-major-version replication cases if we break
the compatibility of history table definition. I would expect that the
history table works as a catalog table in terms of logical
decoding/replication. It would probably make sense to reuse the
user_catalog_table option for that purpose. If we have a history table
for each subscription that wants to record the conflict history (I
believe so), it would be hard to go with the second option (having
hard-code checks).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Tue, Sep 23, 2025 at 11:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Sep 20, 2025 at 4:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > AFAIR, one open point for internally created tables is whether we
> > should skip changes to conflict_history table while replicating
> > changes? The table will be considered under for ALL TABLES
> > publications, if defined? Ideally, these should behave as catalog
> > tables, so one option is to mark them as 'user_catalog_table', or the
> > other option is we have some hard-code checks during replication. The
> > first option has the advantage that it won't write additional WAL for
> > these tables which is otherwise required under wal_level=logical. What
> > other options do we have?
>
> I think conflict history information is subscriber local information
> so doesn't have to be replicated to another subscriber. Also it could
> be problematic in cross-major-version replication cases if we break
> the compatibility of history table definition.
>

Right, this is another reason not to replicate it.

> I would expect that the
> history table works as a catalog table in terms of logical
> decoding/replication. It would probably make sense to reuse the
> user_catalog_table option for that purpose. If we have a history table
> for each subscription that wants to record the conflict history (I
> believe so), it would be hard to go with the second option (having
> hard-code checks).
>

Agreed. Let's wait and see what Dilip or others have to say on this.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Sep 23, 2025 at 11:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Sep 20, 2025 at 4:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > If we compare conflict_history_table with the slot that gets created
> > > > with subscription, one can say the same thing about slots. Users can
> > > > drop the slots and whole replication will stop. I think this table
> > > > will be created with the same privileges as the owner of a
> > > > subscription which can be either a superuser or a user with the
> > > > privileges of the pg_create_subscription role, so we can rely on such
> > > > users.
> > >
> > > We might want to consider which role inserts the conflict info into
> > > the history table. For example, if any table created by a user can be
> > > used as the history table for a subscription and the conflict info
> > > insertion is performed by the subscription owner, we would end up
> > > having the same security issue that was addressed by the run_as_owner
> > > subscription option.
> > >
> >
> > Yeah, I don't think we want to open that door. For user created
> > tables, we should perform actions with table_owner's privilege. In
> > such a case, if one wants to create a subscription with run_as_owner
> > option, she should give DML operation permissions to the subscription
> > owner. OTOH, if we create this table internally (via subscription
> > owner) then irrespective of run_as_owner, we will always insert as
> > subscription_owner.
>
> Agreed.

Yeah that makes sense to me as well.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Sep 24, 2025 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Sep 23, 2025 at 11:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sat, Sep 20, 2025 at 4:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > AFAIR, one open point for internally created tables is whether we
> > > should skip changes to conflict_history table while replicating
> > > changes? The table will be considered under for ALL TABLES
> > > publications, if defined? Ideally, these should behave as catalog
> > > tables, so one option is to mark them as 'user_catalog_table', or the
> > > other option is we have some hard-code checks during replication. The
> > > first option has the advantage that it won't write additional WAL for
> > > these tables which is otherwise required under wal_level=logical. What
> > > other options do we have?
> >
> > I think conflict history information is subscriber local information
> > so doesn't have to be replicated to another subscriber. Also it could
> > be problematic in cross-major-version replication cases if we break
> > the compatibility of history table definition.
> >
>
> Right, this is another reason not to replicate it.
>
> > I would expect that the
> > history table works as a catalog table in terms of logical
> > decoding/replication. It would probably make sense to reuse the
> > user_catalog_table option for that purpose. If we have a history table
> > for each subscription that wants to record the conflict history (I
> > believe so), it would be hard to go with the second option (having
> > hard-code checks).
> >
>
> Agreed. Let's wait and see what Dilip or others have to say on this.

Yeah I think this makes sense to create as 'user_catalog_table' tables
when we internally create them.  However, IMHO when a user provides
its own table, I believe we should not enforce the restriction for
that table to be created as a 'user_catalog_table' table, or do you
think we should enforce that property?

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Masahiko Sawada
Дата:
On Wed, Sep 24, 2025 at 4:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Sep 24, 2025 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Sep 23, 2025 at 11:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Sat, Sep 20, 2025 at 4:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > >
> > > > AFAIR, one open point for internally created tables is whether we
> > > > should skip changes to conflict_history table while replicating
> > > > changes? The table will be considered under for ALL TABLES
> > > > publications, if defined? Ideally, these should behave as catalog
> > > > tables, so one option is to mark them as 'user_catalog_table', or the
> > > > other option is we have some hard-code checks during replication. The
> > > > first option has the advantage that it won't write additional WAL for
> > > > these tables which is otherwise required under wal_level=logical. What
> > > > other options do we have?
> > >
> > > I think conflict history information is subscriber local information
> > > so doesn't have to be replicated to another subscriber. Also it could
> > > be problematic in cross-major-version replication cases if we break
> > > the compatibility of history table definition.
> > >
> >
> > Right, this is another reason not to replicate it.
> >
> > > I would expect that the
> > > history table works as a catalog table in terms of logical
> > > decoding/replication. It would probably make sense to reuse the
> > > user_catalog_table option for that purpose. If we have a history table
> > > for each subscription that wants to record the conflict history (I
> > > believe so), it would be hard to go with the second option (having
> > > hard-code checks).
> > >
> >
> > Agreed. Let's wait and see what Dilip or others have to say on this.
>
> Yeah I think this makes sense to create as 'user_catalog_table' tables
> when we internally create them.  However, IMHO when a user provides
> its own table, I believe we should not enforce the restriction for
> that table to be created as a 'user_catalog_table' table, or do you
> think we should enforce that property?

I find that's a user's responsibility, so I would not enforce that
property for user-provided-tables.

BTW what is the main use case for supporting the use of user-provided
tables for the history table? I think we basically don't want the
history table to be updated by any other processes than apply workers,
so it would make more sense that such a table is created internally
and tied to the subscription. I'm less convinced that it has enough
upside to warrant the complexity.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sat, Sep 20, 2025 at 5:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 18, 2025 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > If we compare conflict_history_table with the slot that gets created
> > > with subscription, one can say the same thing about slots. Users can
> > > drop the slots and whole replication will stop. I think this table
> > > will be created with the same privileges as the owner of a
> > > subscription which can be either a superuser or a user with the
> > > privileges of the pg_create_subscription role, so we can rely on such
> > > users.
> >
> > We might want to consider which role inserts the conflict info into
> > the history table. For example, if any table created by a user can be
> > used as the history table for a subscription and the conflict info
> > insertion is performed by the subscription owner, we would end up
> > having the same security issue that was addressed by the run_as_owner
> > subscription option.
> >
>
> Yeah, I don't think we want to open that door. For user created
> tables, we should perform actions with table_owner's privilege. In
> such a case, if one wants to create a subscription with run_as_owner
> option, she should give DML operation permissions to the subscription
> owner. OTOH, if we create this table internally (via subscription
> owner) then irrespective of run_as_owner, we will always insert as
> subscription_owner.
>
> AFAIR, one open point for internally created tables is whether we
> should skip changes to conflict_history table while replicating
> changes? The table will be considered under for ALL TABLES
> publications, if defined? Ideally, these should behave as catalog
> tables, so one option is to mark them as 'user_catalog_table', or the
> other option is we have some hard-code checks during replication. The
> first option has the advantage that it won't write additional WAL for
> these tables which is otherwise required under wal_level=logical. What
> other options do we have?

I was doing more analysis and testing for 'use_catalog_table', so what
I found is when a table is marked as  'use_catalog_table', it will log
extra information i.e. CID[1] so that these tables can be used for
scanning as well during decoding like catalog tables using historical
snapshot.  And I have checked the code and tested as well
'use_catalog_table' does get streamed with ALL TABLE options.  Am I
missing something or are we thinking of changing the behavior of
use_catalog_table so that they do not get decoded, but I think that
will change the existing behaviour so might not be a good option, yet
another idea is to invent some other option for which purpose called
'conflict_history_purpose' but maybe that doesn't justify the purpose
of the new option IMHO.

[1]
/*
* For logical decode we need combo CIDs to properly decode the
* catalog
*/
if (RelationIsAccessibleInLogicalDecoding(relation))
log_heap_new_cid(relation, &tp);


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Sep 25, 2025 at 11:09 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Sep 20, 2025 at 5:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > If we compare conflict_history_table with the slot that gets created
> > > > with subscription, one can say the same thing about slots. Users can
> > > > drop the slots and whole replication will stop. I think this table
> > > > will be created with the same privileges as the owner of a
> > > > subscription which can be either a superuser or a user with the
> > > > privileges of the pg_create_subscription role, so we can rely on such
> > > > users.
> > >
> > > We might want to consider which role inserts the conflict info into
> > > the history table. For example, if any table created by a user can be
> > > used as the history table for a subscription and the conflict info
> > > insertion is performed by the subscription owner, we would end up
> > > having the same security issue that was addressed by the run_as_owner
> > > subscription option.
> > >
> >
> > Yeah, I don't think we want to open that door. For user created
> > tables, we should perform actions with table_owner's privilege. In
> > such a case, if one wants to create a subscription with run_as_owner
> > option, she should give DML operation permissions to the subscription
> > owner. OTOH, if we create this table internally (via subscription
> > owner) then irrespective of run_as_owner, we will always insert as
> > subscription_owner.
> >
> > AFAIR, one open point for internally created tables is whether we
> > should skip changes to conflict_history table while replicating
> > changes? The table will be considered under for ALL TABLES
> > publications, if defined? Ideally, these should behave as catalog
> > tables, so one option is to mark them as 'user_catalog_table', or the
> > other option is we have some hard-code checks during replication. The
> > first option has the advantage that it won't write additional WAL for
> > these tables which is otherwise required under wal_level=logical. What
> > other options do we have?
>
> I was doing more analysis and testing for 'use_catalog_table', so what
> I found is when a table is marked as  'use_catalog_table', it will log
> extra information i.e. CID[1] so that these tables can be used for
> scanning as well during decoding like catalog tables using historical
> snapshot.  And I have checked the code and tested as well
> 'use_catalog_table' does get streamed with ALL TABLE options.  Am I
> missing something or are we thinking of changing the behavior of
> use_catalog_table so that they do not get decoded, but I think that
> will change the existing behaviour so might not be a good option, yet
> another idea is to invent some other option for which purpose called
> 'conflict_history_purpose' but maybe that doesn't justify the purpose
> of the new option IMHO.
>
> [1]
> /*
> * For logical decode we need combo CIDs to properly decode the
> * catalog
> */
> if (RelationIsAccessibleInLogicalDecoding(relation))
> log_heap_new_cid(relation, &tp);
>

Meanwhile I am also exploring the option where we can just CREATE TYPE
in initialize_data_directory() during initdb, basically we will create
this type in template1 so that it will be available in all the
databases, and that would simplify the table creation whether we
create internally or we allow user to create it.  And while checking
is_publishable_class we can check the type and avoid publishing those
tables.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Sep 25, 2025 at 11:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > [1]
> > /*
> > * For logical decode we need combo CIDs to properly decode the
> > * catalog
> > */
> > if (RelationIsAccessibleInLogicalDecoding(relation))
> > log_heap_new_cid(relation, &tp);
> >
>
> Meanwhile I am also exploring the option where we can just CREATE TYPE
> in initialize_data_directory() during initdb, basically we will create
> this type in template1 so that it will be available in all the
> databases, and that would simplify the table creation whether we
> create internally or we allow user to create it.  And while checking
> is_publishable_class we can check the type and avoid publishing those
> tables.
>

Based on my off list discussion with Amit, one option could be to set
HEAP_INSERT_NO_LOGICAL option while inserting tuple into conflict
history table, for that we can not use SPI interface to insert instead
we will have to directly call the heap_insert() to add this option.
Since we do not want to create any trigger etc on this table, direct
insert should be fine, but if we plan to create this table as
partitioned table in future then direct heap insert might not work.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Sep 25, 2025 at 4:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 25, 2025 at 11:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > > [1]
> > > /*
> > > * For logical decode we need combo CIDs to properly decode the
> > > * catalog
> > > */
> > > if (RelationIsAccessibleInLogicalDecoding(relation))
> > > log_heap_new_cid(relation, &tp);
> > >
> >
> > Meanwhile I am also exploring the option where we can just CREATE TYPE
> > in initialize_data_directory() during initdb, basically we will create
> > this type in template1 so that it will be available in all the
> > databases, and that would simplify the table creation whether we
> > create internally or we allow user to create it.  And while checking
> > is_publishable_class we can check the type and avoid publishing those
> > tables.
> >
>
> Based on my off list discussion with Amit, one option could be to set
> HEAP_INSERT_NO_LOGICAL option while inserting tuple into conflict
> history table, for that we can not use SPI interface to insert instead
> we will have to directly call the heap_insert() to add this option.
> Since we do not want to create any trigger etc on this table, direct
> insert should be fine, but if we plan to create this table as
> partitioned table in future then direct heap insert might not work.

Upon further reflection, I realized that while this approach avoids
streaming inserts to the conflict log history table, it still requires
that table to exist on the subscriber node upon subscription creation,
which isn't ideal.

We have two main options to address this:

Option1:
When calling pg_get_publication_tables(), if the 'alltables' option is
used, we can scan all subscriptions and explicitly ignore (filter out)
all conflict history tables.  This will not be very costly as this
will scan the subscriber when pg_get_publication_tables() is called,
which is only called during create subscription/alter subscription on
the remote node.

Option2:
Alternatively, we could introduce a table creation option, like a
'non-publishable' flag, to prevent a table from being streamed
entirely. I believe this would be a valuable, independent feature for
users who want to create certain tables without including them in
logical replication.

I prefer option2, as I feel this can add value independent of this patch.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 25, 2025 at 4:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Sep 25, 2025 at 11:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > > [1]
> > > > /*
> > > > * For logical decode we need combo CIDs to properly decode the
> > > > * catalog
> > > > */
> > > > if (RelationIsAccessibleInLogicalDecoding(relation))
> > > > log_heap_new_cid(relation, &tp);
> > > >
> > >
> > > Meanwhile I am also exploring the option where we can just CREATE TYPE
> > > in initialize_data_directory() during initdb, basically we will create
> > > this type in template1 so that it will be available in all the
> > > databases, and that would simplify the table creation whether we
> > > create internally or we allow user to create it.  And while checking
> > > is_publishable_class we can check the type and avoid publishing those
> > > tables.
> > >
> >
> > Based on my off list discussion with Amit, one option could be to set
> > HEAP_INSERT_NO_LOGICAL option while inserting tuple into conflict
> > history table, for that we can not use SPI interface to insert instead
> > we will have to directly call the heap_insert() to add this option.
> > Since we do not want to create any trigger etc on this table, direct
> > insert should be fine, but if we plan to create this table as
> > partitioned table in future then direct heap insert might not work.
>
> Upon further reflection, I realized that while this approach avoids
> streaming inserts to the conflict log history table, it still requires
> that table to exist on the subscriber node upon subscription creation,
> which isn't ideal.
>

I am not able to understand what exact problem you are seeing here. I
was thinking that during the CREATE SUBSCRIPTION command, a new table
with user provided name will be created similar to how we create a
slot. The difference would be that we create a slot on the
remote/publisher node but this table will be created locally.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sat, Sep 27, 2025 at 8:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I am not able to understand what exact problem you are seeing here. I
> was thinking that during the CREATE SUBSCRIPTION command, a new table
> with user provided name will be created similar to how we create a
> slot. The difference would be that we create a slot on the
> remote/publisher node but this table will be created locally.
>
That's not an issue, the problem here we are discussing is the
conflict history table which is created on the subscriber node should
not be published when this node subscription node create another
publisher with ALL TABLE option.  So we found a option for inserting
into this table with HEAP_INSERT_NO_LOGICAL flag so that those insert
will not be decoded, but what about another not subscribing from this
publisher, they should have this table because when ALL TABLES are
published subscriber node expect all user table to present there even
if its changes are not published.  Consider below example

Node1:
CREATE PUBLICATION pub_node1..

Node2:
CREATE SUBSCRIPTION sub.. PUBLICATION pub_node1
WITH(conflict_history_table='my_conflict_table');
CREATE PUBLICATION pub_node2 FOR ALL TABLE;

Node3:
CREATE SUBSCRIPTION sub1.. PUBLICATION pub_node2; --this will expect
'my_conflict_table' to exist here because when it will call
pg_get_publication_tables() from Node2 it will also get the
'my_conflict_table' along with other user tables.

And as a solution I wanted to avoid this table to be avoided when
pg_get_publication_tables() is being called.
Option1: We can see if table name is listed as conflict history table
in any of the subscribers on Node2 we will ignore this.
Option2: Provide a new table option to mark table as non publishable
table when ALL TABLE option is provided, I think this option can be
useful independently as well.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Sat, Sep 27, 2025 at 9:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Sep 27, 2025 at 8:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I am not able to understand what exact problem you are seeing here. I
> > was thinking that during the CREATE SUBSCRIPTION command, a new table
> > with user provided name will be created similar to how we create a
> > slot. The difference would be that we create a slot on the
> > remote/publisher node but this table will be created locally.
> >
> That's not an issue, the problem here we are discussing is the
> conflict history table which is created on the subscriber node should
> not be published when this node subscription node create another
> publisher with ALL TABLE option.  So we found a option for inserting
> into this table with HEAP_INSERT_NO_LOGICAL flag so that those insert
> will not be decoded, but what about another not subscribing from this
> publisher, they should have this table because when ALL TABLES are
> published subscriber node expect all user table to present there even
> if its changes are not published.  Consider below example
>
> Node1:
> CREATE PUBLICATION pub_node1..
>
> Node2:
> CREATE SUBSCRIPTION sub.. PUBLICATION pub_node1
> WITH(conflict_history_table='my_conflict_table');
> CREATE PUBLICATION pub_node2 FOR ALL TABLE;
>
> Node3:
> CREATE SUBSCRIPTION sub1.. PUBLICATION pub_node2; --this will expect
> 'my_conflict_table' to exist here because when it will call
> pg_get_publication_tables() from Node2 it will also get the
> 'my_conflict_table' along with other user tables.
>
> And as a solution I wanted to avoid this table to be avoided when
> pg_get_publication_tables() is being called.
> Option1: We can see if table name is listed as conflict history table
> in any of the subscribers on Node2 we will ignore this.
> Option2: Provide a new table option to mark table as non publishable
> table when ALL TABLE option is provided, I think this option can be
> useful independently as well.
>

I agree that option-2 is useful and IIUC, we are already working on
something similar in thread [1]. However, it is better to use option-1
here because we are using non-user specified mechanism to skip changes
during replication, so following the same during other times is
preferable. Once we have that other feature [1], we can probably
optimize this code to use it without taking input from the user. The
other reason of not going with the option-2 in the way you are
proposing is that it doesn't seem like a good idea to have multiple
ways to specify skipping tables from publishing. I find the approach
being discussed in thread [1] a generic and better than a new
table-level option.

[1] - https://www.postgresql.org/message-id/CANhcyEVt2CBnG7MOktaPPV4rYapHR-VHe5%3DqoziTZh1L9SVc6w%40mail.gmail.com
--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sun, Sep 28, 2025 at 2:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>

> I agree that option-2 is useful and IIUC, we are already working on
> something similar in thread [1]. However, it is better to use option-1
> here because we are using non-user specified mechanism to skip changes
> during replication, so following the same during other times is
> preferable. Once we have that other feature [1], we can probably
> optimize this code to use it without taking input from the user. The
> other reason of not going with the option-2 in the way you are
> proposing is that it doesn't seem like a good idea to have multiple
> ways to specify skipping tables from publishing. I find the approach
> being discussed in thread [1] a generic and better than a new
> table-level option.
>
> [1] - https://www.postgresql.org/message-id/CANhcyEVt2CBnG7MOktaPPV4rYapHR-VHe5%3DqoziTZh1L9SVc6w%40mail.gmail.com

I understand the current discussion revolves around using an EXCEPT
clause (for tables/schemas/columns) during publication creation.  But
what we want is to mark some table which will be excluded permanently
from publication, because we can not expect users to explicitly
exclude them while creating publication.

So, I propose we add a "non-publishable" property to tables
themselves. This is a more valuable option for users who are certain
that certain tables should never be replicated.

By marking a table as non-publishable, we save users the effort of
repeatedly listing it in the EXCEPT option for every new publication.
Both methods have merit, but the proposed table property addresses the
need for a permanent, system-wide exclusion.

See below test with a quick hack, what I am referring to.

postgres[2730657]=# CREATE TABLE test(a int) WITH
(NON_PUBLISHABLE_TABLE = true);
CREATE TABLE
postgres[2730657]=# CREATE PUBLICATION pub FOR ALL TABLES ;
CREATE PUBLICATION
postgres[2730657]=# select pg_get_publication_tables('pub');
 pg_get_publication_tables
---------------------------
(0 rows)


But I agree this is an additional table option which might need
consensus, so meanwhile we can proceed with option2, I will prepare
patches with option-2 and as a add on patch I will propose option-1.
And this option-1 patch can be discussed in a separate thread as well.

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sun, Sep 28, 2025 at 5:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sun, Sep 28, 2025 at 2:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > I agree that option-2 is useful and IIUC, we are already working on
> > something similar in thread [1]. However, it is better to use option-1
> > here because we are using non-user specified mechanism to skip changes
> > during replication, so following the same during other times is
> > preferable. Once we have that other feature [1], we can probably
> > optimize this code to use it without taking input from the user. The
> > other reason of not going with the option-2 in the way you are
> > proposing is that it doesn't seem like a good idea to have multiple
> > ways to specify skipping tables from publishing. I find the approach
> > being discussed in thread [1] a generic and better than a new
> > table-level option.
> >
> > [1] - https://www.postgresql.org/message-id/CANhcyEVt2CBnG7MOktaPPV4rYapHR-VHe5%3DqoziTZh1L9SVc6w%40mail.gmail.com
>
> I understand the current discussion revolves around using an EXCEPT
> clause (for tables/schemas/columns) during publication creation.  But
> what we want is to mark some table which will be excluded permanently
> from publication, because we can not expect users to explicitly
> exclude them while creating publication.
>
> So, I propose we add a "non-publishable" property to tables
> themselves. This is a more valuable option for users who are certain
> that certain tables should never be replicated.
>
> By marking a table as non-publishable, we save users the effort of
> repeatedly listing it in the EXCEPT option for every new publication.
> Both methods have merit, but the proposed table property addresses the
> need for a permanent, system-wide exclusion.
>
> See below test with a quick hack, what I am referring to.
>
> postgres[2730657]=# CREATE TABLE test(a int) WITH
> (NON_PUBLISHABLE_TABLE = true);
> CREATE TABLE
> postgres[2730657]=# CREATE PUBLICATION pub FOR ALL TABLES ;
> CREATE PUBLICATION
> postgres[2730657]=# select pg_get_publication_tables('pub');
>  pg_get_publication_tables
> ---------------------------
> (0 rows)
>
>
> But I agree this is an additional table option which might need
> consensus, so meanwhile we can proceed with option2, I will prepare
> patches with option-2 and as a add on patch I will propose option-1.
> And this option-1 patch can be discussed in a separate thread as well.

So here is the patch set using option-2, with this when alltable
option is used and we get pg_get_publication_tables(), this will check
the relid against the conflict history tables in the subscribers and
those tables will not be added to the list.  I will start a separate
thread for proposing the patch I sent in previous email.

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Mon, Sep 29, 2025 at 3:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sun, Sep 28, 2025 at 5:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Sun, Sep 28, 2025 at 2:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > > I agree that option-2 is useful and IIUC, we are already working on
> > > something similar in thread [1]. However, it is better to use option-1
> > > here because we are using non-user specified mechanism to skip changes
> > > during replication, so following the same during other times is
> > > preferable. Once we have that other feature [1], we can probably
> > > optimize this code to use it without taking input from the user. The
> > > other reason of not going with the option-2 in the way you are
> > > proposing is that it doesn't seem like a good idea to have multiple
> > > ways to specify skipping tables from publishing. I find the approach
> > > being discussed in thread [1] a generic and better than a new
> > > table-level option.
> > >
> > > [1] -
https://www.postgresql.org/message-id/CANhcyEVt2CBnG7MOktaPPV4rYapHR-VHe5%3DqoziTZh1L9SVc6w%40mail.gmail.com
> >
> > I understand the current discussion revolves around using an EXCEPT
> > clause (for tables/schemas/columns) during publication creation.  But
> > what we want is to mark some table which will be excluded permanently
> > from publication, because we can not expect users to explicitly
> > exclude them while creating publication.
> >
> > So, I propose we add a "non-publishable" property to tables
> > themselves. This is a more valuable option for users who are certain
> > that certain tables should never be replicated.
> >
> > By marking a table as non-publishable, we save users the effort of
> > repeatedly listing it in the EXCEPT option for every new publication.
> > Both methods have merit, but the proposed table property addresses the
> > need for a permanent, system-wide exclusion.
> >
> > See below test with a quick hack, what I am referring to.
> >
> > postgres[2730657]=# CREATE TABLE test(a int) WITH
> > (NON_PUBLISHABLE_TABLE = true);
> > CREATE TABLE
> > postgres[2730657]=# CREATE PUBLICATION pub FOR ALL TABLES ;
> > CREATE PUBLICATION
> > postgres[2730657]=# select pg_get_publication_tables('pub');
> >  pg_get_publication_tables
> > ---------------------------
> > (0 rows)
> >
> >
> > But I agree this is an additional table option which might need
> > consensus, so meanwhile we can proceed with option2, I will prepare
> > patches with option-2 and as a add on patch I will propose option-1.
> > And this option-1 patch can be discussed in a separate thread as well.
>
> So here is the patch set using option-2, with this when alltable
> option is used and we get pg_get_publication_tables(), this will check
> the relid against the conflict history tables in the subscribers and
> those tables will not be added to the list.  I will start a separate
> thread for proposing the patch I sent in previous email.
>

I have started going through this thread. Is it possible to rebase the
patches and post?

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Nov 11, 2025 at 3:49 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Sep 29, 2025 at 3:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
>
> I have started going through this thread. Is it possible to rebase the
> patches and post?

Thanks Shveta, I will post the rebased patch by tomorrow.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 25, 2025 at 4:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Sep 25, 2025 at 11:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > > [1]
> > > > /*
> > > > * For logical decode we need combo CIDs to properly decode the
> > > > * catalog
> > > > */
> > > > if (RelationIsAccessibleInLogicalDecoding(relation))
> > > > log_heap_new_cid(relation, &tp);
> > > >
> > >
> > > Meanwhile I am also exploring the option where we can just CREATE TYPE
> > > in initialize_data_directory() during initdb, basically we will create
> > > this type in template1 so that it will be available in all the
> > > databases, and that would simplify the table creation whether we
> > > create internally or we allow user to create it.  And while checking
> > > is_publishable_class we can check the type and avoid publishing those
> > > tables.
> > >
> >
> > Based on my off list discussion with Amit, one option could be to set
> > HEAP_INSERT_NO_LOGICAL option while inserting tuple into conflict
> > history table, for that we can not use SPI interface to insert instead
> > we will have to directly call the heap_insert() to add this option.
> > Since we do not want to create any trigger etc on this table, direct
> > insert should be fine, but if we plan to create this table as
> > partitioned table in future then direct heap insert might not work.
>
> Upon further reflection, I realized that while this approach avoids
> streaming inserts to the conflict log history table, it still requires
> that table to exist on the subscriber node upon subscription creation,
> which isn't ideal.
>
> We have two main options to address this:
>
> Option1:
> When calling pg_get_publication_tables(), if the 'alltables' option is
> used, we can scan all subscriptions and explicitly ignore (filter out)
> all conflict history tables.  This will not be very costly as this
> will scan the subscriber when pg_get_publication_tables() is called,
> which is only called during create subscription/alter subscription on
> the remote node.
>
> Option2:
> Alternatively, we could introduce a table creation option, like a
> 'non-publishable' flag, to prevent a table from being streamed
> entirely. I believe this would be a valuable, independent feature for
> users who want to create certain tables without including them in
> logical replication.
>
> I prefer option2, as I feel this can add value independent of this patch.
>

I agree that marking tables with a flag to easily exclude them during
publishing would be cleaner. In the current patch, for an ALL-TABLES
publication, we scan pg_subscription for each table in pg_class to
check its subconflicttable and decide whether to ignore it. But since
this only happens during create/alter subscription and refresh
publication, the overhead should be acceptable.

Introducing a ‘NON_PUBLISHABLE_TABLE’ option would be a good
enhancement but since we already have the EXCEPT list built in a
separate thread, that might be sufficient for now. IMO, such
conflict-tables should be marked internally (for example, with a
‘non_publishable’ or ‘conflict_log_table’ flag) so they can be easily
identified within the system, without requiring users to explicitly
specify them in EXCEPT or as NON_PUBLISHABLE_TABLE. I would like to
see what others think on this.
For the time being, the current implementation looks fine, considering
it runs only during a few publication-related DDL operations.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Nov 12, 2025 at 12:21 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >

> I agree that marking tables with a flag to easily exclude them during
> publishing would be cleaner. In the current patch, for an ALL-TABLES
> publication, we scan pg_subscription for each table in pg_class to
> check its subconflicttable and decide whether to ignore it. But since
> this only happens during create/alter subscription and refresh
> publication, the overhead should be acceptable.

Thanks for your opinion.

> Introducing a ‘NON_PUBLISHABLE_TABLE’ option would be a good
> enhancement but since we already have the EXCEPT list built in a
> separate thread, that might be sufficient for now. IMO, such
> conflict-tables should be marked internally (for example, with a
> ‘non_publishable’ or ‘conflict_log_table’ flag) so they can be easily
> identified within the system, without requiring users to explicitly
> specify them in EXCEPT or as NON_PUBLISHABLE_TABLE. I would like to
> see what others think on this.
> For the time being, the current implementation looks fine, considering
> it runs only during a few publication-related DDL operations.

+1

Here is the rebased patch, changes apart from rebasing it
1) Dropped the conflict history table during drop subscription
2) Added test cases for testing the conflict history table behavior
with CREATE/ALTER/DROP subscription

TODO:
1) Need more thoughts on the table schema whether we need to capture
more items or shall we drop some fields if we think those are not
necessary.
2) Logical replication test for generating conflict and capturing in
conflict history table.

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Nov 12, 2025 at 2:40 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 12:21 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
>
> > I agree that marking tables with a flag to easily exclude them during
> > publishing would be cleaner. In the current patch, for an ALL-TABLES
> > publication, we scan pg_subscription for each table in pg_class to
> > check its subconflicttable and decide whether to ignore it. But since
> > this only happens during create/alter subscription and refresh
> > publication, the overhead should be acceptable.
>
> Thanks for your opinion.
>
> > Introducing a ‘NON_PUBLISHABLE_TABLE’ option would be a good
> > enhancement but since we already have the EXCEPT list built in a
> > separate thread, that might be sufficient for now. IMO, such
> > conflict-tables should be marked internally (for example, with a
> > ‘non_publishable’ or ‘conflict_log_table’ flag) so they can be easily
> > identified within the system, without requiring users to explicitly
> > specify them in EXCEPT or as NON_PUBLISHABLE_TABLE. I would like to
> > see what others think on this.
> > For the time being, the current implementation looks fine, considering
> > it runs only during a few publication-related DDL operations.
>
> +1
>
> Here is the rebased patch, changes apart from rebasing it
> 1) Dropped the conflict history table during drop subscription
> 2) Added test cases for testing the conflict history table behavior
> with CREATE/ALTER/DROP subscription

Thanks.

> TODO:
> 1) Need more thoughts on the table schema whether we need to capture
> more items or shall we drop some fields if we think those are not
> necessary.

Yes, this needs some more thoughts. I will review.

I feel since design is somewhat agreed upon, we may handle
code-correction/completion. I have not looked at the rebased patch
yet, but here are a few comments based on old-version.

Few observations related to publication.
------------------------------

(In the below comments, clt/CLT implies Conflict Log Table)

1)
'select pg_relation_is_publishable(clt)' returns true for conflict-log table.

2)
'\d+ clt'   shows all-tables publication name. I feel we should not
show that for clt.

3)
I am able to create a publication for clt table, should it be allowed?

create subscription sub1 connection '...' publication pub1
WITH(conflict_log_table='clt');
create publication pub3 for table clt;

4)
Is there a reason we have not made '!IsConflictHistoryRelid' check as
part of is_publishable_class() itself? If we do so, other code-logics
will also get clt as non-publishable always (and will solve a few of
the above issues I think). IIUC, there is no place where we want to
mark CLT as publishable or is there any?

5) Also, I feel we can add some documentation now to help others to
understand/review the patch better without going through the long
thread.


Few observations related to conflict-logging:
------------------------------
1)
I found that for the conflicts which ultimately result in Error, we do
not insert any conflict-record in clt.

a)
Example: insert_exists, update_Exists
create table tab1 (i int primary key, j int);
sub: insert into tab1 values(30,10);
pub: insert into tab1 values(30,10);
ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
No record in clt.

sub:
<some pre-data needed>
update tab1 set i=40 where i = 30;
pub: update tab1 set i=40 where i = 20;
ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
No record in clt.

b)
Another question related to this is, since these conflicts (which
results in error) keep on happening until user resolves these or skips
these or 'disable_on_error' is set. Then are we going to insert these
multiple times? We do count these in 'confl_insert_exists' and
'confl_update_exists' everytime, so it makes sense to log those each
time in clt as well. Thoughts?

2)
Conflicts where row on sub is missing, local_ts incorrectly inserted.
It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
indicating that it is not applicable for this conflict-type?

Example: delete_missing, update_missing
pub:
 insert into tab1 values(10,10);
 insert into tab1 values(20,10);
 sub:  delete from tab1 where i=10;
 pub:  delete from tab1 where i=10;


thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Nov 12, 2025 at 3:14 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 2:40 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Nov 12, 2025 at 12:21 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> >
> > > I agree that marking tables with a flag to easily exclude them during
> > > publishing would be cleaner. In the current patch, for an ALL-TABLES
> > > publication, we scan pg_subscription for each table in pg_class to
> > > check its subconflicttable and decide whether to ignore it. But since
> > > this only happens during create/alter subscription and refresh
> > > publication, the overhead should be acceptable.
> >
> > Thanks for your opinion.
> >
> > > Introducing a ‘NON_PUBLISHABLE_TABLE’ option would be a good
> > > enhancement but since we already have the EXCEPT list built in a
> > > separate thread, that might be sufficient for now. IMO, such
> > > conflict-tables should be marked internally (for example, with a
> > > ‘non_publishable’ or ‘conflict_log_table’ flag) so they can be easily
> > > identified within the system, without requiring users to explicitly
> > > specify them in EXCEPT or as NON_PUBLISHABLE_TABLE. I would like to
> > > see what others think on this.
> > > For the time being, the current implementation looks fine, considering
> > > it runs only during a few publication-related DDL operations.
> >
> > +1
> >
> > Here is the rebased patch, changes apart from rebasing it
> > 1) Dropped the conflict history table during drop subscription
> > 2) Added test cases for testing the conflict history table behavior
> > with CREATE/ALTER/DROP subscription
>
> Thanks.
>
> > TODO:
> > 1) Need more thoughts on the table schema whether we need to capture
> > more items or shall we drop some fields if we think those are not
> > necessary.
>
> Yes, this needs some more thoughts. I will review.
>
> I feel since design is somewhat agreed upon, we may handle
> code-correction/completion. I have not looked at the rebased patch
> yet, but here are a few comments based on old-version.
>
> Few observations related to publication.
> ------------------------------
>
> (In the below comments, clt/CLT implies Conflict Log Table)
>
> 1)
> 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
>
> 2)
> '\d+ clt'   shows all-tables publication name. I feel we should not
> show that for clt.
>
> 3)
> I am able to create a publication for clt table, should it be allowed?
>
> create subscription sub1 connection '...' publication pub1
> WITH(conflict_log_table='clt');
> create publication pub3 for table clt;
>
> 4)
> Is there a reason we have not made '!IsConflictHistoryRelid' check as
> part of is_publishable_class() itself? If we do so, other code-logics
> will also get clt as non-publishable always (and will solve a few of
> the above issues I think). IIUC, there is no place where we want to
> mark CLT as publishable or is there any?
>
> 5) Also, I feel we can add some documentation now to help others to
> understand/review the patch better without going through the long
> thread.
>
>
> Few observations related to conflict-logging:
> ------------------------------
> 1)
> I found that for the conflicts which ultimately result in Error, we do
> not insert any conflict-record in clt.
>
> a)
> Example: insert_exists, update_Exists
> create table tab1 (i int primary key, j int);
> sub: insert into tab1 values(30,10);
> pub: insert into tab1 values(30,10);
> ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> No record in clt.
>
> sub:
> <some pre-data needed>
> update tab1 set i=40 where i = 30;
> pub: update tab1 set i=40 where i = 20;
> ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> No record in clt.
>
> b)
> Another question related to this is, since these conflicts (which
> results in error) keep on happening until user resolves these or skips
> these or 'disable_on_error' is set. Then are we going to insert these
> multiple times? We do count these in 'confl_insert_exists' and
> 'confl_update_exists' everytime, so it makes sense to log those each
> time in clt as well. Thoughts?
>
> 2)
> Conflicts where row on sub is missing, local_ts incorrectly inserted.
> It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> indicating that it is not applicable for this conflict-type?
>
> Example: delete_missing, update_missing
> pub:
>  insert into tab1 values(10,10);
>  insert into tab1 values(20,10);
>  sub:  delete from tab1 where i=10;
>  pub:  delete from tab1 where i=10;
>

3)
We also need to think how we are going to display the info in case of
multiple_unique_conflicts as there could be multiple local and remote
tuples conflicting for one single operation. Example:

create table conf_tab (a int primary key, b int unique, c int unique);

sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);

pub: insert into conf_tab values (2,3,4);

ERROR:  conflict detected on relation "public.conf_tab":
conflict=multiple_unique_conflicts
DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
Key already exists in unique index "conf_tab_b_key", modified locally
in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
Key already exists in unique index "conf_tab_c_key", modified locally
in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
CONTEXT:  processing remote data for replication origin "pg_16392"
during message type "INSERT" for replication target relation
"public.conf_tab" in transaction 781, finished at 0/017FDDA0

Currently in clt, we have singular terms such as 'key_tuple',
'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
But it does not look reasonable to have multiple rows inserted for a
single conflict raised. I will think more about this.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> > Few observations related to publication.
> > ------------------------------

Thanks Shveta, for testing and sharing your thoughts.  IMHO for
conflict log tables it should be good enough if we restrict it when
ALL TABLE options are used, I don't think we need to put extra effort
to completely restrict it even if users want to explicitly list it
into the publication.

> >
> > (In the below comments, clt/CLT implies Conflict Log Table)
> >
> > 1)
> > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.

This function is used while publishing every single change and I don't
think we want to add a cost to check each subscription to identify
whether the table is listed as CLT.

> > 2)
> > '\d+ clt'   shows all-tables publication name. I feel we should not
> > show that for clt.

I think we should fix this.

> > 3)
> > I am able to create a publication for clt table, should it be allowed?

I believe we should not do any specific handling to restrict this but
I am open for the opinions.

> > create subscription sub1 connection '...' publication pub1
> > WITH(conflict_log_table='clt');
> > create publication pub3 for table clt;
> >
> > 4)
> > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > part of is_publishable_class() itself? If we do so, other code-logics
> > will also get clt as non-publishable always (and will solve a few of
> > the above issues I think). IIUC, there is no place where we want to
> > mark CLT as publishable or is there any?

IMHO the main reason is performance.

> > 5) Also, I feel we can add some documentation now to help others to
> > understand/review the patch better without going through the long
> > thread.

Make sense, I will do that in the next version.

> >
> > Few observations related to conflict-logging:
> > ------------------------------
> > 1)
> > I found that for the conflicts which ultimately result in Error, we do
> > not insert any conflict-record in clt.
> >
> > a)
> > Example: insert_exists, update_Exists
> > create table tab1 (i int primary key, j int);
> > sub: insert into tab1 values(30,10);
> > pub: insert into tab1 values(30,10);
> > ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> > No record in clt.
> >
> > sub:
> > <some pre-data needed>
> > update tab1 set i=40 where i = 30;
> > pub: update tab1 set i=40 where i = 20;
> > ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> > No record in clt.

Yeah that interesting need to put thought on how to commit this record
when an outer transaction is aborted as we do not have autonomous
transactions which are generally used for this kind of logging.  But
we can explore more options like inserting into conflict log tables
outside the outer transaction.

> > b)
> > Another question related to this is, since these conflicts (which
> > results in error) keep on happening until user resolves these or skips
> > these or 'disable_on_error' is set. Then are we going to insert these
> > multiple times? We do count these in 'confl_insert_exists' and
> > 'confl_update_exists' everytime, so it makes sense to log those each
> > time in clt as well. Thoughts?

I think it make sense to insert every time we see the conflict, but it
would be good to have opinion from others as well.

> > 2)
> > Conflicts where row on sub is missing, local_ts incorrectly inserted.
> > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> > indicating that it is not applicable for this conflict-type?
> >
> > Example: delete_missing, update_missing
> > pub:
> >  insert into tab1 values(10,10);
> >  insert into tab1 values(20,10);
> >  sub:  delete from tab1 where i=10;
> >  pub:  delete from tab1 where i=10;

Sure I will test this.

>
> 3)
> We also need to think how we are going to display the info in case of
> multiple_unique_conflicts as there could be multiple local and remote
> tuples conflicting for one single operation. Example:
>
> create table conf_tab (a int primary key, b int unique, c int unique);
>
> sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
>
> pub: insert into conf_tab values (2,3,4);
>
> ERROR:  conflict detected on relation "public.conf_tab":
> conflict=multiple_unique_conflicts
> DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> Key already exists in unique index "conf_tab_b_key", modified locally
> in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> Key already exists in unique index "conf_tab_c_key", modified locally
> in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> CONTEXT:  processing remote data for replication origin "pg_16392"
> during message type "INSERT" for replication target relation
> "public.conf_tab" in transaction 781, finished at 0/017FDDA0
>
> Currently in clt, we have singular terms such as 'key_tuple',
> 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> But it does not look reasonable to have multiple rows inserted for a
> single conflict raised. I will think more about this.

Currently I am inserting multiple records in the conflict history
table, the same as each tuple is logged, but couldn't find any better
way for this. Another option is to use an array of tuples instead of a
single tuple but not sure this might make things more complicated to
process by any external tool.  But you are right, this needs more
discussion.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > Few observations related to publication.
> > > ------------------------------
>
> Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> conflict log tables it should be good enough if we restrict it when
> ALL TABLE options are used, I don't think we need to put extra effort
> to completely restrict it even if users want to explicitly list it
> into the publication.
>
> > >
> > > (In the below comments, clt/CLT implies Conflict Log Table)
> > >
> > > 1)
> > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.

After putting more thought I have changed this to return false for
clt, as this is just an exposed function not called by pgoutput layer.

> > > 2)
> > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > show that for clt.
>
Fixed

>
> > > 3)
> > > I am able to create a publication for clt table, should it be allowed?
>
> I believe we should not do any specific handling to restrict this but
> I am open for the opinions.

Restricting this as well, lets see what others think.


>
> > > 5) Also, I feel we can add some documentation now to help others to
> > > understand/review the patch better without going through the long
> > > thread.
>
> Make sense, I will do that in the next version.
Done that but not compiled the docs as I don't currently have the
setup so added as WIP patch.


> > > 2)
> > > Conflicts where row on sub is missing, local_ts incorrectly inserted.
> > > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> > > indicating that it is not applicable for this conflict-type?
> > >
> > > Example: delete_missing, update_missing
> > > pub:
> > >  insert into tab1 values(10,10);
> > >  insert into tab1 values(20,10);
> > >  sub:  delete from tab1 where i=10;
> > >  pub:  delete from tab1 where i=10;
>
> Sure I will test this.

I have fixed this.

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > Few observations related to publication.
> > > ------------------------------
>
> Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> conflict log tables it should be good enough if we restrict it when
> ALL TABLE options are used, I don't think we need to put extra effort
> to completely restrict it even if users want to explicitly list it
> into the publication.
>
> > >
> > > (In the below comments, clt/CLT implies Conflict Log Table)
> > >
> > > 1)
> > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
>
> This function is used while publishing every single change and I don't
> think we want to add a cost to check each subscription to identify
> whether the table is listed as CLT.
>
> > > 2)
> > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > show that for clt.
>
> I think we should fix this.
>
> > > 3)
> > > I am able to create a publication for clt table, should it be allowed?
>
> I believe we should not do any specific handling to restrict this but
> I am open for the opinions.
>
> > > create subscription sub1 connection '...' publication pub1
> > > WITH(conflict_log_table='clt');
> > > create publication pub3 for table clt;
> > >
> > > 4)
> > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > part of is_publishable_class() itself? If we do so, other code-logics
> > > will also get clt as non-publishable always (and will solve a few of
> > > the above issues I think). IIUC, there is no place where we want to
> > > mark CLT as publishable or is there any?
>
> IMHO the main reason is performance.
>
> > > 5) Also, I feel we can add some documentation now to help others to
> > > understand/review the patch better without going through the long
> > > thread.
>
> Make sense, I will do that in the next version.
>
> > >
> > > Few observations related to conflict-logging:
> > > ------------------------------
> > > 1)
> > > I found that for the conflicts which ultimately result in Error, we do
> > > not insert any conflict-record in clt.
> > >
> > > a)
> > > Example: insert_exists, update_Exists
> > > create table tab1 (i int primary key, j int);
> > > sub: insert into tab1 values(30,10);
> > > pub: insert into tab1 values(30,10);
> > > ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> > > No record in clt.
> > >
> > > sub:
> > > <some pre-data needed>
> > > update tab1 set i=40 where i = 30;
> > > pub: update tab1 set i=40 where i = 20;
> > > ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> > > No record in clt.
>
> Yeah that interesting need to put thought on how to commit this record
> when an outer transaction is aborted as we do not have autonomous
> transactions which are generally used for this kind of logging.

Right

> But
> we can explore more options like inserting into conflict log tables
> outside the outer transaction.

Yes, that seems the way to me. I could not find any such existing
reference/usage in code though.

>
> > > b)
> > > Another question related to this is, since these conflicts (which
> > > results in error) keep on happening until user resolves these or skips
> > > these or 'disable_on_error' is set. Then are we going to insert these
> > > multiple times? We do count these in 'confl_insert_exists' and
> > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > time in clt as well. Thoughts?
>
> I think it make sense to insert every time we see the conflict, but it
> would be good to have opinion from others as well.
>
> > > 2)
> > > Conflicts where row on sub is missing, local_ts incorrectly inserted.
> > > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> > > indicating that it is not applicable for this conflict-type?
> > >
> > > Example: delete_missing, update_missing
> > > pub:
> > >  insert into tab1 values(10,10);
> > >  insert into tab1 values(20,10);
> > >  sub:  delete from tab1 where i=10;
> > >  pub:  delete from tab1 where i=10;
>
> Sure I will test this.
>
> >
> > 3)
> > We also need to think how we are going to display the info in case of
> > multiple_unique_conflicts as there could be multiple local and remote
> > tuples conflicting for one single operation. Example:
> >
> > create table conf_tab (a int primary key, b int unique, c int unique);
> >
> > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
> >
> > pub: insert into conf_tab values (2,3,4);
> >
> > ERROR:  conflict detected on relation "public.conf_tab":
> > conflict=multiple_unique_conflicts
> > DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> > Key already exists in unique index "conf_tab_b_key", modified locally
> > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> > Key already exists in unique index "conf_tab_c_key", modified locally
> > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> > CONTEXT:  processing remote data for replication origin "pg_16392"
> > during message type "INSERT" for replication target relation
> > "public.conf_tab" in transaction 781, finished at 0/017FDDA0
> >
> > Currently in clt, we have singular terms such as 'key_tuple',
> > 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> > But it does not look reasonable to have multiple rows inserted for a
> > single conflict raised. I will think more about this.
>
> Currently I am inserting multiple records in the conflict history
> table, the same as each tuple is logged, but couldn't find any better
> way for this. Another option is to use an array of tuples instead of a
> single tuple but not sure this might make things more complicated to
> process by any external tool.

It’s arguable and hard to say what the correct behaviour should be.
I’m slightly leaning toward having a single row per conflict. IMO,
overall the confl_* counters in pg_stat_subscription_stats should
align with the number of entries in the conflict history table, which
implies one row even for multiple_unique_conflicts. But I also
understand that this approach could make things complicated for
external tools. For now, we can proceed with logging multiple rows for
a single multiple_unique_conflicts occurrence and wait to hear others’
opinions.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Mon, Nov 17, 2025 at 11:54 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > Few observations related to publication.
> > > > ------------------------------
> >
> > Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> > conflict log tables it should be good enough if we restrict it when
> > ALL TABLE options are used, I don't think we need to put extra effort
> > to completely restrict it even if users want to explicitly list it
> > into the publication.
> >
> > > >
> > > > (In the below comments, clt/CLT implies Conflict Log Table)
> > > >
> > > > 1)
> > > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
>
> After putting more thought I have changed this to return false for
> clt, as this is just an exposed function not called by pgoutput layer.
>
> > > > 2)
> > > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > > show that for clt.
> >
> Fixed
>
> >
> > > > 3)
> > > > I am able to create a publication for clt table, should it be allowed?
> >
> > I believe we should not do any specific handling to restrict this but
> > I am open for the opinions.
>
> Restricting this as well, lets see what others think.
>
>
> >
> > > > 5) Also, I feel we can add some documentation now to help others to
> > > > understand/review the patch better without going through the long
> > > > thread.
> >
> > Make sense, I will do that in the next version.
> Done that but not compiled the docs as I don't currently have the
> setup so added as WIP patch.
>
>
> > > > 2)
> > > > Conflicts where row on sub is missing, local_ts incorrectly inserted.
> > > > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> > > > indicating that it is not applicable for this conflict-type?
> > > >
> > > > Example: delete_missing, update_missing
> > > > pub:
> > > >  insert into tab1 values(10,10);
> > > >  insert into tab1 values(20,10);
> > > >  sub:  delete from tab1 where i=10;
> > > >  pub:  delete from tab1 where i=10;
> >
> > Sure I will test this.
>
> I have fixed this.

Thanks for the patch.  Some feedback about the clt:

1)
local_origin is always NULL in my tests for all conflict types I tried.

2)
Do we need 'key_tuple' as such or replica_identity is enough/better?
I see 'key_tuple' inserted as {"i":10,"j":null} for delete_missing
case where query was 'delete from tab1 where i=10'; here 'i' is PK;
which seems okay.
But it is '{"i":20,"j":200}' for update_origin_differ case where query
was 'update tab1 set j=200 where i =20'. Here too RI is 'i' alone. I
feel 'j' should not be part of the key but let me know if I have
misunderstood. IMO, 'j' being part of remote_tuple should be good
enough.

3)
Do we need to have a timestamp column as well to say when conflict was
recorded? Or local_commit_ts, remote_commit_ts are sufficient?
Thoughts

4)
Also, it makes sense if we have 'conflict_type' next to 'relid'. I
feel relid and conflict_type are primary columns and rest are related
details.

5)
Do we need table_schema, table_name when we have relid already? If we
want to retain these, we can name them as schemaname and relname to be
consistent with all other stats tables. IMO, then the order can be:
relid, schemaname, relname, conflcit_type and then the rest of the
details.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi Dilip.

I started to look at this thread. Here are some comments for patch v4-0001.


=====
GENERAL

1.
There's some inconsistency in how this new table is called at different times :
a) "conflict table"
b) "conflict log table"
c) "conflict log history table"
d) "conflict history"

My preference was (b). Making this consistent will have impacts on
many macros, variables, comments, function names, etc.

~~~

2.
What about enhancements to description \dRs+ so the subscription
conflict log table is displayed?

~~~

3.
What about enhancements to the tab-complete code?

======
src/backend/commands/subscriptioncmds.c

4.
 #define SUBOPT_MAX_RETENTION_DURATION 0x00008000
 #define SUBOPT_LSN 0x00010000
 #define SUBOPT_ORIGIN 0x00020000
+#define SUBOPT_CONFLICT_TABLE 0x00030000

Bug? Shouldn't that be 0x00040000.

~~~

5.
+ char    *conflicttable;
  XLogRecPtr lsn;
 } SubOpts;

IMO 'conflicttable' looks too much like 'conflictable', which may
cause some confusion on first reading.

~~~

6.
+static void CreateConflictLogTable(Oid namespaceId, char *conflictrel);
+static void DropConflictLogTable(Oid namespaceId, char *conflictrel);

AFAIK it is more conventional for the static functions to be
snake_case and the extern functions to use CamelCase. So these would
be:
- create_conflict_log_table
- drop_conflict_log_table

~~~

CreateSubscription:

7.
+ /* If conflict log table name is given than create the table. */
+ if (opts.conflicttable)
+ CreateConflictLogTable(conflict_table_nspid, conflict_table);
+

typo: /If conflict/If a conflict/

typo: "than"

~~~

AlterSubscription:

8.
-   SUBOPT_ORIGIN);
+   SUBOPT_ORIGIN |
+   SUBOPT_CONFLICT_TABLE);

The line wrapping doesn't seem necessary.

~~~

9.
+ replaces[Anum_pg_subscription_subconflictnspid - 1] = true;
+ replaces[Anum_pg_subscription_subconflicttable - 1] = true;
+
+ CreateConflictLogTable(nspid, relname);
+ }
+

What are the rules regarding replacing one log table with a different
log table for the same subscription? I didn't see anything about this
scenario, nor any test cases.

~~~

CreateConflictLogTable:

10.
+ /*
+ * Check if table with same name already present, if so report an error
+ * as currently we do not support user created table as conflict log
+ * table.
+ */

Is the comment about "user-created table" strictly correct? e.g. Won't
you encounter the same problem if there are 2 subscriptions trying to
set the same-named conflict log table?

SUGGESTION
Report an error if the specified conflict log table already exists.

~~~

DropConflictLogTable:

11.
+ /*
+ * Drop conflict log table if exist, use if exists ensures the command
+ * won't error if the table is already gone.
+ */

The reason for EXISTS was already mentioned in the function comment.

SUGGESTION
Drop the conflict log table if it exists.

======
src/backend/replication/logical/conflict.c

12.
+static Datum TupleTableSlotToJsonDatum(TupleTableSlot *slot);
+
+static void InsertConflictLog(Relation rel,
+   TransactionId local_xid,
+   TimestampTz local_ts,
+   ConflictType conflict_type,
+   RepOriginId origin_id,
+   TupleTableSlot *searchslot,
+   TupleTableSlot *localslot,
+   TupleTableSlot *remoteslot);

Same as earlier comment #6 -- isn't it conventional to use snake_case
for the static function names?

~~~

TupleTableSlotToJsonDatum:

13.
+ * This would be a new internal helper function for logical replication
+ * Needs to handle various data types and potentially TOASTed data

What's this comment about? Something doesn't look quite right.

~~~

InsertConflictLog:

14.
+ /* TODO: proper error code */
+ relid = get_relname_relid(relname, nspid);
+ if (!OidIsValid(relid))
+ elog(ERROR, "conflict log history table does not exists");
+ conflictrel = table_open(relid, RowExclusiveLock);
+ if (conflictrel == NULL)
+ elog(ERROR, "could not open conflict log history table");

14a.
What's the TODO comment for? Are you going to replace these elogs?

~

14b.
Typo: "does not exists"

~

14c.
An unnecessary double-blank line follows this code fragment.

~~~

15.
+ /* Populate the values and nulls arrays */
+ attno = 0;
+ values[attno] = ObjectIdGetDatum(RelationGetRelid(rel));
+ attno++;
+
+ if (TransactionIdIsValid(local_xid))
+ values[attno] = TransactionIdGetDatum(local_xid);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (TransactionIdIsValid(remote_xid))
+ values[attno] = TransactionIdGetDatum(remote_xid);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ values[attno] = LSNGetDatum(remote_final_lsn);
+ attno++;
+
+ if (local_ts > 0)
+ values[attno] = TimestampTzGetDatum(local_ts);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (remote_commit_ts > 0)
+ values[attno] = TimestampTzGetDatum(remote_commit_ts);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ values[attno] =
+ CStringGetTextDatum(get_namespace_name(RelationGetNamespace(rel)));
+ attno++;
+
+ values[attno] = CStringGetTextDatum(RelationGetRelationName(rel));
+ attno++;
+
+ values[attno] = CStringGetTextDatum(ConflictTypeNames[conflict_type]);
+ attno++;
+
+ if (origin_id != InvalidRepOriginId)
+ replorigin_by_oid(origin_id, true, &origin);
+
+ if (origin != NULL)
+ values[attno] = CStringGetTextDatum(origin);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (replorigin_session_origin != InvalidRepOriginId)
+ replorigin_by_oid(replorigin_session_origin, true, &remote_origin);
+
+ if (remote_origin != NULL)
+ values[attno] = CStringGetTextDatum(remote_origin);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (searchslot != NULL)
+ values[attno] = TupleTableSlotToJsonDatum(searchslot);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (localslot != NULL)
+ values[attno] = TupleTableSlotToJsonDatum(localslot);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (remoteslot != NULL)
+ values[attno] = TupleTableSlotToJsonDatum(remoteslot);
+ else
+ nulls[attno] = true;
+

15a.
It might be simpler to just post-increment that 'attno' in all the
assignments and save a dozen lines of code:
e.g. values[attno++] = ...

~

15b.
Also, put a sanity Assert check at the end, like:
Assert(attno + 1 == MAX_CONFLICT_ATTR_NUM);


======
src/backend/utils/cache/lsyscache.c

16.
+ if (isnull)
+ {
+ ReleaseSysCache(tup);
+ return NULL;
+ }
+
+ *nspid = subform->subconflictnspid;
+ relname = pstrdup(TextDatumGetCString(datum));
+
+ ReleaseSysCache(tup);
+
+ return relname;

It would be tidier to have a single release/return by coding this
slightly differently.

SUGGESTION:

char *relname = NULL;
...
if (!isnull)
{
  *nspid = subform->subconflictnspid;
  relname = pstrdup(TextDatumGetCString(datum));
}

ReleaseSysCache(tup);
return relname;

======
src/include/catalog/pg_subscription.h

17.
+ Oid subconflictnspid; /* Namespace Oid in which the conflict history
+ * table is created. */

Would it be better to make these 2 new member names more alike, since
they go together. e.g.
confl_table_nspid
confl_table_name

======
src/include/replication/conflict.h

18.
+#define MAX_CONFLICT_ATTR_NUM 15

I felt this doesn't really belong here. Just define it atop/within the
function InsertConflictLog()

~~~

19.
 extern void InitConflictIndexes(ResultRelInfo *relInfo);
+
 #endif

Spurious whitespace change not needed for this patch.

======
src/test/regress/sql/subscription.sql

20.
How about adding some more test scenarios:
e.g.1. ALTER the conflict log table of some subscription that already has one
e.g.2. Have multiple subscriptions that specify the same conflict log table

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Here are some comments for the patch v4-0002.

======
GENERAL

1.
The patch should include test cases:

- to confirm an error happens when attempting to publish clt
- to confirm \dt+ clt is not showing the ALL TABLES publication
- to confirm that SQL function pg_relation_is_publishable givesthe
expected result
- etc.

======
Commit Message

1.
When all table option is used with publication don't publish the
conflict history tables.

~

Maybe reword that using uppercase for keywords, like:

SUGGESTION
A conflict log table will not be published by a FOR ALL TABLES publication.

======
src/backend/catalog/pg_publication.c

check_publication_add_relation:

3.
+ /* Can't be created as conflict log table */
+ if (IsConflictLogRelid(RelationGetRelid(targetrel)))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot add relation \"%s\" to publication",
+ RelationGetRelationName(targetrel)),
+ errdetail("This operation is not supported for conflict log tables.")));

3a.
Typo in comment.

SUGGESTION
Can't be a conflict log table

~

3b.
I was wondering if this check should be moved to the bottom of the function.

I think IsConflictLogRelid() is the most inefficient of all these
conditions, so it is better to give the other ones a chance to fail
quickly before needing to check for clt.

~~~

pg_relation_is_publishable:

4.
 /*
- * SQL-callable variant of the above
+ * SQL-callable variant of the above and this should not be a conflict log rel
  *
  * This returns null when the relation does not exist.  This is intended to be
  * used for example in psql to avoid gratuitous errors when there are

I felt this new comment should be in the code, instead of in the
function comment.

SUGGESTION
/* subscription conflict log tables are not published */
result = is_publishable_class(relid, (Form_pg_class) GETSTRUCT(tuple)) &&
  !IsConflictLogRelid(relid);

~~~

5.
It seemed strange that function
pg_relation_is_publishable(PG_FUNCTION_ARGS) is checking
IsConflictLogRelid, but function is_publishable_relation(Relation rel)
is not.

~~~

GetAllPublicationRelations:

6.
+ /* conflict history tables are not published. */
  if (is_publishable_class(relid, relForm) &&
+ !IsConflictLogRelid(relid) &&
  !(relForm->relispartition && pubviaroot))
  result = lappend_oid(result, relid);
Inconsistent "history table" terminology.

Maybe this comment should be identical to the other one above. e.g.
/* subscription conflict log tables are not published */

======
src/backend/commands/subscriptioncmds.c

IsConflictLogRelid:

8.
+/*
+ * Is relation used as a conflict log table
+ *
+ * Scan all the subscription and check whether the relation is used as
+ * conflict log table.
+ */

typo: "all the subscription"

Also, the 2nd sentence repeats the purpose of the function;  I don't
think you need to say it twice.

SUGGESTION
Check if the specified relation is used as a conflict log table by any
subscription.

~~~

9.
+ if (relname == NULL)
+ continue;
+ if (relid == get_relname_relid(relname, nspid))
+ {
+ found = true;
+ break;
+ }

It seemed unnecessary to separate out the 'continue' like that.

In passing, consider renaming that generic 'found' to be the proper
meaning of the boolean.

SUGGESTION
if (relname && relid == get_relname_relid(relname, nspid))
{
  is_clt = true;
  break;
}

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi Dilip,

FYI, patch v4-0003 (docs) needs rebasing due to ada78cd.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Nov 18, 2025 at 4:47 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Thanks for the patch.  Some feedback about the clt:
>
> 1)
> local_origin is always NULL in my tests for all conflict types I tried.

You need to set the replication origin as shown below
On subscriber side:
---------------------------
SELECT pg_replication_origin_create('my_remote_source_2');
SELECT pg_replication_origin_session_setup('my_remote_source_2');
UPDATE test SET b=200 where a=1;

On remote:
---------------
UPDATE test SET b=300 where a=1; -- conflicting operation with local node

On subscriber
------------------
postgres[1514377]=# select local_origin, remote_origin from
myschema.conflict_log_history2 ;
    local_origin    | remote_origin
--------------------+---------------------
 my_remote_source_2 | pg_16396

> 2)
> Do we need 'key_tuple' as such or replica_identity is enough/better?
> I see 'key_tuple' inserted as {"i":10,"j":null} for delete_missing
> case where query was 'delete from tab1 where i=10'; here 'i' is PK;
> which seems okay.
> But it is '{"i":20,"j":200}' for update_origin_differ case where query
> was 'update tab1 set j=200 where i =20'. Here too RI is 'i' alone. I
> feel 'j' should not be part of the key but let me know if I have
> misunderstood. IMO, 'j' being part of remote_tuple should be good
> enough.

Yeah we should display the replica identity only, I assumed in
ReportApplyConflict() the searchslot should only have RI tuple but it
is sending a remote tuple in the searchslot, so might need to extract
the RI from this slot, I will work on this.

> 3)
> Do we need to have a timestamp column as well to say when conflict was
> recorded? Or local_commit_ts, remote_commit_ts are sufficient?
> Thoughts

You mean we can record the timestamp now while inserting, not sure if
it will add some more meaningful information than remote_commit_ts,
but let's see what others think.

> 4)
> Also, it makes sense if we have 'conflict_type' next to 'relid'. I
> feel relid and conflict_type are primary columns and rest are related
> details.

Sure

> 5)
> Do we need table_schema, table_name when we have relid already? If we
> want to retain these, we can name them as schemaname and relname to be
> consistent with all other stats tables. IMO, then the order can be:
> relid, schemaname, relname, conflcit_type and then the rest of the
> details.

Yeah this makes the table denormalized as we can fetch this
information by joining with pg_class, but I think it might be better
for readability, lets see what others think, for now I will reorder as
suggested.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Nov 19, 2025 at 3:46 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 18, 2025 at 4:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Thanks for the patch.  Some feedback about the clt:
> >
> > 1)
> > local_origin is always NULL in my tests for all conflict types I tried.
>
> You need to set the replication origin as shown below
> On subscriber side:
> ---------------------------
> SELECT pg_replication_origin_create('my_remote_source_2');
> SELECT pg_replication_origin_session_setup('my_remote_source_2');
> UPDATE test SET b=200 where a=1;
>
> On remote:
> ---------------
> UPDATE test SET b=300 where a=1; -- conflicting operation with local node
>
> On subscriber
> ------------------
> postgres[1514377]=# select local_origin, remote_origin from
> myschema.conflict_log_history2 ;
>     local_origin    | remote_origin
> --------------------+---------------------
>  my_remote_source_2 | pg_16396

Okay, I see, thanks!

>
> > 2)
> > Do we need 'key_tuple' as such or replica_identity is enough/better?
> > I see 'key_tuple' inserted as {"i":10,"j":null} for delete_missing
> > case where query was 'delete from tab1 where i=10'; here 'i' is PK;
> > which seems okay.
> > But it is '{"i":20,"j":200}' for update_origin_differ case where query
> > was 'update tab1 set j=200 where i =20'. Here too RI is 'i' alone. I
> > feel 'j' should not be part of the key but let me know if I have
> > misunderstood. IMO, 'j' being part of remote_tuple should be good
> > enough.
>
> Yeah we should display the replica identity only, I assumed in
> ReportApplyConflict() the searchslot should only have RI tuple but it
> is sending a remote tuple in the searchslot, so might need to extract
> the RI from this slot, I will work on this.

yeah, we have extracted it already in
errdetail_apply_conflict()->build_tuple_value_details(). See it dumps
it in log:

LOG:  conflict detected on relation "public.tab1":
conflict=update_origin_differs
DETAIL:  Updating the row that was modified locally in transaction 768
at 2025-11-18 12:09:19.658502+05:30.
        Existing local row (20, 100); remote row (20, 200); replica
identity (i)=(20).

We somehow need to reuse it.

>
> > 3)
> > Do we need to have a timestamp column as well to say when conflict was
> > recorded? Or local_commit_ts, remote_commit_ts are sufficient?
> > Thoughts
>
> You mean we can record the timestamp now while inserting, not sure if
> it will add some more meaningful information than remote_commit_ts,
> but let's see what others think.
>

On rethinking, we can skip it. The commit-ts of both sides are enough.

> > 4)
> > Also, it makes sense if we have 'conflict_type' next to 'relid'. I
> > feel relid and conflict_type are primary columns and rest are related
> > details.
>
> Sure
>
> > 5)
> > Do we need table_schema, table_name when we have relid already? If we
> > want to retain these, we can name them as schemaname and relname to be
> > consistent with all other stats tables. IMO, then the order can be:
> > relid, schemaname, relname, conflcit_type and then the rest of the
> > details.
>
> Yeah this makes the table denormalized as we can fetch this
> information by joining with pg_class, but I think it might be better
> for readability, lets see what others think, for now I will reorder as
> suggested.
>

Okay, works for me if we want to keep these. I see that most of the
other statistics tables (pg_stat_all_indexes, pg_statio_all_tables,
pg_statio_all_sequences etc)  that maintain a relid also retain the
names.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Nov 19, 2025 at 7:01 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Dilip.
>
> I started to look at this thread. Here are some comments for patch v4-0001.

Thanks Peter for your review, worked on most of the comments for 0001
>
> =====
> GENERAL
>
> 1.
> There's some inconsistency in how this new table is called at different times :
> a) "conflict table"
> b) "conflict log table"
> c) "conflict log history table"
> d) "conflict history"
>
> My preference was (b). Making this consistent will have impacts on
> many macros, variables, comments, function names, etc.

Yeah even my preference is b) so used everywhere.

> ~~~
>
> 2.
> What about enhancements to description \dRs+ so the subscription
> conflict log table is displayed?

Done, I have displayed the conflict log table name, not sure shall we
display complete schema qualified name, if so we might need to join
with pg_namespace.

> ~~~
>
> 3.
> What about enhancements to the tab-complete code?

Done

> ======
> src/backend/commands/subscriptioncmds.c
>
> 4.
>  #define SUBOPT_MAX_RETENTION_DURATION 0x00008000
>  #define SUBOPT_LSN 0x00010000
>  #define SUBOPT_ORIGIN 0x00020000
> +#define SUBOPT_CONFLICT_TABLE 0x00030000
>
> Bug? Shouldn't that be 0x00040000.

Yeah, fixed.

> ~~~
>
> 5.
> + char    *conflicttable;
>   XLogRecPtr lsn;
>  } SubOpts;
>
> IMO 'conflicttable' looks too much like 'conflictable', which may
> cause some confusion on first reading.

Changed to conflictlogtable

> ~~~
>
> 6.
> +static void CreateConflictLogTable(Oid namespaceId, char *conflictrel);
> +static void DropConflictLogTable(Oid namespaceId, char *conflictrel);
>
> AFAIK it is more conventional for the static functions to be
> snake_case and the extern functions to use CamelCase. So these would
> be:
> - create_conflict_log_table
> - drop_conflict_log_table

Done

> ~~~
>
> CreateSubscription:
>
> 7.
> + /* If conflict log table name is given than create the table. */
> + if (opts.conflicttable)
> + CreateConflictLogTable(conflict_table_nspid, conflict_table);
> +
>
> typo: /If conflict/If a conflict/
>
> typo: "than"

Fixed

> ~~~
>
> AlterSubscription:
>
> 8.
> -   SUBOPT_ORIGIN);
> +   SUBOPT_ORIGIN |
> +   SUBOPT_CONFLICT_TABLE);
>
> The line wrapping doesn't seem necessary.

Without wrapping it crosses 80 characters per line limit.

> ~~~
>
> 9.
> + replaces[Anum_pg_subscription_subconflictnspid - 1] = true;
> + replaces[Anum_pg_subscription_subconflicttable - 1] = true;
> +
> + CreateConflictLogTable(nspid, relname);
> + }
> +
>
> What are the rules regarding replacing one log table with a different
> log table for the same subscription? I didn't see anything about this
> scenario, nor any test cases.

Added test and updated the code as well, so if we set different log
table, we will drop the old and create new table, however if you set
the same table, just NOTICE will be issued and table will not be
created again.

> ~~~
>
> CreateConflictLogTable:
>
> 10.
> + /*
> + * Check if table with same name already present, if so report an error
> + * as currently we do not support user created table as conflict log
> + * table.
> + */
>
> Is the comment about "user-created table" strictly correct? e.g. Won't
> you encounter the same problem if there are 2 subscriptions trying to
> set the same-named conflict log table?
>
> SUGGESTION
> Report an error if the specified conflict log table already exists.

Done

> ~~~
>
> DropConflictLogTable:
>
> 11.
> + /*
> + * Drop conflict log table if exist, use if exists ensures the command
> + * won't error if the table is already gone.
> + */
>
> The reason for EXISTS was already mentioned in the function comment.
>
> SUGGESTION
> Drop the conflict log table if it exists.

Done

> ======
> src/backend/replication/logical/conflict.c
>
> 12.
> +static Datum TupleTableSlotToJsonDatum(TupleTableSlot *slot);
> +
> +static void InsertConflictLog(Relation rel,
> +   TransactionId local_xid,
> +   TimestampTz local_ts,
> +   ConflictType conflict_type,
> +   RepOriginId origin_id,
> +   TupleTableSlot *searchslot,
> +   TupleTableSlot *localslot,
> +   TupleTableSlot *remoteslot);
>
> Same as earlier comment #6 -- isn't it conventional to use snake_case
> for the static function names?

Done

> ~~~
>
> TupleTableSlotToJsonDatum:
>
> 13.
> + * This would be a new internal helper function for logical replication
> + * Needs to handle various data types and potentially TOASTed data
>
> What's this comment about? Something doesn't look quite right.

Hmm, that's bad, fixed.

> ~~~
>
> InsertConflictLog:
>
> 14.
> + /* TODO: proper error code */
> + relid = get_relname_relid(relname, nspid);
> + if (!OidIsValid(relid))
> + elog(ERROR, "conflict log history table does not exists");
> + conflictrel = table_open(relid, RowExclusiveLock);
> + if (conflictrel == NULL)
> + elog(ERROR, "could not open conflict log history table");
>
> 14a.
> What's the TODO comment for? Are you going to replace these elogs?

replaced with ereport
> ~
>
> 14b.
> Typo: "does not exists"

fixed

> ~
>
> 14c.
> An unnecessary double-blank line follows this code fragment.

fixed

> ~~~
>
> 15.
> + /* Populate the values and nulls arrays */
> + attno = 0;
> + values[attno] = ObjectIdGetDatum(RelationGetRelid(rel));
> + attno++;
> +
> + if (TransactionIdIsValid(local_xid))
> + values[attno] = TransactionIdGetDatum(local_xid);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (TransactionIdIsValid(remote_xid))
> + values[attno] = TransactionIdGetDatum(remote_xid);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + values[attno] = LSNGetDatum(remote_final_lsn);
> + attno++;
> +
> + if (local_ts > 0)
> + values[attno] = TimestampTzGetDatum(local_ts);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (remote_commit_ts > 0)
> + values[attno] = TimestampTzGetDatum(remote_commit_ts);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + values[attno] =
> + CStringGetTextDatum(get_namespace_name(RelationGetNamespace(rel)));
> + attno++;
> +
> + values[attno] = CStringGetTextDatum(RelationGetRelationName(rel));
> + attno++;
> +
> + values[attno] = CStringGetTextDatum(ConflictTypeNames[conflict_type]);
> + attno++;
> +
> + if (origin_id != InvalidRepOriginId)
> + replorigin_by_oid(origin_id, true, &origin);
> +
> + if (origin != NULL)
> + values[attno] = CStringGetTextDatum(origin);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (replorigin_session_origin != InvalidRepOriginId)
> + replorigin_by_oid(replorigin_session_origin, true, &remote_origin);
> +
> + if (remote_origin != NULL)
> + values[attno] = CStringGetTextDatum(remote_origin);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (searchslot != NULL)
> + values[attno] = TupleTableSlotToJsonDatum(searchslot);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (localslot != NULL)
> + values[attno] = TupleTableSlotToJsonDatum(localslot);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (remoteslot != NULL)
> + values[attno] = TupleTableSlotToJsonDatum(remoteslot);
> + else
> + nulls[attno] = true;
> +
>
> 15a.
> It might be simpler to just post-increment that 'attno' in all the
> assignments and save a dozen lines of code:
> e.g. values[attno++] = ...

Yeah done that

> ~
>
> 15b.
> Also, put a sanity Assert check at the end, like:
> Assert(attno + 1 == MAX_CONFLICT_ATTR_NUM);

Done
>
> ======
> src/backend/utils/cache/lsyscache.c
>
> 16.
> + if (isnull)
> + {
> + ReleaseSysCache(tup);
> + return NULL;
> + }
> +
> + *nspid = subform->subconflictnspid;
> + relname = pstrdup(TextDatumGetCString(datum));
> +
> + ReleaseSysCache(tup);
> +
> + return relname;
>
> It would be tidier to have a single release/return by coding this
> slightly differently.
>
> SUGGESTION:
>
> char *relname = NULL;
> ...
> if (!isnull)
> {
>   *nspid = subform->subconflictnspid;
>   relname = pstrdup(TextDatumGetCString(datum));
> }
>
> ReleaseSysCache(tup);
> return relname;

Right, changed it.

> ======
> src/include/catalog/pg_subscription.h
>
> 17.
> + Oid subconflictnspid; /* Namespace Oid in which the conflict history
> + * table is created. */
>
> Would it be better to make these 2 new member names more alike, since
> they go together. e.g.
> confl_table_nspid
> confl_table_name

In pg_subscription.h all field follows same convention without "_" so
I have changed to

subconflictlognspid
subconflictlogtable


> ======
> src/include/replication/conflict.h
>
> 18.
> +#define MAX_CONFLICT_ATTR_NUM 15
>
> I felt this doesn't really belong here. Just define it atop/within the
> function InsertConflictLog()

Done
> ~~~
>
> 19.
>  extern void InitConflictIndexes(ResultRelInfo *relInfo);
> +
>  #endif
>
> Spurious whitespace change not needed for this patch.

Fixed

> ======
> src/test/regress/sql/subscription.sql
>
> 20.
> How about adding some more test scenarios:
> e.g.1. ALTER the conflict log table of some subscription that already has one
> e.g.2. Have multiple subscriptions that specify the same conflict log table

Added

Pending:
1) fixed review comments of 0002 and 0003
2) Need to add replica identity tuple instead of full tuple - reported by Shveta
3) Keeping the logs in case of outer transaction failure by moving log
insertion outside the main transaction - reported by Shveta

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Thanks for addressing all my previous review comment of v4.

Here are some more comments for the latest  patch v5-0001.

======
GENERAL

1.
There are still a couple of place remainig where this new table was
not consistent called a "Conflict Log Table" (e.g. search for
"history")

e.g. Subject: [PATCH v5] Add configurable conflict log history table
for Logical Replication
e.g. + /* Insert conflict details to log history table. */
e.g. +-- CONFLICT LOG HISTORY TABLE TESTS

~~~

2.
Is automatically dropping the log tables always what the user might
want to happen? Maybe someone want them lying around afterwards for
later analysis -- I don't really know the answer; Just wondering if
this is (a) good to be tidy or (b) bad to remove user flexibility. Or
maybe the answer is leave if but make sure to add more documentation
to say "if you are going to want to do some post analysis then be sure
to copy this table data before it gets automatically dropped".

======
Commit message.

3.
User-Defined Table: The conflict log is stored in a user-managed table
rather than a system catalog.

~

I felt "User-defined" makes it sound like the user does CREATE TABLE
themselves and has some control over the schema. Maybe say
"User-Managed Table:" instead?

======
src/backend/commands/subscriptioncmds.c

4.
 #define SUBOPT_LSN 0x00010000
 #define SUBOPT_ORIGIN 0x00020000
+#define SUBOPT_CONFLICT_LOG_TABLE 0x00040000

Whitespace alignment.

~~~

AlterSubscription:

5.
+ values[Anum_pg_subscription_subconflictlognspid - 1] =
+ ObjectIdGetDatum(nspid);
+ values[Anum_pg_subscription_subconflictlogtable - 1] =
+ CStringGetTextDatum(relname);
+
+ replaces[Anum_pg_subscription_subconflictlognspid - 1] = true;
+ replaces[Anum_pg_subscription_subconflictlogtable - 1] = true;

Something feels back-to-front, because if the same clt is being
re-used (like the NOTICE part taht follows) then why do you need to
reassign and say replaces[] = true here?

~~~

6.
+ /*
+ * If the subscription already has the conflict log table
+ * set to the exact same name and namespace currently being
+ * specified, and that table exists, just give notice and
+ * skip creation.
+ */

Is there a simpler way to say the same thing?

SUGGESTION
If the subscription already uses this conflict log table and it
exists, just issue a notice.

~~~

7.
+ ereport(NOTICE,
+ (errmsg("skipping table creation because \"%s.%s\" is already set as
conflict log table",
+ nspname, relname)));

I wasn't sure you need to say "skipping table creation because"... it
seems kind of internal details. How about just:

\"%s.%s\" is already in use as the conflict log table for this subscription

~~~

8.
+ /*
+ * Drop the existing conflict log table if we are
+ * setting a new table.
+ */

The comment didn't feel right by implying there is something to drop.

SUGGESTION
Create the conflict log table after dropping any pre-existing one.

~~~

drop_conflict_log_table:

9.
+ /* Drop the conflict log table if it exist. */

typo: /exist./exists./

======
src/backend/replication/logical/conflict.c

10.
+static Datum
+tuple_table_slot_to_json_datum(TupleTableSlot *slot)
+{
+ HeapTuple tuple = ExecCopySlotHeapTuple(slot);
+ Datum datum = heap_copy_tuple_as_datum(tuple, slot->tts_tupleDescriptor);
+ Datum json;
+
+ if (TupIsNull(slot))
+ return 0;
+
+ json = DirectFunctionCall1(row_to_json, datum);
+ heap_freetuple(tuple);
+
+ return json;
+}

Bug? Shouldn't that TupIsNull(slot) check *precede* using that slot
for the tuple/datum assignments?

~~~

insert_conflict_log:

11.
+ Datum values[MAX_CONFLICT_ATTR_NUM];
+ bool nulls[MAX_CONFLICT_ATTR_NUM];
+ Oid nspid;
+ Oid relid;
+ Relation conflictrel = NULL;
+ int attno;
+ int options = HEAP_INSERT_NO_LOGICAL;
+ char    *relname;
+ char    *origin = NULL;
+ char    *remote_origin = NULL;
+ HeapTuple tup;

I felt some of these var names can be confusing:

11A.
e.g. "conflictlogrel" (instead of 'conflictrel') would emphasise this
is the rel of the log file, not the rel that encountered a conflict.

~

11B.
Similarly, maybe 'relname' could be 'conflictlogtable', which is also
what it was called elsewhere.

~

11C.
AFAICT, the 'relid' is really the relid of the conflict log. So, maybe
name it as it 'confliglogreid', otherwise it seems confusing when
there is already parameter called 'rel' that is unrelated to thia
'relid'.

~~~

12.
+ if (searchslot != NULL)
+ values[attno++] = tuple_table_slot_to_json_datum(searchslot);
+ else
+ nulls[attno++] = true;
+
+ if (localslot != NULL)
+ values[attno++] = tuple_table_slot_to_json_datum(localslot);
+ else
+ nulls[attno++] = true;
+
+ if (remoteslot != NULL)
+ values[attno++] = tuple_table_slot_to_json_datum(remoteslot);
+ else
+ nulls[attno++] = true;

That function tuple_table_slot_to_json_datum() has potential to return
0. Is that something that needs checking, so you can assign nulls[] =
true?

======
src/backend/replication/logical/worker.c

13.
+char *
+get_subscription_conflict_log_table(Oid subid, Oid *nspid)
+{
+ HeapTuple tup;
+ Datum datum;
+ bool isnull;
+ char    *relname = NULL;
+ Form_pg_subscription subform;
+
+ tup = SearchSysCache1(SUBSCRIPTIONOID, ObjectIdGetDatum(subid));
+
+ if (!HeapTupleIsValid(tup))
+ return NULL;
+
+ subform = (Form_pg_subscription) GETSTRUCT(tup);
+
+ /* Get conflict log table name. */
+ datum = SysCacheGetAttr(SUBSCRIPTIONOID,
+ tup,
+ Anum_pg_subscription_subconflictlogtable,
+ &isnull);
+ if (!isnull)
+ {
+ *nspid = subform->subconflictlognspid;
+ relname = pstrdup(TextDatumGetCString(datum));
+ }
+
+ ReleaseSysCache(tup);
+ return relname;
+}

You could consider assigning *nspid = InvalidOid when 'isnull' is
true, so then you don't have to rely on the caller pre-assigning a
default sane value. YMMV.

======
src/bin/psql/tab-complete.in.c

14.
- COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
+ COMPLETE_WITH("binary", "connect", "conflict_log_table",
"copy_data", "create_slot",

'conflict_log_table' comes before 'connect' alphabetically.

======
src/test/regress/sql/subscription.sql

15.
+-- ok - change the conlfict log table name for existing subscription
already had old table
+ALTER SUBSCRIPTION regress_conflict_test2 SET (conflict_log_table =
'public.regress_conflict_log3');
+SELECT subname, subconflictlogtable, subconflictlognspid = (SELECT
oid FROM pg_namespace WHERE nspname = 'public') AS is_public_schema
+FROM pg_subscription WHERE subname = 'regress_conflict_test2';
+

typos in comment.
- /conlfict/conlflict/
- /for existing subscription already had old table/for an existing
subscription that already had one/

~~~

16.
+-- check new table should be created and old should be dropped

SUGGESTION
check the new table was created and the old table was dropped

~~~

17.
+-- ok (NOTICE) - try to set the conflict log table which is used by
same subscription
+ALTER SUBSCRIPTION regress_conflict_test2 SET (conflict_log_table =
'public.regress_conflict_log3');
+
+-- fail - try to use the conflict log table being used by some other
subscription
+ALTER SUBSCRIPTION regress_conflict_test2 SET (conflict_log_table =
'public.regress_conflict_log1');

Make those 2 comment more alike:

SUGGESTIONS
-- ok (NOTICE) - set conflict_log_table to one already used by this subscription
...
-- fail - set conflict_log_table to one already used by a different subscription

~~~

18.
Missing tests for describe \dRs+.

e.g. there are already dozens of \dRs+ examples where there is no clt
assigned, but I did not see any tests where the clt *is* assigned.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Nov 20, 2025 at 5:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
I was working on these pending items, there is something where I got
stuck, I am exploring this more but would like to share the problem.

> 2) Need to add replica identity tuple instead of full tuple - reported by Shveta
I have worked on fixing this along with other comments by Peter, now
we can see only RI tuple is inserted as part of the key_tuple, IMHO
lets keep the name as key tuple as it will use the primary key or
unique key if no explicit replicate identity is set, thoughts?

postgres[3048044]=# select * from myschema.conflict_log_history2;
 relid | schemaname | relname |     conflict_type     | local_xid |
remote_xid | remote_commit_lsn |        local_commit_ts        |
remote_commit_ts        | local_o
rigin | remote_origin | key_tuple |  local_tuple   |  remote_tuple

-------+------------+---------+-----------------------+-----------+------------+-------------------+-------------------------------+-------------------------------+--------
------+---------------+-----------+----------------+----------------
 16385 | public     | test    | update_origin_differs |       765 |
    759 | 0/0174F2E8        | 2025-11-24 06:16:50.468263+00 |
2025-11-24 06:16:55.483507+00 |
      | pg_16396      | {"a":1}   | {"a":1,"b":10} | {"a":1,"b":20}

Now pending work status
1) fixed review comments of 0002 and 0003 - Pending
2) Need to add replica identity tuple instead of full tuple -- Done
3) Keeping the logs in case of outer transaction failure by moving log
insertion outside the main transaction - reported by Shveta - Pending
4) Run pgindent -- planning to do it after we complete the first level
of review - Pending
5) Subscription test cases for logging the actual conflicts - Pending



--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi Dilip.

Here are a couple of review comments for v6-0001.

======
GENERAL.

1.
Firstly, here is one of my "what if" ideas...

The current patch is described as making a "structured, queryable
record of all logical replication conflicts".

What if we go bigger than that? What if this were made a more generic
"structured, queryable record of logical replication activity"?

AFAIK, there don't have to be too many logic changes to achieve this.
e.g. I'm imagining it is mostly:

* Rename the subscription parameter "conflict_log_table" to
"log_table" or similar.
* Remove/modify the "conflict_" name part from many of the variables
and function names.
* Add another 'type' column to the log table -- e.g. everything this
patch writes can be type="CONFL", or type='c', or whatever.
* Maybe tweak/add some of the other columns for more generic future use

Anyway, it might be worth considering this now, before everything
becomes set in stone with a conflict-only focus, making it too
difficult to add more potential/unknown log table enhancements later.

Thoughts?

======
src/backend/replication/logical/conflict.c

2.
+#include "funcapi.h"
+#include "funcapi.h"

double include of the same header.

~~~

3.
+static Datum tuple_table_slot_to_ri_json_datum(EState *estate,
+    Relation localrel,
+    Oid replica_index,
+    TupleTableSlot *slot);
+
+static void insert_conflict_log(EState *estate, Relation rel,
+ TransactionId local_xid,
+ TimestampTz local_ts,
+ ConflictType conflict_type,
+ RepOriginId origin_id,
+ TupleTableSlot *searchslot,
+ TupleTableSlot *localslot,
+ TupleTableSlot *remoteslot);

There were no spaces between any of the other static declarations, so
why is this one different?

~~~

insert_conflict_log:

insert_conflict_log:

4.
+#define MAX_CONFLICT_ATTR_NUM 15
+ Datum values[MAX_CONFLICT_ATTR_NUM];
+ bool nulls[MAX_CONFLICT_ATTR_NUM];
+ Oid nspid;
+ Oid confliglogreid;
+ Relation conflictlogrel = NULL;
+ int attno;
+ int options = HEAP_INSERT_NO_LOGICAL;
+ char    *conflictlogtable;
+ char    *origin = NULL;
+ char    *remote_origin = NULL;
+ HeapTuple tup;

Typo: Oops. Looks like that typo originated from my previous review
comment, and you took it as-is.

/confliglogreid/confliglogrelid/

~~~

5.
+ if (searchslot != NULL && !TupIsNull(searchslot))
  {
- tableslot = table_slot_create(localrel, &estate->es_tupleTable);
- tableslot = ExecCopySlot(tableslot, slot);
+ Oid replica_index = GetRelationIdentityOrPK(rel);
+
+ /*
+ * If the table has a valid replica identity index, build the index
+ * json datum from key value. Otherwise, construct it from the complete
+ * tuple in REPLICA IDENTITY FULL cases.
+ */
+ if (OidIsValid(replica_index))
+ values[attno++] = tuple_table_slot_to_ri_json_datum(estate, rel,
+ replica_index,
+ searchslot);
+ else
+ values[attno++] = tuple_table_slot_to_json_datum(searchslot);
  }
+ else
+ nulls[attno++] = true;

- /*
- * Initialize ecxt_scantuple for potential use in FormIndexDatum when
- * index expressions are present.
- */
- GetPerTupleExprContext(estate)->ecxt_scantuple = tableslot;
+ if (localslot != NULL && !TupIsNull(localslot))
+ values[attno++] = tuple_table_slot_to_json_datum(localslot);
+ else
+ nulls[attno++] = true;

- /*
- * The values/nulls arrays passed to BuildIndexValueDescription should be
- * the results of FormIndexDatum, which are the "raw" input to the index
- * AM.
- */
- FormIndexDatum(BuildIndexInfo(indexDesc), tableslot, estate, values, isnull);
+ if (remoteslot != NULL && !TupIsNull(remoteslot))
+ values[attno++] = tuple_table_slot_to_json_datum(remoteslot);
+ else
+ nulls[attno++] = true;

AFAIK, the TupIsNull() already includes the NULL check anyway, so you
don't need to double up those. I saw at least 3 conditions above where
the code could be simpler. e.g.

BEFORE
+ if (remoteslot != NULL && !TupIsNull(remoteslot))

SUGGESTION
if (!TupIsNull(remoteslot))

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Nov 25, 2025 at 9:03 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Dilip.
>
> Here are a couple of review comments for v6-0001.
>
> ======
> GENERAL.
>
> 1.
> Firstly, here is one of my "what if" ideas...
>
> The current patch is described as making a "structured, queryable
> record of all logical replication conflicts".
>
> What if we go bigger than that? What if this were made a more generic
> "structured, queryable record of logical replication activity"?
>
> AFAIK, there don't have to be too many logic changes to achieve this.
> e.g. I'm imagining it is mostly:
>
> * Rename the subscription parameter "conflict_log_table" to
> "log_table" or similar.
> * Remove/modify the "conflict_" name part from many of the variables
> and function names.
> * Add another 'type' column to the log table -- e.g. everything this
> patch writes can be type="CONFL", or type='c', or whatever.
> * Maybe tweak/add some of the other columns for more generic future use
>
> Anyway, it might be worth considering this now, before everything
> becomes set in stone with a conflict-only focus, making it too
> difficult to add more potential/unknown log table enhancements later.
>
> Thoughts?

Yeah that's an interesting thought for sure, but honestly I believe
the conflict log table only for storing the conflict and conflict
resolution related data is standard followed across the databases who
provide active-active setup e.g. Oracle Golden Gate, BDR, pg active,
so IMHO to keep the feature clean and focused, we should follow the
same.

I will work on other review comments and post the patch soon.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Nov 25, 2025 at 1:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>

On a separate note, I've been considering how to manage conflict log
insertions when an error causes the outer transaction to abort, which
seems to be a non-trivial.

Here is what I have in mind:
======================
First, prepare_conflict_log() would be executed from
ReportApplyConflict(). This function would handle all preliminary
work, such as preparing the tuple for the conflict log table. Second,
insert_conflict_log() would be executed. If the error level in
ReportApplyConflict() is LOG, the insertion would occur directly.
Otherwise, the log information would be stored in a global variable
and inserted in a separate transaction once we exit start_apply() due
to the error.

@shveta malik @Amit Kapila let me know what you think?  Or do you
think it can be simplified?


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Nov 25, 2025 at 4:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 25, 2025 at 1:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
>
> On a separate note, I've been considering how to manage conflict log
> insertions when an error causes the outer transaction to abort, which
> seems to be a non-trivial.
>
> Here is what I have in mind:
> ======================
> First, prepare_conflict_log() would be executed from
> ReportApplyConflict(). This function would handle all preliminary
> work, such as preparing the tuple for the conflict log table. Second,
> insert_conflict_log() would be executed. If the error level in
> ReportApplyConflict() is LOG, the insertion would occur directly.
> Otherwise, the log information would be stored in a global variable
> and inserted in a separate transaction once we exit start_apply() due
> to the error.
>
> @shveta malik @Amit Kapila let me know what you think?  Or do you
> think it can be simplified?

While digging more into this I am wondering why
CT_MULTIPLE_UNIQUE_CONFLICTS is reported as an error and all other
conflicts as LOG?

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Tue, Nov 25, 2025 at 4:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 25, 2025 at 1:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
>
> On a separate note, I've been considering how to manage conflict log
> insertions when an error causes the outer transaction to abort, which
> seems to be a non-trivial.
>
> Here is what I have in mind:
> ======================
> First, prepare_conflict_log() would be executed from
> ReportApplyConflict(). This function would handle all preliminary
> work, such as preparing the tuple for the conflict log table. Second,
> insert_conflict_log() would be executed. If the error level in
> ReportApplyConflict() is LOG, the insertion would occur directly.
> Otherwise, the log information would be stored in a global variable
> and inserted in a separate transaction once we exit start_apply() due
> to the error.
>
> @shveta malik @Amit Kapila let me know what you think?  Or do you
> think it can be simplified?
>

I could not think of a better way. This idea works for me. I had
doubts if it will be okay to start a new transaction in catch-block
(if we plan to do it in start_apply's), but then I found few other
functions doing it (see do_autovacuum, perform_work_item,
_SPI_commit). So IMO, we should be good.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Nov 26, 2025 at 2:05 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Nov 25, 2025 at 4:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Nov 25, 2025 at 1:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> >
> > On a separate note, I've been considering how to manage conflict log
> > insertions when an error causes the outer transaction to abort, which
> > seems to be a non-trivial.
> >
> > Here is what I have in mind:
> > ======================
> > First, prepare_conflict_log() would be executed from
> > ReportApplyConflict(). This function would handle all preliminary
> > work, such as preparing the tuple for the conflict log table. Second,
> > insert_conflict_log() would be executed. If the error level in
> > ReportApplyConflict() is LOG, the insertion would occur directly.
> > Otherwise, the log information would be stored in a global variable
> > and inserted in a separate transaction once we exit start_apply() due
> > to the error.
> >
> > @shveta malik @Amit Kapila let me know what you think?  Or do you
> > think it can be simplified?
> >
>
> I could not think of a better way. This idea works for me. I had
> doubts if it will be okay to start a new transaction in catch-block
> (if we plan to do it in start_apply's), but then I found few other
> functions doing it (see do_autovacuum, perform_work_item,
> _SPI_commit). So IMO, we should be good.
>

On re-reading, I think you were not suggesting to handle it in the
CATCH block. Where exactly once we exit start_apply?
But since the situation will arise only in case of ERROR, I thought
handling in catch-block could be one option.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Nov 26, 2025 at 4:15 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Nov 26, 2025 at 2:05 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Tue, Nov 25, 2025 at 4:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Nov 25, 2025 at 1:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > >
> > > On a separate note, I've been considering how to manage conflict log
> > > insertions when an error causes the outer transaction to abort, which
> > > seems to be a non-trivial.
> > >
> > > Here is what I have in mind:
> > > ======================
> > > First, prepare_conflict_log() would be executed from
> > > ReportApplyConflict(). This function would handle all preliminary
> > > work, such as preparing the tuple for the conflict log table. Second,
> > > insert_conflict_log() would be executed. If the error level in
> > > ReportApplyConflict() is LOG, the insertion would occur directly.
> > > Otherwise, the log information would be stored in a global variable
> > > and inserted in a separate transaction once we exit start_apply() due
> > > to the error.
> > >
> > > @shveta malik @Amit Kapila let me know what you think?  Or do you
> > > think it can be simplified?
> > >
> >
> > I could not think of a better way. This idea works for me. I had
> > doubts if it will be okay to start a new transaction in catch-block
> > (if we plan to do it in start_apply's), but then I found few other
> > functions doing it (see do_autovacuum, perform_work_item,
> > _SPI_commit). So IMO, we should be good.
> >
>
> On re-reading, I think you were not suggesting to handle it in the
> CATCH block. Where exactly once we exit start_apply?
> But since the situation will arise only in case of ERROR, I thought
> handling in catch-block could be one option.

Yeah it makes sense to handle in catch block, I have done that in the
attached patch and also handled other comments by Peter.

Now pending work status
1) fixed review comments of 0002 and 0003 - Pending
2) Need to add replica identity tuple instead of full tuple -- Done
3) Keeping the logs in case of outer transaction failure by moving log
insertion outside the main transaction - reported by Shveta - Done
(might need more validation and testing)
4) Run pgindent -- planning to do it after we complete the first level
of review - Pending
5) Subscription test cases for logging the actual conflicts - Pending

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi Dilip. Some review comments for v7-0001.

======
src/backend/replication/logical/conflict.c

1.
+ /* Insert conflict details to conflict log table. */
+ if (conflictlogrel)
+ {
+ /*
+ * Prepare the conflict log tuple. If the error level is below
+ * ERROR, insert it immediately. Otherwise, defer the insertion to
+ * a new transaction after the current one aborts, ensuring the log
+ * tuple is not rolled back.
+ */
+ conflictlogtuple = prepare_conflict_log_tuple(estate,
+ relinfo->ri_RelationDesc,
+ conflictlogrel,
+ conflicttuple->xmin,
+ conflicttuple->ts, type,
+ conflicttuple->origin,
+ searchslot, conflicttuple->slot,
+ remoteslot);
+ if (elevel < ERROR)
+ {
+ InsertConflictLogTuple(conflictlogrel, conflictlogtuple);
+ heap_freetuple(conflictlogtuple);
+ }
+ else
+ MyLogicalRepWorker->conflict_log_tuple = conflictlogtuple;
+
+ table_close(conflictlogrel, AccessExclusiveLock);
+ }
+ }
+

IMO, some refactoring would help simplify conflictlogtuple processing. e.g.

i)   You don't need any separate 'conflictlogtuple' var
- Use MyLogicalRepWorker->conflict_log_tuple always for this purpose
ii)  prepare_conflict_log_tuple()
- Change this to a void; it will always side-effect
MyLogicalRepWorker->conflict_log_tuple
- Assert MyLogicalRepWorker->conflict_log_tuple must be NULL on entry
iii) InsertConflictLogTuple()
- The 2nd param it not needed if you always use
MyLogicalRepWorker->conflict_log_tuple
- Asserts MyLogicalRepWorker->conflict_log_tuple is not NULL, then writes it
- BTW, I felt that heap_freetuple could also be done here too
- Finally, sets to MyLogicalRepWorker->conflict_log_tuple to NULL
(ready for the next conflict)

~~~

InsertConflictLogTuple:

2.
+/*
+ * InsertConflictLogTuple
+ *
+ * Persistently records the input conflict log tuple into the conflict log
+ * table. It uses HEAP_INSERT_NO_LOGICAL to explicitly block logical decoding
+ * of the tuple inserted into the conflict log table.
+ */
+void
+InsertConflictLogTuple(Relation conflictlogrel, HeapTuple tup)
+{
+ int options = HEAP_INSERT_NO_LOGICAL;
+
+ heap_insert(conflictlogrel, tup, GetCurrentCommandId(true), options, NULL);
+}

See the above review comment (iii), for some suggested changes to this function.

~~~

prepare_conflict_log_tuple:

3.
+ * The caller is responsible for explicitly freeing the returned heap tuple
+ * after inserting.
+ */
+static HeapTuple
+prepare_conflict_log_tuple(EState *estate, Relation rel,

As per the above review comment (iii), I thought the Insert function
could handle the freeing.

~~~

4.
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+ tup = heap_form_tuple(RelationGetDescr(conflictlogrel), values, nulls);
+ MemoryContextSwitchTo(oldctx);

- return index_value;
+ return tup;

Per the above comment (ii), change this to assign to
MyLogicalRepWorker->conflict_log_tuple.

======
src/backend/replication/logical/worker.c

start_apply:

5.
+ /*
+ * Insert the pending conflict log tuple under a new transaction.
+ */

/Insert the/Insert any/

~~~

6.
+ InsertConflictLogTuple(conflictlogrel,
+    MyLogicalRepWorker->conflict_log_tuple);
+ heap_freetuple(MyLogicalRepWorker->conflict_log_tuple);
+ MyLogicalRepWorker->conflict_log_tuple = NULL;

Per earlier reqview comment (iii), remove the 2nd parm to
InsertConflictLogTuple, and those other 2 statements can also be
handled within InsertConflictLogTuple.

======
src/include/replication/worker_internal.h

7.
+ /* Store conflict log tuple to be inserted before worker exit. */
+ HeapTuple conflict_log_tuple;
+

Per my above suggestions, this member comment becomes something more
like "A conflict log tuple which is prepared but not yet written. */

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Nov 27, 2025 at 6:30 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Dilip. Some review comments for v7-0001.
>
> ======
> src/backend/replication/logical/conflict.c
>
> 1.
> + /* Insert conflict details to conflict log table. */
> + if (conflictlogrel)
> + {
> + /*
> + * Prepare the conflict log tuple. If the error level is below
> + * ERROR, insert it immediately. Otherwise, defer the insertion to
> + * a new transaction after the current one aborts, ensuring the log
> + * tuple is not rolled back.
> + */
> + conflictlogtuple = prepare_conflict_log_tuple(estate,
> + relinfo->ri_RelationDesc,
> + conflictlogrel,
> + conflicttuple->xmin,
> + conflicttuple->ts, type,
> + conflicttuple->origin,
> + searchslot, conflicttuple->slot,
> + remoteslot);
> + if (elevel < ERROR)
> + {
> + InsertConflictLogTuple(conflictlogrel, conflictlogtuple);
> + heap_freetuple(conflictlogtuple);
> + }
> + else
> + MyLogicalRepWorker->conflict_log_tuple = conflictlogtuple;
> +
> + table_close(conflictlogrel, AccessExclusiveLock);
> + }
> + }
> +
>
> IMO, some refactoring would help simplify conflictlogtuple processing. e.g.
>
> i)   You don't need any separate 'conflictlogtuple' var
> - Use MyLogicalRepWorker->conflict_log_tuple always for this purpose
> ii)  prepare_conflict_log_tuple()
> - Change this to a void; it will always side-effect
> MyLogicalRepWorker->conflict_log_tuple
> - Assert MyLogicalRepWorker->conflict_log_tuple must be NULL on entry
> iii) InsertConflictLogTuple()
> - The 2nd param it not needed if you always use
> MyLogicalRepWorker->conflict_log_tuple
> - Asserts MyLogicalRepWorker->conflict_log_tuple is not NULL, then writes it
> - BTW, I felt that heap_freetuple could also be done here too
> - Finally, sets to MyLogicalRepWorker->conflict_log_tuple to NULL
> (ready for the next conflict)
>
> ~~~
>
> InsertConflictLogTuple:
>
> 2.
> +/*
> + * InsertConflictLogTuple
> + *
> + * Persistently records the input conflict log tuple into the conflict log
> + * table. It uses HEAP_INSERT_NO_LOGICAL to explicitly block logical decoding
> + * of the tuple inserted into the conflict log table.
> + */
> +void
> +InsertConflictLogTuple(Relation conflictlogrel, HeapTuple tup)
> +{
> + int options = HEAP_INSERT_NO_LOGICAL;
> +
> + heap_insert(conflictlogrel, tup, GetCurrentCommandId(true), options, NULL);
> +}
>
> See the above review comment (iii), for some suggested changes to this function.
>
> ~~~
>
> prepare_conflict_log_tuple:
>
> 3.
> + * The caller is responsible for explicitly freeing the returned heap tuple
> + * after inserting.
> + */
> +static HeapTuple
> +prepare_conflict_log_tuple(EState *estate, Relation rel,
>
> As per the above review comment (iii), I thought the Insert function
> could handle the freeing.
>
> ~~~
>
> 4.
> + oldctx = MemoryContextSwitchTo(ApplyContext);
> + tup = heap_form_tuple(RelationGetDescr(conflictlogrel), values, nulls);
> + MemoryContextSwitchTo(oldctx);
>
> - return index_value;
> + return tup;
>
> Per the above comment (ii), change this to assign to
> MyLogicalRepWorker->conflict_log_tuple.
>
> ======
> src/backend/replication/logical/worker.c
>
> start_apply:
>
> 5.
> + /*
> + * Insert the pending conflict log tuple under a new transaction.
> + */
>
> /Insert the/Insert any/
>
> ~~~
>
> 6.
> + InsertConflictLogTuple(conflictlogrel,
> +    MyLogicalRepWorker->conflict_log_tuple);
> + heap_freetuple(MyLogicalRepWorker->conflict_log_tuple);
> + MyLogicalRepWorker->conflict_log_tuple = NULL;
>
> Per earlier reqview comment (iii), remove the 2nd parm to
> InsertConflictLogTuple, and those other 2 statements can also be
> handled within InsertConflictLogTuple.
>
> ======
> src/include/replication/worker_internal.h
>
> 7.
> + /* Store conflict log tuple to be inserted before worker exit. */
> + HeapTuple conflict_log_tuple;
> +
>
> Per my above suggestions, this member comment becomes something more
> like "A conflict log tuple which is prepared but not yet written. */
>

I have fixed all these comments and also the comments of 0002, now I
feel we can actually merge 0001 and 0002, so I have merged both of
them.

Now pending work status
1) fixed review comments of 0003
2) Run pgindent -- planning to do it after we complete the first level
of review
3) Subscription TAP test for logging the actual conflicts

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi Dilip.

Some review comments for v8-0001.

======
Commit message

1.
When the patches 0001 and 0002 got merged, I think the commit message
should have been updated also to say something along the lines of:

When ALL TABLES or ALL TABLES IN SCHEMA is used with publication won't
publish the clt.

======
src/backend/catalog/pg_publication.c

check_publication_add_relation:

2.
+ /* Can't be conflict log table */
+ if (IsConflictLogRelid(RelationGetRelid(targetrel)))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot add relation \"%s\" to publication",
+ RelationGetRelationName(targetrel)),
+ errdetail("This operation is not supported for conflict log tables.")));

Should it also show the schema name of the clt in the message?

======
src/backend/commands/subscriptioncmds.c

3.
+/*
+ * Check if the specified relation is used as a conflict log table by any
+ * subscription.
+ */
+bool
+IsConflictLogRelid(Oid relid)

Most places refer to the clt. Wondering if this function ought to be
called 'IsConflictLogTable'.

======
src/backend/replication/logical/conflict.c

InsertConflictLogTuple:

4.
+ /* A valid tuple must be prepared and store into MyLogicalRepWorker. */

typo: /store into/stored in/

~~~

prepare_conflict_log_tuple:

5.
- index_close(indexDesc, NoLock);
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+ tup = heap_form_tuple(RelationGetDescr(conflictlogrel), values, nulls);
+ MemoryContextSwitchTo(oldctx);

- return index_value;
+ /* Store conflict_log_tuple into the worker slot for inserting it later. */
+ MyLogicalRepWorker->conflict_log_tuple = tup;

5a.
I don't think you need the 'tup' variable. Just assign directly to
MyLogicalRepWorker->conflict_log_tuple.

~

5b.
"worker slot" -- I don't think this is a "slot".

======
src/backend/replication/logical/worker.c

6.
+ /* Open conflict log table. */
+ conflictlogrel = GetConflictLogTableRel();
+ InsertConflictLogTuple(conflictlogrel);
+ MyLogicalRepWorker->conflict_log_tuple = NULL;
+ table_close(conflictlogrel, AccessExclusiveLock);

Maybe that comment should say:
/* Open conflict log table and write the tuple. */


======
src/include/replication/conflict.h

7.
+ /* A conflict log tuple which is prepared but not yet inserted. */
+ HeapTuple conflict_log_tuple;
+

typo: /which/that/  (sorry, this one is my bad from a previous review comment)


======
src/test/regress/expected/subscription.out

8.
+-- ok - change the conflict log table name for an existing
subscription that already had one
+CREATE SCHEMA clt;
+ALTER SUBSCRIPTION regress_conflict_test2 SET (conflict_log_table =
'clt.regress_conflict_log3');
+SELECT subname, subconflictlogtable, subconflictlognspid = (SELECT
oid FROM pg_namespace WHERE nspname = 'public') AS is_public_schema
+FROM pg_subscription WHERE subname = 'regress_conflict_test2';
+        subname         |  subconflictlogtable  | is_public_schema
+------------------------+-----------------------+------------------
+ regress_conflict_test2 | regress_conflict_log3 | f
+(1 row)
+
+\dRs+
+

                    List of subscriptions
+          Name          |           Owner           | Enabled |
Publication | Binary | Streaming | Two-phase commit | Disable on error
| Origin | Password required | Run as owner? | Failover | Retain dead
tuples | Max retention duration | Retention active | Synchronous
commit |          Conninfo           |  Skip LSN  |  Conflict log
table

+------------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------+------------------+--------------------+-----------------------------+------------+-----------------------
+ regress_conflict_test1 | regress_subscription_user | f       |
{testpub}   | f      | parallel  | d                | f
| any    | t                 | f             | f        | f
      |                      0 | f                | off
| dbname=regress_doesnotexist | 0/00000000 | regress_conflict_log1
+ regress_conflict_test2 | regress_subscription_user | f       |
{testpub}   | f      | parallel  | d                | f
| any    | t                 | f             | f        | f
      |                      0 | f                | off
| dbname=regress_doesnotexist | 0/00000000 | regress_conflict_log3
+(2 rows)

~

After going to the trouble of specifying the CLT on a different
schema, that information is lost by the \dRs+. How about also showing
the CLT schema name (at least when it is not "public") in the \dRs+
output.

~~~

9.
+-- ok - conflict_log_table should not be published with ALL TABLE
+CREATE PUBLICATION pub FOR TABLES IN SCHEMA clt;
+SELECT * FROM pg_publication_tables WHERE pubname = 'pub';
+ pubname | schemaname | tablename | attnames | rowfilter
+---------+------------+-----------+----------+-----------
+(0 rows)

Perhaps you should repeat this same test but using FOR ALL TABLES,
instead of only FOR TABLES IN SCHEMA

======
src/test/regress/sql/subscription.sql

10.
In one of the tests, you could call the function
pg_relation_is_publishable(clt) to verify that it returns false.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Thu, 27 Nov 2025 at 17:50, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 27, 2025 at 6:30 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> I have fixed all these comments and also the comments of 0002, now I
> feel we can actually merge 0001 and 0002, so I have merged both of
> them.

I just started to have a look at the patch, while using I found lock
level used is not correct:
I felt the reason is that table is opened with RowExclusiveLock but
closed in AccessExclusiveLock:

+       /* If conflict log table is not set for the subscription just return. */
+       conflictlogtable = get_subscription_conflict_log_table(
+
MyLogicalRepWorker->subid, &nspid);
+       if (conflictlogtable == NULL)
+       {
+               pfree(conflictlogtable);
+               return NULL;
+       }
+
+       conflictlogrelid = get_relname_relid(conflictlogtable, nspid);
+       if (OidIsValid(conflictlogrelid))
+               conflictlogrel = table_open(conflictlogrelid, RowExclusiveLock);

....
+                       if (elevel < ERROR)
+                               InsertConflictLogTuple(conflictlogrel);
+
+                       table_close(conflictlogrel, AccessExclusiveLock);
....

2025-11-28 12:17:55.631 IST [504133] WARNING:  you don't own a lock of
type AccessExclusiveLock
2025-11-28 12:17:55.631 IST [504133] CONTEXT:  processing remote data
for replication origin "pg_16402" during message type "INSERT" for
replication target relation "public.t1" in transaction 761, finished
at 0/01789AB8
2025-11-28 12:17:58.033 IST [504133] WARNING:  you don't own a lock of
type AccessExclusiveLock
2025-11-28 12:17:58.033 IST [504133] ERROR:  conflict detected on
relation "public.t1": conflict=insert_exists
2025-11-28 12:17:58.033 IST [504133] DETAIL:  Key already exists in
unique index "t1_pkey", modified in transaction 766.
        Key (c1)=(1); existing local row (1, 1); remote row (1, 1).
2025-11-28 12:17:58.033 IST [504133] CONTEXT:  processing remote data
for replication origin "pg_16402" during message type "INSERT" for
replication target relation "public.t1" in transaction 761, finished
at 0/01789AB8

Regards,
Vignesh



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Thu, Nov 27, 2025 at 5:50 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
>
> I have fixed all these comments and also the comments of 0002, now I
> feel we can actually merge 0001 and 0002, so I have merged both of
> them.
>
> Now pending work status
> 1) fixed review comments of 0003
> 2) Run pgindent -- planning to do it after we complete the first level
> of review
> 3) Subscription TAP test for logging the actual conflicts
>

Thanks  for the patch. A few observations:

1)
It seems, as per LOG, 'key' and 'replica-identity' are different when
it comes to insert_exists, update_exists and
multiple_unique_conflicts, while I believe in CLT, key is
replica-identity i.e. there are no 2 separate terms. Please see below:

a)
Update_Exists:
2025-11-28 14:08:56.179 IST [60383] ERROR:  conflict detected on
relation "public.tab1": conflict=update_exists
2025-11-28 14:08:56.179 IST [60383] DETAIL:  Key already exists in
unique index "tab1_pkey", modified locally in transaction 790 at
2025-11-28 14:07:17.578887+05:30.
Key (i)=(40); existing local row (40, 10); remote row (40, 200);
replica identity (i)=(20).

postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple
from clt where conflict_type='update_exists';
 conflict_type | key_tuple |   local_tuple   |   remote_tuple
---------------+-----------+-----------------+------------------
 update_exists | {"i":20}  | {"i":40,"j":10} | {"i":40,"j":200}

b)
insert_Exists:
ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
DETAIL:  Key already exists in unique index "tab1_pkey", modified
locally in transaction 767 at 2025-11-28 13:59:22.431097+05:30.
Key (i)=(30); existing local row (30, 10); remote row (30, 10).

postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple from clt;
 conflict_type  | key_tuple |   local_tuple   |  remote_tuple
----------------+-----------+-----------------+-----------------
 insert_exists  |               | {"i":30,"j":10} | {"i":30,"j":10}

case a) has key_tuple same as replica-identity of LOG
case b) does not have replica-identity and thus key_tuple is NULL.

Does that mean we need to maintain both key_tuple and RI separately in
CLT? Thoughts?


2)
For multiple_unique_conflict (testcase is same as I shared earlier),
it asserts here:
CONTEXT:  processing remote data for replication origin "pg_16390"
during message type "INSERT" for replication target relation
"public.conf_tab" in transaction 778, finished at 0/017E6DE8
TRAP: failed Assert("MyLogicalRepWorker->conflict_log_tuple == NULL"),
File: "conflict.c", Line: 749, PID: 60627

I have not checked it, but maybe
'MyLogicalRepWorker->conflict_log_tuple' is left over from the
previous few tests I tried?

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Tue, Nov 18, 2025 at 3:40 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > 3)
> > > We also need to think how we are going to display the info in case of
> > > multiple_unique_conflicts as there could be multiple local and remote
> > > tuples conflicting for one single operation. Example:
> > >
> > > create table conf_tab (a int primary key, b int unique, c int unique);
> > >
> > > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
> > >
> > > pub: insert into conf_tab values (2,3,4);
> > >
> > > ERROR:  conflict detected on relation "public.conf_tab":
> > > conflict=multiple_unique_conflicts
> > > DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> > > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> > > Key already exists in unique index "conf_tab_b_key", modified locally
> > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> > > Key already exists in unique index "conf_tab_c_key", modified locally
> > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> > > CONTEXT:  processing remote data for replication origin "pg_16392"
> > > during message type "INSERT" for replication target relation
> > > "public.conf_tab" in transaction 781, finished at 0/017FDDA0
> > >
> > > Currently in clt, we have singular terms such as 'key_tuple',
> > > 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> > > But it does not look reasonable to have multiple rows inserted for a
> > > single conflict raised. I will think more about this.
> >
> > Currently I am inserting multiple records in the conflict history
> > table, the same as each tuple is logged, but couldn't find any better
> > way for this.
> >

The biggest drawback of this approach is data bloat. The incoming data
row will be stored multiple times.

> > Another option is to use an array of tuples instead of a
> > single tuple but not sure this might make things more complicated to
> > process by any external tool.
>
> It’s arguable and hard to say what the correct behaviour should be.
> I’m slightly leaning toward having a single row per conflict.
>

Yeah, it is better to either have a single row per conflict or have
two tables conflict_history and conflict_history_details to avoid data
bloat as pointed above. For example, two-table approach could be:

1. The Header Table (Incoming Data)
This stores the data that tried to be applied.
SQL
CREATE TABLE conflict_header (
    conflict_id     SERIAL PRIMARY KEY,
    source_tx_id    VARCHAR(100),    -- Transaction ID from source
    table_name      VARCHAR(100),
    operation       CHAR(1),         -- 'I' for Insert
    incoming_data   JSONB,           -- Store the incoming row as JSON
...
);

2. The Detail Table (Existing Conflicting Data)
This stores the actual rows currently in the database that caused the
violations.
CREATE TABLE conflict_details (
    detail_id       SERIAL PRIMARY KEY,
    conflict_id     INT REFERENCES conflict_header(conflict_id),
    constraint_name/key_tuple VARCHAR(100),
    conflicting_row_data JSONB       -- The existing row in the DB
that blocked the insert
);

Please don't consider these exact columns; you can use something on
the lines of what is proposed in the patch. This is just to show how
the conflict data can be rearranged. Now, one argument against this is
that users need to use JOIN to query data but still better than
bloating the table. The idea to store in a single table could be
changed to have columns like violated_constraints TEXT[],      --
e.g., ['uk_email', 'uk_phone'], error_details   JSONB  -- e.g.,
[{"const": "uk_email", "val": "a@b.com"}, ...]. If we want to store
multiple conflicting tuples in a single column, we need to ensure it
is queryable via a JSONB column. The point in favour of a single JSONB
column to combine multiple conflicting tuples is that we need this
combination only for one kind of conflict.

Both the approaches have their pros and cons. I feel we should dig a
bit deeper for both by laying out details for each method and see what
others think.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Nov 28, 2025 at 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 18, 2025 at 3:40 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > 3)
> > > > We also need to think how we are going to display the info in case of
> > > > multiple_unique_conflicts as there could be multiple local and remote
> > > > tuples conflicting for one single operation. Example:
> > > >
> > > > create table conf_tab (a int primary key, b int unique, c int unique);
> > > >
> > > > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
> > > >
> > > > pub: insert into conf_tab values (2,3,4);
> > > >
> > > > ERROR:  conflict detected on relation "public.conf_tab":
> > > > conflict=multiple_unique_conflicts
> > > > DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> > > > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> > > > Key already exists in unique index "conf_tab_b_key", modified locally
> > > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> > > > Key already exists in unique index "conf_tab_c_key", modified locally
> > > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> > > > CONTEXT:  processing remote data for replication origin "pg_16392"
> > > > during message type "INSERT" for replication target relation
> > > > "public.conf_tab" in transaction 781, finished at 0/017FDDA0
> > > >
> > > > Currently in clt, we have singular terms such as 'key_tuple',
> > > > 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> > > > But it does not look reasonable to have multiple rows inserted for a
> > > > single conflict raised. I will think more about this.
> > >
> > > Currently I am inserting multiple records in the conflict history
> > > table, the same as each tuple is logged, but couldn't find any better
> > > way for this.
> > >
>
> The biggest drawback of this approach is data bloat. The incoming data
> row will be stored multiple times.
>
> > > Another option is to use an array of tuples instead of a
> > > single tuple but not sure this might make things more complicated to
> > > process by any external tool.
> >
> > It’s arguable and hard to say what the correct behaviour should be.
> > I’m slightly leaning toward having a single row per conflict.
> >
>
> Yeah, it is better to either have a single row per conflict or have
> two tables conflict_history and conflict_history_details to avoid data
> bloat as pointed above. For example, two-table approach could be:
>
> 1. The Header Table (Incoming Data)
> This stores the data that tried to be applied.
> SQL
> CREATE TABLE conflict_header (
>     conflict_id     SERIAL PRIMARY KEY,
>     source_tx_id    VARCHAR(100),    -- Transaction ID from source
>     table_name      VARCHAR(100),
>     operation       CHAR(1),         -- 'I' for Insert
>     incoming_data   JSONB,           -- Store the incoming row as JSON
> ...
> );
>
> 2. The Detail Table (Existing Conflicting Data)
> This stores the actual rows currently in the database that caused the
> violations.
> CREATE TABLE conflict_details (
>     detail_id       SERIAL PRIMARY KEY,
>     conflict_id     INT REFERENCES conflict_header(conflict_id),
>     constraint_name/key_tuple VARCHAR(100),
>     conflicting_row_data JSONB       -- The existing row in the DB
> that blocked the insert
> );
>
> Please don't consider these exact columns; you can use something on
> the lines of what is proposed in the patch. This is just to show how
> the conflict data can be rearranged. Now, one argument against this is
> that users need to use JOIN to query data but still better than
> bloating the table. The idea to store in a single table could be
> changed to have columns like violated_constraints TEXT[],      --
> e.g., ['uk_email', 'uk_phone'], error_details   JSONB  -- e.g.,
> [{"const": "uk_email", "val": "a@b.com"}, ...]. If we want to store
> multiple conflicting tuples in a single column, we need to ensure it
> is queryable via a JSONB column. The point in favour of a single JSONB
> column to combine multiple conflicting tuples is that we need this
> combination only for one kind of conflict.
>
> Both the approaches have their pros and cons. I feel we should dig a
> bit deeper for both by laying out details for each method and see what
> others think.

The specific scenario we are discussing is when a single row from the
publisher attempts to apply an operation that causes a conflict across
multiple unique keys, with each of those unique key violations
conflicting with a different local row on the subscriber, is very
rare.  IMHO this low-frequency scenario does not justify
overcomplicating the design with an array field or a multi-level
table.

Consider the infrequency of the root causes:
- How often does a table have more than 3 to 4 unique keys?
- How frequently would each of these keys conflict with a unique row
on the subscriber side?

If resolving this occasional, synthetic conflict requires inserting
two or three rows instead of a single one, this is an acceptable
trade-off considering how rare it can occur.  Anyway this is my
opinion and I am open to opinions from others.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Nov 28, 2025 at 12:24 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, 27 Nov 2025 at 17:50, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Nov 27, 2025 at 6:30 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > I have fixed all these comments and also the comments of 0002, now I
> > feel we can actually merge 0001 and 0002, so I have merged both of
> > them.
>
> I just started to have a look at the patch, while using I found lock
> level used is not correct:
> I felt the reason is that table is opened with RowExclusiveLock but
> closed in AccessExclusiveLock:
>
> +       /* If conflict log table is not set for the subscription just return. */
> +       conflictlogtable = get_subscription_conflict_log_table(
> +
> MyLogicalRepWorker->subid, &nspid);
> +       if (conflictlogtable == NULL)
> +       {
> +               pfree(conflictlogtable);
> +               return NULL;
> +       }
> +
> +       conflictlogrelid = get_relname_relid(conflictlogtable, nspid);
> +       if (OidIsValid(conflictlogrelid))
> +               conflictlogrel = table_open(conflictlogrelid, RowExclusiveLock);
>
> ....
> +                       if (elevel < ERROR)
> +                               InsertConflictLogTuple(conflictlogrel);
> +
> +                       table_close(conflictlogrel, AccessExclusiveLock);
> ....
>
> 2025-11-28 12:17:55.631 IST [504133] WARNING:  you don't own a lock of
> type AccessExclusiveLock
> 2025-11-28 12:17:55.631 IST [504133] CONTEXT:  processing remote data
> for replication origin "pg_16402" during message type "INSERT" for
> replication target relation "public.t1" in transaction 761, finished
> at 0/01789AB8
> 2025-11-28 12:17:58.033 IST [504133] WARNING:  you don't own a lock of
> type AccessExclusiveLock
> 2025-11-28 12:17:58.033 IST [504133] ERROR:  conflict detected on
> relation "public.t1": conflict=insert_exists
> 2025-11-28 12:17:58.033 IST [504133] DETAIL:  Key already exists in
> unique index "t1_pkey", modified in transaction 766.
>         Key (c1)=(1); existing local row (1, 1); remote row (1, 1).
> 2025-11-28 12:17:58.033 IST [504133] CONTEXT:  processing remote data
> for replication origin "pg_16402" during message type "INSERT" for
> replication target relation "public.t1" in transaction 761, finished
> at 0/01789AB8

Thanks, I will fix this.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Nov 28, 2025 at 2:32 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Nov 27, 2025 at 5:50 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> >
> > I have fixed all these comments and also the comments of 0002, now I
> > feel we can actually merge 0001 and 0002, so I have merged both of
> > them.
> >
> > Now pending work status
> > 1) fixed review comments of 0003
> > 2) Run pgindent -- planning to do it after we complete the first level
> > of review
> > 3) Subscription TAP test for logging the actual conflicts
> >
>
> Thanks  for the patch. A few observations:
>
> 1)
> It seems, as per LOG, 'key' and 'replica-identity' are different when
> it comes to insert_exists, update_exists and
> multiple_unique_conflicts, while I believe in CLT, key is
> replica-identity i.e. there are no 2 separate terms. Please see below:
>
> a)
> Update_Exists:
> 2025-11-28 14:08:56.179 IST [60383] ERROR:  conflict detected on
> relation "public.tab1": conflict=update_exists
> 2025-11-28 14:08:56.179 IST [60383] DETAIL:  Key already exists in
> unique index "tab1_pkey", modified locally in transaction 790 at
> 2025-11-28 14:07:17.578887+05:30.
> Key (i)=(40); existing local row (40, 10); remote row (40, 200);
> replica identity (i)=(20).
>
> postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple
> from clt where conflict_type='update_exists';
>  conflict_type | key_tuple |   local_tuple   |   remote_tuple
> ---------------+-----------+-----------------+------------------
>  update_exists | {"i":20}  | {"i":40,"j":10} | {"i":40,"j":200}
>
> b)
> insert_Exists:
> ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> DETAIL:  Key already exists in unique index "tab1_pkey", modified
> locally in transaction 767 at 2025-11-28 13:59:22.431097+05:30.
> Key (i)=(30); existing local row (30, 10); remote row (30, 10).
>
> postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple from clt;
>  conflict_type  | key_tuple |   local_tuple   |  remote_tuple
> ----------------+-----------+-----------------+-----------------
>  insert_exists  |               | {"i":30,"j":10} | {"i":30,"j":10}
>
> case a) has key_tuple same as replica-identity of LOG
> case b) does not have replica-identity and thus key_tuple is NULL.
>
> Does that mean we need to maintain both key_tuple and RI separately in
> CLT? Thoughts?

Maybe we should then have a place for both key_tuple as well as
replica identity as we are logging, what others think about this case?

> 2)
> For multiple_unique_conflict (testcase is same as I shared earlier),
> it asserts here:
> CONTEXT:  processing remote data for replication origin "pg_16390"
> during message type "INSERT" for replication target relation
> "public.conf_tab" in transaction 778, finished at 0/017E6DE8
> TRAP: failed Assert("MyLogicalRepWorker->conflict_log_tuple == NULL"),
> File: "conflict.c", Line: 749, PID: 60627
>
> I have not checked it, but maybe
> 'MyLogicalRepWorker->conflict_log_tuple' is left over from the
> previous few tests I tried?

Yeah, prepare_conflict_log_tuple() is called in loop and when there
are multiple tuple we need to collect all of the tuple before
inserting it at worker exit so the current code has a bug, I will see
how we can fix it, I think this also depends upon the other discussion
we are having related to how to insert multiple unique conflict.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > Few observations related to publication.
> > > ------------------------------
>
> Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> conflict log tables it should be good enough if we restrict it when
> ALL TABLE options are used, I don't think we need to put extra effort
> to completely restrict it even if users want to explicitly list it
> into the publication.
>
> > >
> > > (In the below comments, clt/CLT implies Conflict Log Table)
> > >
> > > 1)
> > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
>
> This function is used while publishing every single change and I don't
> think we want to add a cost to check each subscription to identify
> whether the table is listed as CLT.
>
> > > 2)
> > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > show that for clt.
>
> I think we should fix this.
>
> > > 3)
> > > I am able to create a publication for clt table, should it be allowed?
>
> I believe we should not do any specific handling to restrict this but
> I am open for the opinions.
>
> > > create subscription sub1 connection '...' publication pub1
> > > WITH(conflict_log_table='clt');
> > > create publication pub3 for table clt;
> > >
> > > 4)
> > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > part of is_publishable_class() itself? If we do so, other code-logics
> > > will also get clt as non-publishable always (and will solve a few of
> > > the above issues I think). IIUC, there is no place where we want to
> > > mark CLT as publishable or is there any?
>
> IMHO the main reason is performance.
>
> > > 5) Also, I feel we can add some documentation now to help others to
> > > understand/review the patch better without going through the long
> > > thread.
>
> Make sense, I will do that in the next version.
>
> > >
> > > Few observations related to conflict-logging:
> > > ------------------------------
> > > 1)
> > > I found that for the conflicts which ultimately result in Error, we do
> > > not insert any conflict-record in clt.
> > >
> > > a)
> > > Example: insert_exists, update_Exists
> > > create table tab1 (i int primary key, j int);
> > > sub: insert into tab1 values(30,10);
> > > pub: insert into tab1 values(30,10);
> > > ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> > > No record in clt.
> > >
> > > sub:
> > > <some pre-data needed>
> > > update tab1 set i=40 where i = 30;
> > > pub: update tab1 set i=40 where i = 20;
> > > ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> > > No record in clt.
>
> Yeah that interesting need to put thought on how to commit this record
> when an outer transaction is aborted as we do not have autonomous
> transactions which are generally used for this kind of logging.  But
> we can explore more options like inserting into conflict log tables
> outside the outer transaction.
>
> > > b)
> > > Another question related to this is, since these conflicts (which
> > > results in error) keep on happening until user resolves these or skips
> > > these or 'disable_on_error' is set. Then are we going to insert these
> > > multiple times? We do count these in 'confl_insert_exists' and
> > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > time in clt as well. Thoughts?
>
> I think it make sense to insert every time we see the conflict, but it
> would be good to have opinion from others as well.

Since there is a concern that multiple rows for
multiple_unique_conflicts can cause data-bloat, it made me rethink
that this is actually more prone to causing data-bloat if it is not
resolved on time, as it seems a far more frequent scenario. So shall
we keep inserting the record or insert it once and avoid inserting it
again based on lsn?  Thoughts?

>
> > > 2)
> > > Conflicts where row on sub is missing, local_ts incorrectly inserted.
> > > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> > > indicating that it is not applicable for this conflict-type?
> > >
> > > Example: delete_missing, update_missing
> > > pub:
> > >  insert into tab1 values(10,10);
> > >  insert into tab1 values(20,10);
> > >  sub:  delete from tab1 where i=10;
> > >  pub:  delete from tab1 where i=10;
>
> Sure I will test this.
>
> >
> > 3)
> > We also need to think how we are going to display the info in case of
> > multiple_unique_conflicts as there could be multiple local and remote
> > tuples conflicting for one single operation. Example:
> >
> > create table conf_tab (a int primary key, b int unique, c int unique);
> >
> > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
> >
> > pub: insert into conf_tab values (2,3,4);
> >
> > ERROR:  conflict detected on relation "public.conf_tab":
> > conflict=multiple_unique_conflicts
> > DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> > Key already exists in unique index "conf_tab_b_key", modified locally
> > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> > Key already exists in unique index "conf_tab_c_key", modified locally
> > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> > CONTEXT:  processing remote data for replication origin "pg_16392"
> > during message type "INSERT" for replication target relation
> > "public.conf_tab" in transaction 781, finished at 0/017FDDA0
> >
> > Currently in clt, we have singular terms such as 'key_tuple',
> > 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> > But it does not look reasonable to have multiple rows inserted for a
> > single conflict raised. I will think more about this.
>
> Currently I am inserting multiple records in the conflict history
> table, the same as each tuple is logged, but couldn't find any better
> way for this. Another option is to use an array of tuples instead of a
> single tuple but not sure this might make things more complicated to
> process by any external tool.  But you are right, this needs more
> discussion.
>
> --
> Regards,
> Dilip Kumar
> Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Dec 1, 2025 at 1:57 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > Few observations related to publication.
> > > > ------------------------------
> >
> > Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> > conflict log tables it should be good enough if we restrict it when
> > ALL TABLE options are used, I don't think we need to put extra effort
> > to completely restrict it even if users want to explicitly list it
> > into the publication.
> >
> > > >
> > > > (In the below comments, clt/CLT implies Conflict Log Table)
> > > >
> > > > 1)
> > > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
> >
> > This function is used while publishing every single change and I don't
> > think we want to add a cost to check each subscription to identify
> > whether the table is listed as CLT.
> >
> > > > 2)
> > > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > > show that for clt.
> >
> > I think we should fix this.
> >
> > > > 3)
> > > > I am able to create a publication for clt table, should it be allowed?
> >
> > I believe we should not do any specific handling to restrict this but
> > I am open for the opinions.
> >
> > > > create subscription sub1 connection '...' publication pub1
> > > > WITH(conflict_log_table='clt');
> > > > create publication pub3 for table clt;
> > > >
> > > > 4)
> > > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > > part of is_publishable_class() itself? If we do so, other code-logics
> > > > will also get clt as non-publishable always (and will solve a few of
> > > > the above issues I think). IIUC, there is no place where we want to
> > > > mark CLT as publishable or is there any?
> >
> > IMHO the main reason is performance.
> >
> > > > 5) Also, I feel we can add some documentation now to help others to
> > > > understand/review the patch better without going through the long
> > > > thread.
> >
> > Make sense, I will do that in the next version.
> >
> > > >
> > > > Few observations related to conflict-logging:
> > > > ------------------------------
> > > > 1)
> > > > I found that for the conflicts which ultimately result in Error, we do
> > > > not insert any conflict-record in clt.
> > > >
> > > > a)
> > > > Example: insert_exists, update_Exists
> > > > create table tab1 (i int primary key, j int);
> > > > sub: insert into tab1 values(30,10);
> > > > pub: insert into tab1 values(30,10);
> > > > ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> > > > No record in clt.
> > > >
> > > > sub:
> > > > <some pre-data needed>
> > > > update tab1 set i=40 where i = 30;
> > > > pub: update tab1 set i=40 where i = 20;
> > > > ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> > > > No record in clt.
> >
> > Yeah that interesting need to put thought on how to commit this record
> > when an outer transaction is aborted as we do not have autonomous
> > transactions which are generally used for this kind of logging.  But
> > we can explore more options like inserting into conflict log tables
> > outside the outer transaction.
> >
> > > > b)
> > > > Another question related to this is, since these conflicts (which
> > > > results in error) keep on happening until user resolves these or skips
> > > > these or 'disable_on_error' is set. Then are we going to insert these
> > > > multiple times? We do count these in 'confl_insert_exists' and
> > > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > > time in clt as well. Thoughts?
> >
> > I think it make sense to insert every time we see the conflict, but it
> > would be good to have opinion from others as well.
>
> Since there is a concern that multiple rows for
> multiple_unique_conflicts can cause data-bloat, it made me rethink
> that this is actually more prone to causing data-bloat if it is not
> resolved on time, as it seems a far more frequent scenario. So shall
> we keep inserting the record or insert it once and avoid inserting it
> again based on lsn?  Thoughts?

I agree, this is the real problem related to bloat so maybe we can see
if the same tuple exists we can avoid inserting it again, although I
haven't put thought on how to we distinguish between the new conflict
on the same row vs the same conflict being inserted multiple times due
to worker restart.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Mon, Dec 1, 2025 at 2:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 1:57 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > > Few observations related to publication.
> > > > > ------------------------------
> > >
> > > Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> > > conflict log tables it should be good enough if we restrict it when
> > > ALL TABLE options are used, I don't think we need to put extra effort
> > > to completely restrict it even if users want to explicitly list it
> > > into the publication.
> > >
> > > > >
> > > > > (In the below comments, clt/CLT implies Conflict Log Table)
> > > > >
> > > > > 1)
> > > > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
> > >
> > > This function is used while publishing every single change and I don't
> > > think we want to add a cost to check each subscription to identify
> > > whether the table is listed as CLT.
> > >
> > > > > 2)
> > > > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > > > show that for clt.
> > >
> > > I think we should fix this.
> > >
> > > > > 3)
> > > > > I am able to create a publication for clt table, should it be allowed?
> > >
> > > I believe we should not do any specific handling to restrict this but
> > > I am open for the opinions.
> > >
> > > > > create subscription sub1 connection '...' publication pub1
> > > > > WITH(conflict_log_table='clt');
> > > > > create publication pub3 for table clt;
> > > > >
> > > > > 4)
> > > > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > > > part of is_publishable_class() itself? If we do so, other code-logics
> > > > > will also get clt as non-publishable always (and will solve a few of
> > > > > the above issues I think). IIUC, there is no place where we want to
> > > > > mark CLT as publishable or is there any?
> > >
> > > IMHO the main reason is performance.
> > >
> > > > > 5) Also, I feel we can add some documentation now to help others to
> > > > > understand/review the patch better without going through the long
> > > > > thread.
> > >
> > > Make sense, I will do that in the next version.
> > >
> > > > >
> > > > > Few observations related to conflict-logging:
> > > > > ------------------------------
> > > > > 1)
> > > > > I found that for the conflicts which ultimately result in Error, we do
> > > > > not insert any conflict-record in clt.
> > > > >
> > > > > a)
> > > > > Example: insert_exists, update_Exists
> > > > > create table tab1 (i int primary key, j int);
> > > > > sub: insert into tab1 values(30,10);
> > > > > pub: insert into tab1 values(30,10);
> > > > > ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> > > > > No record in clt.
> > > > >
> > > > > sub:
> > > > > <some pre-data needed>
> > > > > update tab1 set i=40 where i = 30;
> > > > > pub: update tab1 set i=40 where i = 20;
> > > > > ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> > > > > No record in clt.
> > >
> > > Yeah that interesting need to put thought on how to commit this record
> > > when an outer transaction is aborted as we do not have autonomous
> > > transactions which are generally used for this kind of logging.  But
> > > we can explore more options like inserting into conflict log tables
> > > outside the outer transaction.
> > >
> > > > > b)
> > > > > Another question related to this is, since these conflicts (which
> > > > > results in error) keep on happening until user resolves these or skips
> > > > > these or 'disable_on_error' is set. Then are we going to insert these
> > > > > multiple times? We do count these in 'confl_insert_exists' and
> > > > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > > > time in clt as well. Thoughts?
> > >
> > > I think it make sense to insert every time we see the conflict, but it
> > > would be good to have opinion from others as well.
> >
> > Since there is a concern that multiple rows for
> > multiple_unique_conflicts can cause data-bloat, it made me rethink
> > that this is actually more prone to causing data-bloat if it is not
> > resolved on time, as it seems a far more frequent scenario. So shall
> > we keep inserting the record or insert it once and avoid inserting it
> > again based on lsn?  Thoughts?
>
> I agree, this is the real problem related to bloat so maybe we can see
> if the same tuple exists we can avoid inserting it again, although I
> haven't put thought on how to we distinguish between the new conflict
> on the same row vs the same conflict being inserted multiple times due
> to worker restart.
>

If there is consensus on this approach, IMO, it appears safe to rely
on 'remote_origin' and 'remote_commit_lsn' as the comparison keys for
the given 'conflict_type' before we insert a new record.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Dec 1, 2025 at 2:58 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 2:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Dec 1, 2025 at 1:57 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > Since there is a concern that multiple rows for
> > > multiple_unique_conflicts can cause data-bloat, it made me rethink
> > > that this is actually more prone to causing data-bloat if it is not
> > > resolved on time, as it seems a far more frequent scenario. So shall
> > > we keep inserting the record or insert it once and avoid inserting it
> > > again based on lsn?  Thoughts?
> >
> > I agree, this is the real problem related to bloat so maybe we can see
> > if the same tuple exists we can avoid inserting it again, although I
> > haven't put thought on how to we distinguish between the new conflict
> > on the same row vs the same conflict being inserted multiple times due
> > to worker restart.
> >
>
> If there is consensus on this approach, IMO, it appears safe to rely
> on 'remote_origin' and 'remote_commit_lsn' as the comparison keys for
> the given 'conflict_type' before we insert a new record.
>

What happens if as part of multiple_unique_conflict, in the next apply
round only some of the rows conflict (say in the meantime user has
removed a few conflicting rows)? I think the ideal way for users to
avoid such multiple occurrences is to configure subscription with
disable_on_error. I think we should LOG errors again on retry and it
is better to keep it consistent with what we print in LOG because we
may want to give an option to users in future where to LOG (in
conflict_history_table, LOG, or both) the conflicts.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Dec 1, 2025 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 2:58 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Dec 1, 2025 at 2:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Dec 1, 2025 at 1:57 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > Since there is a concern that multiple rows for
> > > > multiple_unique_conflicts can cause data-bloat, it made me rethink
> > > > that this is actually more prone to causing data-bloat if it is not
> > > > resolved on time, as it seems a far more frequent scenario. So shall
> > > > we keep inserting the record or insert it once and avoid inserting it
> > > > again based on lsn?  Thoughts?
> > >
> > > I agree, this is the real problem related to bloat so maybe we can see
> > > if the same tuple exists we can avoid inserting it again, although I
> > > haven't put thought on how to we distinguish between the new conflict
> > > on the same row vs the same conflict being inserted multiple times due
> > > to worker restart.
> > >
> >
> > If there is consensus on this approach, IMO, it appears safe to rely
> > on 'remote_origin' and 'remote_commit_lsn' as the comparison keys for
> > the given 'conflict_type' before we insert a new record.
> >
>
> What happens if as part of multiple_unique_conflict, in the next apply
> round only some of the rows conflict (say in the meantime user has
> removed a few conflicting rows)? I think the ideal way for users to
> avoid such multiple occurrences is to configure subscription with
> disable_on_error. I think we should LOG errors again on retry and it
> is better to keep it consistent with what we print in LOG because we
> may want to give an option to users in future where to LOG (in
> conflict_history_table, LOG, or both) the conflicts.
>

Yeah that makes sense, because if the user tried to fix the conflict
and if still didn't get fixed then next time onward user will have no
way to know that conflict reoccurred.  And also it make sense to
maintain consistency with LOGs.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Nov 28, 2025 at 6:06 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Some review comments for v8-0001.

Thank Peter, yes these all make sense and will fix in next version
along with other comments by Vignesh/Shveta and Amit, except one
comment

> 9.
> +-- ok - conflict_log_table should not be published with ALL TABLE
> +CREATE PUBLICATION pub FOR TABLES IN SCHEMA clt;
> +SELECT * FROM pg_publication_tables WHERE pubname = 'pub';
> + pubname | schemaname | tablename | attnames | rowfilter
> +---------+------------+-----------+----------+-----------
> +(0 rows)
>
> Perhaps you should repeat this same test but using FOR ALL TABLES,
> instead of only FOR TABLES IN SCHEMA

I will have to see how we can safely do this in testing without having
any side effects on the concurrent test, generally we run
publication.sql and subscription.sql concurrently in regression test
so if we do FOR ALL TABLES it can affect each others, one option is to
don't run these 2 test concurrently, I think we can do that as there
is no real concurrency we are testing by running them concurrently,
any thought on this?


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Nov 28, 2025 at 2:32 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Nov 27, 2025 at 5:50 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> >
> > I have fixed all these comments and also the comments of 0002, now I
> > feel we can actually merge 0001 and 0002, so I have merged both of
> > them.
> >
> > Now pending work status
> > 1) fixed review comments of 0003
> > 2) Run pgindent -- planning to do it after we complete the first level
> > of review
> > 3) Subscription TAP test for logging the actual conflicts
> >
>
> Thanks  for the patch. A few observations:
>
> 1)
> It seems, as per LOG, 'key' and 'replica-identity' are different when
> it comes to insert_exists, update_exists and
> multiple_unique_conflicts, while I believe in CLT, key is
> replica-identity i.e. there are no 2 separate terms. Please see below:
>
> a)
> Update_Exists:
> 2025-11-28 14:08:56.179 IST [60383] ERROR:  conflict detected on
> relation "public.tab1": conflict=update_exists
> 2025-11-28 14:08:56.179 IST [60383] DETAIL:  Key already exists in
> unique index "tab1_pkey", modified locally in transaction 790 at
> 2025-11-28 14:07:17.578887+05:30.
> Key (i)=(40); existing local row (40, 10); remote row (40, 200);
> replica identity (i)=(20).
>
> postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple
> from clt where conflict_type='update_exists';
>  conflict_type | key_tuple |   local_tuple   |   remote_tuple
> ---------------+-----------+-----------------+------------------
>  update_exists | {"i":20}  | {"i":40,"j":10} | {"i":40,"j":200}
>
> b)
> insert_Exists:
> ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> DETAIL:  Key already exists in unique index "tab1_pkey", modified
> locally in transaction 767 at 2025-11-28 13:59:22.431097+05:30.
> Key (i)=(30); existing local row (30, 10); remote row (30, 10).
>
> postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple from clt;
>  conflict_type  | key_tuple |   local_tuple   |  remote_tuple
> ----------------+-----------+-----------------+-----------------
>  insert_exists  |               | {"i":30,"j":10} | {"i":30,"j":10}
>
> case a) has key_tuple same as replica-identity of LOG
> case b) does not have replica-identity and thus key_tuple is NULL.
>
> Does that mean we need to maintain both key_tuple and RI separately in
> CLT? Thoughts?
>

Yeah, it could be useful to display RI values separately. What should
be the column name? Few options could be: remote_val_for_ri, or
remote_value_ri, or something else. I think it may also be useful to
display conflicting_index but OTOH, it would be difficult to decide in
the first version what other information could be required, so it is
better to stick with what is being displayed in LOG.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Wed, Nov 19, 2025 at 3:46 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 18, 2025 at 4:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
>
> > 3)
> > Do we need to have a timestamp column as well to say when conflict was
> > recorded? Or local_commit_ts, remote_commit_ts are sufficient?
> > Thoughts
>
> You mean we can record the timestamp now while inserting, not sure if
> it will add some more meaningful information than remote_commit_ts,
> but let's see what others think.
>

local_commit_ts and remote_commit_ts sounds sufficient as one can
identify the truth of information from those two. The key/schema
values displayed in this table could change later but the information
about a particular row is based on the time shown by those two
columns.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Dec 1, 2025 at 10:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> The specific scenario we are discussing is when a single row from the
> publisher attempts to apply an operation that causes a conflict across
> multiple unique keys, with each of those unique key violations
> conflicting with a different local row on the subscriber, is very
> rare.  IMHO this low-frequency scenario does not justify
> overcomplicating the design with an array field or a multi-level
> table.
>

I did some analysis and search on the internet to answer your
following two questions.

> Consider the infrequency of the root causes:
> - How often does a table have more than 3 to 4 unique keys?

It is extremely common—in fact, it is considered the industry "best
practice" for modern database design.

One can find this pattern in almost every enterprise system (e.g.
banking apps, CRMs). It relies on distinguishing between Technical
Identity (for the database) and Business Identity (for the real
world).

1. The Design Pattern: Surrogate vs. Natural Keys
Primary Key (Surrogate Key): Usually a meaningless number (e.g.,
10452) or a UUID. It is used strictly for the database to join tables
efficiently. It never changes.
Unique Key (Natural Key): A real-world value (e.g., john@email.com or
SSN-123). This is how humans or external systems identify the row. It
can change (e.g., someone updates their email).

2. Common Real-World Use Cases
A. User Management (The most classic example)
Primary Key: user_id (Integer). Used for foreign keys in the ORDERS table.
Unique Key 1: email (Varchar). Prevents two people from registering
with the same email.
Unique Key 2: username (Varchar). Ensures unique display names.
Why? If a user changes their email address, you only update one field
in one table. If you used email as the Primary Key, you would have to
update millions of rows in the ORDERS table that reference that email.

B. Inventory / E-Commerce
Primary Key: product_id (Integer). Used internally by the code.
Unique Key: SKU (Stock Keeping Unit) or Barcode (EAN/UPC).
Why? Companies often re-organize their SKU formats. If the SKU was the
Primary Key, a format change would require a massive database
migration.

C. Government / HR Systems
Primary Key: employee_id (Integer).
Unique Key: National_ID (SSN, Aadhaar, Passport Number).
Why? Privacy and security. You do not want to expose a National ID in
every URL or API call (e.g., api/employee/552 is safer than
api/employee/SSN-123).

> - How frequently would each of these keys conflict with a unique row
> on the subscriber side?
>

It can occur with medium-to-high probability in following cases. (a)
In Bi-Directional replication systems; for example, If two users
create the same "User Profile" on two different servers at the same
time, the row will conflict on every unique field (ID, Email, SSN)
simultaneously. (b) The chances of bloat are high, on retrying to fix
the error as mentioned by Shveta. Say, if Ops team fixes errors by
just "trying again" without checking the full row, you will hit the ID
error, fix it, then immediately hit the Email error. (c) The chances
are medium during initial data-load; If a user is loading data from a
legacy system with "dirty" data, rows often violate multiple rules
(e.g., a duplicate user with both a reused ID and a reused Email).

> If resolving this occasional, synthetic conflict requires inserting
> two or three rows instead of a single one, this is an acceptable
> trade-off considering how rare it can occur.
>

As per above analysis and the re-try point Shveta raises, I don't
think we can ignore the possibility of data-bloat especially for this
multiple_unique_key conflict. We can consider logging multiple local
conflicting rows as JSON Array.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 2, 2025 at 11:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 10:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > The specific scenario we are discussing is when a single row from the
> > publisher attempts to apply an operation that causes a conflict across
> > multiple unique keys, with each of those unique key violations
> > conflicting with a different local row on the subscriber, is very
> > rare.  IMHO this low-frequency scenario does not justify
> > overcomplicating the design with an array field or a multi-level
> > table.
> >
>
> I did some analysis and search on the internet to answer your
> following two questions.
>
> > Consider the infrequency of the root causes:
> > - How often does a table have more than 3 to 4 unique keys?
>
> It is extremely common—in fact, it is considered the industry "best
> practice" for modern database design.
>
> One can find this pattern in almost every enterprise system (e.g.
> banking apps, CRMs). It relies on distinguishing between Technical
> Identity (for the database) and Business Identity (for the real
> world).
>
> 1. The Design Pattern: Surrogate vs. Natural Keys
> Primary Key (Surrogate Key): Usually a meaningless number (e.g.,
> 10452) or a UUID. It is used strictly for the database to join tables
> efficiently. It never changes.
> Unique Key (Natural Key): A real-world value (e.g., john@email.com or
> SSN-123). This is how humans or external systems identify the row. It
> can change (e.g., someone updates their email).
>
> 2. Common Real-World Use Cases
> A. User Management (The most classic example)
> Primary Key: user_id (Integer). Used for foreign keys in the ORDERS table.
> Unique Key 1: email (Varchar). Prevents two people from registering
> with the same email.
> Unique Key 2: username (Varchar). Ensures unique display names.
> Why? If a user changes their email address, you only update one field
> in one table. If you used email as the Primary Key, you would have to
> update millions of rows in the ORDERS table that reference that email.
>
> B. Inventory / E-Commerce
> Primary Key: product_id (Integer). Used internally by the code.
> Unique Key: SKU (Stock Keeping Unit) or Barcode (EAN/UPC).
> Why? Companies often re-organize their SKU formats. If the SKU was the
> Primary Key, a format change would require a massive database
> migration.
>
> C. Government / HR Systems
> Primary Key: employee_id (Integer).
> Unique Key: National_ID (SSN, Aadhaar, Passport Number).
> Why? Privacy and security. You do not want to expose a National ID in
> every URL or API call (e.g., api/employee/552 is safer than
> api/employee/SSN-123).
>
> > - How frequently would each of these keys conflict with a unique row
> > on the subscriber side?
> >
>
> It can occur with medium-to-high probability in following cases. (a)
> In Bi-Directional replication systems; for example, If two users
> create the same "User Profile" on two different servers at the same
> time, the row will conflict on every unique field (ID, Email, SSN)
> simultaneously. (b) The chances of bloat are high, on retrying to fix
> the error as mentioned by Shveta. Say, if Ops team fixes errors by
> just "trying again" without checking the full row, you will hit the ID
> error, fix it, then immediately hit the Email error. (c) The chances
> are medium during initial data-load; If a user is loading data from a
> legacy system with "dirty" data, rows often violate multiple rules
> (e.g., a duplicate user with both a reused ID and a reused Email).
>
> > If resolving this occasional, synthetic conflict requires inserting
> > two or three rows instead of a single one, this is an acceptable
> > trade-off considering how rare it can occur.
> >
>
> As per above analysis and the re-try point Shveta raises, I don't
> think we can ignore the possibility of data-bloat especially for this
> multiple_unique_key conflict. We can consider logging multiple local
> conflicting rows as JSON Array.

Okay, I will try to make multiple local rows as JSON Array in the next version.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 2, 2025 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 11:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 1, 2025 at 10:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > The specific scenario we are discussing is when a single row from the
> > > publisher attempts to apply an operation that causes a conflict across
> > > multiple unique keys, with each of those unique key violations
> > > conflicting with a different local row on the subscriber, is very
> > > rare.  IMHO this low-frequency scenario does not justify
> > > overcomplicating the design with an array field or a multi-level
> > > table.
> > >
> >
> > I did some analysis and search on the internet to answer your
> > following two questions.
> >
> > > Consider the infrequency of the root causes:
> > > - How often does a table have more than 3 to 4 unique keys?
> >
> > It is extremely common—in fact, it is considered the industry "best
> > practice" for modern database design.
> >
> > One can find this pattern in almost every enterprise system (e.g.
> > banking apps, CRMs). It relies on distinguishing between Technical
> > Identity (for the database) and Business Identity (for the real
> > world).
> >
> > 1. The Design Pattern: Surrogate vs. Natural Keys
> > Primary Key (Surrogate Key): Usually a meaningless number (e.g.,
> > 10452) or a UUID. It is used strictly for the database to join tables
> > efficiently. It never changes.
> > Unique Key (Natural Key): A real-world value (e.g., john@email.com or
> > SSN-123). This is how humans or external systems identify the row. It
> > can change (e.g., someone updates their email).
> >
> > 2. Common Real-World Use Cases
> > A. User Management (The most classic example)
> > Primary Key: user_id (Integer). Used for foreign keys in the ORDERS table.
> > Unique Key 1: email (Varchar). Prevents two people from registering
> > with the same email.
> > Unique Key 2: username (Varchar). Ensures unique display names.
> > Why? If a user changes their email address, you only update one field
> > in one table. If you used email as the Primary Key, you would have to
> > update millions of rows in the ORDERS table that reference that email.
> >
> > B. Inventory / E-Commerce
> > Primary Key: product_id (Integer). Used internally by the code.
> > Unique Key: SKU (Stock Keeping Unit) or Barcode (EAN/UPC).
> > Why? Companies often re-organize their SKU formats. If the SKU was the
> > Primary Key, a format change would require a massive database
> > migration.
> >
> > C. Government / HR Systems
> > Primary Key: employee_id (Integer).
> > Unique Key: National_ID (SSN, Aadhaar, Passport Number).
> > Why? Privacy and security. You do not want to expose a National ID in
> > every URL or API call (e.g., api/employee/552 is safer than
> > api/employee/SSN-123).
> >
> > > - How frequently would each of these keys conflict with a unique row
> > > on the subscriber side?
> > >
> >
> > It can occur with medium-to-high probability in following cases. (a)
> > In Bi-Directional replication systems; for example, If two users
> > create the same "User Profile" on two different servers at the same
> > time, the row will conflict on every unique field (ID, Email, SSN)
> > simultaneously. (b) The chances of bloat are high, on retrying to fix
> > the error as mentioned by Shveta. Say, if Ops team fixes errors by
> > just "trying again" without checking the full row, you will hit the ID
> > error, fix it, then immediately hit the Email error. (c) The chances
> > are medium during initial data-load; If a user is loading data from a
> > legacy system with "dirty" data, rows often violate multiple rules
> > (e.g., a duplicate user with both a reused ID and a reused Email).
> >
> > > If resolving this occasional, synthetic conflict requires inserting
> > > two or three rows instead of a single one, this is an acceptable
> > > trade-off considering how rare it can occur.
> > >
> >
> > As per above analysis and the re-try point Shveta raises, I don't
> > think we can ignore the possibility of data-bloat especially for this
> > multiple_unique_key conflict. We can consider logging multiple local
> > conflicting rows as JSON Array.
>
> Okay, I will try to make multiple local rows as JSON Array in the next version.
>
Just to clarify so that we are on the same page, along with the local
tuple the other local fields like local_xid, local_commit_ts,
local_origin will also be converted into the array.  Hope that makes
sense?

So we will change the table like this, not sure if this makes sense to
keep all local array fields nearby in the table, or let it be near the
respective remote field, like we are doing now remote_xid and local
xid together etc.

      Column       |           Type           | Collation | Nullable | Default
-------------------+--------------------------+-----------+----------+---------
 relid             | oid                      |           |          |
 schemaname        | text                     |           |          |
 relname           | text                     |           |          |
 conflict_type     | text                     |           |          |
 local_xid         | xid[]                      |           |          |
 remote_xid        | xid                      |           |          |
 remote_commit_lsn | pg_lsn                   |           |          |
 local_commit_ts   | timestamp with time zone[] |           |          |
 remote_commit_ts  | timestamp with time zone |           |          |
 local_origin      | text[]                     |           |          |
 remote_origin     | text                     |           |          |
 key_tuple         | json                     |           |          |
 local_tuple       | json[]                     |           |          |
 remote_tuple      | json                     |           |          |

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Tue, Dec 2, 2025 at 12:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> >
> > Okay, I will try to make multiple local rows as JSON Array in the next version.
> >
> Just to clarify so that we are on the same page, along with the local
> tuple the other local fields like local_xid, local_commit_ts,
> local_origin will also be converted into the array.  Hope that makes
> sense?
>

Yes, what about key_tuple or RI?

> So we will change the table like this, not sure if this makes sense to
> keep all local array fields nearby in the table, or let it be near the
> respective remote field, like we are doing now remote_xid and local
> xid together etc.
>

It is better to keep the array fields together at the end. I think it
would be better to read via CLI. Also, it may take more space due to
padding/alignment if we store fixed-width and variable-width columns
interleaved and similarly the access will also be slower for
interleaved cases.

Having said that, can we consider an alternative way to store all
local_conflict_info together as a JSONB column (that can be used to
store an array of objects). For example, the multiple conflicting
tuple information can be stored as:

[
{ "xid": "1001", "commit_ts": "2023-10-27 10:00:00", "origin":
"node_A", "tuple": { "id": 1, "email": "a@b.com" } },
{ "xid": "1005", "commit_ts": "2023-10-27 10:01:00", "origin":
"node_B", "tuple": { "id": 2, "phone": "555-0199" } }
]

To access JSON array columns, I think one needs to use the unnest
function, whereas JSONB could be accessed with something like: "SELECT
* FROM conflicts WHERE local_conflicts @> '[{"xid": "1001"}]".

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 2, 2025 at 2:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 12:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Dec 2, 2025 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > >
> > > Okay, I will try to make multiple local rows as JSON Array in the next version.
> > >
> > Just to clarify so that we are on the same page, along with the local
> > tuple the other local fields like local_xid, local_commit_ts,
> > local_origin will also be converted into the array.  Hope that makes
> > sense?
> >
>
> Yes, what about key_tuple or RI?
>
> > So we will change the table like this, not sure if this makes sense to
> > keep all local array fields nearby in the table, or let it be near the
> > respective remote field, like we are doing now remote_xid and local
> > xid together etc.
> >
>
> It is better to keep the array fields together at the end. I think it
> would be better to read via CLI. Also, it may take more space due to
> padding/alignment if we store fixed-width and variable-width columns
> interleaved and similarly the access will also be slower for
> interleaved cases.
>
> Having said that, can we consider an alternative way to store all
> local_conflict_info together as a JSONB column (that can be used to
> store an array of objects). For example, the multiple conflicting
> tuple information can be stored as:
>
> [
> { "xid": "1001", "commit_ts": "2023-10-27 10:00:00", "origin":
> "node_A", "tuple": { "id": 1, "email": "a@b.com" } },
> { "xid": "1005", "commit_ts": "2023-10-27 10:01:00", "origin":
> "node_B", "tuple": { "id": 2, "phone": "555-0199" } }
> ]
>
> To access JSON array columns, I think one needs to use the unnest
> function, whereas JSONB could be accessed with something like: "SELECT
> * FROM conflicts WHERE local_conflicts @> '[{"xid": "1001"}]".

Yeah we can do that as well, maybe that's a better idea compared to
creating separate array fields for each local element.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 2, 2025 at 4:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 2:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Dec 2, 2025 at 12:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Dec 2, 2025 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > >
> > > > Okay, I will try to make multiple local rows as JSON Array in the next version.
> > > >
> > > Just to clarify so that we are on the same page, along with the local
> > > tuple the other local fields like local_xid, local_commit_ts,
> > > local_origin will also be converted into the array.  Hope that makes
> > > sense?
> > >
> >
> > Yes, what about key_tuple or RI?
> >
> > > So we will change the table like this, not sure if this makes sense to
> > > keep all local array fields nearby in the table, or let it be near the
> > > respective remote field, like we are doing now remote_xid and local
> > > xid together etc.
> > >
> >
> > It is better to keep the array fields together at the end. I think it
> > would be better to read via CLI. Also, it may take more space due to
> > padding/alignment if we store fixed-width and variable-width columns
> > interleaved and similarly the access will also be slower for
> > interleaved cases.
> >
> > Having said that, can we consider an alternative way to store all
> > local_conflict_info together as a JSONB column (that can be used to
> > store an array of objects). For example, the multiple conflicting
> > tuple information can be stored as:
> >
> > [
> > { "xid": "1001", "commit_ts": "2023-10-27 10:00:00", "origin":
> > "node_A", "tuple": { "id": 1, "email": "a@b.com" } },
> > { "xid": "1005", "commit_ts": "2023-10-27 10:01:00", "origin":
> > "node_B", "tuple": { "id": 2, "phone": "555-0199" } }
> > ]
> >
> > To access JSON array columns, I think one needs to use the unnest
> > function, whereas JSONB could be accessed with something like: "SELECT
> > * FROM conflicts WHERE local_conflicts @> '[{"xid": "1001"}]".
>
> Yeah we can do that as well, maybe that's a better idea compared to
> creating separate array fields for each local element.

So I tried the POC idea with this approach and tested with one of the
test cases given by Shveta, and now the conflict log table entry looks
like this.  So we can see the local conflicts field which is an array
of JSON and each entry of the array is formed using (xid, commit_ts,
origin, json tuple).  I will send the updated patch by tomorrow after
doing some more cleanup and testing.

relid             | 16391
schemaname        | public
relname           | conf_tab
conflict_type     | multiple_unique_conflicts
remote_xid        | 761
remote_commit_lsn | 0/01761400
remote_commit_ts  | 2025-12-02 15:02:07.045935+00
remote_origin     | pg_16406
key_tuple         |
remote_tuple      | {"a":2,"b":3,"c":4}
local_conflicts   |

{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"

773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Tue, Dec 2, 2025 at 8:40 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 4:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Dec 2, 2025 at 2:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Dec 2, 2025 at 12:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Tue, Dec 2, 2025 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > >
> > > > > Okay, I will try to make multiple local rows as JSON Array in the next version.
> > > > >
> > > > Just to clarify so that we are on the same page, along with the local
> > > > tuple the other local fields like local_xid, local_commit_ts,
> > > > local_origin will also be converted into the array.  Hope that makes
> > > > sense?
> > > >
> > >
> > > Yes, what about key_tuple or RI?
> > >
> > > > So we will change the table like this, not sure if this makes sense to
> > > > keep all local array fields nearby in the table, or let it be near the
> > > > respective remote field, like we are doing now remote_xid and local
> > > > xid together etc.
> > > >
> > >
> > > It is better to keep the array fields together at the end. I think it
> > > would be better to read via CLI. Also, it may take more space due to
> > > padding/alignment if we store fixed-width and variable-width columns
> > > interleaved and similarly the access will also be slower for
> > > interleaved cases.
> > >
> > > Having said that, can we consider an alternative way to store all
> > > local_conflict_info together as a JSONB column (that can be used to
> > > store an array of objects). For example, the multiple conflicting
> > > tuple information can be stored as:
> > >
> > > [
> > > { "xid": "1001", "commit_ts": "2023-10-27 10:00:00", "origin":
> > > "node_A", "tuple": { "id": 1, "email": "a@b.com" } },
> > > { "xid": "1005", "commit_ts": "2023-10-27 10:01:00", "origin":
> > > "node_B", "tuple": { "id": 2, "phone": "555-0199" } }
> > > ]
> > >
> > > To access JSON array columns, I think one needs to use the unnest
> > > function, whereas JSONB could be accessed with something like: "SELECT
> > > * FROM conflicts WHERE local_conflicts @> '[{"xid": "1001"}]".
> >
> > Yeah we can do that as well, maybe that's a better idea compared to
> > creating separate array fields for each local element.
>
> So I tried the POC idea with this approach and tested with one of the
> test cases given by Shveta, and now the conflict log table entry looks
> like this.  So we can see the local conflicts field which is an array
> of JSON and each entry of the array is formed using (xid, commit_ts,
> origin, json tuple).  I will send the updated patch by tomorrow after
> doing some more cleanup and testing.
>
> relid             | 16391
> schemaname        | public
> relname           | conf_tab
> conflict_type     | multiple_unique_conflicts
> remote_xid        | 761
> remote_commit_lsn | 0/01761400
> remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> remote_origin     | pg_16406
> key_tuple         |
> remote_tuple      | {"a":2,"b":3,"c":4}
> local_conflicts   |
>
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
>
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
>

Thanks, it looks good. For the benefit of others, could you include a
brief note, perhaps in the commit message for now, describing how to
access or read this array column? We can remove it later.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > relid             | 16391
> > schemaname        | public
> > relname           | conf_tab
> > conflict_type     | multiple_unique_conflicts
> > remote_xid        | 761
> > remote_commit_lsn | 0/01761400
> > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > remote_origin     | pg_16406
> > key_tuple         |
> > remote_tuple      | {"a":2,"b":3,"c":4}
> > local_conflicts   |
> >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> >
>
> Thanks, it looks good. For the benefit of others, could you include a
> brief note, perhaps in the commit message for now, describing how to
> access or read this array column? We can remove it later.

Thanks, okay, temporarily I have added in a commit message how we can
fetch the data from the JSON array field.  In next version I will add
a test to get the conflict stored in conflict log history table and
fetch from it.

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Masahiko Sawada
Дата:
On Wed, Dec 3, 2025 at 3:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > relid             | 16391
> > > schemaname        | public
> > > relname           | conf_tab
> > > conflict_type     | multiple_unique_conflicts
> > > remote_xid        | 761
> > > remote_commit_lsn | 0/01761400
> > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > remote_origin     | pg_16406
> > > key_tuple         |
> > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > local_conflicts   |
> > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > >
> >
> > Thanks, it looks good. For the benefit of others, could you include a
> > brief note, perhaps in the commit message for now, describing how to
> > access or read this array column? We can remove it later.
>
> Thanks, okay, temporarily I have added in a commit message how we can
> fetch the data from the JSON array field.  In next version I will add
> a test to get the conflict stored in conflict log history table and
> fetch from it.
>

I've reviewed the v9 patch and here are some comments:

The patch utilizes SPI for creating and dropping the conflict history
table, but I'm really not sure if it's okay because it's actually
affected by some GUC parameters such as default_tablespace and
default_toast_compression etc. Also, probably some hooks and event
triggers could be fired during the creation and removal. Is it
intentional behavior? I'm concerned that it would make investigation
harder if an issue happened in the user environment.

---
+   /* build and execute the CREATE TABLE query. */
+   appendStringInfo(&querybuf,
+                    "CREATE TABLE %s.%s ("
+                    "relid Oid,"
+                    "schemaname TEXT,"
+                    "relname TEXT,"
+                    "conflict_type TEXT,"
+                    "remote_xid xid,"
+                    "remote_commit_lsn pg_lsn,"
+                    "remote_commit_ts TIMESTAMPTZ,"
+                    "remote_origin TEXT,"
+                    "key_tuple     JSON,"
+                    "remote_tuple  JSON,"
+                    "local_conflicts JSON[])",
+                    quote_identifier(get_namespace_name(namespaceId)),
+                    quote_identifier(conflictrel));

If we want to use SPI for history table creation, we should use
qualified names in all the places including data types.

---
The patch doesn't create the dependency between the subscription and
the conflict history table. So users can entirely drop the schema
(with CASCADE option) where the history table is created. And once
dropping the schema along with the history table, ALTER SUBSCRIPTION
... SET (conflict_history_table = '') seems not to work (I got a
SEGV).

---
We can create the history table in pg_temp namespace but it should not
be allowed.

---
I think the conflict history table should not be transferred to the
new cluster when pg_upgrade since the table definition could be
different across major versions.

I got the following log when the publisher disables track_commit_timestamp:

local_conflicts   |
{"{\"xid\":\"790\",\"commit_ts\":\"1999-12-31T16:00:00-08:00\",\"origin\":\"\",\"tuple\":{\"c\":1}}"}

I think we can omit commit_ts when it's omitted.

---
I think we should keep the history table name case-sensitive:

postgres(1:351685)=# create subscription sub connection
'dbname=postgres port=5551' publication pub with (conflict_log_table =
'LOGTABLE');
CREATE SUBSCRIPTION
postgres(1:351685)=# \d
          List of relations
 Schema |   Name   | Type  |  Owner
--------+----------+-------+----------
 public | test     | table | masahiko
 public | logtable | table | masahiko
(2 rows)

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Dec 4, 2025 at 7:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Dec 3, 2025 at 3:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > relid             | 16391
> > > > schemaname        | public
> > > > relname           | conf_tab
> > > > conflict_type     | multiple_unique_conflicts
> > > > remote_xid        | 761
> > > > remote_commit_lsn | 0/01761400
> > > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > > remote_origin     | pg_16406
> > > > key_tuple         |
> > > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > > local_conflicts   |
> > > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > > >
> > >
> > > Thanks, it looks good. For the benefit of others, could you include a
> > > brief note, perhaps in the commit message for now, describing how to
> > > access or read this array column? We can remove it later.
> >
> > Thanks, okay, temporarily I have added in a commit message how we can
> > fetch the data from the JSON array field.  In next version I will add
> > a test to get the conflict stored in conflict log history table and
> > fetch from it.
> >
>
> I've reviewed the v9 patch and here are some comments:

Thanks for reviewing this and your valuable comments.

> The patch utilizes SPI for creating and dropping the conflict history
> table, but I'm really not sure if it's okay because it's actually
> affected by some GUC parameters such as default_tablespace and
> default_toast_compression etc. Also, probably some hooks and event
> triggers could be fired during the creation and removal. Is it
> intentional behavior? I'm concerned that it would make investigation
> harder if an issue happened in the user environment.

Hmm, interesting point, well we can control the value of default
parameters while creating the table using SPI, but I don't see any
reason to not use heap_create_with_catalog() directly, so maybe that's
a better choice than using SPI because then we don't need to bother
about any event triggers/utility hooks etc.  Although I don't see any
specific issue with that, unless the user intentionally wants to
create trouble while creating this table.  What do others think about
it?

> ---
> +   /* build and execute the CREATE TABLE query. */
> +   appendStringInfo(&querybuf,
> +                    "CREATE TABLE %s.%s ("
> +                    "relid Oid,"
> +                    "schemaname TEXT,"
> +                    "relname TEXT,"
> +                    "conflict_type TEXT,"
> +                    "remote_xid xid,"
> +                    "remote_commit_lsn pg_lsn,"
> +                    "remote_commit_ts TIMESTAMPTZ,"
> +                    "remote_origin TEXT,"
> +                    "key_tuple     JSON,"
> +                    "remote_tuple  JSON,"
> +                    "local_conflicts JSON[])",
> +                    quote_identifier(get_namespace_name(namespaceId)),
> +                    quote_identifier(conflictrel));
>
> If we want to use SPI for history table creation, we should use
> qualified names in all the places including data types.

That's true, so that we can avoid interference of any user created types.

> ---
> The patch doesn't create the dependency between the subscription and
> the conflict history table. So users can entirely drop the schema
> (with CASCADE option) where the history table is created.

I think as part of the initial discussion we thought since it is
created under the subscription owner privileges so only that user can
drop that table and if the user intentionally drops the table the
conflict will not be recorded in the table and that's acceptable. But
now I think it would be a good idea to maintain the dependency with
subscription so that users can not drop it without dropping the
subscription.

 And once
> dropping the schema along with the history table, ALTER SUBSCRIPTION
> ... SET (conflict_history_table = '') seems not to work (I got a
> SEGV).

I will check this, thanks

> ---
> We can create the history table in pg_temp namespace but it should not
> be allowed.

Right, will check this and also add the test for the same.

> ---
> I think the conflict history table should not be transferred to the
> new cluster when pg_upgrade since the table definition could be
> different across major versions.

Let me think more on this with respect to behaviour of other factors
like subscriptions etc.

> I got the following log when the publisher disables track_commit_timestamp:
>
> local_conflicts   |
> {"{\"xid\":\"790\",\"commit_ts\":\"1999-12-31T16:00:00-08:00\",\"origin\":\"\",\"tuple\":{\"c\":1}}"}
>
> I think we can omit commit_ts when it's omitted.

+1

> ---
> I think we should keep the history table name case-sensitive:

Yeah we can do that, it looks good to me, what do others think about it?


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Dec 3, 2025 at 4:57 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> >
> > Thanks, it looks good. For the benefit of others, could you include a
> > brief note, perhaps in the commit message for now, describing how to
> > access or read this array column? We can remove it later.
>
> Thanks, okay, temporarily I have added in a commit message how we can
> fetch the data from the JSON array field.  In next version I will add
> a test to get the conflict stored in conflict log history table and
> fetch from it.
>

Thanks, I have not looked at the patch in detail yet, but a few things:

1)
Assert is hit here:
 LOG:  logical replication apply worker for subscription "sub1" has started
TRAP: failed Assert("slot != NULL"), File: "conflict.c", Line: 669, PID: 137604

Steps: create table tab1 (i int primary key, j int);
Pub: insert into tab1 values(10,10); insert into tab1 values(20,10);
Sub:  delete from tab1 where i=10;
Pub:  delete from tab1 where i=10;

2)
I see that key_tuple still points to RI and there is no RI field
added. It seems that discussion at [1] is missed in this patch.

[1]: https://www.postgresql.org/message-id/CAA4eK1L3umixUUik7Ef1eU%3Dx-JMb8iXD7rWWExBMP4dmOGTS9A%40mail.gmail.com

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi. Some review comments for v9-0001.

======
Commit message.

1.
Note: A single remote tuple may conflict with multiple local conflict
when conflict type
is CT_MULTIPLE_UNIQUE_CONFLICTS, so for handling this case we create a
single row in
conflict log table with respect to each remote conflict row even if it
conflicts with
multiple local rows and we store the multiple conflict tuples as a
single JSON array
element in format as
[ { "xid": "1001", "commit_ts": "...", "origin": "...", "tuple": {...} }, ... ]
We can extract the elements from local tuple as given in below example

~

Something seems broken/confused with this description:

1a.
"A single remote tuple may conflict with multiple local conflict"
Should that say "... with multiple local tuples" ?

~

1b.
There is a mixture of terminology here, "row" vs "tuple", which
doesn't seem correct.

~

1c.
"We can extract the elements from local tuple"
Should that say "... elements of the local tuples from the CLT row ..."

======
src/backend/replication/logical/conflict.c

2.
+
+#define N_LOCAL_CONFLICT_INFO_ATTRS 4

I felt it would be better to put this where it is used. e.g. IMO put
it within the build_conflict_tupledesc().

~~~

InsertConflictLogTuple:

3.
+ /* A valid tuple must be prepared and store in MyLogicalRepWorker. */

Typo still here: /store in/stored in/

~~~

4.
+static TupleDesc
+build_conflict_tupledesc(void)
+{
+ TupleDesc tupdesc;
+
+ tupdesc = CreateTemplateTupleDesc(N_LOCAL_CONFLICT_INFO_ATTRS);
+
+ TupleDescInitEntry(tupdesc, (AttrNumber) 1, "xid",
+ XIDOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "commit_ts",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 3, "origin",
+ TEXTOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 4, "tuple",
+ JSONOID, -1, 0);

If you had some incrementing attno instead of hard-wiring the
(1,2,3,4) then you'd be able to add a sanity check like Assert(attno +
1 ==  N_LOCAL_CONFLICT_INFO_ATTRS); that can safeguard against future
mistakes in case something changes without updating the constant.

~~~

build_local_conflicts_json_array:

5.
+ /* Process local conflict tuple list and prepare a array of JSON. */
+ foreach(lc, conflicttuples)
  {
- tableslot = table_slot_create(localrel, &estate->es_tupleTable);
- tableslot = ExecCopySlot(tableslot, slot);
+ ConflictTupleInfo *conflicttuple = (ConflictTupleInfo *) lfirst(lc);

5a.
typo in comment: /a array/an array/

~

5b.
SUGGESTION
foreach_ptr(ConflictTupleInfo, conflicttuple, confrlicttuples)
{

~~~

6.
+ i = 0;
+ foreach(lc, json_datums)
+ {
+ json_datum_array[i] = (Datum) lfirst(lc);
+ json_null_array[i] = false;
+ i++;
+ }

6a.
The loop seemed to be unnecessarily complicated since you already know
the size. Isn't it the same as below?

SUGGESTION
for (int i = 0; i < num_conflicts; i++)
{
  json_datum_array[i] = (Datum) list_nth(json_datums, i);
  json_null_array[i] = false;
}

6b.
Also, there is probably no need to do json_null_array[i] = false; at
every iteration here, because you could have just used palloc0 for the
whole array in the first place.

======
src/test/regress/expected/subscription.out

7.
+-- check if the table exists and has the correct schema (15 columns)
+SELECT count(*) FROM pg_attribute WHERE attrelid =
'public.regress_conflict_log1'::regclass AND attnum > 0;
+ count
+-------
+    11
+(1 row)
+

That comment is wrong; there aren't 15 columns anymore.

~~~

8.
(mentioned in a previous review)

I felt that \dRs should display the CLT's schema name in the "Conflict
log table" field -- at least when it's not "public". Otherwise, it
won't be easy for the user to know it.

I did not see a test case for this.

~~~

9.
(mentioned in a previous review)

You could have another test case to explicitly call the function
pg_relation_is_publishable(clt) to verify it returns false for a CTL
table.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Dec 4, 2025 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Dec 4, 2025 at 7:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
>
> > The patch utilizes SPI for creating and dropping the conflict history
> > table, but I'm really not sure if it's okay because it's actually
> > affected by some GUC parameters such as default_tablespace and
> > default_toast_compression etc. Also, probably some hooks and event
> > triggers could be fired during the creation and removal. Is it
> > intentional behavior? I'm concerned that it would make investigation
> > harder if an issue happened in the user environment.
>
> Hmm, interesting point, well we can control the value of default
> parameters while creating the table using SPI, but I don't see any
> reason to not use heap_create_with_catalog() directly, so maybe that's
> a better choice than using SPI because then we don't need to bother
> about any event triggers/utility hooks etc.  Although I don't see any
> specific issue with that, unless the user intentionally wants to
> create trouble while creating this table.  What do others think about
> it?
>
> > ---
> > +   /* build and execute the CREATE TABLE query. */
> > +   appendStringInfo(&querybuf,
> > +                    "CREATE TABLE %s.%s ("
> > +                    "relid Oid,"
> > +                    "schemaname TEXT,"
> > +                    "relname TEXT,"
> > +                    "conflict_type TEXT,"
> > +                    "remote_xid xid,"
> > +                    "remote_commit_lsn pg_lsn,"
> > +                    "remote_commit_ts TIMESTAMPTZ,"
> > +                    "remote_origin TEXT,"
> > +                    "key_tuple     JSON,"
> > +                    "remote_tuple  JSON,"
> > +                    "local_conflicts JSON[])",
> > +                    quote_identifier(get_namespace_name(namespaceId)),
> > +                    quote_identifier(conflictrel));
> >
> > If we want to use SPI for history table creation, we should use
> > qualified names in all the places including data types.
>
> That's true, so that we can avoid interference of any user created types.
>
> > ---
> > The patch doesn't create the dependency between the subscription and
> > the conflict history table. So users can entirely drop the schema
> > (with CASCADE option) where the history table is created.
>
> I think as part of the initial discussion we thought since it is
> created under the subscription owner privileges so only that user can
> drop that table and if the user intentionally drops the table the
> conflict will not be recorded in the table and that's acceptable. But
> now I think it would be a good idea to maintain the dependency with
> subscription so that users can not drop it without dropping the
> subscription.
>

Yeah, it seems reasonable to maintain its dependency with the
subscription in this model. BTW, for this it would be easier to record
dependency, if we use heap_create_with_catalog() as we do for
create_toast_table(). The other places where we use SPI interface to
execute statements are either the places where we need to execute
multiple SQL statements or non-CREATE Table statements. So, for this
patch's purpose, I feel heap_create_with_catalog() suits more.

I was also thinking whether it is a good idea to create one global
conflict table and let all subscriptions use it. However, it has
disadvantages like whenever, user drops any subscription, we need to
DELETE all conflict rows for that subscription causing the need for
vacuum. Then we somehow need to ensure that conflicts from one
subscription_owner are not visible to other subscription_owner via
some RLS policy. So, catalog table per-subscription (aka) the current
way appears better.

Also, shall we give the option to the user where she wants to see
conflict/resolution information? One idea to achieve the same is to
provide subscription options like (a) conflict_resolution_format, the
values could be log and table for now, in future, one could extend it
to other options like xml, json, etc. (b) conflict_log_table: in this
user can specify the conflict table name, this can be optional such
that if user omits this and conflict_resolution_format is table, then
we will use internally generated table name like
pg_conflicts_<subscription_id>.

>  And once
> > dropping the schema along with the history table, ALTER SUBSCRIPTION
> > ... SET (conflict_history_table = '') seems not to work (I got a
> > SEGV).
>
> I will check this, thanks
>
> > ---
> > We can create the history table in pg_temp namespace but it should not
> > be allowed.
>
> Right, will check this and also add the test for the same.
>
> > ---
> > I think the conflict history table should not be transferred to the
> > new cluster when pg_upgrade since the table definition could be
> > different across major versions.
>
> Let me think more on this with respect to behaviour of other factors
> like subscriptions etc.
>

Can we deal with different schema of tables across versions via
pg_dump/restore during upgrade?

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Wed, 3 Dec 2025 at 16:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > relid             | 16391
> > > schemaname        | public
> > > relname           | conf_tab
> > > conflict_type     | multiple_unique_conflicts
> > > remote_xid        | 761
> > > remote_commit_lsn | 0/01761400
> > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > remote_origin     | pg_16406
> > > key_tuple         |
> > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > local_conflicts   |
> > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > >
> >
> > Thanks, it looks good. For the benefit of others, could you include a
> > brief note, perhaps in the commit message for now, describing how to
> > access or read this array column? We can remove it later.
>
> Thanks, okay, temporarily I have added in a commit message how we can
> fetch the data from the JSON array field.  In next version I will add
> a test to get the conflict stored in conflict log history table and
> fetch from it.

I noticed that the table structure can get changed by the time the
conflict record is prepared. In ReportApplyConflict(), the code
currently prepares the conflict log tuple before deciding whether the
insertion will be immediate or deferred:
+       /* Insert conflict details to conflict log table. */
+       if (conflictlogrel)
+       {
+               /*
+                * Prepare the conflict log tuple. If the error level
is below ERROR,
+                * insert it immediately. Otherwise, defer the
insertion to a new
+                * transaction after the current one aborts, ensuring
the insertion of
+                * the log tuple is not rolled back.
+                */
+               prepare_conflict_log_tuple(estate,
+
relinfo->ri_RelationDesc,
+
conflictlogrel,
+                                                                  type,
+                                                                  searchslot,
+
conflicttuples,
+                                                                  remoteslot);
+               if (elevel < ERROR)
+                       InsertConflictLogTuple(conflictlogrel);
+
+               table_close(conflictlogrel, RowExclusiveLock);
+       }

If the conflict history table defintion is changed just before
prepare_conflict_log_tuple, the tuple creation will crash:
Program received signal SIGSEGV, Segmentation fault.
0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
../../../../src/include/varatt.h:419
419 return VARATT_IS_4B_U(PTR) &&
(gdb) bt
#0  0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
../../../../src/include/varatt.h:419
#1  0x00005a342e01e5ed in heap_compute_data_size
(tupleDesc=0x7ab405e5dda8, values=0x7ffd7af3ad20,
isnull=0x7ffd7af3ad15) at heaptuple.c:239
#2  0x00005a342e0200dd in heap_form_tuple
(tupleDescriptor=0x7ab405e5dda8, values=0x7ffd7af3ad20,
isnull=0x7ffd7af3ad15) at heaptuple.c:1158
#3  0x00005a342e55e8c2 in prepare_conflict_log_tuple
(estate=0x5a3467944530, rel=0x7ab405e594e8,
conflictlogrel=0x7ab405e5da88, conflict_type=CT_INSERT_EXISTS,
searchslot=0x0,
    conflicttuples=0x5a3467942da0, remoteslot=0x5a346792e498) at conflict.c:936
#4  0x00005a342e55cea6 in ReportApplyConflict (estate=0x5a3467944530,
relinfo=0x5a346792e778, elevel=21, type=CT_INSERT_EXISTS,
searchslot=0x0, remoteslot=0x5a346792e498,
    conflicttuples=0x5a3467942da0) at conflict.c:168
#5  0x00005a342e348c35 in CheckAndReportConflict
(resultRelInfo=0x5a346792e778, estate=0x5a3467944530,
type=CT_INSERT_EXISTS, recheckIndexes=0x5a3467942648, searchslot=0x0,
    remoteslot=0x5a346792e498) at execReplication.c:793

This can be reproduced by the following steps:
CREATE PUBLICATION pub;
CREATE SUBSCRIPTION sub ... WITH (conflict_log_table = 'conflict');
ALTER TABLE conflict RENAME TO conflict1:
CREATE TABLE conflict(c1 varchar, c2 varchar);
-- Cause a conflict, this will crash while trying to prepare the
conflicting tuple

Regards,
Vignesh



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Wed, 3 Dec 2025 at 16:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > relid             | 16391
> > > schemaname        | public
> > > relname           | conf_tab
> > > conflict_type     | multiple_unique_conflicts
> > > remote_xid        | 761
> > > remote_commit_lsn | 0/01761400
> > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > remote_origin     | pg_16406
> > > key_tuple         |
> > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > local_conflicts   |
> > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > >
> >
> > Thanks, it looks good. For the benefit of others, could you include a
> > brief note, perhaps in the commit message for now, describing how to
> > access or read this array column? We can remove it later.
>
> Thanks, okay, temporarily I have added in a commit message how we can
> fetch the data from the JSON array field.  In next version I will add
> a test to get the conflict stored in conflict log history table and
> fetch from it.

Few comments:
1) Currently pg_dump is not dumping conflict_log_table option, I felt
it should be included while dumping.

2) Is there a way to unset the conflict log table after we create the
subscription with conflict_log_table option

3) Any reason why this table should not be allowed to add to a publication:
+       /* Can't be conflict log table */
+       if (IsConflictLogTable(RelationGetRelid(targetrel)))
+               ereport(ERROR,
+                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                errmsg("cannot add relation \"%s.%s\"
to publication",
+
get_namespace_name(RelationGetNamespace(targetrel)),
+
RelationGetRelationName(targetrel)),
+                                errdetail("This operation is not
supported for conflict log tables.")));

Is the reason like the same table can be a conflict table in the
subscriber and prevent corruption in the subscriber

4) I did not find any documentation for this feature, can we include
documentation in create_subscription.sgml, alter_subscription.sgml and
logical_replication.sgml

Regards,
Vignesh



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Dec 4, 2025 at 8:05 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 3 Dec 2025 at 16:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > relid             | 16391
> > > > schemaname        | public
> > > > relname           | conf_tab
> > > > conflict_type     | multiple_unique_conflicts
> > > > remote_xid        | 761
> > > > remote_commit_lsn | 0/01761400
> > > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > > remote_origin     | pg_16406
> > > > key_tuple         |
> > > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > > local_conflicts   |
> > > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > > >
> > >
> > > Thanks, it looks good. For the benefit of others, could you include a
> > > brief note, perhaps in the commit message for now, describing how to
> > > access or read this array column? We can remove it later.
> >
> > Thanks, okay, temporarily I have added in a commit message how we can
> > fetch the data from the JSON array field.  In next version I will add
> > a test to get the conflict stored in conflict log history table and
> > fetch from it.
>
> I noticed that the table structure can get changed by the time the
> conflict record is prepared. In ReportApplyConflict(), the code
> currently prepares the conflict log tuple before deciding whether the
> insertion will be immediate or deferred:
> +       /* Insert conflict details to conflict log table. */
> +       if (conflictlogrel)
> +       {
> +               /*
> +                * Prepare the conflict log tuple. If the error level
> is below ERROR,
> +                * insert it immediately. Otherwise, defer the
> insertion to a new
> +                * transaction after the current one aborts, ensuring
> the insertion of
> +                * the log tuple is not rolled back.
> +                */
> +               prepare_conflict_log_tuple(estate,
> +
> relinfo->ri_RelationDesc,
> +
> conflictlogrel,
> +                                                                  type,
> +                                                                  searchslot,
> +
> conflicttuples,
> +                                                                  remoteslot);
> +               if (elevel < ERROR)
> +                       InsertConflictLogTuple(conflictlogrel);
> +
> +               table_close(conflictlogrel, RowExclusiveLock);
> +       }
>
> If the conflict history table defintion is changed just before
> prepare_conflict_log_tuple, the tuple creation will crash:
> Program received signal SIGSEGV, Segmentation fault.
> 0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
> ../../../../src/include/varatt.h:419
> 419 return VARATT_IS_4B_U(PTR) &&
> (gdb) bt
> #0  0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
> ../../../../src/include/varatt.h:419
> #1  0x00005a342e01e5ed in heap_compute_data_size
> (tupleDesc=0x7ab405e5dda8, values=0x7ffd7af3ad20,
> isnull=0x7ffd7af3ad15) at heaptuple.c:239
> #2  0x00005a342e0200dd in heap_form_tuple
> (tupleDescriptor=0x7ab405e5dda8, values=0x7ffd7af3ad20,
> isnull=0x7ffd7af3ad15) at heaptuple.c:1158
> #3  0x00005a342e55e8c2 in prepare_conflict_log_tuple
> (estate=0x5a3467944530, rel=0x7ab405e594e8,
> conflictlogrel=0x7ab405e5da88, conflict_type=CT_INSERT_EXISTS,
> searchslot=0x0,
>     conflicttuples=0x5a3467942da0, remoteslot=0x5a346792e498) at conflict.c:936
> #4  0x00005a342e55cea6 in ReportApplyConflict (estate=0x5a3467944530,
> relinfo=0x5a346792e778, elevel=21, type=CT_INSERT_EXISTS,
> searchslot=0x0, remoteslot=0x5a346792e498,
>     conflicttuples=0x5a3467942da0) at conflict.c:168
> #5  0x00005a342e348c35 in CheckAndReportConflict
> (resultRelInfo=0x5a346792e778, estate=0x5a3467944530,
> type=CT_INSERT_EXISTS, recheckIndexes=0x5a3467942648, searchslot=0x0,
>     remoteslot=0x5a346792e498) at execReplication.c:793
>
> This can be reproduced by the following steps:
> CREATE PUBLICATION pub;
> CREATE SUBSCRIPTION sub ... WITH (conflict_log_table = 'conflict');
> ALTER TABLE conflict RENAME TO conflict1:
> CREATE TABLE conflict(c1 varchar, c2 varchar);
> -- Cause a conflict, this will crash while trying to prepare the
> conflicting tuple

Yeah while it is allowed to drop or alter the conflict log table, it
should not seg fault, IMHO error is acceptable as per the initial
discussion, so I will look into this and tighten up the logic so that
it will throw an error whenever it can not insert into the conflict
log table.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 5, 2025 at 9:24 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 3 Dec 2025 at 16:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > relid             | 16391
> > > > schemaname        | public
> > > > relname           | conf_tab
> > > > conflict_type     | multiple_unique_conflicts
> > > > remote_xid        | 761
> > > > remote_commit_lsn | 0/01761400
> > > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > > remote_origin     | pg_16406
> > > > key_tuple         |
> > > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > > local_conflicts   |
> > > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > > >
> > >
> > > Thanks, it looks good. For the benefit of others, could you include a
> > > brief note, perhaps in the commit message for now, describing how to
> > > access or read this array column? We can remove it later.
> >
> > Thanks, okay, temporarily I have added in a commit message how we can
> > fetch the data from the JSON array field.  In next version I will add
> > a test to get the conflict stored in conflict log history table and
> > fetch from it.
>
> Few comments:
> 1) Currently pg_dump is not dumping conflict_log_table option, I felt
> it should be included while dumping.

Yeah, we should.

> 2) Is there a way to unset the conflict log table after we create the
> subscription with conflict_log_table option

IMHO we can use ALTER SUBSCRIPTION...WITH(conflict_log_table='') so
unset? What do others think about it?

> 3) Any reason why this table should not be allowed to add to a publication:
> +       /* Can't be conflict log table */
> +       if (IsConflictLogTable(RelationGetRelid(targetrel)))
> +               ereport(ERROR,
> +                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                                errmsg("cannot add relation \"%s.%s\"
> to publication",
> +
> get_namespace_name(RelationGetNamespace(targetrel)),
> +
> RelationGetRelationName(targetrel)),
> +                                errdetail("This operation is not
> supported for conflict log tables.")));
>
> Is the reason like the same table can be a conflict table in the
> subscriber and prevent corruption in the subscriber

The main reason was that, since these tables are internally created
for maintaining the conflict information which is very much internal
node specific details, so there is no reason someone want to replicate
those tables, so we blocked it with ALL TABLES option and then based
on suggestion from Shveta we blocked it from getting added to
publication as well.  So there is no strong reason to disallow from
forcefully getting added to publication OTOH there is no reason why
someone wants to do that considering those are internally managed
tables.

> 4) I did not find any documentation for this feature, can we include
> documentation in create_subscription.sgml, alter_subscription.sgml and
> logical_replication.sgml

Yeah, in the initial version I posted a doc patch, but since we are
doing changes in the first patch and also some behavior might change
so I will postpone it for a later stage after we have consensus on
most of the behaviour.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Dec 5, 2025 at 10:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 5, 2025 at 9:24 AM vignesh C <vignesh21@gmail.com> wrote:
> >
>
> > 2) Is there a way to unset the conflict log table after we create the
> > subscription with conflict_log_table option
>
> IMHO we can use ALTER SUBSCRIPTION...WITH(conflict_log_table='') so
> unset? What do others think about it?
>

We already have a syntax: ALTER SUBSCRIPTION name SET (
subscription_parameter [= value] [, ... ] ) which can be used to
set/unset this new subscription option.

> > 3) Any reason why this table should not be allowed to add to a publication:
> > +       /* Can't be conflict log table */
> > +       if (IsConflictLogTable(RelationGetRelid(targetrel)))
> > +               ereport(ERROR,
> > +                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > +                                errmsg("cannot add relation \"%s.%s\"
> > to publication",
> > +
> > get_namespace_name(RelationGetNamespace(targetrel)),
> > +
> > RelationGetRelationName(targetrel)),
> > +                                errdetail("This operation is not
> > supported for conflict log tables.")));
> >
> > Is the reason like the same table can be a conflict table in the
> > subscriber and prevent corruption in the subscriber
>
> The main reason was that, since these tables are internally created
> for maintaining the conflict information which is very much internal
> node specific details, so there is no reason someone want to replicate
> those tables, so we blocked it with ALL TABLES option and then based
> on suggestion from Shveta we blocked it from getting added to
> publication as well.  So there is no strong reason to disallow from
> forcefully getting added to publication OTOH there is no reason why
> someone wants to do that considering those are internally managed
> tables.
>

I also don't see any reason to allow such internal tables to be
replicated. So, it is okay to prohibit them for now. If we see any use
case, we can allow it.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Also, shall we give the option to the user where she wants to see
> conflict/resolution information? One idea to achieve the same is to
> provide subscription options like (a) conflict_resolution_format, the
> values could be log and table for now, in future, one could extend it
> to other options like xml, json, etc. (b) conflict_log_table: in this
> user can specify the conflict table name, this can be optional such
> that if user omits this and conflict_resolution_format is table, then
> we will use internally generated table name like
> pg_conflicts_<subscription_id>.
>

In this idea, we can keep the name of the second option as
conflict_log_name instead of conflict_log_table. This can help us LOG
the conflicts in a totally separate conflict file instead of in server
log. Say, the user provides conflict_resolution_format as 'log' and
conflict_log_name as 'conflict_report' then we can report conflicts in
this separate file by appending subid to distinguish it. And, if the
user gives only the first option conflict_resolution_format as 'log'
then we can keep reporting the information in server log files.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Fri, Dec 5, 2025 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Also, shall we give the option to the user where she wants to see
> > conflict/resolution information? One idea to achieve the same is to
> > provide subscription options like (a) conflict_resolution_format, the
> > values could be log and table for now, in future, one could extend it
> > to other options like xml, json, etc. (b) conflict_log_table: in this
> > user can specify the conflict table name, this can be optional such
> > that if user omits this and conflict_resolution_format is table, then
> > we will use internally generated table name like
> > pg_conflicts_<subscription_id>.
> >
>
> In this idea, we can keep the name of the second option as
> conflict_log_name instead of conflict_log_table. This can help us LOG
> the conflicts in a totally separate conflict file instead of in server
> log. Say, the user provides conflict_resolution_format as 'log' and
> conflict_log_name as 'conflict_report' then we can report conflicts in
> this separate file by appending subid to distinguish it. And, if the
> user gives only the first option conflict_resolution_format as 'log'
> then we can keep reporting the information in server log files.
>

+1 on the idea.
Instead of using conflict_resolution_format, I feel it should be
conflict_log_format as we are referring to LOGs and not resolutions.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 5, 2025 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Also, shall we give the option to the user where she wants to see
> > conflict/resolution information? One idea to achieve the same is to
> > provide subscription options like (a) conflict_resolution_format, the
> > values could be log and table for now, in future, one could extend it
> > to other options like xml, json, etc. (b) conflict_log_table: in this
> > user can specify the conflict table name, this can be optional such
> > that if user omits this and conflict_resolution_format is table, then
> > we will use internally generated table name like
> > pg_conflicts_<subscription_id>.
> >
>
> In this idea, we can keep the name of the second option as
> conflict_log_name instead of conflict_log_table. This can help us LOG
> the conflicts in a totally separate conflict file instead of in server
> log. Say, the user provides conflict_resolution_format as 'log' and
> conflict_log_name as 'conflict_report' then we can report conflicts in
> this separate file by appending subid to distinguish it. And, if the
> user gives only the first option conflict_resolution_format as 'log'
> then we can keep reporting the information in server log files.

Yeah that looks good, so considering the extensibility I think we can
keep the option name as 'conflict_log_name' from the first version
itself even if we don't provide all the options in the first version.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 5, 2025 at 10:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Dec 4, 2025 at 8:05 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Wed, 3 Dec 2025 at 16:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > > >
> > > > > relid             | 16391
> > > > > schemaname        | public
> > > > > relname           | conf_tab
> > > > > conflict_type     | multiple_unique_conflicts
> > > > > remote_xid        | 761
> > > > > remote_commit_lsn | 0/01761400
> > > > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > > > remote_origin     | pg_16406
> > > > > key_tuple         |
> > > > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > > > local_conflicts   |
> > > > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > > > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > > > >
> > > >
> > > > Thanks, it looks good. For the benefit of others, could you include a
> > > > brief note, perhaps in the commit message for now, describing how to
> > > > access or read this array column? We can remove it later.
> > >
> > > Thanks, okay, temporarily I have added in a commit message how we can
> > > fetch the data from the JSON array field.  In next version I will add
> > > a test to get the conflict stored in conflict log history table and
> > > fetch from it.
> >
> > I noticed that the table structure can get changed by the time the
> > conflict record is prepared. In ReportApplyConflict(), the code
> > currently prepares the conflict log tuple before deciding whether the
> > insertion will be immediate or deferred:
> > +       /* Insert conflict details to conflict log table. */
> > +       if (conflictlogrel)
> > +       {
> > +               /*
> > +                * Prepare the conflict log tuple. If the error level
> > is below ERROR,
> > +                * insert it immediately. Otherwise, defer the
> > insertion to a new
> > +                * transaction after the current one aborts, ensuring
> > the insertion of
> > +                * the log tuple is not rolled back.
> > +                */
> > +               prepare_conflict_log_tuple(estate,
> > +
> > relinfo->ri_RelationDesc,
> > +
> > conflictlogrel,
> > +                                                                  type,
> > +                                                                  searchslot,
> > +
> > conflicttuples,
> > +                                                                  remoteslot);
> > +               if (elevel < ERROR)
> > +                       InsertConflictLogTuple(conflictlogrel);
> > +
> > +               table_close(conflictlogrel, RowExclusiveLock);
> > +       }
> >
> > If the conflict history table defintion is changed just before
> > prepare_conflict_log_tuple, the tuple creation will crash:
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
> > ../../../../src/include/varatt.h:419
> > 419 return VARATT_IS_4B_U(PTR) &&
> > (gdb) bt
> > #0  0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
> > ../../../../src/include/varatt.h:419
> > #1  0x00005a342e01e5ed in heap_compute_data_size
> > (tupleDesc=0x7ab405e5dda8, values=0x7ffd7af3ad20,
> > isnull=0x7ffd7af3ad15) at heaptuple.c:239
> > #2  0x00005a342e0200dd in heap_form_tuple
> > (tupleDescriptor=0x7ab405e5dda8, values=0x7ffd7af3ad20,
> > isnull=0x7ffd7af3ad15) at heaptuple.c:1158
> > #3  0x00005a342e55e8c2 in prepare_conflict_log_tuple
> > (estate=0x5a3467944530, rel=0x7ab405e594e8,
> > conflictlogrel=0x7ab405e5da88, conflict_type=CT_INSERT_EXISTS,
> > searchslot=0x0,
> >     conflicttuples=0x5a3467942da0, remoteslot=0x5a346792e498) at conflict.c:936
> > #4  0x00005a342e55cea6 in ReportApplyConflict (estate=0x5a3467944530,
> > relinfo=0x5a346792e778, elevel=21, type=CT_INSERT_EXISTS,
> > searchslot=0x0, remoteslot=0x5a346792e498,
> >     conflicttuples=0x5a3467942da0) at conflict.c:168
> > #5  0x00005a342e348c35 in CheckAndReportConflict
> > (resultRelInfo=0x5a346792e778, estate=0x5a3467944530,
> > type=CT_INSERT_EXISTS, recheckIndexes=0x5a3467942648, searchslot=0x0,
> >     remoteslot=0x5a346792e498) at execReplication.c:793
> >
> > This can be reproduced by the following steps:
> > CREATE PUBLICATION pub;
> > CREATE SUBSCRIPTION sub ... WITH (conflict_log_table = 'conflict');
> > ALTER TABLE conflict RENAME TO conflict1:
> > CREATE TABLE conflict(c1 varchar, c2 varchar);
> > -- Cause a conflict, this will crash while trying to prepare the
> > conflicting tuple
>
> Yeah while it is allowed to drop or alter the conflict log table, it
> should not seg fault, IMHO error is acceptable as per the initial
> discussion, so I will look into this and tighten up the logic so that
> it will throw an error whenever it can not insert into the conflict
> log table.

I was thinking about the solution that we need to do if table
definition is changed, one option is whenever we try to prepare the
tuple after acquiring the lock we can validate the table definition if
this doesn't qualify the standard conflict log table schema we can
ERROR out.  IMHO that should not be an issue as we are only doing this
in conflict logging.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Sat, 6 Dec 2025 at 20:36, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 5, 2025 at 10:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Dec 4, 2025 at 8:05 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Wed, 3 Dec 2025 at 16:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > > > >
> > > > > > relid             | 16391
> > > > > > schemaname        | public
> > > > > > relname           | conf_tab
> > > > > > conflict_type     | multiple_unique_conflicts
> > > > > > remote_xid        | 761
> > > > > > remote_commit_lsn | 0/01761400
> > > > > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > > > > remote_origin     | pg_16406
> > > > > > key_tuple         |
> > > > > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > > > > local_conflicts   |
> > > > > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > > > > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > > > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > > > > >
> > > > >
> > > > > Thanks, it looks good. For the benefit of others, could you include a
> > > > > brief note, perhaps in the commit message for now, describing how to
> > > > > access or read this array column? We can remove it later.
> > > >
> > > > Thanks, okay, temporarily I have added in a commit message how we can
> > > > fetch the data from the JSON array field.  In next version I will add
> > > > a test to get the conflict stored in conflict log history table and
> > > > fetch from it.
> > >
> > > I noticed that the table structure can get changed by the time the
> > > conflict record is prepared. In ReportApplyConflict(), the code
> > > currently prepares the conflict log tuple before deciding whether the
> > > insertion will be immediate or deferred:
> > > +       /* Insert conflict details to conflict log table. */
> > > +       if (conflictlogrel)
> > > +       {
> > > +               /*
> > > +                * Prepare the conflict log tuple. If the error level
> > > is below ERROR,
> > > +                * insert it immediately. Otherwise, defer the
> > > insertion to a new
> > > +                * transaction after the current one aborts, ensuring
> > > the insertion of
> > > +                * the log tuple is not rolled back.
> > > +                */
> > > +               prepare_conflict_log_tuple(estate,
> > > +
> > > relinfo->ri_RelationDesc,
> > > +
> > > conflictlogrel,
> > > +                                                                  type,
> > > +                                                                  searchslot,
> > > +
> > > conflicttuples,
> > > +                                                                  remoteslot);
> > > +               if (elevel < ERROR)
> > > +                       InsertConflictLogTuple(conflictlogrel);
> > > +
> > > +               table_close(conflictlogrel, RowExclusiveLock);
> > > +       }
> > >
> > > If the conflict history table defintion is changed just before
> > > prepare_conflict_log_tuple, the tuple creation will crash:
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
> > > ../../../../src/include/varatt.h:419
> > > 419 return VARATT_IS_4B_U(PTR) &&
> > > (gdb) bt
> > > #0  0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
> > > ../../../../src/include/varatt.h:419
> > > #1  0x00005a342e01e5ed in heap_compute_data_size
> > > (tupleDesc=0x7ab405e5dda8, values=0x7ffd7af3ad20,
> > > isnull=0x7ffd7af3ad15) at heaptuple.c:239
> > > #2  0x00005a342e0200dd in heap_form_tuple
> > > (tupleDescriptor=0x7ab405e5dda8, values=0x7ffd7af3ad20,
> > > isnull=0x7ffd7af3ad15) at heaptuple.c:1158
> > > #3  0x00005a342e55e8c2 in prepare_conflict_log_tuple
> > > (estate=0x5a3467944530, rel=0x7ab405e594e8,
> > > conflictlogrel=0x7ab405e5da88, conflict_type=CT_INSERT_EXISTS,
> > > searchslot=0x0,
> > >     conflicttuples=0x5a3467942da0, remoteslot=0x5a346792e498) at conflict.c:936
> > > #4  0x00005a342e55cea6 in ReportApplyConflict (estate=0x5a3467944530,
> > > relinfo=0x5a346792e778, elevel=21, type=CT_INSERT_EXISTS,
> > > searchslot=0x0, remoteslot=0x5a346792e498,
> > >     conflicttuples=0x5a3467942da0) at conflict.c:168
> > > #5  0x00005a342e348c35 in CheckAndReportConflict
> > > (resultRelInfo=0x5a346792e778, estate=0x5a3467944530,
> > > type=CT_INSERT_EXISTS, recheckIndexes=0x5a3467942648, searchslot=0x0,
> > >     remoteslot=0x5a346792e498) at execReplication.c:793
> > >
> > > This can be reproduced by the following steps:
> > > CREATE PUBLICATION pub;
> > > CREATE SUBSCRIPTION sub ... WITH (conflict_log_table = 'conflict');
> > > ALTER TABLE conflict RENAME TO conflict1:
> > > CREATE TABLE conflict(c1 varchar, c2 varchar);
> > > -- Cause a conflict, this will crash while trying to prepare the
> > > conflicting tuple
> >
> > Yeah while it is allowed to drop or alter the conflict log table, it
> > should not seg fault, IMHO error is acceptable as per the initial
> > discussion, so I will look into this and tighten up the logic so that
> > it will throw an error whenever it can not insert into the conflict
> > log table.
>
> I was thinking about the solution that we need to do if table
> definition is changed, one option is whenever we try to prepare the
> tuple after acquiring the lock we can validate the table definition if
> this doesn't qualify the standard conflict log table schema we can
> ERROR out.  IMHO that should not be an issue as we are only doing this
> in conflict logging.

Should we emit a warning instead of error, to stay consistent with the
other exception case where a warning is raised when the conflict log
table does not exist?
+       /* Conflict log table is dropped or not accessible. */
+       if (conflictlogrel == NULL)
+               ereport(WARNING,
+                               (errcode(ERRCODE_UNDEFINED_TABLE),
+                                errmsg("conflict log table \"%s.%s\"
does not exist",
+
get_namespace_name(nspid), conflictlogtable)));

Regards,
Vignesh



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Dec 8, 2025 at 9:12 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Sat, 6 Dec 2025 at 20:36, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Dec 5, 2025 at 10:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Dec 4, 2025 at 8:05 PM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > On Wed, 3 Dec 2025 at 16:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > > > > >
> > > > > > > relid             | 16391
> > > > > > > schemaname        | public
> > > > > > > relname           | conf_tab
> > > > > > > conflict_type     | multiple_unique_conflicts
> > > > > > > remote_xid        | 761
> > > > > > > remote_commit_lsn | 0/01761400
> > > > > > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > > > > > remote_origin     | pg_16406
> > > > > > > key_tuple         |
> > > > > > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > > > > > local_conflicts   |
> > > > > > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > > > > > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > > > > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > > > > > >
> > > > > >
> > > > > > Thanks, it looks good. For the benefit of others, could you include a
> > > > > > brief note, perhaps in the commit message for now, describing how to
> > > > > > access or read this array column? We can remove it later.
> > > > >
> > > > > Thanks, okay, temporarily I have added in a commit message how we can
> > > > > fetch the data from the JSON array field.  In next version I will add
> > > > > a test to get the conflict stored in conflict log history table and
> > > > > fetch from it.
> > > >
> > > > I noticed that the table structure can get changed by the time the
> > > > conflict record is prepared. In ReportApplyConflict(), the code
> > > > currently prepares the conflict log tuple before deciding whether the
> > > > insertion will be immediate or deferred:
> > > > +       /* Insert conflict details to conflict log table. */
> > > > +       if (conflictlogrel)
> > > > +       {
> > > > +               /*
> > > > +                * Prepare the conflict log tuple. If the error level
> > > > is below ERROR,
> > > > +                * insert it immediately. Otherwise, defer the
> > > > insertion to a new
> > > > +                * transaction after the current one aborts, ensuring
> > > > the insertion of
> > > > +                * the log tuple is not rolled back.
> > > > +                */
> > > > +               prepare_conflict_log_tuple(estate,
> > > > +
> > > > relinfo->ri_RelationDesc,
> > > > +
> > > > conflictlogrel,
> > > > +                                                                  type,
> > > > +                                                                  searchslot,
> > > > +
> > > > conflicttuples,
> > > > +                                                                  remoteslot);
> > > > +               if (elevel < ERROR)
> > > > +                       InsertConflictLogTuple(conflictlogrel);
> > > > +
> > > > +               table_close(conflictlogrel, RowExclusiveLock);
> > > > +       }
> > > >
> > > > If the conflict history table defintion is changed just before
> > > > prepare_conflict_log_tuple, the tuple creation will crash:
> > > > Program received signal SIGSEGV, Segmentation fault.
> > > > 0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
> > > > ../../../../src/include/varatt.h:419
> > > > 419 return VARATT_IS_4B_U(PTR) &&
> > > > (gdb) bt
> > > > #0  0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
> > > > ../../../../src/include/varatt.h:419
> > > > #1  0x00005a342e01e5ed in heap_compute_data_size
> > > > (tupleDesc=0x7ab405e5dda8, values=0x7ffd7af3ad20,
> > > > isnull=0x7ffd7af3ad15) at heaptuple.c:239
> > > > #2  0x00005a342e0200dd in heap_form_tuple
> > > > (tupleDescriptor=0x7ab405e5dda8, values=0x7ffd7af3ad20,
> > > > isnull=0x7ffd7af3ad15) at heaptuple.c:1158
> > > > #3  0x00005a342e55e8c2 in prepare_conflict_log_tuple
> > > > (estate=0x5a3467944530, rel=0x7ab405e594e8,
> > > > conflictlogrel=0x7ab405e5da88, conflict_type=CT_INSERT_EXISTS,
> > > > searchslot=0x0,
> > > >     conflicttuples=0x5a3467942da0, remoteslot=0x5a346792e498) at conflict.c:936
> > > > #4  0x00005a342e55cea6 in ReportApplyConflict (estate=0x5a3467944530,
> > > > relinfo=0x5a346792e778, elevel=21, type=CT_INSERT_EXISTS,
> > > > searchslot=0x0, remoteslot=0x5a346792e498,
> > > >     conflicttuples=0x5a3467942da0) at conflict.c:168
> > > > #5  0x00005a342e348c35 in CheckAndReportConflict
> > > > (resultRelInfo=0x5a346792e778, estate=0x5a3467944530,
> > > > type=CT_INSERT_EXISTS, recheckIndexes=0x5a3467942648, searchslot=0x0,
> > > >     remoteslot=0x5a346792e498) at execReplication.c:793
> > > >
> > > > This can be reproduced by the following steps:
> > > > CREATE PUBLICATION pub;
> > > > CREATE SUBSCRIPTION sub ... WITH (conflict_log_table = 'conflict');
> > > > ALTER TABLE conflict RENAME TO conflict1:
> > > > CREATE TABLE conflict(c1 varchar, c2 varchar);
> > > > -- Cause a conflict, this will crash while trying to prepare the
> > > > conflicting tuple
> > >
> > > Yeah while it is allowed to drop or alter the conflict log table, it
> > > should not seg fault, IMHO error is acceptable as per the initial
> > > discussion, so I will look into this and tighten up the logic so that
> > > it will throw an error whenever it can not insert into the conflict
> > > log table.
> >
> > I was thinking about the solution that we need to do if table
> > definition is changed, one option is whenever we try to prepare the
> > tuple after acquiring the lock we can validate the table definition if
> > this doesn't qualify the standard conflict log table schema we can
> > ERROR out.  IMHO that should not be an issue as we are only doing this
> > in conflict logging.
>
> Should we emit a warning instead of error, to stay consistent with the
> other exception case where a warning is raised when the conflict log
> table does not exist?
> +       /* Conflict log table is dropped or not accessible. */
> +       if (conflictlogrel == NULL)
> +               ereport(WARNING,
> +                               (errcode(ERRCODE_UNDEFINED_TABLE),
> +                                errmsg("conflict log table \"%s.%s\"
> does not exist",
> +
> get_namespace_name(nspid), conflictlogtable)));

Yes this should be WARNING.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 4, 2025 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > > ---
> > > I think the conflict history table should not be transferred to the
> > > new cluster when pg_upgrade since the table definition could be
> > > different across major versions.
> >
> > Let me think more on this with respect to behaviour of other factors
> > like subscriptions etc.
> >
>
> Can we deal with different schema of tables across versions via
> pg_dump/restore during upgrade?
>

While handling the case of conflict_log_table option during pg_dump, I
realized that the restore is trying to create conflict log table 2
different places 1) As part of the regular table dump 2) As part of
the CREATE SUBSCRIPTION when conflict_log_table option is set.

So one option is we can avoid dumping the conflict log tables as part
of the regular table dump if we think that we do not need to conflict
log table data and let it get created as part of the create
subscription command, OTOH if we think we want to keep the conflict
log table data, let it get dumped as part of the regular tables and in
CREATE SUBSCRIPTION we will just set the option but do not create the
table, although we might need to do special handling of this case
because if we allow the existing tables to be set as conflict log
tables then it may allow other user tables to be set, so need to think
how to handle this if we need to go with this option.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Dec 8, 2025 at 10:25 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 4, 2025 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > > ---
> > > > I think the conflict history table should not be transferred to the
> > > > new cluster when pg_upgrade since the table definition could be
> > > > different across major versions.
> > >
> > > Let me think more on this with respect to behaviour of other factors
> > > like subscriptions etc.
> > >
> >
> > Can we deal with different schema of tables across versions via
> > pg_dump/restore during upgrade?
> >
>
> While handling the case of conflict_log_table option during pg_dump, I
> realized that the restore is trying to create conflict log table 2
> different places 1) As part of the regular table dump 2) As part of
> the CREATE SUBSCRIPTION when conflict_log_table option is set.
>
> So one option is we can avoid dumping the conflict log tables as part
> of the regular table dump if we think that we do not need to conflict
> log table data and let it get created as part of the create
> subscription command, OTOH if we think we want to keep the conflict
> log table data,
>

We want to retain conflict_history after upgrade. This is required for
various reasons (a) after upgrade DBA user will still require to
resolved the pending unresolved conflicts, (b) Regulations often
require keeping audit trails for a longer period of time. If a
conflict occurred at time X (which is less than the regulations
requirement) regarding a financial transaction, that record must
survive the upgrade, (c)
If something breaks after the upgrade (e.g., missing rows, constraint
violations), conflict history helps trace root causes. It shows
whether issues existed before the upgrade or were introduced during
migration, (d) as users can query the conflict_history tables, it
should be treated similar to user tables.

BTW, we are also planning to migrate commit_ts data in thread [1]
which would be helpful for conflict_resolutions after upgrade.

 let it get dumped as part of the regular tables and in
> CREATE SUBSCRIPTION we will just set the option but do not create the
> table,
>

Yeah, we can turn this option during CREATE SUBSCRIPTION so that it
doesn't try to create the table again.

> although we might need to do special handling of this case
> because if we allow the existing tables to be set as conflict log
> tables then it may allow other user tables to be set, so need to think
> how to handle this if we need to go with this option.
>

Yeah, probably but it should be allowed internally only not to users.
I think we can split this upgrade handling as a top-up patch at least
for the purpose of review.

[1] - https://www.postgresql.org/message-id/182311743703924%40mail.yandex.ru

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Dec 8, 2025 at 2:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 8, 2025 at 10:25 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Dec 4, 2025 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > > ---
> > > > > I think the conflict history table should not be transferred to the
> > > > > new cluster when pg_upgrade since the table definition could be
> > > > > different across major versions.
> > > >
> > > > Let me think more on this with respect to behaviour of other factors
> > > > like subscriptions etc.
> > > >
> > >
> > > Can we deal with different schema of tables across versions via
> > > pg_dump/restore during upgrade?
> > >
> >
> > While handling the case of conflict_log_table option during pg_dump, I
> > realized that the restore is trying to create conflict log table 2
> > different places 1) As part of the regular table dump 2) As part of
> > the CREATE SUBSCRIPTION when conflict_log_table option is set.
> >
> > So one option is we can avoid dumping the conflict log tables as part
> > of the regular table dump if we think that we do not need to conflict
> > log table data and let it get created as part of the create
> > subscription command, OTOH if we think we want to keep the conflict
> > log table data,
> >
>
> We want to retain conflict_history after upgrade. This is required for
> various reasons (a) after upgrade DBA user will still require to
> resolved the pending unresolved conflicts, (b) Regulations often
> require keeping audit trails for a longer period of time. If a
> conflict occurred at time X (which is less than the regulations
> requirement) regarding a financial transaction, that record must
> survive the upgrade, (c)
> If something breaks after the upgrade (e.g., missing rows, constraint
> violations), conflict history helps trace root causes. It shows
> whether issues existed before the upgrade or were introduced during
> migration, (d) as users can query the conflict_history tables, it
> should be treated similar to user tables.
>
> BTW, we are also planning to migrate commit_ts data in thread [1]
> which would be helpful for conflict_resolutions after upgrade.
>
>  let it get dumped as part of the regular tables and in
> > CREATE SUBSCRIPTION we will just set the option but do not create the
> > table,
> >
>
> Yeah, we can turn this option during CREATE SUBSCRIPTION so that it
> doesn't try to create the table again.
>
> > although we might need to do special handling of this case
> > because if we allow the existing tables to be set as conflict log
> > tables then it may allow other user tables to be set, so need to think
> > how to handle this if we need to go with this option.
> >
>
> Yeah, probably but it should be allowed internally only not to users.

Yeah I wanted to do that, but problem is with dump and restore, I mean
if you just dump into a sql file and execute the sql file at that time
the CREATE SUBSCRIPTION with conflict_log_table option will fail as
the table already exists because it was restored as part of the dump.
I know under binary upgrade we have binary_upgrade flag so can do
special handling not sure how to distinguish the sql executing as part
of the restore or normal sql execution by user?

> I think we can split this upgrade handling as a top-up patch at least
> for the purpose of review.

Make sense.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Dec 8, 2025 at 3:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Dec 8, 2025 at 2:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 8, 2025 at 10:25 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Dec 4, 2025 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > > ---
> > > > > > I think the conflict history table should not be transferred to the
> > > > > > new cluster when pg_upgrade since the table definition could be
> > > > > > different across major versions.
> > > > >
> > > > > Let me think more on this with respect to behaviour of other factors
> > > > > like subscriptions etc.
> > > > >
> > > >
> > > > Can we deal with different schema of tables across versions via
> > > > pg_dump/restore during upgrade?
> > > >
> > >
> > > While handling the case of conflict_log_table option during pg_dump, I
> > > realized that the restore is trying to create conflict log table 2
> > > different places 1) As part of the regular table dump 2) As part of
> > > the CREATE SUBSCRIPTION when conflict_log_table option is set.
> > >
> > > So one option is we can avoid dumping the conflict log tables as part
> > > of the regular table dump if we think that we do not need to conflict
> > > log table data and let it get created as part of the create
> > > subscription command, OTOH if we think we want to keep the conflict
> > > log table data,
> > >
> >
> > We want to retain conflict_history after upgrade. This is required for
> > various reasons (a) after upgrade DBA user will still require to
> > resolved the pending unresolved conflicts, (b) Regulations often
> > require keeping audit trails for a longer period of time. If a
> > conflict occurred at time X (which is less than the regulations
> > requirement) regarding a financial transaction, that record must
> > survive the upgrade, (c)
> > If something breaks after the upgrade (e.g., missing rows, constraint
> > violations), conflict history helps trace root causes. It shows
> > whether issues existed before the upgrade or were introduced during
> > migration, (d) as users can query the conflict_history tables, it
> > should be treated similar to user tables.
> >
> > BTW, we are also planning to migrate commit_ts data in thread [1]
> > which would be helpful for conflict_resolutions after upgrade.
> >
> >  let it get dumped as part of the regular tables and in
> > > CREATE SUBSCRIPTION we will just set the option but do not create the
> > > table,
> > >
> >
> > Yeah, we can turn this option during CREATE SUBSCRIPTION so that it
> > doesn't try to create the table again.
> >
> > > although we might need to do special handling of this case
> > > because if we allow the existing tables to be set as conflict log
> > > tables then it may allow other user tables to be set, so need to think
> > > how to handle this if we need to go with this option.
> > >
> >
> > Yeah, probably but it should be allowed internally only not to users.
>
> Yeah I wanted to do that, but problem is with dump and restore, I mean
> if you just dump into a sql file and execute the sql file at that time
> the CREATE SUBSCRIPTION with conflict_log_table option will fail as
> the table already exists because it was restored as part of the dump.
> I know under binary upgrade we have binary_upgrade flag so can do
> special handling not sure how to distinguish the sql executing as part
> of the restore or normal sql execution by user?
>

See dumpSubscription(). We always use (connect = false) while dumping
subscription, so, similarly, we should always dump the new option with
default value which not to create the history table. Won't that be
sufficient?

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Dec 8, 2025 at 3:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 8, 2025 at 3:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Dec 8, 2025 at 2:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Dec 8, 2025 at 10:25 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Thu, Dec 4, 2025 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > >
> > > > > > > ---
> > > > > > > I think the conflict history table should not be transferred to the
> > > > > > > new cluster when pg_upgrade since the table definition could be
> > > > > > > different across major versions.
> > > > > >
> > > > > > Let me think more on this with respect to behaviour of other factors
> > > > > > like subscriptions etc.
> > > > > >
> > > > >
> > > > > Can we deal with different schema of tables across versions via
> > > > > pg_dump/restore during upgrade?
> > > > >
> > > >
> > > > While handling the case of conflict_log_table option during pg_dump, I
> > > > realized that the restore is trying to create conflict log table 2
> > > > different places 1) As part of the regular table dump 2) As part of
> > > > the CREATE SUBSCRIPTION when conflict_log_table option is set.
> > > >
> > > > So one option is we can avoid dumping the conflict log tables as part
> > > > of the regular table dump if we think that we do not need to conflict
> > > > log table data and let it get created as part of the create
> > > > subscription command, OTOH if we think we want to keep the conflict
> > > > log table data,
> > > >
> > >
> > > We want to retain conflict_history after upgrade. This is required for
> > > various reasons (a) after upgrade DBA user will still require to
> > > resolved the pending unresolved conflicts, (b) Regulations often
> > > require keeping audit trails for a longer period of time. If a
> > > conflict occurred at time X (which is less than the regulations
> > > requirement) regarding a financial transaction, that record must
> > > survive the upgrade, (c)
> > > If something breaks after the upgrade (e.g., missing rows, constraint
> > > violations), conflict history helps trace root causes. It shows
> > > whether issues existed before the upgrade or were introduced during
> > > migration, (d) as users can query the conflict_history tables, it
> > > should be treated similar to user tables.
> > >
> > > BTW, we are also planning to migrate commit_ts data in thread [1]
> > > which would be helpful for conflict_resolutions after upgrade.
> > >
> > >  let it get dumped as part of the regular tables and in
> > > > CREATE SUBSCRIPTION we will just set the option but do not create the
> > > > table,
> > > >
> > >
> > > Yeah, we can turn this option during CREATE SUBSCRIPTION so that it
> > > doesn't try to create the table again.
> > >
> > > > although we might need to do special handling of this case
> > > > because if we allow the existing tables to be set as conflict log
> > > > tables then it may allow other user tables to be set, so need to think
> > > > how to handle this if we need to go with this option.
> > > >
> > >
> > > Yeah, probably but it should be allowed internally only not to users.
> >
> > Yeah I wanted to do that, but problem is with dump and restore, I mean
> > if you just dump into a sql file and execute the sql file at that time
> > the CREATE SUBSCRIPTION with conflict_log_table option will fail as
> > the table already exists because it was restored as part of the dump.
> > I know under binary upgrade we have binary_upgrade flag so can do
> > special handling not sure how to distinguish the sql executing as part
> > of the restore or normal sql execution by user?
> >
>
> See dumpSubscription(). We always use (connect = false) while dumping
> subscription, so, similarly, we should always dump the new option with
> default value which not to create the history table. Won't that be
> sufficient?

Thinking out loud, so basically what we need is we need to create
subscription and set the conflict log table in catalog entry of the
subscription in pg_subscription but do not want to create the conflict
log table, so seems like we need to invent something new which set the
conflict log table in catalog but do not create the table.  Currently
we have a single option that if conflict_log_table='table_name' is set
then we will create the table as well as set the table name in the
catalog, so need to think of something on the line of separating this,
or something more innovative.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Dec 8, 2025 at 5:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Dec 8, 2025 at 3:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 8, 2025 at 3:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Dec 8, 2025 at 2:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Dec 8, 2025 at 10:25 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Dec 4, 2025 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > > ---
> > > > > > > > I think the conflict history table should not be transferred to the
> > > > > > > > new cluster when pg_upgrade since the table definition could be
> > > > > > > > different across major versions.
> > > > > > >
> > > > > > > Let me think more on this with respect to behaviour of other factors
> > > > > > > like subscriptions etc.
> > > > > > >
> > > > > >
> > > > > > Can we deal with different schema of tables across versions via
> > > > > > pg_dump/restore during upgrade?
> > > > > >
> > > > >
> > > > > While handling the case of conflict_log_table option during pg_dump, I
> > > > > realized that the restore is trying to create conflict log table 2
> > > > > different places 1) As part of the regular table dump 2) As part of
> > > > > the CREATE SUBSCRIPTION when conflict_log_table option is set.
> > > > >
> > > > > So one option is we can avoid dumping the conflict log tables as part
> > > > > of the regular table dump if we think that we do not need to conflict
> > > > > log table data and let it get created as part of the create
> > > > > subscription command, OTOH if we think we want to keep the conflict
> > > > > log table data,
> > > > >
> > > >
> > > > We want to retain conflict_history after upgrade. This is required for
> > > > various reasons (a) after upgrade DBA user will still require to
> > > > resolved the pending unresolved conflicts, (b) Regulations often
> > > > require keeping audit trails for a longer period of time. If a
> > > > conflict occurred at time X (which is less than the regulations
> > > > requirement) regarding a financial transaction, that record must
> > > > survive the upgrade, (c)
> > > > If something breaks after the upgrade (e.g., missing rows, constraint
> > > > violations), conflict history helps trace root causes. It shows
> > > > whether issues existed before the upgrade or were introduced during
> > > > migration, (d) as users can query the conflict_history tables, it
> > > > should be treated similar to user tables.
> > > >
> > > > BTW, we are also planning to migrate commit_ts data in thread [1]
> > > > which would be helpful for conflict_resolutions after upgrade.
> > > >
> > > >  let it get dumped as part of the regular tables and in
> > > > > CREATE SUBSCRIPTION we will just set the option but do not create the
> > > > > table,
> > > > >
> > > >
> > > > Yeah, we can turn this option during CREATE SUBSCRIPTION so that it
> > > > doesn't try to create the table again.
> > > >
> > > > > although we might need to do special handling of this case
> > > > > because if we allow the existing tables to be set as conflict log
> > > > > tables then it may allow other user tables to be set, so need to think
> > > > > how to handle this if we need to go with this option.
> > > > >
> > > >
> > > > Yeah, probably but it should be allowed internally only not to users.
> > >
> > > Yeah I wanted to do that, but problem is with dump and restore, I mean
> > > if you just dump into a sql file and execute the sql file at that time
> > > the CREATE SUBSCRIPTION with conflict_log_table option will fail as
> > > the table already exists because it was restored as part of the dump.
> > > I know under binary upgrade we have binary_upgrade flag so can do
> > > special handling not sure how to distinguish the sql executing as part
> > > of the restore or normal sql execution by user?
> > >
> >
> > See dumpSubscription(). We always use (connect = false) while dumping
> > subscription, so, similarly, we should always dump the new option with
> > default value which not to create the history table. Won't that be
> > sufficient?
>
> Thinking out loud, so basically what we need is we need to create
> subscription and set the conflict log table in catalog entry of the
> subscription in pg_subscription but do not want to create the conflict
> log table, so seems like we need to invent something new which set the
> conflict log table in catalog but do not create the table.  Currently
> we have a single option that if conflict_log_table='table_name' is set
> then we will create the table as well as set the table name in the
> catalog, so need to think of something on the line of separating this,
> or something more innovative.
>

This needs more thought and discussion, so it is better to separate
out this part at this stage and let's try to review the core patch
first. BTW, I told a few days back to have two options (instead of a
single option conflict_log_table) to allow extension of more ways to
LOG the conflict data.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 9, 2025 at 10:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 8, 2025 at 5:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Dec 8, 2025 at 3:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Dec 8, 2025 at 3:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Mon, Dec 8, 2025 at 2:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Mon, Dec 8, 2025 at 10:25 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Dec 4, 2025 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > > I think the conflict history table should not be transferred to the
> > > > > > > > > new cluster when pg_upgrade since the table definition could be
> > > > > > > > > different across major versions.
> > > > > > > >
> > > > > > > > Let me think more on this with respect to behaviour of other factors
> > > > > > > > like subscriptions etc.
> > > > > > > >
> > > > > > >
> > > > > > > Can we deal with different schema of tables across versions via
> > > > > > > pg_dump/restore during upgrade?
> > > > > > >
> > > > > >
> > > > > > While handling the case of conflict_log_table option during pg_dump, I
> > > > > > realized that the restore is trying to create conflict log table 2
> > > > > > different places 1) As part of the regular table dump 2) As part of
> > > > > > the CREATE SUBSCRIPTION when conflict_log_table option is set.
> > > > > >
> > > > > > So one option is we can avoid dumping the conflict log tables as part
> > > > > > of the regular table dump if we think that we do not need to conflict
> > > > > > log table data and let it get created as part of the create
> > > > > > subscription command, OTOH if we think we want to keep the conflict
> > > > > > log table data,
> > > > > >
> > > > >
> > > > > We want to retain conflict_history after upgrade. This is required for
> > > > > various reasons (a) after upgrade DBA user will still require to
> > > > > resolved the pending unresolved conflicts, (b) Regulations often
> > > > > require keeping audit trails for a longer period of time. If a
> > > > > conflict occurred at time X (which is less than the regulations
> > > > > requirement) regarding a financial transaction, that record must
> > > > > survive the upgrade, (c)
> > > > > If something breaks after the upgrade (e.g., missing rows, constraint
> > > > > violations), conflict history helps trace root causes. It shows
> > > > > whether issues existed before the upgrade or were introduced during
> > > > > migration, (d) as users can query the conflict_history tables, it
> > > > > should be treated similar to user tables.
> > > > >
> > > > > BTW, we are also planning to migrate commit_ts data in thread [1]
> > > > > which would be helpful for conflict_resolutions after upgrade.
> > > > >
> > > > >  let it get dumped as part of the regular tables and in
> > > > > > CREATE SUBSCRIPTION we will just set the option but do not create the
> > > > > > table,
> > > > > >
> > > > >
> > > > > Yeah, we can turn this option during CREATE SUBSCRIPTION so that it
> > > > > doesn't try to create the table again.
> > > > >
> > > > > > although we might need to do special handling of this case
> > > > > > because if we allow the existing tables to be set as conflict log
> > > > > > tables then it may allow other user tables to be set, so need to think
> > > > > > how to handle this if we need to go with this option.
> > > > > >
> > > > >
> > > > > Yeah, probably but it should be allowed internally only not to users.
> > > >
> > > > Yeah I wanted to do that, but problem is with dump and restore, I mean
> > > > if you just dump into a sql file and execute the sql file at that time
> > > > the CREATE SUBSCRIPTION with conflict_log_table option will fail as
> > > > the table already exists because it was restored as part of the dump.
> > > > I know under binary upgrade we have binary_upgrade flag so can do
> > > > special handling not sure how to distinguish the sql executing as part
> > > > of the restore or normal sql execution by user?
> > > >
> > >
> > > See dumpSubscription(). We always use (connect = false) while dumping
> > > subscription, so, similarly, we should always dump the new option with
> > > default value which not to create the history table. Won't that be
> > > sufficient?
> >
> > Thinking out loud, so basically what we need is we need to create
> > subscription and set the conflict log table in catalog entry of the
> > subscription in pg_subscription but do not want to create the conflict
> > log table, so seems like we need to invent something new which set the
> > conflict log table in catalog but do not create the table.  Currently
> > we have a single option that if conflict_log_table='table_name' is set
> > then we will create the table as well as set the table name in the
> > catalog, so need to think of something on the line of separating this,
> > or something more innovative.
> >
>
> This needs more thought and discussion, so it is better to separate
> out this part at this stage and let's try to review the core patch
> first.

+1

BTW, I told a few days back to have two options (instead of a
> single option conflict_log_table) to allow extension of more ways to
> LOG the conflict data.

Yeah, I will put that as well in an add on patch, once I fix all the
option issues of the core patch.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 9, 2025 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 9, 2025 at 10:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 8, 2025 at 5:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Dec 8, 2025 at 3:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Dec 8, 2025 at 3:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Mon, Dec 8, 2025 at 2:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Dec 8, 2025 at 10:25 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Dec 4, 2025 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > > I think the conflict history table should not be transferred to the
> > > > > > > > > > new cluster when pg_upgrade since the table definition could be
> > > > > > > > > > different across major versions.
> > > > > > > > >
> > > > > > > > > Let me think more on this with respect to behaviour of other factors
> > > > > > > > > like subscriptions etc.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Can we deal with different schema of tables across versions via
> > > > > > > > pg_dump/restore during upgrade?
> > > > > > > >
> > > > > > >
> > > > > > > While handling the case of conflict_log_table option during pg_dump, I
> > > > > > > realized that the restore is trying to create conflict log table 2
> > > > > > > different places 1) As part of the regular table dump 2) As part of
> > > > > > > the CREATE SUBSCRIPTION when conflict_log_table option is set.
> > > > > > >
> > > > > > > So one option is we can avoid dumping the conflict log tables as part
> > > > > > > of the regular table dump if we think that we do not need to conflict
> > > > > > > log table data and let it get created as part of the create
> > > > > > > subscription command, OTOH if we think we want to keep the conflict
> > > > > > > log table data,
> > > > > > >
> > > > > >
> > > > > > We want to retain conflict_history after upgrade. This is required for
> > > > > > various reasons (a) after upgrade DBA user will still require to
> > > > > > resolved the pending unresolved conflicts, (b) Regulations often
> > > > > > require keeping audit trails for a longer period of time. If a
> > > > > > conflict occurred at time X (which is less than the regulations
> > > > > > requirement) regarding a financial transaction, that record must
> > > > > > survive the upgrade, (c)
> > > > > > If something breaks after the upgrade (e.g., missing rows, constraint
> > > > > > violations), conflict history helps trace root causes. It shows
> > > > > > whether issues existed before the upgrade or were introduced during
> > > > > > migration, (d) as users can query the conflict_history tables, it
> > > > > > should be treated similar to user tables.
> > > > > >
> > > > > > BTW, we are also planning to migrate commit_ts data in thread [1]
> > > > > > which would be helpful for conflict_resolutions after upgrade.
> > > > > >
> > > > > >  let it get dumped as part of the regular tables and in
> > > > > > > CREATE SUBSCRIPTION we will just set the option but do not create the
> > > > > > > table,
> > > > > > >
> > > > > >
> > > > > > Yeah, we can turn this option during CREATE SUBSCRIPTION so that it
> > > > > > doesn't try to create the table again.
> > > > > >
> > > > > > > although we might need to do special handling of this case
> > > > > > > because if we allow the existing tables to be set as conflict log
> > > > > > > tables then it may allow other user tables to be set, so need to think
> > > > > > > how to handle this if we need to go with this option.
> > > > > > >
> > > > > >
> > > > > > Yeah, probably but it should be allowed internally only not to users.
> > > > >
> > > > > Yeah I wanted to do that, but problem is with dump and restore, I mean
> > > > > if you just dump into a sql file and execute the sql file at that time
> > > > > the CREATE SUBSCRIPTION with conflict_log_table option will fail as
> > > > > the table already exists because it was restored as part of the dump.
> > > > > I know under binary upgrade we have binary_upgrade flag so can do
> > > > > special handling not sure how to distinguish the sql executing as part
> > > > > of the restore or normal sql execution by user?
> > > > >
> > > >
> > > > See dumpSubscription(). We always use (connect = false) while dumping
> > > > subscription, so, similarly, we should always dump the new option with
> > > > default value which not to create the history table. Won't that be
> > > > sufficient?
> > >
> > > Thinking out loud, so basically what we need is we need to create
> > > subscription and set the conflict log table in catalog entry of the
> > > subscription in pg_subscription but do not want to create the conflict
> > > log table, so seems like we need to invent something new which set the
> > > conflict log table in catalog but do not create the table.  Currently
> > > we have a single option that if conflict_log_table='table_name' is set
> > > then we will create the table as well as set the table name in the
> > > catalog, so need to think of something on the line of separating this,
> > > or something more innovative.
> > >
> >
> > This needs more thought and discussion, so it is better to separate
> > out this part at this stage and let's try to review the core patch
> > first.
>
> +1
>
> BTW, I told a few days back to have two options (instead of a
> > single option conflict_log_table) to allow extension of more ways to
> > LOG the conflict data.
>
> Yeah, I will put that as well in an add on patch, once I fix all the
> option issues of the core patch.
>
Here is the updated version of patch
What has changed
1. Table is created using create_heap_with_catalog() instead of SPI as
suggested by Sawada-San and Amit Kapila.
2. Validated the table schema after acquiring the lock before
preparing/inserting conflict tuples for defects raised by Vignesh.
3. Bug fixes raised by Shweta (segfault)
3. Comments from Peter (except exposing namespace in \dRs+, it's still pending.

What's not done/pending
1. Adding for key_tuple/RI as pointed by Shveta - will do in next version
2. Adding dependency of subscription on table so that we are not
allowed to drop the table - I think when we put the dependency on
shared objects those can not be dropped even with cascade option, but
I am still exploring more on this.
3. dump/restore and upgrade, I have partially working patch but then I
need to figure out how to skip table creation while creating
subscription, while discussing offlist with Hannu, he suggested we can
do something with dump dependency ordering, e.g. we can dump create
subscription first and then dump the clt data without actually dumping
the clt definition, with that table will be created while creating the
subscription and then data will be restored with COPY command, I will
explore more on this.
4. Test case for conflit insertion
5. Documentation patch


--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Tue, Dec 9, 2025 at 8:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> >
> Here is the updated version of patch
> What has changed
> 1. Table is created using create_heap_with_catalog() instead of SPI as
> suggested by Sawada-San and Amit Kapila.
> 2. Validated the table schema after acquiring the lock before
> preparing/inserting conflict tuples for defects raised by Vignesh.
> 3. Bug fixes raised by Shweta (segfault)
> 3. Comments from Peter (except exposing namespace in \dRs+, it's still pending.
>

Thanks for the patch.
I tested all conflict-types on this version, they (basic scenarios)
seem to work well. Except only that key-RI pending issue, other issues
seem to be addressed. I will start with code-review now.

Few observations:

1)
\dRs+  shows 'Conflict log table' without namespace, this could be
confusing if the same table exists in multiple schemas.

2)
When we do below:
alter subscription sub1 SET (conflict_log_table=clt2);

the previous conflict log table is dropped. Is this behavior
intentional and discussed/concluded earlier? It’s possible that a user
may want to create a new conflict log table for future events while
still retaining the old one for analysis. If the subscription itself
is dropped, then dropping the CLT makes sense, but I’m not sure this
behavior is intended for ALTER SUBSCRIPTION.  I do understand that
once we unlink CLT from subscription, later even DROP subscription
cannot drop it, but user can always drop it when not needed.

If we plan to keep existing behavior, it should be clearly documented
in a CAUTION section, and the command should explicitly log the table
drop.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Dec 11, 2025 at 5:04 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Dec 9, 2025 at 8:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > >
> > Here is the updated version of patch
> > What has changed
> > 1. Table is created using create_heap_with_catalog() instead of SPI as
> > suggested by Sawada-San and Amit Kapila.
> > 2. Validated the table schema after acquiring the lock before
> > preparing/inserting conflict tuples for defects raised by Vignesh.
> > 3. Bug fixes raised by Shweta (segfault)
> > 3. Comments from Peter (except exposing namespace in \dRs+, it's still pending.
> >
>
> Thanks for the patch.
> I tested all conflict-types on this version, they (basic scenarios)
> seem to work well. Except only that key-RI pending issue, other issues
> seem to be addressed. I will start with code-review now.
>
> Few observations:
>
> 1)
> \dRs+  shows 'Conflict log table' without namespace, this could be
> confusing if the same table exists in multiple schemas.

Yeah this is not yet fixed comments, will fix in next version.

> 2)
> When we do below:
> alter subscription sub1 SET (conflict_log_table=clt2);
>
> the previous conflict log table is dropped. Is this behavior
> intentional and discussed/concluded earlier? It’s possible that a user
> may want to create a new conflict log table for future events while
> still retaining the old one for analysis. If the subscription itself
> is dropped, then dropping the CLT makes sense, but I’m not sure this
> behavior is intended for ALTER SUBSCRIPTION.  I do understand that
> once we unlink CLT from subscription, later even DROP subscription
> cannot drop it, but user can always drop it when not needed.
>
> If we plan to keep existing behavior, it should be clearly documented
> in a CAUTION section, and the command should explicitly log the table
> drop.

Yeah we discussed this behavior and the conclusion was we would
document this behavior and its user's responsibility to take necessary
backup of the conflict log table data if they are setting a new log
table or NONE for the subscription.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Dec 11, 2025 at 5:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Dec 11, 2025 at 5:04 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> > 2)
> > When we do below:
> > alter subscription sub1 SET (conflict_log_table=clt2);
> >
> > the previous conflict log table is dropped. Is this behavior
> > intentional and discussed/concluded earlier? It’s possible that a user
> > may want to create a new conflict log table for future events while
> > still retaining the old one for analysis. If the subscription itself
> > is dropped, then dropping the CLT makes sense, but I’m not sure this
> > behavior is intended for ALTER SUBSCRIPTION.  I do understand that
> > once we unlink CLT from subscription, later even DROP subscription
> > cannot drop it, but user can always drop it when not needed.
> >
> > If we plan to keep existing behavior, it should be clearly documented
> > in a CAUTION section, and the command should explicitly log the table
> > drop.
>
> Yeah we discussed this behavior and the conclusion was we would
> document this behavior and its user's responsibility to take necessary
> backup of the conflict log table data if they are setting a new log
> table or NONE for the subscription.
>

+1. If we don't do this then it will be difficult to track for
postgres or users the previous conflict history tables.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Dec 11, 2025 at 5:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 11, 2025 at 5:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Dec 11, 2025 at 5:04 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > 2)
> > > When we do below:
> > > alter subscription sub1 SET (conflict_log_table=clt2);
> > >
> > > the previous conflict log table is dropped. Is this behavior
> > > intentional and discussed/concluded earlier? It’s possible that a user
> > > may want to create a new conflict log table for future events while
> > > still retaining the old one for analysis. If the subscription itself
> > > is dropped, then dropping the CLT makes sense, but I’m not sure this
> > > behavior is intended for ALTER SUBSCRIPTION.  I do understand that
> > > once we unlink CLT from subscription, later even DROP subscription
> > > cannot drop it, but user can always drop it when not needed.
> > >
> > > If we plan to keep existing behavior, it should be clearly documented
> > > in a CAUTION section, and the command should explicitly log the table
> > > drop.
> >
> > Yeah we discussed this behavior and the conclusion was we would
> > document this behavior and its user's responsibility to take necessary
> > backup of the conflict log table data if they are setting a new log
> > table or NONE for the subscription.
> >
>
> +1. If we don't do this then it will be difficult to track for
> postgres or users the previous conflict history tables.

Right, it makes sense.

Attached patch fixed most of the open comments
1) \dRs+ now show the schema qualified name
2) Now key_tuple and replica_identify tuple both are add in conflict
log tuple wherever applicable
3) Refactored the code so that we can define the conflict log table
schema only once in the header file and both create_conflict_log_table
and ValidateConflictLogTable use it.

I was considering the interdependence between the subscription and the
conflict log table (CLT). IMHO, it would be logical to establish the
subscription as dependent on the CLT. This way, if someone attempts to
drop the CLT, the system would recognize the dependency of the
subscription and prevent the drop unless the subscription is removed
first or the CASCADE option is used.

However, while investigating this, I encountered an error [1] stating
that global objects are not supported in this context. This indicates
that global objects cannot be made dependent on local objects.
Although making an object dependent on global/shared objects is
possible for certain types of shared objects [2], this is not our main
objective.

We do not need to make the CLT dependent on the subscription because
the table can be dropped when the subscription is dropped anyway and
we are already doing it as part of drop subscription as well as alter
subscription when CLT is set to NONE or a different table. Therefore,
extending the functionality of shared dependency is unnecessary for
this purpose.

Thoughts?

[1]
doDeletion()
{
....
/*
* These global object types are not supported here.
*/
case AuthIdRelationId:
case DatabaseRelationId:
case TableSpaceRelationId:
case SubscriptionRelationId:
case ParameterAclRelationId:
elog(ERROR, "global objects cannot be deleted by doDeletion");
break;
}

[2]
typedef enum SharedDependencyType
{
SHARED_DEPENDENCY_OWNER = 'o',
SHARED_DEPENDENCY_ACL = 'a',
SHARED_DEPENDENCY_INITACL = 'i',
SHARED_DEPENDENCY_POLICY = 'r',
SHARED_DEPENDENCY_TABLESPACE = 't',
SHARED_DEPENDENCY_INVALID = 0,
} SharedDependencyType;

Pending Items are:
1. Handling dump/upgrade
2. Test case for conflit insertion
3. Documentation patch

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Dec 11, 2025 at 5:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 11, 2025 at 5:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Dec 11, 2025 at 5:04 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > 2)
> > > > When we do below:
> > > > alter subscription sub1 SET (conflict_log_table=clt2);
> > > >
> > > > the previous conflict log table is dropped. Is this behavior
> > > > intentional and discussed/concluded earlier? It’s possible that a user
> > > > may want to create a new conflict log table for future events while
> > > > still retaining the old one for analysis. If the subscription itself
> > > > is dropped, then dropping the CLT makes sense, but I’m not sure this
> > > > behavior is intended for ALTER SUBSCRIPTION.  I do understand that
> > > > once we unlink CLT from subscription, later even DROP subscription
> > > > cannot drop it, but user can always drop it when not needed.
> > > >
> > > > If we plan to keep existing behavior, it should be clearly documented
> > > > in a CAUTION section, and the command should explicitly log the table
> > > > drop.
> > >
> > > Yeah we discussed this behavior and the conclusion was we would
> > > document this behavior and its user's responsibility to take necessary
> > > backup of the conflict log table data if they are setting a new log
> > > table or NONE for the subscription.
> > >
> >
> > +1. If we don't do this then it will be difficult to track for
> > postgres or users the previous conflict history tables.
>
> Right, it makes sense.

Okay, right.

>
> Attached patch fixed most of the open comments
> 1) \dRs+ now show the schema qualified name
> 2) Now key_tuple and replica_identify tuple both are add in conflict
> log tuple wherever applicable
> 3) Refactored the code so that we can define the conflict log table
> schema only once in the header file and both create_conflict_log_table
> and ValidateConflictLogTable use it.
>
> I was considering the interdependence between the subscription and the
> conflict log table (CLT). IMHO, it would be logical to establish the
> subscription as dependent on the CLT. This way, if someone attempts to
> drop the CLT, the system would recognize the dependency of the
> subscription and prevent the drop unless the subscription is removed
> first or the CASCADE option is used.
>
> However, while investigating this, I encountered an error [1] stating
> that global objects are not supported in this context. This indicates
> that global objects cannot be made dependent on local objects.
> Although making an object dependent on global/shared objects is
> possible for certain types of shared objects [2], this is not our main
> objective.
>
> We do not need to make the CLT dependent on the subscription because
> the table can be dropped when the subscription is dropped anyway and
> we are already doing it as part of drop subscription as well as alter
> subscription when CLT is set to NONE or a different table. Therefore,
> extending the functionality of shared dependency is unnecessary for
> this purpose.
>
> Thoughts?

I believe the recommendation to create a dependency was meant to
prevent the table from being accidentally dropped during a DROP SCHEMA
or DROP TABLE operation. That risk still remains, regardless of the
fact that dropping or altering a subscription will result in the table
removal. I will give this more thought and let you know if anything
comes to mind.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 12, 2025 at 9:19 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > We do not need to make the CLT dependent on the subscription because
> > the table can be dropped when the subscription is dropped anyway and
> > we are already doing it as part of drop subscription as well as alter
> > subscription when CLT is set to NONE or a different table. Therefore,
> > extending the functionality of shared dependency is unnecessary for
> > this purpose.
> >
> > Thoughts?
>
> I believe the recommendation to create a dependency was meant to
> prevent the table from being accidentally dropped during a DROP SCHEMA
> or DROP TABLE operation. That risk still remains, regardless of the
> fact that dropping or altering a subscription will result in the table
> removal. I will give this more thought and let you know if anything
> comes to mind.

I mean we can register the dependency of subscriber on table and that
will prevent dropping the tables via DROP TABLE/DROP SCHEMA, but what
I do not like is the internal error[1] in doDeletion() when someone
will try to DROP TABLE CLT CASCADE;

I suggest an alternative approach for handling this: implement a check
within the ALTER/DROP table commands. If the table is a CLT (using
IsConflictLogTable() to verify), these operations should be
disallowed. This would enhance the robustness of CLT handling by
entirely preventing external drop/alter actions. What are your
thoughts on this solution? And let's also see what Amit and Sawada-san
think about this solution.

[1]
doDeletion()
{
....
/*
* These global object types are not supported here.
*/
case AuthIdRelationId:
case DatabaseRelationId:
case TableSpaceRelationId:
case SubscriptionRelationId:
case ParameterAclRelationId:
elog(ERROR, "global objects cannot be deleted by doDeletion");
break;
}

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Fri, Dec 12, 2025 at 9:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 12, 2025 at 9:19 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > We do not need to make the CLT dependent on the subscription because
> > > the table can be dropped when the subscription is dropped anyway and
> > > we are already doing it as part of drop subscription as well as alter
> > > subscription when CLT is set to NONE or a different table. Therefore,
> > > extending the functionality of shared dependency is unnecessary for
> > > this purpose.
> > >
> > > Thoughts?
> >
> > I believe the recommendation to create a dependency was meant to
> > prevent the table from being accidentally dropped during a DROP SCHEMA
> > or DROP TABLE operation. That risk still remains, regardless of the
> > fact that dropping or altering a subscription will result in the table
> > removal. I will give this more thought and let you know if anything
> > comes to mind.
>
> I mean we can register the dependency of subscriber on table and that
> will prevent dropping the tables via DROP TABLE/DROP SCHEMA, but what
> I do not like is the internal error[1] in doDeletion() when someone
> will try to DROP TABLE CLT CASCADE;
>

Yes, I understand that part.

> I suggest an alternative approach for handling this: implement a check
> within the ALTER/DROP table commands. If the table is a CLT (using
> IsConflictLogTable() to verify), these operations should be
> disallowed. This would enhance the robustness of CLT handling by
> entirely preventing external drop/alter actions. What are your
> thoughts on this solution? And let's also see what Amit and Sawada-san
> think about this solution.

I had similar thoughts, but was unsure how this should behave when a
user runs DROP SCHEMA … CASCADE. We can’t simply block the entire
operation with an error just because the schema contains a CLT, but we
also shouldn’t allow it to proceed without notifying the user that the
schema includes a CLT.

>
> [1]
> doDeletion()
> {
> ....
> /*
> * These global object types are not supported here.
> */
> case AuthIdRelationId:
> case DatabaseRelationId:
> case TableSpaceRelationId:
> case SubscriptionRelationId:
> case ParameterAclRelationId:
> elog(ERROR, "global objects cannot be deleted by doDeletion");
> break;
> }
>
> --
> Regards,
> Dilip Kumar
> Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 12, 2025 at 10:02 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Dec 12, 2025 at 9:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Dec 12, 2025 at 9:19 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > We do not need to make the CLT dependent on the subscription because
> > > > the table can be dropped when the subscription is dropped anyway and
> > > > we are already doing it as part of drop subscription as well as alter
> > > > subscription when CLT is set to NONE or a different table. Therefore,
> > > > extending the functionality of shared dependency is unnecessary for
> > > > this purpose.
> > > >
> > > > Thoughts?
> > >
> > > I believe the recommendation to create a dependency was meant to
> > > prevent the table from being accidentally dropped during a DROP SCHEMA
> > > or DROP TABLE operation. That risk still remains, regardless of the
> > > fact that dropping or altering a subscription will result in the table
> > > removal. I will give this more thought and let you know if anything
> > > comes to mind.
> >
> > I mean we can register the dependency of subscriber on table and that
> > will prevent dropping the tables via DROP TABLE/DROP SCHEMA, but what
> > I do not like is the internal error[1] in doDeletion() when someone
> > will try to DROP TABLE CLT CASCADE;
> >
>
> Yes, I understand that part.
>
> > I suggest an alternative approach for handling this: implement a check
> > within the ALTER/DROP table commands. If the table is a CLT (using
> > IsConflictLogTable() to verify), these operations should be
> > disallowed. This would enhance the robustness of CLT handling by
> > entirely preventing external drop/alter actions. What are your
> > thoughts on this solution? And let's also see what Amit and Sawada-san
> > think about this solution.
>
> I had similar thoughts, but was unsure how this should behave when a
> user runs DROP SCHEMA … CASCADE. We can’t simply block the entire
> operation with an error just because the schema contains a CLT, but we
> also shouldn’t allow it to proceed without notifying the user that the
> schema includes a CLT.

I understand your concern about whether this restriction is
appropriate, particularly when using DROP SCHEMA … CASCADE is.
However, considering the logical dependency where the subscription
relies on the table (CLT), expecting DROP SCHEMA … CASCADE to drop the
CLT implies it should also drop the dependent subscription, which is
not permitted. Therefore, a more appropriate behavior would be to
issue an error message stating that the table is a conflict log table
and that subscriber "<subname>" depends on it. This message should
instruct the user to either drop the subscription or reset the
conflict log table before proceeding with the drop operation.

OTOH, we can simply let the CLT get dropped and altered and document
this behavior so that it is the user's responsibility to not to
drop/alter the CLT otherwise conflict logging will be skipped as we
have now.  While thinking more I feel it might be better to keep it
simple as we have now instead of overcomplicating it?

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Thu, 11 Dec 2025 at 19:50, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Dec 11, 2025 at 5:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 11, 2025 at 5:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Dec 11, 2025 at 5:04 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > 2)
> > > > When we do below:
> > > > alter subscription sub1 SET (conflict_log_table=clt2);
> > > >
> > > > the previous conflict log table is dropped. Is this behavior
> > > > intentional and discussed/concluded earlier? It’s possible that a user
> > > > may want to create a new conflict log table for future events while
> > > > still retaining the old one for analysis. If the subscription itself
> > > > is dropped, then dropping the CLT makes sense, but I’m not sure this
> > > > behavior is intended for ALTER SUBSCRIPTION.  I do understand that
> > > > once we unlink CLT from subscription, later even DROP subscription
> > > > cannot drop it, but user can always drop it when not needed.
> > > >
> > > > If we plan to keep existing behavior, it should be clearly documented
> > > > in a CAUTION section, and the command should explicitly log the table
> > > > drop.
> > >
> > > Yeah we discussed this behavior and the conclusion was we would
> > > document this behavior and its user's responsibility to take necessary
> > > backup of the conflict log table data if they are setting a new log
> > > table or NONE for the subscription.
> > >
> >
> > +1. If we don't do this then it will be difficult to track for
> > postgres or users the previous conflict history tables.
>
> Right, it makes sense.
>
> Attached patch fixed most of the open comments
> 1) \dRs+ now show the schema qualified name
> 2) Now key_tuple and replica_identify tuple both are add in conflict
> log tuple wherever applicable
> 3) Refactored the code so that we can define the conflict log table
> schema only once in the header file and both create_conflict_log_table
> and ValidateConflictLogTable use it.
>
> I was considering the interdependence between the subscription and the
> conflict log table (CLT). IMHO, it would be logical to establish the
> subscription as dependent on the CLT. This way, if someone attempts to
> drop the CLT, the system would recognize the dependency of the
> subscription and prevent the drop unless the subscription is removed
> first or the CASCADE option is used.
>
> However, while investigating this, I encountered an error [1] stating
> that global objects are not supported in this context. This indicates
> that global objects cannot be made dependent on local objects.
> Although making an object dependent on global/shared objects is
> possible for certain types of shared objects [2], this is not our main
> objective.
>
> We do not need to make the CLT dependent on the subscription because
> the table can be dropped when the subscription is dropped anyway and
> we are already doing it as part of drop subscription as well as alter
> subscription when CLT is set to NONE or a different table. Therefore,
> extending the functionality of shared dependency is unnecessary for
> this purpose.

I noticed an inconsistency in the checks that prevent adding a
conflict log table to a publication.  At creation time, we explicitly
reject attempts to publish a conflict log table:
/* Can't be conflict log table */
if (IsConflictLogTable(RelationGetRelid(targetrel)))
    ereport(ERROR,
            (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
             errmsg("cannot add relation \"%s.%s\" to publication",
                    get_namespace_name(RelationGetNamespace(targetrel)),
                    RelationGetRelationName(targetrel)),
             errdetail("This operation is not supported for conflict
log tables.")));

However, the restriction can be bypassed through a sequence of table
renames like below:
-- Set up logical replication
CREATE PUBLICATION pub_all;
CREATE SUBSCRIPTION sub1 CONNECTION '...' PUBLICATION pub_all  WITH
(conflict_log_table = 'conflict');

-- Rename the conflict log table
ALTER TABLE conflict RENAME TO conflict1;

-- Now this succeeds:
CREATE PUBLICATION pub1 FOR TABLE conflict1;

-- Rename it back
ALTER TABLE conflict1 RENAME TO conflict;

\dRp+ pub1
  Publication pub1
  ...
  Tables:
      public.conflict

Thus, although we prohibit publishing the conflict log table directly,
a publication can still end up referencing it through renaming. This
is inconsistent with the invariant the code attempts to enforce.

Should we extend the checks to handle renames so that a conflict log
table can never end up in a publication?
Alternatively, should the creation-time restriction be relaxed if this
case is acceptable?
If the invariant should be enforced, should we also prevent renaming a
conflict-log table into a published table's name?

Thoughts?

Regards,
Vignesh



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I was considering the interdependence between the subscription and the
> conflict log table (CLT). IMHO, it would be logical to establish the
> subscription as dependent on the CLT. This way, if someone attempts to
> drop the CLT, the system would recognize the dependency of the
> subscription and prevent the drop unless the subscription is removed
> first or the CASCADE option is used.
>
> However, while investigating this, I encountered an error [1] stating
> that global objects are not supported in this context. This indicates
> that global objects cannot be made dependent on local objects.
>

What we need here is an equivalent of DEPENDENCY_INTERNAL for database
objects. For example, consider following case:
postgres=# create table t1(c1 int primary key);
CREATE TABLE
postgres=# \d+ t1
                                           Table "public.t1"
 Column |  Type   | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
 c1     | integer |           | not null |         | plain   |
    |              |
Indexes:
    "t1_pkey" PRIMARY KEY, btree (c1)
Publications:
    "pub1"
Not-null constraints:
    "t1_c1_not_null" NOT NULL "c1"
Access method: heap
postgres=# drop index t1_pkey;
ERROR:  cannot drop index t1_pkey because constraint t1_pkey on table
t1 requires it
HINT:  You can drop constraint t1_pkey on table t1 instead.

Here, the PK index is created as part for CREATE TABLE operation and
pk_index is not allowed to be dropped independently.

> Although making an object dependent on global/shared objects is
> possible for certain types of shared objects [2], this is not our main
> objective.
>

As per my understanding from the above example, we need something like
that only for shared object subscription and (internally created)
table.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Fri, Dec 12, 2025 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I was considering the interdependence between the subscription and the
> > conflict log table (CLT). IMHO, it would be logical to establish the
> > subscription as dependent on the CLT. This way, if someone attempts to
> > drop the CLT, the system would recognize the dependency of the
> > subscription and prevent the drop unless the subscription is removed
> > first or the CASCADE option is used.
> >
> > However, while investigating this, I encountered an error [1] stating
> > that global objects are not supported in this context. This indicates
> > that global objects cannot be made dependent on local objects.
> >
>
> What we need here is an equivalent of DEPENDENCY_INTERNAL for database
> objects. For example, consider following case:
> postgres=# create table t1(c1 int primary key);
> CREATE TABLE
> postgres=# \d+ t1
>                                            Table "public.t1"
>  Column |  Type   | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
>  c1     | integer |           | not null |         | plain   |
>     |              |
> Indexes:
>     "t1_pkey" PRIMARY KEY, btree (c1)
> Publications:
>     "pub1"
> Not-null constraints:
>     "t1_c1_not_null" NOT NULL "c1"
> Access method: heap
> postgres=# drop index t1_pkey;
> ERROR:  cannot drop index t1_pkey because constraint t1_pkey on table
> t1 requires it
> HINT:  You can drop constraint t1_pkey on table t1 instead.
>
> Here, the PK index is created as part for CREATE TABLE operation and
> pk_index is not allowed to be dropped independently.
>
> > Although making an object dependent on global/shared objects is
> > possible for certain types of shared objects [2], this is not our main
> > objective.
> >
>
> As per my understanding from the above example, we need something like
> that only for shared object subscription and (internally created)
> table.
>

+1

~~

Few comments for v11:

1)
+#include "executor/spi.h"
+#include "replication/conflict.h"
+#include "utils/fmgroids.h"
+#include "utils/regproc.h"

subscriptioncmds.c compiles without the above inclusions.

2)
postgres=# create subscription sub3 connection '...' publication pub1
WITH(conflict_log_table='pg_temp.clt');
NOTICE:  created replication slot "sub3" on publisher
CREATE SUBSCRIPTION

Should we restrict clt creation in pg_temp?

3)
+ /* Fetch the eixsting conflict table table information. */

typos: eixsting->existing,
          table table -> table

4)
AlterSubscription():
+ values[Anum_pg_subscription_subconflictlognspid - 1] =
+ ObjectIdGetDatum(nspid);
+
+ if (relname != NULL)
+ values[Anum_pg_subscription_subconflictlogtable - 1] =
+ CStringGetTextDatum(relname);
+ else
+ nulls[Anum_pg_subscription_subconflictlogtable - 1] =
+ true;

Should we move the nspid setting inside 'if(relname != NULL)' block?

5)
Is there a way to reset/remove conflict_log_table? I did not see any
such handling in AlterSubscription? It gives error:

postgres=# alter subscription sub3 set (conflict_log_table='');
ERROR:  invalid name syntax

6)
+char *
+get_subscription_conflict_log_table(Oid subid, Oid *nspid)
+{
+ HeapTuple tup;
+ Datum datum;
+ bool isnull;
+ char    *relname = NULL;
+ Form_pg_subscription subform;
+
+ *nspid = InvalidOid;
+
+ tup = SearchSysCache1(SUBSCRIPTIONOID, ObjectIdGetDatum(subid));
+
+ if (!HeapTupleIsValid(tup))
+ return NULL;

Should we have elog(ERROR) here for cache lookup failure? Callers like
AlterSubscription, DropSubscription lock the sub entry, so it being
missing at this stage is not normal. I have not seen all the callers
though.

7)
+#include "access/htup.h"
+#include "access/skey.h"

+#include "access/table.h"
+#include "catalog/pg_attribute.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_namespace.h"
+#include "catalog/pg_type.h"

+#include "executor/spi.h"
+#include "utils/array.h"

conflict.c compiles without above inclusions.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 12, 2025 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I was considering the interdependence between the subscription and the
> > conflict log table (CLT). IMHO, it would be logical to establish the
> > subscription as dependent on the CLT. This way, if someone attempts to
> > drop the CLT, the system would recognize the dependency of the
> > subscription and prevent the drop unless the subscription is removed
> > first or the CASCADE option is used.
> >
> > However, while investigating this, I encountered an error [1] stating
> > that global objects are not supported in this context. This indicates
> > that global objects cannot be made dependent on local objects.
> >
>
> What we need here is an equivalent of DEPENDENCY_INTERNAL for database
> objects. For example, consider following case:
> postgres=# create table t1(c1 int primary key);
> CREATE TABLE
> postgres=# \d+ t1
>                                            Table "public.t1"
>  Column |  Type   | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
>  c1     | integer |           | not null |         | plain   |
>     |              |
> Indexes:
>     "t1_pkey" PRIMARY KEY, btree (c1)
> Publications:
>     "pub1"
> Not-null constraints:
>     "t1_c1_not_null" NOT NULL "c1"
> Access method: heap
> postgres=# drop index t1_pkey;
> ERROR:  cannot drop index t1_pkey because constraint t1_pkey on table
> t1 requires it
> HINT:  You can drop constraint t1_pkey on table t1 instead.
>
> Here, the PK index is created as part for CREATE TABLE operation and
> pk_index is not allowed to be dropped independently.
>
> > Although making an object dependent on global/shared objects is
> > possible for certain types of shared objects [2], this is not our main
> > objective.
> >
>
> As per my understanding from the above example, we need something like
> that only for shared object subscription and (internally created)
> table.

Yeah that seems to be exactly what we want, so I tried doing that by
recording DEPENDENCY_INTERNAL dependency of CLT on subscription[1] and
it is behaving as we want[2].  And while dropping the subscription or
altering CLT we can delete internal dependency so that CLT get dropped
automatically[3]

I will send an updated patch after testing a few more scenarios and
fixing other pending issues.

[1]
+       ObjectAddressSet(myself, RelationRelationId, relid);
+       ObjectAddressSet(subaddr, SubscriptionRelationId, subid);
+       recordDependencyOn(&myself, &subaddr, DEPENDENCY_INTERNAL);


[2]
postgres[670778]=# DROP TABLE myschema.conflict_log_history2;
ERROR:  2BP01: cannot drop table myschema.conflict_log_history2
because subscription sub requires it
HINT:  You can drop subscription sub instead.
LOCATION:  findDependentObjects, dependency.c:788
postgres[670778]=#

[3]
ObjectAddressSet(object, SubscriptionRelationId, subid);
performDeletion(&object, DROP_CASCADE
                           PERFORM_DELETION_INTERNAL |
                           PERFORM_DELETION_SKIP_ORIGINAL);



--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sun, Dec 14, 2025 at 3:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 12, 2025 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > I was considering the interdependence between the subscription and the
> > > conflict log table (CLT). IMHO, it would be logical to establish the
> > > subscription as dependent on the CLT. This way, if someone attempts to
> > > drop the CLT, the system would recognize the dependency of the
> > > subscription and prevent the drop unless the subscription is removed
> > > first or the CASCADE option is used.
> > >
> > > However, while investigating this, I encountered an error [1] stating
> > > that global objects are not supported in this context. This indicates
> > > that global objects cannot be made dependent on local objects.
> > >
> >
> > What we need here is an equivalent of DEPENDENCY_INTERNAL for database
> > objects. For example, consider following case:
> > postgres=# create table t1(c1 int primary key);
> > CREATE TABLE
> > postgres=# \d+ t1
> >                                            Table "public.t1"
> >  Column |  Type   | Collation | Nullable | Default | Storage |
> > Compression | Stats target | Description
> > --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
> >  c1     | integer |           | not null |         | plain   |
> >     |              |
> > Indexes:
> >     "t1_pkey" PRIMARY KEY, btree (c1)
> > Publications:
> >     "pub1"
> > Not-null constraints:
> >     "t1_c1_not_null" NOT NULL "c1"
> > Access method: heap
> > postgres=# drop index t1_pkey;
> > ERROR:  cannot drop index t1_pkey because constraint t1_pkey on table
> > t1 requires it
> > HINT:  You can drop constraint t1_pkey on table t1 instead.
> >
> > Here, the PK index is created as part for CREATE TABLE operation and
> > pk_index is not allowed to be dropped independently.
> >
> > > Although making an object dependent on global/shared objects is
> > > possible for certain types of shared objects [2], this is not our main
> > > objective.
> > >
> >
> > As per my understanding from the above example, we need something like
> > that only for shared object subscription and (internally created)
> > table.
>
> Yeah that seems to be exactly what we want, so I tried doing that by
> recording DEPENDENCY_INTERNAL dependency of CLT on subscription[1] and
> it is behaving as we want[2].  And while dropping the subscription or
> altering CLT we can delete internal dependency so that CLT get dropped
> automatically[3]
>
> I will send an updated patch after testing a few more scenarios and
> fixing other pending issues.
>
> [1]
> +       ObjectAddressSet(myself, RelationRelationId, relid);
> +       ObjectAddressSet(subaddr, SubscriptionRelationId, subid);
> +       recordDependencyOn(&myself, &subaddr, DEPENDENCY_INTERNAL);
>
>
> [2]
> postgres[670778]=# DROP TABLE myschema.conflict_log_history2;
> ERROR:  2BP01: cannot drop table myschema.conflict_log_history2
> because subscription sub requires it
> HINT:  You can drop subscription sub instead.
> LOCATION:  findDependentObjects, dependency.c:788
> postgres[670778]=#
>
> [3]
> ObjectAddressSet(object, SubscriptionRelationId, subid);
> performDeletion(&object, DROP_CASCADE
>                            PERFORM_DELETION_INTERNAL |
>                            PERFORM_DELETION_SKIP_ORIGINAL);
>
>

Here is the patch which implements the dependency and fixes other
comments from Shveta.

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 12, 2025 at 3:33 PM shveta malik <shveta.malik@gmail.com> wrote:
>
>
> Few comments for v11:
>
> 1)
> +#include "executor/spi.h"
> +#include "replication/conflict.h"
> +#include "utils/fmgroids.h"
> +#include "utils/regproc.h"
>
> subscriptioncmds.c compiles without the above inclusions.

I think we need utils/regproc.h for "stringToQualifiedNameList()"

> 2)
> postgres=# create subscription sub3 connection '...' publication pub1
> WITH(conflict_log_table='pg_temp.clt');
> NOTICE:  created replication slot "sub3" on publisher
> CREATE SUBSCRIPTION
>
> Should we restrict clt creation in pg_temp?

Done and added a test.

> 3)
> + /* Fetch the eixsting conflict table table information. */
>
> typos: eixsting->existing,
>           table table -> table

Fixed

> 4)
> AlterSubscription():
> + values[Anum_pg_subscription_subconflictlognspid - 1] =
> + ObjectIdGetDatum(nspid);
> +
> + if (relname != NULL)
> + values[Anum_pg_subscription_subconflictlogtable - 1] =
> + CStringGetTextDatum(relname);
> + else
> + nulls[Anum_pg_subscription_subconflictlogtable - 1] =
> + true;
>
> Should we move the nspid setting inside 'if(relname != NULL)' block?

Since subconflictlognspid is part of the fixed size structure so we
will always have to set it so I prefer it to keep it out.

> 5)
> Is there a way to reset/remove conflict_log_table? I did not see any
> such handling in AlterSubscription? It gives error:
>
> postgres=# alter subscription sub3 set (conflict_log_table='');
> ERROR:  invalid name syntax

Fixed and added a test case

> 6)
> +char *
> +get_subscription_conflict_log_table(Oid subid, Oid *nspid)
> +{
> + HeapTuple tup;
> + Datum datum;
> + bool isnull;
> + char    *relname = NULL;
> + Form_pg_subscription subform;
> +
> + *nspid = InvalidOid;
> +
> + tup = SearchSysCache1(SUBSCRIPTIONOID, ObjectIdGetDatum(subid));
> +
> + if (!HeapTupleIsValid(tup))
> + return NULL;
>
> Should we have elog(ERROR) here for cache lookup failure? Callers like
> AlterSubscription, DropSubscription lock the sub entry, so it being
> missing at this stage is not normal. I have not seen all the callers
> though.

Yeah we can do that.

> 7)
> +#include "access/htup.h"
> +#include "access/skey.h"
>
> +#include "access/table.h"
> +#include "catalog/pg_attribute.h"
> +#include "catalog/indexing.h"
> +#include "catalog/namespace.h"
> +#include "catalog/pg_namespace.h"
> +#include "catalog/pg_type.h"
>
> +#include "executor/spi.h"
> +#include "utils/array.h"
>
> conflict.c compiles without above inclusions.

Done


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Sun, Dec 14, 2025 at 9:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>

Thanks for the patch. Few comments:

1)
+ if (isTempNamespace(namespaceId))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot create conflict log table \"%s\" in a temporary namespace",
+ conflictrel),
+ errhint("Use a permanent schema.")));

a)
Shall we use 'temporary schema' instead of 'temporary namespace'? See
other similar errors:

errmsg("cannot move objects into or out of temporary schemas")
errmsg("cannot create relations in temporary schemas of other
sessions"))
errmsg("cannot create temporary relation in non-temporary schema")

b)
Do we really need errhint here? It seems self-explanatory. If we
really want to specify HINT, shall we say:
"Specify a non-temporary schema for conflict log table."

2)
postgres=# alter subscription sub1 set (conflict_log_table='');
ERROR:  conflict log table name cannot be empty
HINT:  Provide a valid table name or omit the parameter.

My idea was to allow the above operation to enable users to reset the
conflict_log_table when the conflict log history is no longer needed.
Is there any other way to reset it, or is this intentionally not
supported?

3)
postgres=# alter subscription sub1 set (conflict_log_table=NULL);
ALTER SUBSCRIPTION
postgres=# alter subscription sub2 set (conflict_log_table=create);
ALTER SUBSCRIPTION
postgres=# \d
         List of relations
 Schema |  Name   | Type  | Owner
--------+---------+-------+--------
 public | create  | table | shveta
 public | null    | table | shveta


It takes reserved keywords and creates tables with those names. It
should be restricted.

4)
postgres=# SELECT c.relname FROM pg_depend d JOIN pg_class c ON c.oid
= d.objid JOIN pg_subscription s ON s.oid = d.refobjid WHERE s.subname
= 'sub1';
 relname
---------
 clt

postgres=#  select count(*) from pg_shdepend  where refobjid = (select
oid from pg_subscription where subname='sub1');
 count
-------
     0

Since dependency between sub and clt is a dependency involving
shared-object, shouldn't the entry be in pg_shdepend? Or do we allow
such entries in pg_depend as well?

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Dec 15, 2025 at 2:16 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Sun, Dec 14, 2025 at 9:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
>
> Thanks for the patch. Few comments:

>
> 2)
> postgres=# alter subscription sub1 set (conflict_log_table='');
> ERROR:  conflict log table name cannot be empty
> HINT:  Provide a valid table name or omit the parameter.
>
> My idea was to allow the above operation to enable users to reset the
> conflict_log_table when the conflict log history is no longer needed.
> Is there any other way to reset it, or is this intentionally not
> supported?

ALTEr SUBSCRIPTION..SET (conflict_log_table=NONE); this is same as how
other subscription parameters are being reset

> 3)
> postgres=# alter subscription sub1 set (conflict_log_table=NULL);
> ALTER SUBSCRIPTION
> postgres=# alter subscription sub2 set (conflict_log_table=create);
> ALTER SUBSCRIPTION
> postgres=# \d
>          List of relations
>  Schema |  Name   | Type  | Owner
> --------+---------+-------+--------
>  public | create  | table | shveta
>  public | null    | table | shveta
>
>
> It takes reserved keywords and creates tables with those names. It
> should be restricted.

I somehow assume table creation will be restricted with these names,
but since we switch from SPI to internal interface its not true
anymore, need to see how we can handle this.

> 4)
> postgres=# SELECT c.relname FROM pg_depend d JOIN pg_class c ON c.oid
> = d.objid JOIN pg_subscription s ON s.oid = d.refobjid WHERE s.subname
> = 'sub1';
>  relname
> ---------
>  clt
>
> postgres=#  select count(*) from pg_shdepend  where refobjid = (select
> oid from pg_subscription where subname='sub1');
>  count
> -------
>      0
>
> Since dependency between sub and clt is a dependency involving
> shared-object, shouldn't the entry be in pg_shdepend? Or do we allow
> such entries in pg_depend as well?

The primary reason for recording in pg_depend is that the
RemoveRelations() function already includes logic to check for and
report internal dependencies within pg_depends. Consequently, if we
were to record the dependency in pg_shdepends, we would likely need to
modify RemoveRelations() to incorporate handling for pg_shdepends
dependencies.

However, some might argue that when an object ID (objid) is local and
the referenced object ID (refobjid) is shared, such as when a table is
created under a ROLE, establishing a dependency with the owner, the
dependency is currently recorded in pg_shdepend. In this scenario, the
dependent object (the local table) can be dropped independently, while
the referenced object (the shared owner) cannot. However, when aiming
to record an internal dependency, the dependent object should not be
droppable without first dropping the referencing object. Therefore, I
believe the dependency record should be placed in pg_depend, as the
depender is a local object and will check for dependencies there.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Sun, Dec 14, 2025 at 9:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Here is the patch which implements the dependency and fixes other
> comments from Shveta.
>

+/*
+ * Check if the specified relation is used as a conflict log table by any
+ * subscription.
+ */
+bool
+IsConflictLogTable(Oid relid)
+{
+ Relation rel;
+ TableScanDesc scan;
+ HeapTuple tup;
+ bool is_clt = false;
+
+ rel = table_open(SubscriptionRelationId, AccessShareLock);
+ scan = table_beginscan_catalog(rel, 0, NULL);
+
+ while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))

This function has been used at multiple places in the patch, though
not in any performance-critical paths, but still, it seems like the
impact can be noticeable for a large number of subscriptions. Also, I
am not sure it is a good design to scan the entire system table to
find whether some other relation is publishable or not. I see below
kinds of usages for it:

+ /* Subscription conflict log tables are not published */
+ result = is_publishable_class(relid, (Form_pg_class) GETSTRUCT(tuple)) &&
+ !IsConflictLogTable(relid);

In this regard, I see a comment atop is_publishable_class which
suggests as follows:

The best
 * long-term solution may be to add a "relispublishable" bool to pg_class,
 * and depend on that instead of OID checks.
 */
static bool
is_publishable_class(Oid relid, Form_pg_class reltuple)

I feel that is a good idea for reasons mentioned atop
is_publishable_class and for the conflict table. What do you think?

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Dec 15, 2025 at 3:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Dec 14, 2025 at 9:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Here is the patch which implements the dependency and fixes other
> > comments from Shveta.
> >
>
> +/*
> + * Check if the specified relation is used as a conflict log table by any
> + * subscription.
> + */
> +bool
> +IsConflictLogTable(Oid relid)
> +{
> + Relation rel;
> + TableScanDesc scan;
> + HeapTuple tup;
> + bool is_clt = false;
> +
> + rel = table_open(SubscriptionRelationId, AccessShareLock);
> + scan = table_beginscan_catalog(rel, 0, NULL);
> +
> + while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
>
> This function has been used at multiple places in the patch, though
> not in any performance-critical paths, but still, it seems like the
> impact can be noticeable for a large number of subscriptions. Also, I
> am not sure it is a good design to scan the entire system table to
> find whether some other relation is publishable or not. I see below
> kinds of usages for it:
>
> + /* Subscription conflict log tables are not published */
> + result = is_publishable_class(relid, (Form_pg_class) GETSTRUCT(tuple)) &&
> + !IsConflictLogTable(relid);
>
> In this regard, I see a comment atop is_publishable_class which
> suggests as follows:
>
> The best
>  * long-term solution may be to add a "relispublishable" bool to pg_class,
>  * and depend on that instead of OID checks.
>  */
> static bool
> is_publishable_class(Oid relid, Form_pg_class reltuple)
>
> I feel that is a good idea for reasons mentioned atop
> is_publishable_class and for the conflict table. What do you think?

On quick thought, this seems like a good idea and may simplify a
couple of places.  And might be good for future extension as we can
mark publishable at individual relation instead of targeting broad
categories like IsCatalogRelationOid() or checking individual items by
its Oid.  IMHO this can be done as an individual patch in a separate
thread, or as a base patch.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Dec 15, 2025 at 4:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Dec 15, 2025 at 3:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sun, Dec 14, 2025 at 9:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > Here is the patch which implements the dependency and fixes other
> > > comments from Shveta.
> > >
> >
> > +/*
> > + * Check if the specified relation is used as a conflict log table by any
> > + * subscription.
> > + */
> > +bool
> > +IsConflictLogTable(Oid relid)
> > +{
> > + Relation rel;
> > + TableScanDesc scan;
> > + HeapTuple tup;
> > + bool is_clt = false;
> > +
> > + rel = table_open(SubscriptionRelationId, AccessShareLock);
> > + scan = table_beginscan_catalog(rel, 0, NULL);
> > +
> > + while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
> >
> > This function has been used at multiple places in the patch, though
> > not in any performance-critical paths, but still, it seems like the
> > impact can be noticeable for a large number of subscriptions. Also, I
> > am not sure it is a good design to scan the entire system table to
> > find whether some other relation is publishable or not. I see below
> > kinds of usages for it:
> >
> > + /* Subscription conflict log tables are not published */
> > + result = is_publishable_class(relid, (Form_pg_class) GETSTRUCT(tuple)) &&
> > + !IsConflictLogTable(relid);
> >
> > In this regard, I see a comment atop is_publishable_class which
> > suggests as follows:
> >
> > The best
> >  * long-term solution may be to add a "relispublishable" bool to pg_class,
> >  * and depend on that instead of OID checks.
> >  */
> > static bool
> > is_publishable_class(Oid relid, Form_pg_class reltuple)
> >
> > I feel that is a good idea for reasons mentioned atop
> > is_publishable_class and for the conflict table. What do you think?
>
> On quick thought, this seems like a good idea and may simplify a
> couple of places.  And might be good for future extension as we can
> mark publishable at individual relation instead of targeting broad
> categories like IsCatalogRelationOid() or checking individual items by
> its Oid.  IMHO this can be done as an individual patch in a separate
> thread, or as a base patch.
>

I prefer to do it in a separate thread, so that it can get some more
attention. But it should be done before the main conflict patch. I
think we can subdivide the main patch into (a) DDL handling,
everything except inserting data into conflict table, (b) inserting
data into conflict table, (c) upgrade handling. That way it will be
easier to review.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Dec 15, 2025 at 2:55 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > 3)
> > postgres=# alter subscription sub1 set (conflict_log_table=NULL);
> > ALTER SUBSCRIPTION
> > postgres=# alter subscription sub2 set (conflict_log_table=create);
> > ALTER SUBSCRIPTION
> > postgres=# \d
> >          List of relations
> >  Schema |  Name   | Type  | Owner
> > --------+---------+-------+--------
> >  public | create  | table | shveta
> >  public | null    | table | shveta
> >
> >
> > It takes reserved keywords and creates tables with those names. It
> > should be restricted.
>
> I somehow assume table creation will be restricted with these names,
> but since we switch from SPI to internal interface its not true
> anymore, need to see how we can handle this.

While thinking more on this, I was seeing other places where we use
'heap_create_with_catalog()' so I noticed that we always use the
internally generated name, so wouldn't it be nice to make the conflict
log table as bool and use internally generated name something like
conflict_log_table_$subid$ and we will always create that in current
active searchpath?  Thought?

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Some review comments for v12-0001.

======
General

1.
There is no documentation. Even if it seems a bit premature IMO
writing/reviewing the documention could help identify unanticipated
usability issues.

======
src/backend/commands/subscriptioncmds.c

2.
+
+ /* Setting conflict_log_table = NONE is treated as no table. */
+ if (strcmp(opts->conflictlogtable, "none") == 0)
+ opts->conflictlogtable = NULL;
+ }

2a.
This was unexpected when I cam across this code. This feature needs to
be described in the commit message.

~

2b.
Case sensitive?

~~~

CreateSubscription:

3.
+ List   *names;
+
+ /* Explicitly check for empty string before any processing. */
+ if (opts.conflictlogtable[0] == '\0')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("conflict log table name cannot be empty"),
+ errhint("Provide a valid table name or omit the parameter.")));
+
+ names = stringToQualifiedNameList(opts.conflictlogtable, NULL);

Should '' just be equivalent of NONE instead of another error condition?

~~~

AlterSubscription:

4.
+ Oid     old_nspid = InvalidOid;
+ char   *old_relname = NULL;
+ char   *relname = NULL;
+ List   *names = NIL;

Var 'names' can be declared at a lower scope -- e.g. in the 'if' block.

~~~

DropSubscription:

5.
+ /*
+ * Conflict log tables are recorded as internal dependencies of the
+ * subscription.  We must drop the dependent objects before the
+ * subscription itself is removed.  By using
+ * PERFORM_DELETION_SKIP_ORIGINAL, we ensure that only the conflict log
+ * table is reaped while the  subscription remains for the final deletion
+ * step.
+ */

Double spaces? /the  subscription/the subscription/

~~~

create_conflict_log_table_tupdesc:

6.
+static TupleDesc
+create_conflict_log_table_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ int i;
+
+ tupdesc = CreateTemplateTupleDesc(MAX_CONFLICT_ATTR_NUM);
+
+ for (i = 0; i < MAX_CONFLICT_ATTR_NUM; i++)

Declare 'i' as a for-loop var.

~~~

create_conflict_log_table:

7.
+/*
+ * Create conflict log table.
+ *
+ * The subscription owner becomes the owner of this table and has all
+ * privileges on it.
+ */
+static void
+create_conflict_log_table(Oid namespaceId, char *conflictrel, Oid subid)
+{

I felt that the 'subid' should be the first parameter, not the last.

~~~

8.
namespace > relation, so I felt it is more natural to check for the
temp namespace *before* checking for clashing table names.

======
src/backend/replication/logical/conflict.c

9.
+ if (ValidateConflictLogTable(conflictlogrel))
+ {
+ /*
+ * Prepare the conflict log tuple. If the error level is below
+ * ERROR, insert it immediately. Otherwise, defer the insertion to
+ * a new transaction after the current one aborts, ensuring the
+ * insertion of the log tuple is not rolled back.
+ */
+ prepare_conflict_log_tuple(estate,
+    relinfo->ri_RelationDesc,
+    conflictlogrel,
+    type,
+    searchslot,
+    conflicttuples,
+    remoteslot);
+ if (elevel < ERROR)
+ InsertConflictLogTuple(conflictlogrel);
+ }
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Conflict log table \"%s.%s\" structure changed, skipping insertion",
+ get_namespace_name(RelationGetNamespace(conflictlogrel)),
+ RelationGetRelationName(conflictlogrel)));

9a.
AFAICT in the only few places this function is called it emits exactly
the same warning, so it seems unnecessary duplication. Would it be
better to have that WARNING code inside the ValidateConflictLogTable
(eg always give the warning when returning false). But see also 9b.

~

9b.
I have some doubts about this validation function. It seems
inefficient to be validating the same CLT structure over and over
every time there is a new conflict. Not only is that going to be
slower, but the logfile is going to fill up with warnings. Maybe this
"validation" phase should be a one-time check only during the
CREATE/ALTER SUBSCRIPTION.

Maybe if validation fails it could give some NOTICE that the CLT
logging is broken and then reset the CLT to NONE?

~~~

ValidateConflictLogTable:

10.
+/*
+ * ValidateConflictLogTable - Validate conflict log table
+ *
+ * Validate whether the conflict log table is still suitable for considering as
+ * conflict log table.
+ */
+bool
+ValidateConflictLogTable(Relation rel)

This function comment seems unhelpful. 3 times it mentions equivalent
of "validate conflict log table" but nowhere does it say what that
even means.

 Maybe the later comment (below):

+ /*
+ * Check whether the table definition including its column names, data
+ * types, and column ordering meets the requirements for conflict log
+ * table.
+ */

Should be moved into the function comment part.

~~~

11.
+ Relation    pg_attribute;
+ HeapTuple   atup;
+ ScanKeyData scankey;
+ SysScanDesc scan;
+ Form_pg_attribute attForm;
+ int         attcnt = 0;
+ bool        tbl_ok = true;

'attForm' can be declared within the while loop.

~~~

12.
+ if (attcnt != MAX_CONFLICT_ATTR_NUM || !tbl_ok)
+ return false;

As per previous review comment, this could emit the WARNING log right
here. But see also #9b.

~~~

build_local_conflicts_json_array:

13.
+ Datum values[MAX_LOCAL_CONFLICT_INFO_ATTRS];
+ bool nulls[MAX_LOCAL_CONFLICT_INFO_ATTRS];
+ char    *origin_name = NULL;
+ HeapTuple tuple;
+ Datum json_datum;
+ int attno;
+
+ memset(values, 0, sizeof(Datum) * MAX_LOCAL_CONFLICT_INFO_ATTRS);
+ memset(nulls, 0, sizeof(bool) * MAX_LOCAL_CONFLICT_INFO_ATTRS);

You could also just use designated initializer syntax here and avoid
the memsets.

e.g. = {0}

~~~

14.
+ memset(values, 0, sizeof(Datum) * MAX_LOCAL_CONFLICT_INFO_ATTRS);
+ memset(nulls, 0, sizeof(bool) * MAX_LOCAL_CONFLICT_INFO_ATTRS);

Another place where you could've avoided memset and just done = {0};

~~~

15.
+ json_datum_array = (Datum *) palloc(num_conflicts * sizeof(Datum));
+ json_null_array = (bool *) palloc0(num_conflicts * sizeof(bool));

- index_value = BuildIndexValueDescription(indexDesc, values, isnull);
+ i = 0;
+ foreach(lc, json_datums)
+ {
+ json_datum_array[i] = (Datum) lfirst(lc);
+ i++;
+ }

Should these be using new palloc_array instead of palloc?

======
src/include/replication/conflict.h

16.
+typedef struct ConflictLogColumnDef
+{
+ const char *attname;    /* Column name */
+ Oid         atttypid;   /* Data type OID */
+} ConflictLogColumnDef;

Add this to typedefs.list

~~~

17.
+/* The single source of truth for the conflict log table schema */
+static const ConflictLogColumnDef ConflictLogSchema[] =
+{
+ { .attname = "relid",            .atttypid = OIDOID },
+ { .attname = "schemaname",       .atttypid = TEXTOID },
+ { .attname = "relname",          .atttypid = TEXTOID },
+ { .attname = "conflict_type",    .atttypid = TEXTOID },
+ { .attname = "remote_xid",       .atttypid = XIDOID },
+ { .attname = "remote_commit_lsn",.atttypid = LSNOID },
+ { .attname = "remote_commit_ts", .atttypid = TIMESTAMPTZOID },
+ { .attname = "remote_origin",    .atttypid = TEXTOID },
+ { .attname = "replica_identity", .atttypid = JSONOID },
+ { .attname = "remote_tuple",     .atttypid = JSONOID },
+ { .attname = "local_conflicts",  .atttypid = JSONARRAYOID }
+};

I like this, but I felt it would be better if all the definitions for
"local_conflicts" were defined here too. Then everythin gis in one
place.
e.g. MAX_LOCAL_CONFLICT_INFO_ATTRS and most of the content of
build_conflict_tupledesc().

~~~

18.
+/* Define the count using the array size */
+#define MAX_CONFLICT_ATTR_NUM (sizeof(ConflictLogSchema) /
sizeof(ConflictLogSchema[0]))

This comment is just saying same as the code so doesn't seem to be useful.

======
src/test/regress/expected/subscription.out

19.
+\dt+ clt.regress_conflict_log3
+                                              List of tables
+ Schema |         Name          | Type  |           Owner           |
Persistence |  Size   | Description
+--------+-----------------------+-------+---------------------------+-------------+---------+-------------
+ clt    | regress_conflict_log3 | table | regress_subscription_user |
permanent   | 0 bytes |
+(1 row)


Since the CLT is auto-created internally, and since there is a
"Description" attribute, I wonder should you also be auto-generating
that description so that here it might say something useful like:
"Conflict Log File for subscription XYZ"

~~~

20.
+-- ok - create subscription with conflict_log_table = NONE
+CREATE SUBSCRIPTION regress_conflict_test1 CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, conflict_log_table = NONE);
+SELECT subname, subconflictlogtable FROM pg_subscription WHERE
subname = 'regress_conflict_test2';
+        subname         |  subconflictlogtable
+------------------------+-----------------------
+ regress_conflict_test2 | regress_conflict_log3
+(1 row)
+

I didn't understand this test case; You are setting a NONE clt for
subscription 'regress_conflict_test1'. But then you are checking
subname 'regress_conflict_test2'.

Is that a typo?

~~~

21.
+ALTER SUBSCRIPTION regress_conflict_test1 DISABLE;
+ALTER SUBSCRIPTION regress_conflict_test1 SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_conflict_test1;
+-- Clean up remaining test subscription
+ALTER SUBSCRIPTION regress_conflict_test2 DISABLE;
+ALTER SUBSCRIPTION regress_conflict_test2 SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_conflict_test2;

Something seems misplaced. Why aren't all of the cleanups under the
'cleanup' comment?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Dec 15, 2025 at 5:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Dec 15, 2025 at 2:55 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > > 3)
> > > postgres=# alter subscription sub1 set (conflict_log_table=NULL);
> > > ALTER SUBSCRIPTION
> > > postgres=# alter subscription sub2 set (conflict_log_table=create);
> > > ALTER SUBSCRIPTION
> > > postgres=# \d
> > >          List of relations
> > >  Schema |  Name   | Type  | Owner
> > > --------+---------+-------+--------
> > >  public | create  | table | shveta
> > >  public | null    | table | shveta
> > >
> > >
> > > It takes reserved keywords and creates tables with those names. It
> > > should be restricted.
> >
> > I somehow assume table creation will be restricted with these names,
> > but since we switch from SPI to internal interface its not true
> > anymore, need to see how we can handle this.
>
> While thinking more on this, I was seeing other places where we use
> 'heap_create_with_catalog()' so I noticed that we always use the
> internally generated name, so wouldn't it be nice to make the conflict
> log table as bool and use internally generated name something like
> conflict_log_table_$subid$ and we will always create that in current
> active searchpath?  Thought?
>

We could do this as a first step. See the proposal in email [1] where
we have discussed having two options instead of one. The first option
will be conflict_log_format and the values would be log and table. In
this case, the table would be an internally generated one.

[1] -
https://www.postgresql.org/message-id/CAA4eK1KwqE2y%3D_k5Xc%3Def0S5JXG2x%3DoeWpDJ%2B%3D5k6Anzaw2gdw%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Sun, 14 Dec 2025 at 21:17, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sun, Dec 14, 2025 at 3:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Dec 12, 2025 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I was considering the interdependence between the subscription and the
> > > > conflict log table (CLT). IMHO, it would be logical to establish the
> > > > subscription as dependent on the CLT. This way, if someone attempts to
> > > > drop the CLT, the system would recognize the dependency of the
> > > > subscription and prevent the drop unless the subscription is removed
> > > > first or the CASCADE option is used.
> > > >
> > > > However, while investigating this, I encountered an error [1] stating
> > > > that global objects are not supported in this context. This indicates
> > > > that global objects cannot be made dependent on local objects.
> > > >
> > >
> > > What we need here is an equivalent of DEPENDENCY_INTERNAL for database
> > > objects. For example, consider following case:
> > > postgres=# create table t1(c1 int primary key);
> > > CREATE TABLE
> > > postgres=# \d+ t1
> > >                                            Table "public.t1"
> > >  Column |  Type   | Collation | Nullable | Default | Storage |
> > > Compression | Stats target | Description
> > > --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
> > >  c1     | integer |           | not null |         | plain   |
> > >     |              |
> > > Indexes:
> > >     "t1_pkey" PRIMARY KEY, btree (c1)
> > > Publications:
> > >     "pub1"
> > > Not-null constraints:
> > >     "t1_c1_not_null" NOT NULL "c1"
> > > Access method: heap
> > > postgres=# drop index t1_pkey;
> > > ERROR:  cannot drop index t1_pkey because constraint t1_pkey on table
> > > t1 requires it
> > > HINT:  You can drop constraint t1_pkey on table t1 instead.
> > >
> > > Here, the PK index is created as part for CREATE TABLE operation and
> > > pk_index is not allowed to be dropped independently.
> > >
> > > > Although making an object dependent on global/shared objects is
> > > > possible for certain types of shared objects [2], this is not our main
> > > > objective.
> > > >
> > >
> > > As per my understanding from the above example, we need something like
> > > that only for shared object subscription and (internally created)
> > > table.
> >
> > Yeah that seems to be exactly what we want, so I tried doing that by
> > recording DEPENDENCY_INTERNAL dependency of CLT on subscription[1] and
> > it is behaving as we want[2].  And while dropping the subscription or
> > altering CLT we can delete internal dependency so that CLT get dropped
> > automatically[3]
> >
> > I will send an updated patch after testing a few more scenarios and
> > fixing other pending issues.
> >
> > [1]
> > +       ObjectAddressSet(myself, RelationRelationId, relid);
> > +       ObjectAddressSet(subaddr, SubscriptionRelationId, subid);
> > +       recordDependencyOn(&myself, &subaddr, DEPENDENCY_INTERNAL);
> >
> >
> > [2]
> > postgres[670778]=# DROP TABLE myschema.conflict_log_history2;
> > ERROR:  2BP01: cannot drop table myschema.conflict_log_history2
> > because subscription sub requires it
> > HINT:  You can drop subscription sub instead.
> > LOCATION:  findDependentObjects, dependency.c:788
> > postgres[670778]=#
> >
> > [3]
> > ObjectAddressSet(object, SubscriptionRelationId, subid);
> > performDeletion(&object, DROP_CASCADE
> >                            PERFORM_DELETION_INTERNAL |
> >                            PERFORM_DELETION_SKIP_ORIGINAL);
> >
> >
>
> Here is the patch which implements the dependency and fixes other
> comments from Shveta.

Thanks for the changes, the new implementation based on dependency
creates a cycle while dumping:
./pg_dump -d postgres -f dump1.txt -p 5433
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump: detail: TABLE conflict  (ID 225 OID 16397)
pg_dump: detail: SUBSCRIPTION (ID 3484 OID 16396)
pg_dump: detail: POST-DATA BOUNDARY  (ID 3491)
pg_dump: detail: TABLE DATA t1  (ID 3485 OID 16384)
pg_dump: detail: PRE-DATA BOUNDARY  (ID 3490)

This can be seen with a simple subscription with conflict_log_table.
This was working fine with the v11 version patch.

Regards,
Vignesh



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Mon, Dec 15, 2025 at 3:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Dec 14, 2025 at 9:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Here is the patch which implements the dependency and fixes other
> > comments from Shveta.
> >
>
> +/*
> + * Check if the specified relation is used as a conflict log table by any
> + * subscription.
> + */
> +bool
> +IsConflictLogTable(Oid relid)
> +{
> + Relation rel;
> + TableScanDesc scan;
> + HeapTuple tup;
> + bool is_clt = false;
> +
> + rel = table_open(SubscriptionRelationId, AccessShareLock);
> + scan = table_beginscan_catalog(rel, 0, NULL);
> +
> + while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
>
> This function has been used at multiple places in the patch, though
> not in any performance-critical paths, but still, it seems like the
> impact can be noticeable for a large number of subscriptions. Also, I
> am not sure it is a good design to scan the entire system table to
> find whether some other relation is publishable or not. I see below
> kinds of usages for it:
>
> + /* Subscription conflict log tables are not published */
> + result = is_publishable_class(relid, (Form_pg_class) GETSTRUCT(tuple)) &&
> + !IsConflictLogTable(relid);
>
> In this regard, I see a comment atop is_publishable_class which
> suggests as follows:
>
> The best
>  * long-term solution may be to add a "relispublishable" bool to pg_class,
>  * and depend on that instead of OID checks.
>  */
> static bool
> is_publishable_class(Oid relid, Form_pg_class reltuple)
>
> I feel that is a good idea for reasons mentioned atop
> is_publishable_class and for the conflict table. What do you think?
>

+1.
The OID check may be unreliable, as mentioned in the comment. I tested
this by dropping and recreating information_schema, and observed that
after recreation it became eligible for publication because its relid
no longer falls under FirstNormalObjectId.  Steps:

****Pub****:
create publication pub1;
ALTER PUBLICATION pub1 ADD TABLE information_schema.sql_sizing;
select * from information_schema.sql_sizing where sizing_id=97;

****Sub****:
create subscription sub1 connection '...' publication pub1 with
(copy_data=false);
select * from information_schema.sql_sizing where sizing_id=97;

****Pub****:
alter table information_schema.sql_sizing replica identity full;
--this is not replicated.
UPDATE information_schema.sql_sizing set supported_value=12 where sizing_id=97;

****Sub****:
postgres=# select supported_value from information_schema.sql_sizing
where sizing_id=97;
 supported_value
-----------------
              0

~~

Then drop and recreate and try to perform the above update again, it
gets replicated:

drop schema information_schema cascade;
./psql -d postgres -f ./../../src/backend/catalog/information_schema.sql -p 5433

****Pub****:
ALTER PUBLICATION pub1 ADD TABLE information_schema.sql_sizing;
select * from information_schema.sql_sizing where sizing_id=97;
alter table information_schema.sql_sizing replica identity full;
--This is replicated
UPDATE information_schema.sql_sizing set supported_value=14 where sizing_id=97;

****Sub****:
--This shows supported_value as 14
postgres=# select supported_value from information_schema.sql_sizing
where sizing_id=97;
 supported_value
-----------------
              14

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Thu, 11 Dec 2025 at 19:50, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Dec 11, 2025 at 5:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 11, 2025 at 5:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Dec 11, 2025 at 5:04 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > 2)
> > > > When we do below:
> > > > alter subscription sub1 SET (conflict_log_table=clt2);
> > > >
> > > > the previous conflict log table is dropped. Is this behavior
> > > > intentional and discussed/concluded earlier? It’s possible that a user
> > > > may want to create a new conflict log table for future events while
> > > > still retaining the old one for analysis. If the subscription itself
> > > > is dropped, then dropping the CLT makes sense, but I’m not sure this
> > > > behavior is intended for ALTER SUBSCRIPTION.  I do understand that
> > > > once we unlink CLT from subscription, later even DROP subscription
> > > > cannot drop it, but user can always drop it when not needed.
> > > >
> > > > If we plan to keep existing behavior, it should be clearly documented
> > > > in a CAUTION section, and the command should explicitly log the table
> > > > drop.
> > >
> > > Yeah we discussed this behavior and the conclusion was we would
> > > document this behavior and its user's responsibility to take necessary
> > > backup of the conflict log table data if they are setting a new log
> > > table or NONE for the subscription.
> > >
> >
> > +1. If we don't do this then it will be difficult to track for
> > postgres or users the previous conflict history tables.
>
> Right, it makes sense.
>
> Attached patch fixed most of the open comments
> 1) \dRs+ now show the schema qualified name
> 2) Now key_tuple and replica_identify tuple both are add in conflict
> log tuple wherever applicable
> 3) Refactored the code so that we can define the conflict log table
> schema only once in the header file and both create_conflict_log_table
> and ValidateConflictLogTable use it.
>
> I was considering the interdependence between the subscription and the
> conflict log table (CLT). IMHO, it would be logical to establish the
> subscription as dependent on the CLT. This way, if someone attempts to
> drop the CLT, the system would recognize the dependency of the
> subscription and prevent the drop unless the subscription is removed
> first or the CASCADE option is used.
>
> However, while investigating this, I encountered an error [1] stating
> that global objects are not supported in this context. This indicates
> that global objects cannot be made dependent on local objects.
> Although making an object dependent on global/shared objects is
> possible for certain types of shared objects [2], this is not our main
> objective.
>
> We do not need to make the CLT dependent on the subscription because
> the table can be dropped when the subscription is dropped anyway and
> we are already doing it as part of drop subscription as well as alter
> subscription when CLT is set to NONE or a different table. Therefore,
> extending the functionality of shared dependency is unnecessary for
> this purpose.
>
> Thoughts?
>
> [1]
> doDeletion()
> {
> ....
> /*
> * These global object types are not supported here.
> */
> case AuthIdRelationId:
> case DatabaseRelationId:
> case TableSpaceRelationId:
> case SubscriptionRelationId:
> case ParameterAclRelationId:
> elog(ERROR, "global objects cannot be deleted by doDeletion");
> break;
> }
>
> [2]
> typedef enum SharedDependencyType
> {
> SHARED_DEPENDENCY_OWNER = 'o',
> SHARED_DEPENDENCY_ACL = 'a',
> SHARED_DEPENDENCY_INITACL = 'i',
> SHARED_DEPENDENCY_POLICY = 'r',
> SHARED_DEPENDENCY_TABLESPACE = 't',
> SHARED_DEPENDENCY_INVALID = 0,
> } SharedDependencyType;
>
> Pending Items are:
> 1. Handling dump/upgrade

The attached patch has the changes for handling dump. This works on
top of v11 version, it does not work on v12 because of the issue
reported at [1]. Currently the upgrade does not work because of the
existing issue which is being tracked at [2], upgrade works with the
patch attached at [2].

[1] - https://www.postgresql.org/message-id/CALDaNm1zEYoSdf2Ns-%3DUJRw95E5sbfpB0oaNUWtRJN27Q1Knhw%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CALDaNm2x3rd7C0_HjUpJFbxpAqXgm%3DQtoKfkEWDVA8h%2BJFpa_w%40mail.gmail.com

Regards,
Vignesh

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Dec 15, 2025 at 2:55 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Dec 15, 2025 at 2:16 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Sun, Dec 14, 2025 at 9:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > 4)
> > postgres=# SELECT c.relname FROM pg_depend d JOIN pg_class c ON c.oid
> > = d.objid JOIN pg_subscription s ON s.oid = d.refobjid WHERE s.subname
> > = 'sub1';
> >  relname
> > ---------
> >  clt
> >
> > postgres=#  select count(*) from pg_shdepend  where refobjid = (select
> > oid from pg_subscription where subname='sub1');
> >  count
> > -------
> >      0
> >
> > Since dependency between sub and clt is a dependency involving
> > shared-object, shouldn't the entry be in pg_shdepend? Or do we allow
> > such entries in pg_depend as well?
>
> The primary reason for recording in pg_depend is that the
> RemoveRelations() function already includes logic to check for and
> report internal dependencies within pg_depends. Consequently, if we
> were to record the dependency in pg_shdepends, we would likely need to
> modify RemoveRelations() to incorporate handling for pg_shdepends
> dependencies.
>
> However, some might argue that when an object ID (objid) is local and
> the referenced object ID (refobjid) is shared, such as when a table is
> created under a ROLE, establishing a dependency with the owner, the
> dependency is currently recorded in pg_shdepend. In this scenario, the
> dependent object (the local table) can be dropped independently, while
> the referenced object (the shared owner) cannot.
>

Yes and same is true for tablespaces. Consider below case:
create tablespace tbs location <tbs_location>;
create table t2(c1 int, c2 int) PARTITION BY RANGE(c1) tablespace tbs;

>
> However, when aiming
> to record an internal dependency, the dependent object should not be
> droppable without first dropping the referencing object. Therefore, I
> believe the dependency record should be placed in pg_depend, as the
> depender is a local object and will check for dependencies there.
>

I think it make sense to add the dependency entry in pg_depend for
this case (dependent object table is db-local and referenced object
subscription is shared among cluster) as there is a fundamental
architectural difference between Tablespaces/Roles and Subscriptions
that determines why one needs pg_shdepend and the other is better off
with pg_depend.

It comes down to cross-database visibility during the DROP command.

1. The "Tablespace" Scenario (Why it needs pg_shdepend)
A Tablespace is a truly global resource. You can connect to postgres
(database A) and try to drop a tablespace that is being used by app_db
(database B).

The Problem: When you run DROP TABLESPACE tbs from Database A, the
system cannot look inside Database B's pg_depend to see if the
tablespace is in use. It would have to connect to every database in
the cluster to check.

The Solution: We explicitly push this dependency up to the global
pg_shdepend. This allows the DROP command in Database A to instantly
see: "Wait, object 123 in Database B needs this. Block the drop."

2. The "Subscription" Scenario (Why it does NOT need pg_shdepend)
Although pg_subscription is a shared catalog, a Subscription is pinned
to a specific database (subdbid). One can only DROP SUBSCRIPTION while
connected to the database that owns it. Consider a scenario where one
creates a subscription sub_1 in app_db. Now, one cannot connect to
postgres DB and run DROP SUBSCRIPTION sub_1. She must connect to
app_db. Since we need to conenct to app_db to drop the subscription,
the system has direct, fast access to the local pg_depend of app_db.
It doesn't need to consult a global "Cross-DB" catalog because there
is no mystery about where the dependencies live.

Does this theory sound more bullet-proof as to why it is desirable to
store dependency entries for this case in pg_depend. If so, I suggest
we can add some comments to explain the difference of subscription
with other shared objects in comments as the future readers may have
the same question.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 16, 2025 at 10:33 AM shveta malik <shveta.malik@gmail.com> wrote:

> The OID check may be unreliable, as mentioned in the comment. I tested
> this by dropping and recreating information_schema, and observed that
> after recreation it became eligible for publication because its relid
> no longer falls under FirstNormalObjectId.  Steps:
>
> ****Pub****:
> create publication pub1;
> ALTER PUBLICATION pub1 ADD TABLE information_schema.sql_sizing;
> select * from information_schema.sql_sizing where sizing_id=97;
>
> ****Sub****:
> create subscription sub1 connection '...' publication pub1 with
> (copy_data=false);
> select * from information_schema.sql_sizing where sizing_id=97;
>
> ****Pub****:
> alter table information_schema.sql_sizing replica identity full;
> --this is not replicated.
> UPDATE information_schema.sql_sizing set supported_value=12 where sizing_id=97;
>
> ****Sub****:
> postgres=# select supported_value from information_schema.sql_sizing
> where sizing_id=97;
>  supported_value
> -----------------
>               0
>
> ~~
>
> Then drop and recreate and try to perform the above update again, it
> gets replicated:
>
> drop schema information_schema cascade;
> ./psql -d postgres -f ./../../src/backend/catalog/information_schema.sql -p 5433
>
> ****Pub****:
> ALTER PUBLICATION pub1 ADD TABLE information_schema.sql_sizing;
> select * from information_schema.sql_sizing where sizing_id=97;
> alter table information_schema.sql_sizing replica identity full;
> --This is replicated
> UPDATE information_schema.sql_sizing set supported_value=14 where sizing_id=97;
>
> ****Sub****:
> --This shows supported_value as 14
> postgres=# select supported_value from information_schema.sql_sizing
> where sizing_id=97;
>  supported_value
> -----------------
>               14

Hmm, I might be missing something what why we do not want to publish
which is in information_shcema, especially when the internally created
schema is dropped then user can create his own schema with name
information-schema and create a bunch of tables in that so why do we
want to block those?  I mean the example you showed here is pretty
much like a user created schema and table no? Or am I missing
something important?

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Dec 17, 2025 at 9:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 16, 2025 at 10:33 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> > The OID check may be unreliable, as mentioned in the comment. I tested
> > this by dropping and recreating information_schema, and observed that
> > after recreation it became eligible for publication because its relid
> > no longer falls under FirstNormalObjectId.  Steps:
> >
> > ****Pub****:
> > create publication pub1;
> > ALTER PUBLICATION pub1 ADD TABLE information_schema.sql_sizing;
> > select * from information_schema.sql_sizing where sizing_id=97;
> >
> > ****Sub****:
> > create subscription sub1 connection '...' publication pub1 with
> > (copy_data=false);
> > select * from information_schema.sql_sizing where sizing_id=97;
> >
> > ****Pub****:
> > alter table information_schema.sql_sizing replica identity full;
> > --this is not replicated.
> > UPDATE information_schema.sql_sizing set supported_value=12 where sizing_id=97;
> >
> > ****Sub****:
> > postgres=# select supported_value from information_schema.sql_sizing
> > where sizing_id=97;
> >  supported_value
> > -----------------
> >               0
> >
> > ~~
> >
> > Then drop and recreate and try to perform the above update again, it
> > gets replicated:
> >
> > drop schema information_schema cascade;
> > ./psql -d postgres -f ./../../src/backend/catalog/information_schema.sql -p 5433
> >
> > ****Pub****:
> > ALTER PUBLICATION pub1 ADD TABLE information_schema.sql_sizing;
> > select * from information_schema.sql_sizing where sizing_id=97;
> > alter table information_schema.sql_sizing replica identity full;
> > --This is replicated
> > UPDATE information_schema.sql_sizing set supported_value=14 where sizing_id=97;
> >
> > ****Sub****:
> > --This shows supported_value as 14
> > postgres=# select supported_value from information_schema.sql_sizing
> > where sizing_id=97;
> >  supported_value
> > -----------------
> >               14
>
> Hmm, I might be missing something what why we do not want to publish
> which is in information_shcema, especially when the internally created
> schema is dropped then user can create his own schema with name
> information-schema and create a bunch of tables in that so why do we
> want to block those?  I mean the example you showed here is pretty
> much like a user created schema and table no? Or am I missing
> something important?
>

I don’t think a user intentionally dropping information_schema and
creating their own schema (with different definitions and tables) is a
practical scenario. While it isn’t explicitly restricted, I don’t see
a strong need for it. OTOH, there are scenarios where, after fixing
issues that affect the definition of information_schema on stable
branches, users may be asked to reload information_schema to apply the
updated definitions. One such case can be seen in [1].

Additionally, while reviewing the code, I noticed places where the
logic does not rely solely on relid being less than
FirstNormalObjectId. Instead, it performs name-based comparisons,
explicitly accounting for the possibility that information_schema may
have been dropped and reloaded. This further indicates that such
scenarios are considered practical. See [2].
And if such scenarios are possible, it might be worth considering
keeping the publish behavior consistent, both before and after a
reload of information_schema.

[1]:
https://www.postgresql.org/docs/9.1/release-9-1-2.html

[2]:
pg_upgrade has this:
static DataTypesUsageChecks data_types_usage_checks[] =
{
        /*
         * Look for composite types that were made during initdb *or* belong to
         * information_schema; that's important in case information_schema was
         * dropped and reloaded.
         *
         * The cutoff OID here should match the source cluster's value of
         * FirstNormalObjectId.  We hardcode it rather than using that C #define
         * because, if that #define is ever changed, our own version's value is
         * NOT what to use.  Eventually we may need a test on the
source cluster's
         * version to select the correct value.
         */
        {
                .status = gettext_noop("Checking for system-defined
composite types in user tables"),
                .report_filename = "tables_using_composite.txt",
                .base_query =
                "SELECT t.oid FROM pg_catalog.pg_type t "
                "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
                " WHERE typtype = 'c' AND (t.oid < 16384 OR nspname =
'information_schema')",

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Dec 17, 2025 at 3:14 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> I don’t think a user intentionally dropping information_schema and
> creating their own schema (with different definitions and tables) is a
> practical scenario. While it isn’t explicitly restricted, I don’t see
> a strong need for it. OTOH, there are scenarios where, after fixing
> issues that affect the definition of information_schema on stable
> branches, users may be asked to reload information_schema to apply the
> updated definitions. One such case can be seen in [1].
>
> Additionally, while reviewing the code, I noticed places where the
> logic does not rely solely on relid being less than
> FirstNormalObjectId. Instead, it performs name-based comparisons,
> explicitly accounting for the possibility that information_schema may
> have been dropped and reloaded. This further indicates that such
> scenarios are considered practical. See [2].
> And if such scenarios are possible, it might be worth considering
> keeping the publish behavior consistent, both before and after a
> reload of information_schema.
>
> [1]:
> https://www.postgresql.org/docs/9.1/release-9-1-2.html
>
> [2]:
> pg_upgrade has this:
> static DataTypesUsageChecks data_types_usage_checks[] =
> {
>         /*
>          * Look for composite types that were made during initdb *or* belong to
>          * information_schema; that's important in case information_schema was
>          * dropped and reloaded.
>          *
>          * The cutoff OID here should match the source cluster's value of
>          * FirstNormalObjectId.  We hardcode it rather than using that C #define
>          * because, if that #define is ever changed, our own version's value is
>          * NOT what to use.  Eventually we may need a test on the
> source cluster's
>          * version to select the correct value.
>          */
>         {
>                 .status = gettext_noop("Checking for system-defined
> composite types in user tables"),
>                 .report_filename = "tables_using_composite.txt",
>                 .base_query =
>                 "SELECT t.oid FROM pg_catalog.pg_type t "
>                 "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
>                 " WHERE typtype = 'c' AND (t.oid < 16384 OR nspname =
> 'information_schema')",

Yeah I agree with your theory.  While the system allows users to
manually create an information_schema or place objects within it, we
are establishing that anything inside this schema will be treated as
an internal object. If a user chooses to bypass these conventions and
then finds the objects are not handled like standard user tables, it
constitutes a usage error rather than a system bug.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Dec 17, 2025 at 3:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 17, 2025 at 3:14 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > I don’t think a user intentionally dropping information_schema and
> > creating their own schema (with different definitions and tables) is a
> > practical scenario. While it isn’t explicitly restricted, I don’t see
> > a strong need for it. OTOH, there are scenarios where, after fixing
> > issues that affect the definition of information_schema on stable
> > branches, users may be asked to reload information_schema to apply the
> > updated definitions. One such case can be seen in [1].
> >
> > Additionally, while reviewing the code, I noticed places where the
> > logic does not rely solely on relid being less than
> > FirstNormalObjectId. Instead, it performs name-based comparisons,
> > explicitly accounting for the possibility that information_schema may
> > have been dropped and reloaded. This further indicates that such
> > scenarios are considered practical. See [2].
> > And if such scenarios are possible, it might be worth considering
> > keeping the publish behavior consistent, both before and after a
> > reload of information_schema.
> >
> > [1]:
> > https://www.postgresql.org/docs/9.1/release-9-1-2.html
> >
> > [2]:
> > pg_upgrade has this:
> > static DataTypesUsageChecks data_types_usage_checks[] =
> > {
> >         /*
> >          * Look for composite types that were made during initdb *or* belong to
> >          * information_schema; that's important in case information_schema was
> >          * dropped and reloaded.
> >          *
> >          * The cutoff OID here should match the source cluster's value of
> >          * FirstNormalObjectId.  We hardcode it rather than using that C #define
> >          * because, if that #define is ever changed, our own version's value is
> >          * NOT what to use.  Eventually we may need a test on the
> > source cluster's
> >          * version to select the correct value.
> >          */
> >         {
> >                 .status = gettext_noop("Checking for system-defined
> > composite types in user tables"),
> >                 .report_filename = "tables_using_composite.txt",
> >                 .base_query =
> >                 "SELECT t.oid FROM pg_catalog.pg_type t "
> >                 "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
> >                 " WHERE typtype = 'c' AND (t.oid < 16384 OR nspname =
> > 'information_schema')",
>
> Yeah I agree with your theory.  While the system allows users to
> manually create an information_schema or place objects within it, we
> are establishing that anything inside this schema will be treated as
> an internal object. If a user chooses to bypass these conventions and
> then finds the objects are not handled like standard user tables, it
> constitutes a usage error rather than a system bug.

Yes, I think so as well. IIUC, we wouldn’t be establishing anything
new here; this behavior is already established. If we look at the code
paths that reference information_schema, it is consistently treated as
similar to system schema rather than a user schema. A few examples
include XML_VISIBLE_SCHEMAS_EXCLUDE, selectDumpableNamespace,
data_types_usage_checks, describeFunctions, describeAggregates, and
others.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 16, 2025 at 9:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 15, 2025 at 5:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

>
> We could do this as a first step. See the proposal in email [1] where
> we have discussed having two options instead of one. The first option
> will be conflict_log_format and the values would be log and table. In
> this case, the table would be an internally generated one.
>
> [1] -
https://www.postgresql.org/message-id/CAA4eK1KwqE2y%3D_k5Xc%3Def0S5JXG2x%3DoeWpDJ%2B%3D5k6Anzaw2gdw%40mail.gmail.com

So I have put more thought on this and here is what I am proposing

1) Subscription Parameter: Son in first version the subscription
parameter will be named 'conflict_log_format' which will accept
'log/table/both' default option would be log.
2) If conflict_log_format = log is provided then we do not need to do
anything as this would work by default
3) If conflict_log_format = table/both is provided then we will
generate a internal table name i.e. conflict_log_table_$subid$ and the
table will be created in the current schema
4) in pg_subscription we will still keep 2 field a) namespace id of
the conflict log table b) the conflict log format = 'log/table'both'
5) If option is table/both the name can be generated on the fly
whether we are creating the table or inserting conflict into the
table.

Question:
1) Shall we create a conflict log table in the current schema or we
should consider anything else, IMHO the current schema should be fine
and in the future when we add an option for conflict_log_table we will
support schema qualified names as well?
2) In catalog I am storing the "conflict_log_format" option as a text
field, is there any better way so that we can store in fixed format
maybe enum value as an integer we can do e.g. from below enum we can
store the integer value in system catalog for "conflict_log_format"
field, not sure if we have done such think anywhere else?

typedef enum ConflictLogFormat
{
CONFLICT_LOG_FORMAT_DEFAULT = 0,
CONFLICT_LOG_FORMAT_LOG,
CONFLICT_LOG_FORMAT_TABLE,
CONFLICT_LOG_FORMAT_BOTH
} ConflictLogFormat;

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Dec 18, 2025 at 2:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 16, 2025 at 9:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 15, 2025 at 5:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> >
> > We could do this as a first step. See the proposal in email [1] where
> > we have discussed having two options instead of one. The first option
> > will be conflict_log_format and the values would be log and table. In
> > this case, the table would be an internally generated one.
> >
> > [1] -
https://www.postgresql.org/message-id/CAA4eK1KwqE2y%3D_k5Xc%3Def0S5JXG2x%3DoeWpDJ%2B%3D5k6Anzaw2gdw%40mail.gmail.com
>
> So I have put more thought on this and here is what I am proposing
>
> 1) Subscription Parameter: Son in first version the subscription
> parameter will be named 'conflict_log_format' which will accept
> 'log/table/both' default option would be log.
> 2) If conflict_log_format = log is provided then we do not need to do
> anything as this would work by default
> 3) If conflict_log_format = table/both is provided then we will
> generate a internal table name i.e. conflict_log_table_$subid$ and the
> table will be created in the current schema
> 4) in pg_subscription we will still keep 2 field a) namespace id of
> the conflict log table b) the conflict log format = 'log/table'both'
> 5) If option is table/both the name can be generated on the fly
> whether we are creating the table or inserting conflict into the
> table.
>
> Question:
> 1) Shall we create a conflict log table in the current schema or we
> should consider anything else, IMHO the current schema should be fine
> and in the future when we add an option for conflict_log_table we will
> support schema qualified names as well?
> 2) In catalog I am storing the "conflict_log_format" option as a text
> field, is there any better way so that we can store in fixed format
> maybe enum value as an integer we can do e.g. from below enum we can
> store the integer value in system catalog for "conflict_log_format"
> field, not sure if we have done such think anywhere else?
>
> typedef enum ConflictLogFormat
> {
> CONFLICT_LOG_FORMAT_DEFAULT = 0,
> CONFLICT_LOG_FORMAT_LOG,
> CONFLICT_LOG_FORMAT_TABLE,
> CONFLICT_LOG_FORMAT_BOTH
> } ConflictLogFormat;

While exploring other kinds of options I think we can make it a char
something like relkind as shown below, any other opinion on the same?

#define CONFLICT_LOG_FORMAT_LOG = 'l'
#define CONFLICT_LOG_FORMAT_TABLE = 't'
#define CONFLICT_LOG_FORMAT_BOTH = 'b'

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Dec 18, 2025 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Dec 18, 2025 at 2:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Dec 16, 2025 at 9:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Dec 15, 2025 at 5:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > >
> > > We could do this as a first step. See the proposal in email [1] where
> > > we have discussed having two options instead of one. The first option
> > > will be conflict_log_format and the values would be log and table. In
> > > this case, the table would be an internally generated one.
> > >
> > > [1] -
https://www.postgresql.org/message-id/CAA4eK1KwqE2y%3D_k5Xc%3Def0S5JXG2x%3DoeWpDJ%2B%3D5k6Anzaw2gdw%40mail.gmail.com
> >
> > So I have put more thought on this and here is what I am proposing
> >
> > 1) Subscription Parameter: Son in first version the subscription
> > parameter will be named 'conflict_log_format' which will accept
> > 'log/table/both' default option would be log.
> > 2) If conflict_log_format = log is provided then we do not need to do
> > anything as this would work by default
> > 3) If conflict_log_format = table/both is provided then we will
> > generate a internal table name i.e. conflict_log_table_$subid$ and the
> > table will be created in the current schema
> > 4) in pg_subscription we will still keep 2 field a) namespace id of
> > the conflict log table b) the conflict log format = 'log/table'both'
> > 5) If option is table/both the name can be generated on the fly
> > whether we are creating the table or inserting conflict into the
> > table.
> >
> > Question:
> > 1) Shall we create a conflict log table in the current schema or we
> > should consider anything else, IMHO the current schema should be fine
> > and in the future when we add an option for conflict_log_table we will
> > support schema qualified names as well?
> > 2) In catalog I am storing the "conflict_log_format" option as a text
> > field, is there any better way so that we can store in fixed format
> > maybe enum value as an integer we can do e.g. from below enum we can
> > store the integer value in system catalog for "conflict_log_format"
> > field, not sure if we have done such think anywhere else?
> >
> > typedef enum ConflictLogFormat
> > {
> > CONFLICT_LOG_FORMAT_DEFAULT = 0,
> > CONFLICT_LOG_FORMAT_LOG,
> > CONFLICT_LOG_FORMAT_TABLE,
> > CONFLICT_LOG_FORMAT_BOTH
> > } ConflictLogFormat;
>
> While exploring other kinds of options I think we can make it a char
> something like relkind as shown below, any other opinion on the same?
>
> #define CONFLICT_LOG_FORMAT_LOG = 'l'
> #define CONFLICT_LOG_FORMAT_TABLE = 't'
> #define CONFLICT_LOG_FORMAT_BOTH = 'b'
>

+1. Also, we should expose this to users with a type as enum similar
to auto_explain.log_format or publish_generated_columns.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Masahiko Sawada
Дата:
On Thu, Dec 18, 2025 at 1:09 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 16, 2025 at 9:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 15, 2025 at 5:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> >
> > We could do this as a first step. See the proposal in email [1] where
> > we have discussed having two options instead of one. The first option
> > will be conflict_log_format and the values would be log and table. In
> > this case, the table would be an internally generated one.
> >
> > [1] -
https://www.postgresql.org/message-id/CAA4eK1KwqE2y%3D_k5Xc%3Def0S5JXG2x%3DoeWpDJ%2B%3D5k6Anzaw2gdw%40mail.gmail.com
>
> So I have put more thought on this and here is what I am proposing
>
> 1) Subscription Parameter: Son in first version the subscription
> parameter will be named 'conflict_log_format' which will accept
> 'log/table/both' default option would be log.
> 2) If conflict_log_format = log is provided then we do not need to do
> anything as this would work by default
> 3) If conflict_log_format = table/both is provided then we will
> generate a internal table name i.e. conflict_log_table_$subid$ and the
> table will be created in the current schema
> 4) in pg_subscription we will still keep 2 field a) namespace id of
> the conflict log table b) the conflict log format = 'log/table'both'
> 5) If option is table/both the name can be generated on the fly
> whether we are creating the table or inserting conflict into the
> table.

I have a question: who will be the owner of the conflict log table? I
assume that the subscription owner would own the conflict log table
and the conflict logs are inserted by the owner but not by the table
owner, is that right?

>
> Question:
> 1) Shall we create a conflict log table in the current schema or we
> should consider anything else, IMHO the current schema should be fine
> and in the future when we add an option for conflict_log_table we will
> support schema qualified names as well?

Some questions:

If the same name table already exists, CREATE SUBSCRIPTION will fail, right?

Can the conflict log table be used like normal user tables (e.g.,
creating a trigger/a foreign key, running vacuum, ALTER TABLE etc.)?

> 2) In catalog I am storing the "conflict_log_format" option as a text
> field, is there any better way so that we can store in fixed format
> maybe enum value as an integer we can do e.g. from below enum we can
> store the integer value in system catalog for "conflict_log_format"
> field, not sure if we have done such think anywhere else?
>
> typedef enum ConflictLogFormat
> {
> CONFLICT_LOG_FORMAT_DEFAULT = 0,
> CONFLICT_LOG_FORMAT_LOG,
> CONFLICT_LOG_FORMAT_TABLE,
> CONFLICT_LOG_FORMAT_BOTH
> } ConflictLogFormat;

How about making conflict_log_format accept a list of destinations
instead of having the 'both' option in case where we might add more
destination options in the future?

It seems to me that conflict_log_destination sounds better.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
On Thu, Dec 18, 2025 at 8:09 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 16, 2025 at 9:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 15, 2025 at 5:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> >
> > We could do this as a first step. See the proposal in email [1] where
> > we have discussed having two options instead of one. The first option
> > will be conflict_log_format and the values would be log and table. In
> > this case, the table would be an internally generated one.
> >
> > [1] -
https://www.postgresql.org/message-id/CAA4eK1KwqE2y%3D_k5Xc%3Def0S5JXG2x%3DoeWpDJ%2B%3D5k6Anzaw2gdw%40mail.gmail.com
>
> So I have put more thought on this and here is what I am proposing
>
> 1) Subscription Parameter: Son in first version the subscription
> parameter will be named 'conflict_log_format' which will accept
> 'log/table/both' default option would be log.
> 2) If conflict_log_format = log is provided then we do not need to do
> anything as this would work by default
> 3) If conflict_log_format = table/both is provided then we will
> generate a internal table name i.e. conflict_log_table_$subid$ and the
> table will be created in the current schema
> 4) in pg_subscription we will still keep 2 field a) namespace id of
> the conflict log table b) the conflict log format = 'log/table'both'
> 5) If option is table/both the name can be generated on the fly
> whether we are creating the table or inserting conflict into the
> table.

IIUC, previously you had a "none" value which was a way to "turn off"
any CLT previously defined. How can users do that now with
log/table/both? Would they have to reassign (the default) "log"? That
seems a bit strange.

The word "both" option is too restrictive. What if in the future you
added a 3rd kind of destination -- then what does "both" mean?

Maybe the destination list idea of Sawda-San's is better.
a) it resolves the "none" issue -- e.g., empty string means revert to
default CLT behaviour
b) it resolves the "both" issue.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
On Thu, Dec 18, 2025 at 8:09 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
...
>
> Question:
> 1) Shall we create a conflict log table in the current schema or we
> should consider anything else, IMHO the current schema should be fine
> and in the future when we add an option for conflict_log_table we will
> support schema qualified names as well?

You might be able to avoid a proliferation of related options (such as
conflict_log_table) if you renamed the main option to
"conflict_log_destination" like Sawada-San was suggesting.

e.g.

conflict_log_destimation="table" --> use default table named by code
conflict_log_destimation="table=myschema.mytable" --> table name
nominated by user

e.g. if wanted maybe this idea can extend to logs too.

conflict_log_destimation="log" --> use default pg log files
conflict_log_destimation="log=my_clt_log.txt" --> write conflicts to a
separate log file nominated by user

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Dec 19, 2025 at 4:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Dec 18, 2025 at 1:09 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
>
> > 2) In catalog I am storing the "conflict_log_format" option as a text
> > field, is there any better way so that we can store in fixed format
> > maybe enum value as an integer we can do e.g. from below enum we can
> > store the integer value in system catalog for "conflict_log_format"
> > field, not sure if we have done such think anywhere else?
> >
> > typedef enum ConflictLogFormat
> > {
> > CONFLICT_LOG_FORMAT_DEFAULT = 0,
> > CONFLICT_LOG_FORMAT_LOG,
> > CONFLICT_LOG_FORMAT_TABLE,
> > CONFLICT_LOG_FORMAT_BOTH
> > } ConflictLogFormat;
>
> How about making conflict_log_format accept a list of destinations
> instead of having the 'both' option in case where we might add more
> destination options in the future?
>
> It seems to me that conflict_log_destination sounds better.
>

Yeah, this is worth considering. But say, we need to extend it so that
the conflict data goes in xml format file instead of standard log then
won't it look a bit odd to specify via conflict_log_destination. I
thought we could name it similar to the existing
auto_explain.log_format.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 19, 2025 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 19, 2025 at 4:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Dec 18, 2025 at 1:09 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> >
> > > 2) In catalog I am storing the "conflict_log_format" option as a text
> > > field, is there any better way so that we can store in fixed format
> > > maybe enum value as an integer we can do e.g. from below enum we can
> > > store the integer value in system catalog for "conflict_log_format"
> > > field, not sure if we have done such think anywhere else?
> > >
> > > typedef enum ConflictLogFormat
> > > {
> > > CONFLICT_LOG_FORMAT_DEFAULT = 0,
> > > CONFLICT_LOG_FORMAT_LOG,
> > > CONFLICT_LOG_FORMAT_TABLE,
> > > CONFLICT_LOG_FORMAT_BOTH
> > > } ConflictLogFormat;
> >
> > How about making conflict_log_format accept a list of destinations
> > instead of having the 'both' option in case where we might add more
> > destination options in the future?
> >
> > It seems to me that conflict_log_destination sounds better.
> >
>
> Yeah, this is worth considering. But say, we need to extend it so that
> the conflict data goes in xml format file instead of standard log then
> won't it look a bit odd to specify via conflict_log_destination. I
> thought we could name it similar to the existing
> auto_explain.log_format.

IMHO conflict_log_destination sounds more appropriate considering we
are talking about the log destination instead of format no?  And the
option could be log/table/file etc, and for now we can just stick to
log/table.  And in future we can extend it by supporting extra options
like destination_name, where we can provide table name or file name
etc.  So let me list down all the points which need consensus.

1. What should be the name of the option 'conflict_log_destination' vs
'conflict_log_format'
2. Do we want to support multi destination then providing string like
'conflict_log_destination = 'log,table,..' make more sense but then we
would have to store as a string in catalog and parse it everytime we
insert conflicts or alter subscription OTOH currently I have just
support single option log/table/both which make things much easy
because then in catalog we can store as a single char field and don't
need any parsing.  And since the input are taken as a string itself,
even if in future we want to support more options like  'log,table,..'
it would be backward compatible with old options.
3. Do we want to support 'none' destinations? i.e. do not log to anywhere?

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 19, 2025 at 5:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Thu, Dec 18, 2025 at 8:09 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Dec 16, 2025 at 9:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Dec 15, 2025 at 5:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > >
> > > We could do this as a first step. See the proposal in email [1] where
> > > we have discussed having two options instead of one. The first option
> > > will be conflict_log_format and the values would be log and table. In
> > > this case, the table would be an internally generated one.
> > >
> > > [1] -
https://www.postgresql.org/message-id/CAA4eK1KwqE2y%3D_k5Xc%3Def0S5JXG2x%3DoeWpDJ%2B%3D5k6Anzaw2gdw%40mail.gmail.com
> >
> > So I have put more thought on this and here is what I am proposing
> >
> > 1) Subscription Parameter: Son in first version the subscription
> > parameter will be named 'conflict_log_format' which will accept
> > 'log/table/both' default option would be log.
> > 2) If conflict_log_format = log is provided then we do not need to do
> > anything as this would work by default
> > 3) If conflict_log_format = table/both is provided then we will
> > generate a internal table name i.e. conflict_log_table_$subid$ and the
> > table will be created in the current schema
> > 4) in pg_subscription we will still keep 2 field a) namespace id of
> > the conflict log table b) the conflict log format = 'log/table'both'
> > 5) If option is table/both the name can be generated on the fly
> > whether we are creating the table or inserting conflict into the
> > table.
>
> IIUC, previously you had a "none" value which was a way to "turn off"
> any CLT previously defined. How can users do that now with
> log/table/both? Would they have to reassign (the default) "log"? That
> seems a bit strange.

Previously we were supporting only conflict log tables and by default
it was always sent to log.  And "none" was used for clearing the
conflict log table option; it was never meant for not logging anywhere
it was meant to say that there is no conflict log table.  Now also we
can have another option as none but I intentionally avoided it
considering we want to support the case where we don't want to log it
at all, maybe that's not a bad idea either.  Let's see what others
think about it.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Fri, Dec 19, 2025 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 19, 2025 at 4:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Dec 18, 2025 at 1:09 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> >
> > > 2) In catalog I am storing the "conflict_log_format" option as a text
> > > field, is there any better way so that we can store in fixed format
> > > maybe enum value as an integer we can do e.g. from below enum we can
> > > store the integer value in system catalog for "conflict_log_format"
> > > field, not sure if we have done such think anywhere else?
> > >
> > > typedef enum ConflictLogFormat
> > > {
> > > CONFLICT_LOG_FORMAT_DEFAULT = 0,
> > > CONFLICT_LOG_FORMAT_LOG,
> > > CONFLICT_LOG_FORMAT_TABLE,
> > > CONFLICT_LOG_FORMAT_BOTH
> > > } ConflictLogFormat;
> >
> > How about making conflict_log_format accept a list of destinations
> > instead of having the 'both' option in case where we might add more
> > destination options in the future?
> >
> > It seems to me that conflict_log_destination sounds better.
> >
>
> Yeah, this is worth considering. But say, we need to extend it so that
> the conflict data goes in xml format file instead of standard log then
> won't it look a bit odd to specify via conflict_log_destination. I
> thought we could name it similar to the existing
> auto_explain.log_format.
>

One option could be to separate destination and format:
conflict_log_history.destination : log/table
conflict_log_history.format : xml/json/text etc

Another option could be to use a single parameter,
'conflict_log_destination', with values such as:
table, xmllog, jsonlog, stderr/textlog

(where stderr corresponds to logging to log/postgresql.log, similar to
log_destination at [1]). I prefer this approach.

[1]: https://www.postgresql.org/docs/18/runtime-config-logging.html

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Fri, Dec 19, 2025 at 9:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 19, 2025 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Dec 19, 2025 at 4:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Dec 18, 2025 at 1:09 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > >
> > > > 2) In catalog I am storing the "conflict_log_format" option as a text
> > > > field, is there any better way so that we can store in fixed format
> > > > maybe enum value as an integer we can do e.g. from below enum we can
> > > > store the integer value in system catalog for "conflict_log_format"
> > > > field, not sure if we have done such think anywhere else?
> > > >
> > > > typedef enum ConflictLogFormat
> > > > {
> > > > CONFLICT_LOG_FORMAT_DEFAULT = 0,
> > > > CONFLICT_LOG_FORMAT_LOG,
> > > > CONFLICT_LOG_FORMAT_TABLE,
> > > > CONFLICT_LOG_FORMAT_BOTH
> > > > } ConflictLogFormat;
> > >
> > > How about making conflict_log_format accept a list of destinations
> > > instead of having the 'both' option in case where we might add more
> > > destination options in the future?
> > >
> > > It seems to me that conflict_log_destination sounds better.
> > >
> >
> > Yeah, this is worth considering. But say, we need to extend it so that
> > the conflict data goes in xml format file instead of standard log then
> > won't it look a bit odd to specify via conflict_log_destination. I
> > thought we could name it similar to the existing
> > auto_explain.log_format.
>
> IMHO conflict_log_destination sounds more appropriate considering we
> are talking about the log destination instead of format no?  And the
> option could be log/table/file etc, and for now we can just stick to
> log/table.  And in future we can extend it by supporting extra options
> like destination_name, where we can provide table name or file name
> etc.  So let me list down all the points which need consensus.
>
> 1. What should be the name of the option 'conflict_log_destination' vs
> 'conflict_log_format'

I prefer conflcit_log_destination.

> 2. Do we want to support multi destination then providing string like
> 'conflict_log_destination = 'log,table,..' make more sense but then we
> would have to store as a string in catalog and parse it everytime we
> insert conflicts or alter subscription OTOH currently I have just
> support single option log/table/both which make things much easy
> because then in catalog we can store as a single char field and don't
> need any parsing.  And since the input are taken as a string itself,
> even if in future we want to support more options like  'log,table,..'
> it would be backward compatible with old options.

I feel, combination of options might be a good idea, similar to how
'log_destination' provides. But it can be done in future versions and
the first draft can be a simple one.

> 3. Do we want to support 'none' destinations? i.e. do not log to anywhere?

IMO, conflict information is an important piece of information to
diagnose data divergence and thus should be logged always.

Let's wait for others' opinions.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
On Fri, Dec 19, 2025 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 19, 2025 at 5:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Thu, Dec 18, 2025 at 8:09 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Dec 16, 2025 at 9:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Dec 15, 2025 at 5:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > >
> > > > We could do this as a first step. See the proposal in email [1] where
> > > > we have discussed having two options instead of one. The first option
> > > > will be conflict_log_format and the values would be log and table. In
> > > > this case, the table would be an internally generated one.
> > > >
> > > > [1] -
https://www.postgresql.org/message-id/CAA4eK1KwqE2y%3D_k5Xc%3Def0S5JXG2x%3DoeWpDJ%2B%3D5k6Anzaw2gdw%40mail.gmail.com
> > >
> > > So I have put more thought on this and here is what I am proposing
> > >
> > > 1) Subscription Parameter: Son in first version the subscription
> > > parameter will be named 'conflict_log_format' which will accept
> > > 'log/table/both' default option would be log.
> > > 2) If conflict_log_format = log is provided then we do not need to do
> > > anything as this would work by default
> > > 3) If conflict_log_format = table/both is provided then we will
> > > generate a internal table name i.e. conflict_log_table_$subid$ and the
> > > table will be created in the current schema
> > > 4) in pg_subscription we will still keep 2 field a) namespace id of
> > > the conflict log table b) the conflict log format = 'log/table'both'
> > > 5) If option is table/both the name can be generated on the fly
> > > whether we are creating the table or inserting conflict into the
> > > table.
> >
> > IIUC, previously you had a "none" value which was a way to "turn off"
> > any CLT previously defined. How can users do that now with
> > log/table/both? Would they have to reassign (the default) "log"? That
> > seems a bit strange.
>
> Previously we were supporting only conflict log tables and by default
> it was always sent to log.  And "none" was used for clearing the
> conflict log table option; it was never meant for not logging anywhere
> it was meant to say that there is no conflict log table.  Now also we
> can have another option as none but I intentionally avoided it
> considering we want to support the case where we don't want to log it
> at all, maybe that's not a bad idea either.  Let's see what others
> think about it.
>

I didn't mean to suggest we should allow "not logging anywhere". I
only wanted to ask how the user is expected to revert the conflict
logging back to the default after they had set it to something else.

e.g.

CREATE SUBSCRIPTION mysub2 ... WITH(conflict_log_destination=table)
Now, how to ALTER SUBSCRIPTION to revert that back to default?

It seems there is no "reset to default" so is the user required to do
this explicitly?
ALTER SUBSCRIPTION mysub2 SET (conflict_log_destination=log);

Maybe that's fine --- I was just looking for some examples/clarification.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 19, 2025 at 11:12 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> I didn't mean to suggest we should allow "not logging anywhere". I
> only wanted to ask how the user is expected to revert the conflict
> logging back to the default after they had set it to something else.

Okay understood, thanks for the clarification.

> e.g.
>
> CREATE SUBSCRIPTION mysub2 ... WITH(conflict_log_destination=table)
> Now, how to ALTER SUBSCRIPTION to revert that back to default?
>
> It seems there is no "reset to default" so is the user required to do
> this explicitly?
> ALTER SUBSCRIPTION mysub2 SET (conflict_log_destination=log);
>
> Maybe that's fine --- I was just looking for some examples/clarification.

Yeah this is the way, IMHO it looks fine to me.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Dec 19, 2025 at 10:40 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Dec 19, 2025 at 9:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
>
> > 2. Do we want to support multi destination then providing string like
> > 'conflict_log_destination = 'log,table,..' make more sense but then we
> > would have to store as a string in catalog and parse it everytime we
> > insert conflicts or alter subscription OTOH currently I have just
> > support single option log/table/both which make things much easy
> > because then in catalog we can store as a single char field and don't
> > need any parsing.  And since the input are taken as a string itself,
> > even if in future we want to support more options like  'log,table,..'
> > it would be backward compatible with old options.
>
> I feel, combination of options might be a good idea, similar to how
> 'log_destination' provides. But it can be done in future versions and
> the first draft can be a simple one.
>

Considering the future extension of storing conflict information in
multiple places, it would be good to follow log_destination. Yes, it
is more work now but I feel that will be future-proof.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Dec 19, 2025 at 11:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 19, 2025 at 11:12 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > I didn't mean to suggest we should allow "not logging anywhere". I
> > only wanted to ask how the user is expected to revert the conflict
> > logging back to the default after they had set it to something else.
>
> Okay understood, thanks for the clarification.
>
> > e.g.
> >
> > CREATE SUBSCRIPTION mysub2 ... WITH(conflict_log_destination=table)
> > Now, how to ALTER SUBSCRIPTION to revert that back to default?
> >
> > It seems there is no "reset to default" so is the user required to do
> > this explicitly?
> > ALTER SUBSCRIPTION mysub2 SET (conflict_log_destination=log);
> >
> > Maybe that's fine --- I was just looking for some examples/clarification.
>
> Yeah this is the way, IMHO it looks fine to me.
>

How about considering log as default, so even if the user resets it
via "ALTER SUBSCRIPTION mysub2 SET (conflict_log_destination='');", we
send it to LOG as we are doing currently in HEAD? This means
conflict_log_destination='' or conflict_log_destination='log' means
the same.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 19, 2025 at 10:40 AM shveta malik <shveta.malik@gmail.com> wrote:

> > 1. What should be the name of the option 'conflict_log_destination' vs
> > 'conflict_log_format'
>
> I prefer conflcit_log_destination.
>
> > 2. Do we want to support multi destination then providing string like
> > 'conflict_log_destination = 'log,table,..' make more sense but then we
> > would have to store as a string in catalog and parse it everytime we
> > insert conflicts or alter subscription OTOH currently I have just
> > support single option log/table/both which make things much easy
> > because then in catalog we can store as a single char field and don't
> > need any parsing.  And since the input are taken as a string itself,
> > even if in future we want to support more options like  'log,table,..'
> > it would be backward compatible with old options.
>
> I feel, combination of options might be a good idea, similar to how
> 'log_destination' provides. But it can be done in future versions and
> the first draft can be a simple one.
>
> > 3. Do we want to support 'none' destinations? i.e. do not log to anywhere?
>
> IMO, conflict information is an important piece of information to
> diagnose data divergence and thus should be logged always.
>
> Let's wait for others' opinions.

Thanks Shveta for you opinion,

Here is what I propose considering balance between simplicity with
future scalability:

1. Retain 'conflict_log_destination' as the option name.
2. Current supported values include 'log', 'table', or 'all' (which
directs output to both locations).  But we will not support comma
separated values in the first version.
3. By treating this as a string, we can eventually support
comma-separated values like 'log, table, new_option'. This approach
maintains a simple design by avoiding immediate need of parsing the
comma separated options while ensuring extensibility.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Masahiko Sawada
Дата:
On Thu, Dec 18, 2025 at 10:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 19, 2025 at 10:40 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> > > 1. What should be the name of the option 'conflict_log_destination' vs
> > > 'conflict_log_format'
> >
> > I prefer conflcit_log_destination.
> >
> > > 2. Do we want to support multi destination then providing string like
> > > 'conflict_log_destination = 'log,table,..' make more sense but then we
> > > would have to store as a string in catalog and parse it everytime we
> > > insert conflicts or alter subscription OTOH currently I have just
> > > support single option log/table/both which make things much easy
> > > because then in catalog we can store as a single char field and don't
> > > need any parsing.  And since the input are taken as a string itself,
> > > even if in future we want to support more options like  'log,table,..'
> > > it would be backward compatible with old options.
> >
> > I feel, combination of options might be a good idea, similar to how
> > 'log_destination' provides. But it can be done in future versions and
> > the first draft can be a simple one.
> >
> > > 3. Do we want to support 'none' destinations? i.e. do not log to anywhere?
> >
> > IMO, conflict information is an important piece of information to
> > diagnose data divergence and thus should be logged always.
> >
> > Let's wait for others' opinions.
>
> Thanks Shveta for you opinion,
>
> Here is what I propose considering balance between simplicity with
> future scalability:
>
> 1. Retain 'conflict_log_destination' as the option name.
> 2. Current supported values include 'log', 'table', or 'all' (which
> directs output to both locations).  But we will not support comma
> separated values in the first version.

If users set conflict_log_destination='table', we don't report
anything related to conflict to the server logs while all other errors
generated by apply workers go to the server logs? or do we write
ERRORs without the conflict details while writing full conflict logs
to the table? If we go with the former idea, monitoring tools would
not be able to  catch ERROR logs. Users can set
conflict_log_destination='all' in this case, but they might want to
avoid bloating the server logs by the detailed conflict information. I
wonder if there might be cases where monitoring tools want to detect
at least the fact that errors occur in the system.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Tue, 16 Dec 2025 at 09:54, vignesh C <vignesh21@gmail.com> wrote:
>
> On Sun, 14 Dec 2025 at 21:17, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Sun, Dec 14, 2025 at 3:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Dec 12, 2025 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > I was considering the interdependence between the subscription and the
> > > > > conflict log table (CLT). IMHO, it would be logical to establish the
> > > > > subscription as dependent on the CLT. This way, if someone attempts to
> > > > > drop the CLT, the system would recognize the dependency of the
> > > > > subscription and prevent the drop unless the subscription is removed
> > > > > first or the CASCADE option is used.
> > > > >
> > > > > However, while investigating this, I encountered an error [1] stating
> > > > > that global objects are not supported in this context. This indicates
> > > > > that global objects cannot be made dependent on local objects.
> > > > >
> > > >
> > > > What we need here is an equivalent of DEPENDENCY_INTERNAL for database
> > > > objects. For example, consider following case:
> > > > postgres=# create table t1(c1 int primary key);
> > > > CREATE TABLE
> > > > postgres=# \d+ t1
> > > >                                            Table "public.t1"
> > > >  Column |  Type   | Collation | Nullable | Default | Storage |
> > > > Compression | Stats target | Description
> > > > --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
> > > >  c1     | integer |           | not null |         | plain   |
> > > >     |              |
> > > > Indexes:
> > > >     "t1_pkey" PRIMARY KEY, btree (c1)
> > > > Publications:
> > > >     "pub1"
> > > > Not-null constraints:
> > > >     "t1_c1_not_null" NOT NULL "c1"
> > > > Access method: heap
> > > > postgres=# drop index t1_pkey;
> > > > ERROR:  cannot drop index t1_pkey because constraint t1_pkey on table
> > > > t1 requires it
> > > > HINT:  You can drop constraint t1_pkey on table t1 instead.
> > > >
> > > > Here, the PK index is created as part for CREATE TABLE operation and
> > > > pk_index is not allowed to be dropped independently.
> > > >
> > > > > Although making an object dependent on global/shared objects is
> > > > > possible for certain types of shared objects [2], this is not our main
> > > > > objective.
> > > > >
> > > >
> > > > As per my understanding from the above example, we need something like
> > > > that only for shared object subscription and (internally created)
> > > > table.
> > >
> > > Yeah that seems to be exactly what we want, so I tried doing that by
> > > recording DEPENDENCY_INTERNAL dependency of CLT on subscription[1] and
> > > it is behaving as we want[2].  And while dropping the subscription or
> > > altering CLT we can delete internal dependency so that CLT get dropped
> > > automatically[3]
> > >
> > > I will send an updated patch after testing a few more scenarios and
> > > fixing other pending issues.
> > >
> > > [1]
> > > +       ObjectAddressSet(myself, RelationRelationId, relid);
> > > +       ObjectAddressSet(subaddr, SubscriptionRelationId, subid);
> > > +       recordDependencyOn(&myself, &subaddr, DEPENDENCY_INTERNAL);
> > >
> > >
> > > [2]
> > > postgres[670778]=# DROP TABLE myschema.conflict_log_history2;
> > > ERROR:  2BP01: cannot drop table myschema.conflict_log_history2
> > > because subscription sub requires it
> > > HINT:  You can drop subscription sub instead.
> > > LOCATION:  findDependentObjects, dependency.c:788
> > > postgres[670778]=#
> > >
> > > [3]
> > > ObjectAddressSet(object, SubscriptionRelationId, subid);
> > > performDeletion(&object, DROP_CASCADE
> > >                            PERFORM_DELETION_INTERNAL |
> > >                            PERFORM_DELETION_SKIP_ORIGINAL);
> > >
> > >
> >
> > Here is the patch which implements the dependency and fixes other
> > comments from Shveta.
>
> Thanks for the changes, the new implementation based on dependency
> creates a cycle while dumping:
> ./pg_dump -d postgres -f dump1.txt -p 5433
> pg_dump: warning: could not resolve dependency loop among these items:
> pg_dump: detail: TABLE conflict  (ID 225 OID 16397)
> pg_dump: detail: SUBSCRIPTION (ID 3484 OID 16396)
> pg_dump: detail: POST-DATA BOUNDARY  (ID 3491)
> pg_dump: detail: TABLE DATA t1  (ID 3485 OID 16384)
> pg_dump: detail: PRE-DATA BOUNDARY  (ID 3490)
>
> This can be seen with a simple subscription with conflict_log_table.
> This was working fine with the v11 version patch.

The attached v13 patch includes the fix for this issue. In addition,
it now raises an error when attempting to configure a conflict log
table that belongs to a temporary schema or is not a permanent
(persistent) relation.

Regards,
Vignesh

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sat, Dec 20, 2025 at 3:17 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 16 Dec 2025 at 09:54, vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Sun, 14 Dec 2025 at 21:17, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Sun, Dec 14, 2025 at 3:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Dec 12, 2025 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > >
> > > > > > I was considering the interdependence between the subscription and the
> > > > > > conflict log table (CLT). IMHO, it would be logical to establish the
> > > > > > subscription as dependent on the CLT. This way, if someone attempts to
> > > > > > drop the CLT, the system would recognize the dependency of the
> > > > > > subscription and prevent the drop unless the subscription is removed
> > > > > > first or the CASCADE option is used.
> > > > > >
> > > > > > However, while investigating this, I encountered an error [1] stating
> > > > > > that global objects are not supported in this context. This indicates
> > > > > > that global objects cannot be made dependent on local objects.
> > > > > >
> > > > >
> > > > > What we need here is an equivalent of DEPENDENCY_INTERNAL for database
> > > > > objects. For example, consider following case:
> > > > > postgres=# create table t1(c1 int primary key);
> > > > > CREATE TABLE
> > > > > postgres=# \d+ t1
> > > > >                                            Table "public.t1"
> > > > >  Column |  Type   | Collation | Nullable | Default | Storage |
> > > > > Compression | Stats target | Description
> > > > > --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
> > > > >  c1     | integer |           | not null |         | plain   |
> > > > >     |              |
> > > > > Indexes:
> > > > >     "t1_pkey" PRIMARY KEY, btree (c1)
> > > > > Publications:
> > > > >     "pub1"
> > > > > Not-null constraints:
> > > > >     "t1_c1_not_null" NOT NULL "c1"
> > > > > Access method: heap
> > > > > postgres=# drop index t1_pkey;
> > > > > ERROR:  cannot drop index t1_pkey because constraint t1_pkey on table
> > > > > t1 requires it
> > > > > HINT:  You can drop constraint t1_pkey on table t1 instead.
> > > > >
> > > > > Here, the PK index is created as part for CREATE TABLE operation and
> > > > > pk_index is not allowed to be dropped independently.
> > > > >
> > > > > > Although making an object dependent on global/shared objects is
> > > > > > possible for certain types of shared objects [2], this is not our main
> > > > > > objective.
> > > > > >
> > > > >
> > > > > As per my understanding from the above example, we need something like
> > > > > that only for shared object subscription and (internally created)
> > > > > table.
> > > >
> > > > Yeah that seems to be exactly what we want, so I tried doing that by
> > > > recording DEPENDENCY_INTERNAL dependency of CLT on subscription[1] and
> > > > it is behaving as we want[2].  And while dropping the subscription or
> > > > altering CLT we can delete internal dependency so that CLT get dropped
> > > > automatically[3]
> > > >
> > > > I will send an updated patch after testing a few more scenarios and
> > > > fixing other pending issues.
> > > >
> > > > [1]
> > > > +       ObjectAddressSet(myself, RelationRelationId, relid);
> > > > +       ObjectAddressSet(subaddr, SubscriptionRelationId, subid);
> > > > +       recordDependencyOn(&myself, &subaddr, DEPENDENCY_INTERNAL);
> > > >
> > > >
> > > > [2]
> > > > postgres[670778]=# DROP TABLE myschema.conflict_log_history2;
> > > > ERROR:  2BP01: cannot drop table myschema.conflict_log_history2
> > > > because subscription sub requires it
> > > > HINT:  You can drop subscription sub instead.
> > > > LOCATION:  findDependentObjects, dependency.c:788
> > > > postgres[670778]=#
> > > >
> > > > [3]
> > > > ObjectAddressSet(object, SubscriptionRelationId, subid);
> > > > performDeletion(&object, DROP_CASCADE
> > > >                            PERFORM_DELETION_INTERNAL |
> > > >                            PERFORM_DELETION_SKIP_ORIGINAL);
> > > >
> > > >
> > >
> > > Here is the patch which implements the dependency and fixes other
> > > comments from Shveta.
> >
> > Thanks for the changes, the new implementation based on dependency
> > creates a cycle while dumping:
> > ./pg_dump -d postgres -f dump1.txt -p 5433
> > pg_dump: warning: could not resolve dependency loop among these items:
> > pg_dump: detail: TABLE conflict  (ID 225 OID 16397)
> > pg_dump: detail: SUBSCRIPTION (ID 3484 OID 16396)
> > pg_dump: detail: POST-DATA BOUNDARY  (ID 3491)
> > pg_dump: detail: TABLE DATA t1  (ID 3485 OID 16384)
> > pg_dump: detail: PRE-DATA BOUNDARY  (ID 3490)
> >
> > This can be seen with a simple subscription with conflict_log_table.
> > This was working fine with the v11 version patch.
>
> The attached v13 patch includes the fix for this issue. In addition,
> it now raises an error when attempting to configure a conflict log
> table that belongs to a temporary schema or is not a permanent
> (persistent) relation.

I have updated the patch and here are changes done
1. Splitted into 2 patches, 0001- for catalog related changes
0002-inserting conflict into the conflict table, Vignesh need to
rebase the dump and upgrade related patch on this latest changes
2. Subscription option changed to conflict_log_destination=(log/table/all/'')
3. For internal processing we will use ConflictLogDest enum whereas
for taking input or storing into catalog we will use string [1].
4. As suggested by Sawada San, if conflict_log_destination is 'table'
we log the information about conflict but don't log the tuple
details[3]

Pending:
1. tap test for conflict insertion
2. Still need to work on caching related changes discussed at [2], so
currently we don't allow conflict log tables to be added to
publication at all and might change this behavior as discussed at [2]
and for that we will need to implement the caching.
3. Need to add conflict insertion test and doc changes.
4. Still need to check on the latest comments from Peter Smith.


[1]
typedef enum ConflictLogDest
{
CONFLICT_LOG_DEST_INVALID = 0,
CONFLICT_LOG_DEST_LOG, /* "log" (default) */
CONFLICT_LOG_DEST_TABLE, /* "table" */
CONFLICT_LOG_DEST_ALL /* "all" */
} ConflictLogDest;

/*
* Array mapping for converting internal enum to string.
*/
static const char *const ConflictLogDestLabels[] = {
[CONFLICT_LOG_DEST_LOG] = "log",
[CONFLICT_LOG_DEST_TABLE] = "table",
[CONFLICT_LOG_DEST_ALL] = "all"
};

[2] https://www.postgresql.org/message-id/CAA4eK1LNjWigHb5YKz2nBwcGQr18WnNZHv3Gyo8GNCshSkAb-A%40mail.gmail.com

[3]
/* Decide what detail to show in server logs. */
if (dest == CONFLICT_LOG_DEST_LOG || dest == CONFLICT_LOG_DEST_ALL)
{
/* Standard reporting with full internal details. */
ereport(elevel,
errcode_apply_conflict(type),
errmsg("conflict detected on relation \"%s.%s\": conflict=%s",
get_namespace_name(RelationGetNamespace(localrel)),
RelationGetRelationName(localrel),
ConflictTypeNames[type]),
errdetail_internal("%s", err_detail.data));
}
else
{
/*
* 'table' only: Report the error msg but omit raw tuple data from
* server logs since it's already captured in the internal table.
*/
ereport(elevel,
errcode_apply_conflict(type),
errmsg("conflict detected on relation \"%s.%s\": conflict=%s",
get_namespace_name(RelationGetNamespace(localrel)),
RelationGetRelationName(localrel),
ConflictTypeNames[type]),
errdetail("Conflict details logged to internal table with OID %u.",
MySubscription->conflictrelid));
}

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Sat, 20 Dec 2025 at 16:51, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I have updated the patch and here are changes done
> 1. Splitted into 2 patches, 0001- for catalog related changes
> 0002-inserting conflict into the conflict table, Vignesh need to
> rebase the dump and upgrade related patch on this latest changes
> 2. Subscription option changed to conflict_log_destination=(log/table/all/'')
> 3. For internal processing we will use ConflictLogDest enum whereas
> for taking input or storing into catalog we will use string [1].
> 4. As suggested by Sawada San, if conflict_log_destination is 'table'
> we log the information about conflict but don't log the tuple
> details[3]
>
> Pending:
> 2. Still need to work on caching related changes discussed at [2], so
> currently we don't allow conflict log tables to be added to
> publication at all and might change this behavior as discussed at [2]
> and for that we will need to implement the caching.

This point is addressed in the attached patch. A new shared index on
pg_subscription (subconflictlogrelid) is introduced and used to
efficiently determine whether a relation is a conflict log table,
avoiding full catalog scans. Additionally, a conflict log table can be
explicitly added to a TABLE publication and will be published when
specified directly. At the same time, such relations are excluded from
implicit publication paths (FOR ALL TABLES and schema publications).
The patch also exposes pg_relation_is_conflict_log_table() as a
SQL-visible helper, which is used by psql \d+ to filter out conflict
log tables from implicit publication listings. This avoids querying
pg_subscription directly, which is generally inaccessible to
non-superusers.

These changes are included in v14-003. There are no changes in v14-001
and v14-002; those versions are identical to the patch previously
shared by Dilip at [1].

[1] - https://www.postgresql.org/message-id/CAFiTN-sNg9ghLNkB2Kn0SwBGOub9acc99XZZU_d5NAcyW-yrEg%40mail.gmail.com

Regards,
Vignesh

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Sat, 20 Dec 2025 at 16:51, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Dec 20, 2025 at 3:17 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, 16 Dec 2025 at 09:54, vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Sun, 14 Dec 2025 at 21:17, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Sun, Dec 14, 2025 at 3:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Dec 12, 2025 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > I was considering the interdependence between the subscription and the
> > > > > > > conflict log table (CLT). IMHO, it would be logical to establish the
> > > > > > > subscription as dependent on the CLT. This way, if someone attempts to
> > > > > > > drop the CLT, the system would recognize the dependency of the
> > > > > > > subscription and prevent the drop unless the subscription is removed
> > > > > > > first or the CASCADE option is used.
> > > > > > >
> > > > > > > However, while investigating this, I encountered an error [1] stating
> > > > > > > that global objects are not supported in this context. This indicates
> > > > > > > that global objects cannot be made dependent on local objects.
> > > > > > >
> > > > > >
> > > > > > What we need here is an equivalent of DEPENDENCY_INTERNAL for database
> > > > > > objects. For example, consider following case:
> > > > > > postgres=# create table t1(c1 int primary key);
> > > > > > CREATE TABLE
> > > > > > postgres=# \d+ t1
> > > > > >                                            Table "public.t1"
> > > > > >  Column |  Type   | Collation | Nullable | Default | Storage |
> > > > > > Compression | Stats target | Description
> > > > > > --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
> > > > > >  c1     | integer |           | not null |         | plain   |
> > > > > >     |              |
> > > > > > Indexes:
> > > > > >     "t1_pkey" PRIMARY KEY, btree (c1)
> > > > > > Publications:
> > > > > >     "pub1"
> > > > > > Not-null constraints:
> > > > > >     "t1_c1_not_null" NOT NULL "c1"
> > > > > > Access method: heap
> > > > > > postgres=# drop index t1_pkey;
> > > > > > ERROR:  cannot drop index t1_pkey because constraint t1_pkey on table
> > > > > > t1 requires it
> > > > > > HINT:  You can drop constraint t1_pkey on table t1 instead.
> > > > > >
> > > > > > Here, the PK index is created as part for CREATE TABLE operation and
> > > > > > pk_index is not allowed to be dropped independently.
> > > > > >
> > > > > > > Although making an object dependent on global/shared objects is
> > > > > > > possible for certain types of shared objects [2], this is not our main
> > > > > > > objective.
> > > > > > >
> > > > > >
> > > > > > As per my understanding from the above example, we need something like
> > > > > > that only for shared object subscription and (internally created)
> > > > > > table.
> > > > >
> > > > > Yeah that seems to be exactly what we want, so I tried doing that by
> > > > > recording DEPENDENCY_INTERNAL dependency of CLT on subscription[1] and
> > > > > it is behaving as we want[2].  And while dropping the subscription or
> > > > > altering CLT we can delete internal dependency so that CLT get dropped
> > > > > automatically[3]
> > > > >
> > > > > I will send an updated patch after testing a few more scenarios and
> > > > > fixing other pending issues.
> > > > >
> > > > > [1]
> > > > > +       ObjectAddressSet(myself, RelationRelationId, relid);
> > > > > +       ObjectAddressSet(subaddr, SubscriptionRelationId, subid);
> > > > > +       recordDependencyOn(&myself, &subaddr, DEPENDENCY_INTERNAL);
> > > > >
> > > > >
> > > > > [2]
> > > > > postgres[670778]=# DROP TABLE myschema.conflict_log_history2;
> > > > > ERROR:  2BP01: cannot drop table myschema.conflict_log_history2
> > > > > because subscription sub requires it
> > > > > HINT:  You can drop subscription sub instead.
> > > > > LOCATION:  findDependentObjects, dependency.c:788
> > > > > postgres[670778]=#
> > > > >
> > > > > [3]
> > > > > ObjectAddressSet(object, SubscriptionRelationId, subid);
> > > > > performDeletion(&object, DROP_CASCADE
> > > > >                            PERFORM_DELETION_INTERNAL |
> > > > >                            PERFORM_DELETION_SKIP_ORIGINAL);
> > > > >
> > > > >
> > > >
> > > > Here is the patch which implements the dependency and fixes other
> > > > comments from Shveta.
> > >
> > > Thanks for the changes, the new implementation based on dependency
> > > creates a cycle while dumping:
> > > ./pg_dump -d postgres -f dump1.txt -p 5433
> > > pg_dump: warning: could not resolve dependency loop among these items:
> > > pg_dump: detail: TABLE conflict  (ID 225 OID 16397)
> > > pg_dump: detail: SUBSCRIPTION (ID 3484 OID 16396)
> > > pg_dump: detail: POST-DATA BOUNDARY  (ID 3491)
> > > pg_dump: detail: TABLE DATA t1  (ID 3485 OID 16384)
> > > pg_dump: detail: PRE-DATA BOUNDARY  (ID 3490)
> > >
> > > This can be seen with a simple subscription with conflict_log_table.
> > > This was working fine with the v11 version patch.
> >
> > The attached v13 patch includes the fix for this issue. In addition,
> > it now raises an error when attempting to configure a conflict log
> > table that belongs to a temporary schema or is not a permanent
> > (persistent) relation.
>
> I have updated the patch and here are changes done
> 1. Splitted into 2 patches, 0001- for catalog related changes
> 0002-inserting conflict into the conflict table, Vignesh need to
> rebase the dump and upgrade related patch on this latest changes
> 2. Subscription option changed to conflict_log_destination=(log/table/all/'')
> 3. For internal processing we will use ConflictLogDest enum whereas
> for taking input or storing into catalog we will use string [1].
> 4. As suggested by Sawada San, if conflict_log_destination is 'table'
> we log the information about conflict but don't log the tuple
> details[3]
>
> Pending:
> 1. tap test for conflict insertion
> 2. Still need to work on caching related changes discussed at [2], so
> currently we don't allow conflict log tables to be added to
> publication at all and might change this behavior as discussed at [2]
> and for that we will need to implement the caching.
> 3. Need to add conflict insertion test and doc changes.
> 4. Still need to check on the latest comments from Peter Smith.
>
>
> [1]
> typedef enum ConflictLogDest
> {
> CONFLICT_LOG_DEST_INVALID = 0,
> CONFLICT_LOG_DEST_LOG, /* "log" (default) */
> CONFLICT_LOG_DEST_TABLE, /* "table" */
> CONFLICT_LOG_DEST_ALL /* "all" */
> } ConflictLogDest;

Consider the following scenario. Initially, the subscription was
configured with conflict_log_destination set to a table. As conflicts
occurred, entries were generated and recorded in that table, for
example:
postgres=# SELECT * FROM conflict_log_table_16399;
 relid | schemaname | relname | conflict_type | remote_xid |
remote_commit_lsn |         remote_commit_ts         | remote_origin |
replica_identity | remote_tuple |
                local_conflicts

-------+------------+---------+---------------+------------+-------------------+----------------------------------+---------------+------------------+--------------+-------------------------
-------------------------------------------------------------------------
 16384 | public     | t1      | insert_exists |        765 |
0/0178A718        | 2025-12-22 12:06:57.417789+05:30 | pg_16399      |
                 | {"c1":1}     | {"{\"xid\":\"781\",\"com
mit_ts\":null,\"origin\":null,\"key\":{\"c1\":1},\"tuple\":{\"c1\":1}}"}
 16384 | public     | t1      | insert_exists |        765 |
0/0178A718        | 2025-12-22 12:06:57.417789+05:30 | pg_16399      |
                 | {"c1":1}     | {"{\"xid\":\"781\",\"com
mit_ts\":null,\"origin\":null,\"key\":{\"c1\":1},\"tuple\":{\"c1\":1}}"}
(2 rows)

Subsequently, the conflict log destination was changed from table to log:
ALTER SUBSCRIPTION sub1 SET (conflict_log_destination = 'log');

As a result, the conflict log table is dropped, and there is no longer
any way to access the previously recorded conflict entries. This
effectively causes the loss of historical conflict data.

It is unclear whether this behavior is desirable or expected. Should
we consider a way to preserve the historical conflict data in this
case?

Regards,
Vignesh



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Sat, Dec 20, 2025 at 4:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I have updated the patch and here are changes done

Thank You for the patch.  Few comments on 001 alone:


1)
postgres=# create subscription sub1 connection ...' publication pub1
WITH(conflict_log_destination = 'table');
ERROR:  could not generate conflict log table "conflict_log_table_16395"
DETAIL:  Conflict log tables cannot be created in a temporary namespace.
HINT:  Ensure your 'search_path' is set to permanent schema.

Based on such existing errors:
errmsg("cannot create relations in temporary schemas of other sessions")));
errmsg("cannot create temporary relation in non-temporary schema")));
errmsg("cannot create relations in temporary schemas of other sessions")));

Shall we tweak:
--temporary namespace --> temporary schema
--permanent --> non-temporary

2)
postgres=# drop schema shveta cascade;
NOTICE:  drop cascades to subscription sub1
ERROR:  global objects cannot be deleted by doDeletion

Is this expected? Is the user supposed to see this error?

3)
ConflictLogDestLabels enum starts from 0/INVALID while mapping
ConflictLogDestLabels has values starting from index 1. The index 0
has no value. Thus IMO, wherever we access ConflictLogDestLabels, we
should make a sanity check that index accessed is not
CONFLICT_LOG_DEST_INVALID i.e. opts.logdest !=
CONFLICT_LOG_DEST_INVALID

4)
I find 'Labels' in ConflictLogDestLabels slightly odd. There could be
other names for this variables such as ConflictLogDestValues,
ConflictLogDestStrings or ConflictLogDestNames.

See similar: ConflictTypeNames, SlotInvalidationCauses

5)
+ /*
+ * Strategy for logging replication conflicts:
+ * log - server log only,
+ * table - internal table only,
+ * all - both log and table.
+ */
+ text sublogdestination;

sublogdestination can be confused with regular log_destination. Shall
we rename to subconflictlogdest.

6)
Should the \dRs+ command display the 'Conflict Log Table:' at the end?
This would be similar to how \dRp+ shows 'Tables:', even though the
relation IDs can already be obtained from pg_publication_rel. I think
this would be a useful improvement.

7)
One observation, not sure if it needs any fix, please review and share thoughts.

--CLT created in default public schema present in serach_path
create subscription sub1 connection '..' publication pub1
WITH(conflict_log_destination = 'table');

--Change search path
create schema sch1;
SET search_path=sch1, "$user";

After this, if I create a new sub with destination as 'table', CLT is
generated in sch1. But if I do below:
alter subscription sub1 set (conflict_log_destination='table');

It does not move the table to sch1. This is because
conflict_log_destination is not changed; and as per current
implementation, alter-sub becomes no-op. But search_path is changed.
So what should be the behaviour here?

--let the table be in the old schema, which is currently not in
search_path (existing behaviour)?
--drop the table in the old schema and create a new one present in
search_path?

I could not find a similar case in postgres to compare the behaviour.

If we do
alter subscription sub1 set (conflict_log_destination='log');
alter subscription sub1 set (conflict_log_destination='table');

Then it moves the table to a new schema as internally setting
destination to 'log' drops the table.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Sat, 20 Dec 2025 at 16:51, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I have updated the patch and here are changes done
> 1. Splitted into 2 patches, 0001- for catalog related changes
> 0002-inserting conflict into the conflict table, Vignesh need to
> rebase the dump and upgrade related patch on this latest changes
> 2. Subscription option changed to conflict_log_destination=(log/table/all/'')
> 3. For internal processing we will use ConflictLogDest enum whereas
> for taking input or storing into catalog we will use string [1].
> 4. As suggested by Sawada San, if conflict_log_destination is 'table'
> we log the information about conflict but don't log the tuple
> details[3]

Few comments:
1) when a conflict_log_destination is specified as log:
create subscription sub1 connection 'dbname=postgres host=localhost
port=5432' publication pub1 with ( conflict_log_destination='log');
postgres=# select subname, subconflictlogrelid,sublogdestination from
pg_subscription where subname = 'sub4';
 subname | subconflictlogrelid | sublogdestination
---------+---------------------+-------------------
 sub4    |                   0 | log
(1 row)

Currently it displays as 0, instead we can show as NULL in this case

2) can we include displaying of conflict log table also  in describe
subscriptions:
+               /* Conflict log destination is supported in v19 and higher */
+               if (pset.sversion >= 190000)
+               {
+                       appendPQExpBuffer(&buf,
+                                                         ",
sublogdestination AS \"%s\"\n",
+
gettext_noop("Conflict log destination"));
+               }

3) Can we include pg_ in the conflict table to indicate it is an
internally created table:
+/*
+ * Format the standardized internal conflict log table name for a subscription
+ *
+ * Use the OID to prevent collisions during rename operations.
+ */
+void
+GetConflictLogTableName(char *dest, Oid subid)
+{
+       snprintf(dest, NAMEDATALEN, "conflict_log_table_%u", subid);
+}

4) Can the table be deleted now with the dependency associated between
the table and the subscription?
+       conflictlogrel = table_open(conflictlogrelid, RowExclusiveLock);
+
+       /* Conflict log table is dropped or not accessible. */
+       if (conflictlogrel == NULL)
+               ereport(WARNING,
+                               (errcode(ERRCODE_UNDEFINED_TABLE),
+                                errmsg("conflict log table with OID
%u does not exist",
+                                               conflictlogrelid)));
+
+       return conflictlogrel;

5) Should this code be changed to just prepare the conflict log tuple
here, validation and insertion can happen at start_apply if elevel >=
ERROR to avoid ValidateConflictLogTable here as well as at start_apply
function:
+               if (ValidateConflictLogTable(conflictlogrel))
+               {
+                       /*
+                        * Prepare the conflict log tuple. If the
error level is below
+                        * ERROR, insert it immediately. Otherwise,
defer the insertion to
+                        * a new transaction after the current one
aborts, ensuring the
+                        * insertion of the log tuple is not rolled back.
+                        */
+                       prepare_conflict_log_tuple(estate,
+
    relinfo->ri_RelationDesc,
+
    conflictlogrel,
+                                                                          type,
+
    searchslot,
+
    conflicttuples,
+
    remoteslot);
+                       if (elevel < ERROR)
+                               InsertConflictLogTuple(conflictlogrel);
+               }
+               else
+                       ereport(WARNING,
+
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                       errmsg("conflict log table
\"%s.%s\" structure changed, skipping insertion",
+
get_namespace_name(RelationGetNamespace(conflictlogrel)),
+
RelationGetRelationName(conflictlogrel)));

to:
prepare_conflict_log_tuple(estate,
   relinfo->ri_RelationDesc,
   conflictlogrel,
   type,
   searchslot,
   conflicttuples,
   remoteslot);
if (elevel < ERROR)
{
if (ValidateConflictLogTable(conflictlogrel))
InsertConflictLogTuple(conflictlogrel);
else
ereport(WARNING,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("conflict log table \"%s.%s\" structure changed, skipping insertion",
get_namespace_name(RelationGetNamespace(conflictlogrel)),
RelationGetRelationName(conflictlogrel)));
}

Regards,
Vignesh



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sat, Dec 20, 2025 at 4:50 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Dec 20, 2025 at 3:17 PM vignesh C <vignesh21@gmail.com> wrote:

> I have updated the patch and here are changes done
> 1. Splitted into 2 patches, 0001- for catalog related changes
> 0002-inserting conflict into the conflict table, Vignesh need to
> rebase the dump and upgrade related patch on this latest changes
> 2. Subscription option changed to conflict_log_destination=(log/table/all/'')
> 3. For internal processing we will use ConflictLogDest enum whereas
> for taking input or storing into catalog we will use string [1].
> 4. As suggested by Sawada San, if conflict_log_destination is 'table'
> we log the information about conflict but don't log the tuple
> details[3]
>
> Pending:
> 1. tap test for conflict insertion

Done in V15
> 2. Still need to work on caching related changes discussed at [2], so
> currently we don't allow conflict log tables to be added to
> publication at all and might change this behavior as discussed at [2]
> and for that we will need to implement the caching.

Pending

> 3. Need to add conflict insertion test and doc changes.

Done

> 4. Still need to check on the latest comments from Peter Smith.

Done

While planning to send the patch, I have noticed some latest comments
from Shveta and Vignesh, so I will analyze those in the next version.

V15-0004 is Vignesh's patch which is attached as it is and I am going
to review that soon.


--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Mon, Dec 22, 2025 at 3:55 PM vignesh C <vignesh21@gmail.com> wrote:
>
>
> Few comments:
> 1) when a conflict_log_destination is specified as log:
> create subscription sub1 connection 'dbname=postgres host=localhost
> port=5432' publication pub1 with ( conflict_log_destination='log');
> postgres=# select subname, subconflictlogrelid,sublogdestination from
> pg_subscription where subname = 'sub4';
>  subname | subconflictlogrelid | sublogdestination
> ---------+---------------------+-------------------
>  sub4    |                   0 | log
> (1 row)
>
> Currently it displays as 0, instead we can show as NULL in this case

I also thought about it while reviewing, but I feel 0 makes more sense
as it is 'relid'. This is how it is shown currently in other tables.
See 'reltoastrelid':

postgres=# select relname, reltoastrelid from  pg_class where relname='tab1';
 relname | reltoastrelid
---------+---------------
 tab1    |             0
(1 row)


>
> 3) Can we include pg_ in the conflict table to indicate it is an
> internally created table:
> +/*
> + * Format the standardized internal conflict log table name for a subscription
> + *
> + * Use the OID to prevent collisions during rename operations.
> + */
> +void
> +GetConflictLogTableName(char *dest, Oid subid)
> +{
> +       snprintf(dest, NAMEDATALEN, "conflict_log_table_%u", subid);
> +}
>

There is already a discussion about it in [1]

[1]: https://www.postgresql.org/message-id/CAA4eK1KE%3DtNHcN3Qp0FZVwDnt4rF2zwHy8NgAdG3oPqixdzOsA%40mail.gmail.com

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Dec 22, 2025 at 3:09 PM shveta malik <shveta.malik@gmail.com> wrote:

I think this needs more thought, others can be fixed.

> 2)
> postgres=# drop schema shveta cascade;
> NOTICE:  drop cascades to subscription sub1
> ERROR:  global objects cannot be deleted by doDeletion
>
> Is this expected? Is the user supposed to see this error?
>
See below code, so this says if the object being dropped is the
outermost object (i.e. if we are dropping the table directly) then it
will disallow dropping the object on which it has INTERNAL DEPENDENCY,
OTOH if the object is being dropped via recursive drop (i.e. the table
is being dropped while dropping the schema) then object on which it
has INTERNAL dependency will also be added to the deletion list and
later will be dropped via doDeletion and later we are getting error as
subscription is a global object.  I thought maybe we can handle an
additional case that the INTERNAL DEPENDENCY, is on subscription the
disallow dropping it irrespective of whether it is being called
directly or via recursive drop but then it will give an issue even
when we are trying to drop table during subscription drop, we can make
handle this case as well via 'flags' passed in findDependentObjects()
but need more investigation.

Seeing this complexity makes me think more on is it really worth it to
maintain this dependency?  Because during subscription drop we anyway
have to call performDeletion externally because this dependency is
local so we are just disallowing the conflict table drop, however the
ALTER table is allowed so what we are really protecting by protecting
the table drop, I think it can be just documented that if user try to
drop the table then conflict will not be inserted anymore?

findDependentObjects()
{
...
     switch (foundDep->deptype)
     {
         ....
         case DEPENDENCY_INTERNAL:
            * 1. At the outermost recursion level, we must disallow the
            * DROP. However, if the owning object is listed in
            * pendingObjects, just release the caller's lock and return;
            * we'll eventually complete the DROP when we reach that entry
            * in the pending list.
     }
}

[1]
postgres[1333899]=# select * from pg_depend where objid > 16410;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
    1259 | 16420 |        0 |       2615 |    16410 |           0 | n
    1259 | 16420 |        0 |       6100 |    16419 |           0 | i
(4 rows)

16420 -> conflict_log_table_16419
16419 -> subscription
16410 -> schema s1



--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Mon, Dec 22, 2025 at 9:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Dec 22, 2025 at 3:09 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> I think this needs more thought, others can be fixed.
>
> > 2)
> > postgres=# drop schema shveta cascade;
> > NOTICE:  drop cascades to subscription sub1
> > ERROR:  global objects cannot be deleted by doDeletion
> >
> > Is this expected? Is the user supposed to see this error?
> >
> See below code, so this says if the object being dropped is the
> outermost object (i.e. if we are dropping the table directly) then it
> will disallow dropping the object on which it has INTERNAL DEPENDENCY,
> OTOH if the object is being dropped via recursive drop (i.e. the table
> is being dropped while dropping the schema) then object on which it
> has INTERNAL dependency will also be added to the deletion list and
> later will be dropped via doDeletion and later we are getting error as
> subscription is a global object.  I thought maybe we can handle an
> additional case that the INTERNAL DEPENDENCY, is on subscription the
> disallow dropping it irrespective of whether it is being called
> directly or via recursive drop but then it will give an issue even
> when we are trying to drop table during subscription drop, we can make
> handle this case as well via 'flags' passed in findDependentObjects()
> but need more investigation.
>
> Seeing this complexity makes me think more on is it really worth it to
> maintain this dependency?  Because during subscription drop we anyway
> have to call performDeletion externally because this dependency is
> local so we are just disallowing the conflict table drop, however the
> ALTER table is allowed so what we are really protecting by protecting
> the table drop, I think it can be just documented that if user try to
> drop the table then conflict will not be inserted anymore?
>
> findDependentObjects()
> {
> ...
>      switch (foundDep->deptype)
>      {
>          ....
>          case DEPENDENCY_INTERNAL:
>             * 1. At the outermost recursion level, we must disallow the
>             * DROP. However, if the owning object is listed in
>             * pendingObjects, just release the caller's lock and return;
>             * we'll eventually complete the DROP when we reach that entry
>             * in the pending list.
>      }
> }
>
> [1]
> postgres[1333899]=# select * from pg_depend where objid > 16410;
>  classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
> ---------+-------+----------+------------+----------+-------------+---------
>     1259 | 16420 |        0 |       2615 |    16410 |           0 | n
>     1259 | 16420 |        0 |       6100 |    16419 |           0 | i
> (4 rows)
>
> 16420 -> conflict_log_table_16419
> 16419 -> subscription
> 16410 -> schema s1
>

One approach could be to use something similar to
PERFORM_DELETION_SKIP_EXTENSIONS in our case, but only for recursive
drops. The effect would be that 'DROP SCHEMA ... CASCADE' would
proceed without error, i.e., it would drop the tables as well without
including the subscription in the dependency list. But if we try to
drop a table directly (e.g., DROP TABLE CLT), it will still result in:
ERROR: cannot drop table because subscription sub1 requires it

The behavior will resemble a dependency somewhere between type 'n' and
type 'i'. That said, I’m not sure if this is worth the effort, even
though it prevents direct drop of table, it still does not prevent
table from being dropped as part of a schema drop.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 23, 2025 at 10:55 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> One approach could be to use something similar to
> PERFORM_DELETION_SKIP_EXTENSIONS in our case, but only for recursive
> drops. The effect would be that 'DROP SCHEMA ... CASCADE' would
> proceed without error, i.e., it would drop the tables as well without
> including the subscription in the dependency list. But if we try to
> drop a table directly (e.g., DROP TABLE CLT), it will still result in:
> ERROR: cannot drop table because subscription sub1 requires it
>
> The behavior will resemble a dependency somewhere between type 'n' and
> type 'i'. That said, I’m not sure if this is worth the effort, even
> though it prevents direct drop of table, it still does not prevent
> table from being dropped as part of a schema drop.

Yeah but that would be inconsistent behavior.  Anyway here is what I
got with what I was proposing yesterday.[1], so basically drop schema
and drop table are giving the same behavior as expected and drop
subscription is internally dropping the table as we would want.
Although this need more thought to see what else it might break.

postgres[1553010]=# CREATE SCHEMA s1;
postgres[1553010]=# SET search_path TO s1;
postgres[1553010]=# CREATE SUBSCRIPTION sub1 CONNECTION
'dbname=postgres port=5432' PUBLICATION pub WITH
(conflict_log_destination = table);
postgres[1553010]=# \d
                    List of relations
 Schema |           Name           | Type  |    Owner
--------+--------------------------+-------+-------------
 s1     | conflict_log_table_16428 | table | dilipkumarb
(1 row)

postgres[1553010]=# DROP SCHEMA s1;
ERROR:  2BP01: cannot drop table conflict_log_table_16428 because
subscription sub1 requires it
HINT:  You can drop subscription sub1 instead.
LOCATION:  findDependentObjects, dependency.c:843

postgres[1553010]=# DROP TABLE conflict_log_table_16428 ;
ERROR:  2BP01: cannot drop table conflict_log_table_16428 because
subscription sub1 requires it
HINT:  You can drop subscription sub1 instead.
LOCATION:  findDependentObjects, dependency.c:843

postgres[1553010]=# DROP SUBSCRIPTION sub1;
NOTICE:  00000: dropped replication slot
"pg_16428_sync_16385_7586930395971240479" on publisher
LOCATION:  ReplicationSlotDropAtPubNode, subscriptioncmds.c:2469
NOTICE:  00000: dropped replication slot "sub1" on publisher
LOCATION:  ReplicationSlotDropAtPubNode, subscriptioncmds.c:2469
DROP SUBSCRIPTION

[1]
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 7489bbd5fb3..14184d076d3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -662,6 +662,11 @@ findDependentObjects(const ObjectAddress *object,
                                 * However, no inconsistency can
result: since we're at outer
                                 * level, there is no object depending
on this one.
                                 */
+                               if
(IsSharedRelation(otherObject.classId) && !(flags &
PERFORM_DELETION_INTERNAL))
+                               {
+                                       owningObject = otherObject;
+                                       break;
+                               }
                                if (stack == NULL)
                                {
                                        if (pendingObjects &&



--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Sat, 20 Dec 2025 at 16:51, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Dec 20, 2025 at 3:17 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, 16 Dec 2025 at 09:54, vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Sun, 14 Dec 2025 at 21:17, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Sun, Dec 14, 2025 at 3:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Dec 12, 2025 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Dec 11, 2025 at 7:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > I was considering the interdependence between the subscription and the
> > > > > > > conflict log table (CLT). IMHO, it would be logical to establish the
> > > > > > > subscription as dependent on the CLT. This way, if someone attempts to
> > > > > > > drop the CLT, the system would recognize the dependency of the
> > > > > > > subscription and prevent the drop unless the subscription is removed
> > > > > > > first or the CASCADE option is used.
> > > > > > >
> > > > > > > However, while investigating this, I encountered an error [1] stating
> > > > > > > that global objects are not supported in this context. This indicates
> > > > > > > that global objects cannot be made dependent on local objects.
> > > > > > >
> > > > > >
> > > > > > What we need here is an equivalent of DEPENDENCY_INTERNAL for database
> > > > > > objects. For example, consider following case:
> > > > > > postgres=# create table t1(c1 int primary key);
> > > > > > CREATE TABLE
> > > > > > postgres=# \d+ t1
> > > > > >                                            Table "public.t1"
> > > > > >  Column |  Type   | Collation | Nullable | Default | Storage |
> > > > > > Compression | Stats target | Description
> > > > > > --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
> > > > > >  c1     | integer |           | not null |         | plain   |
> > > > > >     |              |
> > > > > > Indexes:
> > > > > >     "t1_pkey" PRIMARY KEY, btree (c1)
> > > > > > Publications:
> > > > > >     "pub1"
> > > > > > Not-null constraints:
> > > > > >     "t1_c1_not_null" NOT NULL "c1"
> > > > > > Access method: heap
> > > > > > postgres=# drop index t1_pkey;
> > > > > > ERROR:  cannot drop index t1_pkey because constraint t1_pkey on table
> > > > > > t1 requires it
> > > > > > HINT:  You can drop constraint t1_pkey on table t1 instead.
> > > > > >
> > > > > > Here, the PK index is created as part for CREATE TABLE operation and
> > > > > > pk_index is not allowed to be dropped independently.
> > > > > >
> > > > > > > Although making an object dependent on global/shared objects is
> > > > > > > possible for certain types of shared objects [2], this is not our main
> > > > > > > objective.
> > > > > > >
> > > > > >
> > > > > > As per my understanding from the above example, we need something like
> > > > > > that only for shared object subscription and (internally created)
> > > > > > table.
> > > > >
> > > > > Yeah that seems to be exactly what we want, so I tried doing that by
> > > > > recording DEPENDENCY_INTERNAL dependency of CLT on subscription[1] and
> > > > > it is behaving as we want[2].  And while dropping the subscription or
> > > > > altering CLT we can delete internal dependency so that CLT get dropped
> > > > > automatically[3]
> > > > >
> > > > > I will send an updated patch after testing a few more scenarios and
> > > > > fixing other pending issues.
> > > > >
> > > > > [1]
> > > > > +       ObjectAddressSet(myself, RelationRelationId, relid);
> > > > > +       ObjectAddressSet(subaddr, SubscriptionRelationId, subid);
> > > > > +       recordDependencyOn(&myself, &subaddr, DEPENDENCY_INTERNAL);
> > > > >
> > > > >
> > > > > [2]
> > > > > postgres[670778]=# DROP TABLE myschema.conflict_log_history2;
> > > > > ERROR:  2BP01: cannot drop table myschema.conflict_log_history2
> > > > > because subscription sub requires it
> > > > > HINT:  You can drop subscription sub instead.
> > > > > LOCATION:  findDependentObjects, dependency.c:788
> > > > > postgres[670778]=#
> > > > >
> > > > > [3]
> > > > > ObjectAddressSet(object, SubscriptionRelationId, subid);
> > > > > performDeletion(&object, DROP_CASCADE
> > > > >                            PERFORM_DELETION_INTERNAL |
> > > > >                            PERFORM_DELETION_SKIP_ORIGINAL);
> > > > >
> > > > >
> > > >
> > > > Here is the patch which implements the dependency and fixes other
> > > > comments from Shveta.
> > >
> > > Thanks for the changes, the new implementation based on dependency
> > > creates a cycle while dumping:
> > > ./pg_dump -d postgres -f dump1.txt -p 5433
> > > pg_dump: warning: could not resolve dependency loop among these items:
> > > pg_dump: detail: TABLE conflict  (ID 225 OID 16397)
> > > pg_dump: detail: SUBSCRIPTION (ID 3484 OID 16396)
> > > pg_dump: detail: POST-DATA BOUNDARY  (ID 3491)
> > > pg_dump: detail: TABLE DATA t1  (ID 3485 OID 16384)
> > > pg_dump: detail: PRE-DATA BOUNDARY  (ID 3490)
> > >
> > > This can be seen with a simple subscription with conflict_log_table.
> > > This was working fine with the v11 version patch.
> >
> > The attached v13 patch includes the fix for this issue. In addition,
> > it now raises an error when attempting to configure a conflict log
> > table that belongs to a temporary schema or is not a permanent
> > (persistent) relation.
>
> I have updated the patch and here are changes done
> 1. Splitted into 2 patches, 0001- for catalog related changes
> 0002-inserting conflict into the conflict table, Vignesh need to
> rebase the dump and upgrade related patch on this latest changes

Here is a rebased version of the dump/upgrade patch based on the v15
version posted at [1].
After replacing conflict_log_table with conflict_log_destination, we
don't specify a fully qualified table name directly. Instead, the
conflict log behavior is controlled via conflict_log_destination
(table, log, or all). Since pg_dump resets search_path, it must
explicitly set the schema in which the conflict log table should be
created or reused. To handle this, pg_dump temporarily sets and then
restores search_path around the ALTER SUBSCRIPTION ... SET
(conflict_log_destination ...) command, ensuring the conflict log
table is resolved in the intended schema.
Additionally, in non-upgrade dump/restore scenarios, the conflict log
table is not dumped as in non-upgrade mode it does not make sense to
link with the older conflict log table.

v15-0001 to v15-0004 is the same as the patches posted at [1].
dump/upgrade changes are present in v15-0005 patch.

[1] - https://www.postgresql.org/message-id/CAFiTN-uKn7mix8BkOOmJQ2cF5yKdfQUg2mX_w9vEC4787VZ_xQ%40mail.gmail.com

Regards.
Vignesh

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi Dilip.

Here are some review comments after a first pass of patch v15-0001.

======
Commit Message

1.
If user choose to log into the table the table will automatically created while
creating the subscription with internal name i.e.
conflict_log_table_$subid$.  The
table will be created in the current search path and table would be
automatically
dropped while dropping the subscription.

English:

/If user choose/
/the table the table/
/and table would/

======
src/backend/commands/subscriptioncmds.c

2.
+#define SUBOPT_CONFLICT_LOG_DESTINATION 0x00040000

For the values, you are using DEST instead of DESTINATION. You can do
the same here to keep the macro name a bit shorter.

~~~

parse_subscription_options:

3.
+ dest = GetLogDestination(val);
+
+ if (dest == CONFLICT_LOG_DEST_INVALID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized conflict_log_destination value: \"%s\"", val),
+ errhint("Valid values are \"log\", \"table\", and \"all\".")));

I don't think CONFLICT_LOG_DEST_INVALID should even exist as an enum
value. Instead, the validation and the ereport(ERROR) should all be
done within GetLogDestination function. So, it should only return
valid values, else give an error.

~~~

CreateSubscription:

4.
+ /* Always set the destination, default will be log. */
+ values[Anum_pg_subscription_sublogdestination - 1] =
+ CStringGetTextDatum(ConflictLogDestLabels[opts.logdest]);
+
+ /*
+ * If the conflict log destination includes 'table', generate an internal
+ * name using the subscription OID and determine the target namespace based
+ * on the current search path. Store the namespace OID and the conflict log
+ * format in the pg_subscription catalog tuple., then  physically create
+ * the table.
+ */

4a.
When referring to these parameter values, you should always
consistently quote them. Currently, there is a mix of lots of formats.
(e.g. log (unquoted), 'table' (single-quoted), "log" (double-quoted)).

Pick one style, and make them all the same. Check for the same everywhere.

~

4b.
Typo "tuple.,"

~~~

5.
+ if (opts.logdest == CONFLICT_LOG_DEST_TABLE ||
+ opts.logdest == CONFLICT_LOG_DEST_ALL)

IIUC, you are effectively treating these parameter values like bits
that can be OR-ed together. And if in the future a "list" is
supported, then that's exactly what you will be doing. So, IMO, they
should be defined that way. See a review comment later in this post.

e.g. this condition would be written more like:
if ((opts.logdest & CONFLICT_LOG_DEST_TABLE) != 0)
or, using the macro
if (IsSet(opts.logdest, CONFLICT_LOG_DEST_TABLE))

~~~

AlterSubscription:

6.
+ if (opts.logdest != old_dest)
+ {
+ bool want_table =
+ (opts.logdest == CONFLICT_LOG_DEST_TABLE ||
+ opts.logdest == CONFLICT_LOG_DEST_ALL);
+ bool has_oldtable =
+ (old_dest == CONFLICT_LOG_DEST_TABLE ||
+ old_dest == CONFLICT_LOG_DEST_ALL);
+


This is more of the same kind of logic that convinces me the code
should be using bitmasks.

SUGGESTION
bool want_table = IsSet(opts.logdest, CONFLICT_LOG_DEST_TABLE);
bool has_oldtable = IsSet(olddest, CONFLICT_LOG_DEST_TABLE);

~~~

create_conflict_log_table:

7.
+/*
+ * Create conflict log table.
+ *
+ * The subscription owner becomes the owner of this table and has all
+ * privileges on it.
+ */
+static Oid
+create_conflict_log_table(Oid subid, char *subname, Oid namespaceId,
+   char *conflictrel)


I felt something like 'relname' is a better name for the char *
conflictrel param. It clearly is the name of the conflict relation
because of the name of the function.

~~~

8.
+ /* Add a comments for the conflict log table. */
+ snprintf(comment, sizeof(comment),
+ "Conflict log table for subscription \"%s\"", subname);
+ CreateComments(relid, RelationRelationId, 0, comment);
+

8a.
typo /Add a comments/Add a comment/

~

8b.
My (previous review) suggestion for adding a table comment/description
made more sense when the CLT was some arbitrary name chosen by the
user. But, now that the CLT is a name like "conflict_log_table_%u",
the idea for a comment seems redundant.

~~~

9.
+/*
+ * Format the standardized internal conflict log table name for a subscription
+ *
+ * Use the OID to prevent collisions during rename operations.
+ */
+void
+GetConflictLogTableName(char *dest, Oid subid)
+{
+ snprintf(dest, NAMEDATALEN, "conflict_log_table_%u", subid);
+}
+

9a.
To emphasise that this is an "internal" table, IMO there should be a
"pg_" prefix for this table name.

~

9b.
Since it is internal anyway, why not make the tablename descriptive to
clarify what that number means?
e.g. "pg_conflict_log_table_for_subid_%u"

BTW, since it is already a TABLE, then why is "table" even part of
this name? Why not just "pg_conflict_log_for_subid_%u"
~~~

10.
+/*
+ * GetLogDestination
+ *
+ * Convert string to enum by comparing against standardized labels.
+ */
+ConflictLogDest
+GetLogDestination(const char *dest)
+{
+ /* Empty string or NULL defaults to LOG. */
+ if (dest == NULL || dest[0] == '\0')
+ return CONFLICT_LOG_DEST_LOG;
+
+ for (int i = CONFLICT_LOG_DEST_LOG; i <= CONFLICT_LOG_DEST_ALL; i++)
+ {
+ if (pg_strcasecmp(dest, ConflictLogDestLabels[i]) == 0)
+ return (ConflictLogDest) i;
+ }
+
+ /* Unrecognized string. */
+ return CONFLICT_LOG_DEST_INVALID;
+}

Mentioned previously: I think there should be no such thing as
CONFLICT_LOG_DEST_INVALID. I also think this function should be
responsible for the ereport(ERROR).


======
src/include/catalog/pg_subscription.h

11.
+ /*
+ * Strategy for logging replication conflicts:
+ * log - server log only,
+ * table - internal table only,
+ * all - both log and table.
+ */
+ text sublogdestination;
+

SUGGEST 'subconflictlogdest'

(see next review comment #12 for why)

~~~

12.
+ Oid conflictrelid; /* conflict log table Oid */
  char    *conninfo; /* Connection string to the publisher */
  char    *slotname; /* Name of the replication slot */
  char    *synccommit; /* Synchronous commit setting for worker */
  List    *publications; /* List of publication names to subscribe to */
  char    *origin; /* Only publish data originating from the
  * specified origin */
+ char    *logdestination; /* Conflict log destination */
 } Subscription;

These don't seem very good member names:

Maybe 'conflictrelid' -> 'conflictlogrelid' (because it's rel of the
log; not the conflict)
Maybe 'logdestination' -> 'conflictlogdest' (because in future there
might be other kinds of subscription logs)

======
src/include/replication/conflict.h

13.
+typedef enum ConflictLogDest
+{
+ CONFLICT_LOG_DEST_INVALID = 0,
+ CONFLICT_LOG_DEST_LOG, /* "log" (default) */
+ CONFLICT_LOG_DEST_TABLE, /* "table" */
+ CONFLICT_LOG_DEST_ALL /* "all" */
+} ConflictLogDest;
+

I didn't like this enum much.

Suggest removing CONFLICT_LOG_DEST_INVALID.
And use bits for the other values.
And you can still have a default enum if you want.

SUGGESTION
typedef enum ConflictLogDest
{
  CONFLICT_LOG_DEST_LOG = 0x001,
  CONFLICT_LOG_DEST_TABLE = 0x010,
  CONFLICT_LOG_DEST_DEFAULT = CONFLICT_LOG_DEST_LOG,
  CONFLICT_LOG_DEST_ALL = CONFLICT_LOG_DEST_LOG | CONFLICT_LOG_DEST_TABLE,
} ConflictLogDest;

BTW, there are only a few values that the array won't exceed length
0x11, so I guess you can still keep your same designated initialiser
for the dest labels.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Mon, Dec 22, 2025 at 4:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Done in V15

Thanks for the patches. A few comments on v15-002 for the part I have
reviewed so far:

1)
Defined twice:

+#define MAX_LOCAL_CONFLICT_INFO_ATTRS 5

+#define MAX_LOCAL_CONFLICT_INFO_ATTRS \
+ (sizeof(LocalConflictSchema) / sizeof(LocalConflictSchema[0]))


2)
GetConflictLogTableInfo:
+ *log_dest = GetLogDestination(MySubscription->logdestination);
+ conflictlogrelid = MySubscription->conflictrelid;
+
+ /* If destination is 'log' only, no table to open. */
+ if (*log_dest == CONFLICT_LOG_DEST_LOG)
+ return NULL;

We can get conflictlogrelid after the if-check for DEST_LOG.

3)
In ReportApplyConflict(), we form err_detail by calling
errdetail_apply_conflict(). But when dest is TABLE, we don't use
err_detail. Shall we skip creating it for dest=TABLE case?

4)
ReportApplyConflict():
+ /*
+ * Get both the conflict log destination and the opened conflict log
+ * relation for insertion.
+ */
+ conflictlogrel = GetConflictLogTableInfo(&dest);
+

We can move it after errdetail_apply_conflict(), closer to where we
actually use it.

5)
start_apply:
+ /* Open conflict log table and insert the tuple. */
+ conflictlogrel = GetConflictLogTableInfo(&dest);
+ if (ValidateConflictLogTable(conflictlogrel))
+ InsertConflictLogTuple(conflictlogrel);

We can have Assert here too before we call Validate:
Assert(dest == CONFLICT_LOG_DEST_TABLE || dest == CONFLICT_LOG_DEST_ALL);

6)
start_apply:
+ if (ValidateConflictLogTable(conflictlogrel))
+ InsertConflictLogTuple(conflictlogrel);
+ MyLogicalRepWorker->conflict_log_tuple = NULL;

InsertConflictLogTuple() already sets conflict_log_tuple to NULL.
Above is not needed.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Tue, Dec 23, 2025 at 10:55 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Dec 22, 2025 at 9:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Dec 22, 2025 at 3:09 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > I think this needs more thought, others can be fixed.
> >
> > > 2)
> > > postgres=# drop schema shveta cascade;
> > > NOTICE:  drop cascades to subscription sub1
> > > ERROR:  global objects cannot be deleted by doDeletion
> > >
> > > Is this expected? Is the user supposed to see this error?
> > >
> > See below code, so this says if the object being dropped is the
> > outermost object (i.e. if we are dropping the table directly) then it
> > will disallow dropping the object on which it has INTERNAL DEPENDENCY,
> > OTOH if the object is being dropped via recursive drop (i.e. the table
> > is being dropped while dropping the schema) then object on which it
> > has INTERNAL dependency will also be added to the deletion list and
> > later will be dropped via doDeletion and later we are getting error as
> > subscription is a global object.  I thought maybe we can handle an
> > additional case that the INTERNAL DEPENDENCY, is on subscription the
> > disallow dropping it irrespective of whether it is being called
> > directly or via recursive drop but then it will give an issue even
> > when we are trying to drop table during subscription drop, we can make
> > handle this case as well via 'flags' passed in findDependentObjects()
> > but need more investigation.
> >
> > Seeing this complexity makes me think more on is it really worth it to
> > maintain this dependency?  Because during subscription drop we anyway
> > have to call performDeletion externally because this dependency is
> > local so we are just disallowing the conflict table drop, however the
> > ALTER table is allowed so what we are really protecting by protecting
> > the table drop, I think it can be just documented that if user try to
> > drop the table then conflict will not be inserted anymore?
> >
> > findDependentObjects()
> > {
> > ...
> >      switch (foundDep->deptype)
> >      {
> >          ....
> >          case DEPENDENCY_INTERNAL:
> >             * 1. At the outermost recursion level, we must disallow the
> >             * DROP. However, if the owning object is listed in
> >             * pendingObjects, just release the caller's lock and return;
> >             * we'll eventually complete the DROP when we reach that entry
> >             * in the pending list.
> >      }
> > }
> >
> > [1]
> > postgres[1333899]=# select * from pg_depend where objid > 16410;
> >  classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
> > ---------+-------+----------+------------+----------+-------------+---------
> >     1259 | 16420 |        0 |       2615 |    16410 |           0 | n
> >     1259 | 16420 |        0 |       6100 |    16419 |           0 | i
> > (4 rows)
> >
> > 16420 -> conflict_log_table_16419
> > 16419 -> subscription
> > 16410 -> schema s1
> >
>
> One approach could be to use something similar to
> PERFORM_DELETION_SKIP_EXTENSIONS in our case, but only for recursive
> drops. The effect would be that 'DROP SCHEMA ... CASCADE' would
> proceed without error, i.e., it would drop the tables as well without
> including the subscription in the dependency list. But if we try to
> drop a table directly (e.g., DROP TABLE CLT), it will still result in:
> ERROR: cannot drop table because subscription sub1 requires it
>

I think this way of allowing dropping the conflict table without
caring for the parent object (subscription) is not a good idea. How
about creating a dedicated schema, say pg_conflict for the purpose of
storing conflict tables? This will be similar to the pg_toast schema
for toast tables. So, similar to that each database will have a
pg_conflict schema. It prevents the "orphan" problem where a user
accidentally drops the logging schema but the Subscription is still
trying to write to it. pg_dump needs to ignore all system schemas
EXCEPT pg_conflict. This ensures the history is preserved during
migrations while still protecting the tables from accidental user
deletion. About permissions, I think we need to set the schema
permissions so that USAGE is public (so users can SELECT from their
logs) but CREATE is restricted to the superuser/subscription owner. We
may need to think some more about permissions.

I also tried to reason out if we can allow storing the conflict table
in pg_catalog but here are a few reasons why it won't be a good idea.
I think by default, pg_dump completely ignores the pg_catalog schema.
It assumes pg_catalog contains static system definitions (like
pg_class, pg_proc, etc.) that are re-generated by the initdb process,
not user data. If we place a conflict table in pg_catalog, it will not
be backed up. If a user runs pg_dump/all to migrate to a new server,
their subscription definition will survive, but their entire history
of conflict logs will vanish. Also from the permissions angle, If a
user wants to write a custom PL/pgSQL function to "retry" conflicts,
they might need to DELETE rows from the conflict table after fixing
them. Granting DELETE permissions on a table inside pg_catalog is
non-standard and often frowned upon by security auditors. It blurs the
line between "System Internals" (immutable) and "User Data" (mutable).

So, in short a separate pg_conflict schema appears to be a better solution.

Thoughts?

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 23, 2025 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 23, 2025 at 10:55 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Dec 22, 2025 at 9:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Dec 22, 2025 at 3:09 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > I think this needs more thought, others can be fixed.
> > >
> > > > 2)
> > > > postgres=# drop schema shveta cascade;
> > > > NOTICE:  drop cascades to subscription sub1
> > > > ERROR:  global objects cannot be deleted by doDeletion
> > > >
> > > > Is this expected? Is the user supposed to see this error?
> > > >
> > > See below code, so this says if the object being dropped is the
> > > outermost object (i.e. if we are dropping the table directly) then it
> > > will disallow dropping the object on which it has INTERNAL DEPENDENCY,
> > > OTOH if the object is being dropped via recursive drop (i.e. the table
> > > is being dropped while dropping the schema) then object on which it
> > > has INTERNAL dependency will also be added to the deletion list and
> > > later will be dropped via doDeletion and later we are getting error as
> > > subscription is a global object.  I thought maybe we can handle an
> > > additional case that the INTERNAL DEPENDENCY, is on subscription the
> > > disallow dropping it irrespective of whether it is being called
> > > directly or via recursive drop but then it will give an issue even
> > > when we are trying to drop table during subscription drop, we can make
> > > handle this case as well via 'flags' passed in findDependentObjects()
> > > but need more investigation.
> > >
> > > Seeing this complexity makes me think more on is it really worth it to
> > > maintain this dependency?  Because during subscription drop we anyway
> > > have to call performDeletion externally because this dependency is
> > > local so we are just disallowing the conflict table drop, however the
> > > ALTER table is allowed so what we are really protecting by protecting
> > > the table drop, I think it can be just documented that if user try to
> > > drop the table then conflict will not be inserted anymore?
> > >
> > > findDependentObjects()
> > > {
> > > ...
> > >      switch (foundDep->deptype)
> > >      {
> > >          ....
> > >          case DEPENDENCY_INTERNAL:
> > >             * 1. At the outermost recursion level, we must disallow the
> > >             * DROP. However, if the owning object is listed in
> > >             * pendingObjects, just release the caller's lock and return;
> > >             * we'll eventually complete the DROP when we reach that entry
> > >             * in the pending list.
> > >      }
> > > }
> > >
> > > [1]
> > > postgres[1333899]=# select * from pg_depend where objid > 16410;
> > >  classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
> > > ---------+-------+----------+------------+----------+-------------+---------
> > >     1259 | 16420 |        0 |       2615 |    16410 |           0 | n
> > >     1259 | 16420 |        0 |       6100 |    16419 |           0 | i
> > > (4 rows)
> > >
> > > 16420 -> conflict_log_table_16419
> > > 16419 -> subscription
> > > 16410 -> schema s1
> > >
> >
> > One approach could be to use something similar to
> > PERFORM_DELETION_SKIP_EXTENSIONS in our case, but only for recursive
> > drops. The effect would be that 'DROP SCHEMA ... CASCADE' would
> > proceed without error, i.e., it would drop the tables as well without
> > including the subscription in the dependency list. But if we try to
> > drop a table directly (e.g., DROP TABLE CLT), it will still result in:
> > ERROR: cannot drop table because subscription sub1 requires it
> >
>
> I think this way of allowing dropping the conflict table without
> caring for the parent object (subscription) is not a good idea. How
> about creating a dedicated schema, say pg_conflict for the purpose of
> storing conflict tables? This will be similar to the pg_toast schema
> for toast tables. So, similar to that each database will have a
> pg_conflict schema. It prevents the "orphan" problem where a user
> accidentally drops the logging schema but the Subscription is still
> trying to write to it. pg_dump needs to ignore all system schemas
> EXCEPT pg_conflict. This ensures the history is preserved during
> migrations while still protecting the tables from accidental user
> deletion. About permissions, I think we need to set the schema
> permissions so that USAGE is public (so users can SELECT from their
> logs) but CREATE is restricted to the superuser/subscription owner. We
> may need to think some more about permissions.
>
> I also tried to reason out if we can allow storing the conflict table
> in pg_catalog but here are a few reasons why it won't be a good idea.
> I think by default, pg_dump completely ignores the pg_catalog schema.
> It assumes pg_catalog contains static system definitions (like
> pg_class, pg_proc, etc.) that are re-generated by the initdb process,
> not user data. If we place a conflict table in pg_catalog, it will not
> be backed up. If a user runs pg_dump/all to migrate to a new server,
> their subscription definition will survive, but their entire history
> of conflict logs will vanish. Also from the permissions angle, If a
> user wants to write a custom PL/pgSQL function to "retry" conflicts,
> they might need to DELETE rows from the conflict table after fixing
> them. Granting DELETE permissions on a table inside pg_catalog is
> non-standard and often frowned upon by security auditors. It blurs the
> line between "System Internals" (immutable) and "User Data" (mutable).
> So, in short a separate pg_conflict schema appears to be a better solution.

Yeah that makes sense.  Although I haven't thought about all cases
whether it can be a problem anywhere, but meanwhile I tried
prototyping with this and it behaves what we want.

postgres[1651968]=# select * from pg_conflict.conflict_log_table_16406 ;
 relid | schemaname | relname |     conflict_type     | remote_xid |
remote_commit_lsn |       remote_commit_ts        | remote_origin |
replica_identity |  remote_tuple
|
local_conflicts

-------+------------+---------+-----------------------+------------+-------------------+-------------------------------+---------------+------------------+----------------

+------------------------------------------------------------------------------------------------------------------------------------
 16385 | public     | test    | update_origin_differs |        761 |
0/01760BD8        | 2025-12-23 11:08:30.583816+00 | pg_16406      |
{"a":1}          | {"a":1,"b":20}
|
{"{\"xid\":\"772\",\"commit_ts\":\"2025-12-23T11:08:25.568561+00:00\",\"origin\":null,\"key\":null,\"tuple\":{\"a\":1,\"b\":10}}"}
(1 row)

-- Case1: Alter is not allowed
postgres[1651968]=# ALTER TABLE pg_conflict.conflict_log_table_16406
ADD COLUMN a int;
ERROR:  42501: permission denied: "conflict_log_table_16406" is a system catalog
LOCATION:  RangeVarCallbackForAlterRelation, tablecmds.c:19634

-- Case2: drop is not allowed
postgres[1651968]=# drop table pg_conflict.conflict_log_table_16406;
ERROR:  42501: permission denied: "conflict_log_table_16406" is a system catalog
LOCATION:  RangeVarCallbackForDropRelation, tablecmds.c:1803

--Case3: Drop subscription drops it internally
postgres[1651968]=# DROP SUBSCRIPTION sub ;
NOTICE:  00000: dropped replication slot "sub" on publisher
LOCATION:  ReplicationSlotDropAtPubNode, subscriptioncmds.c:2470
DROP SUBSCRIPTION
postgres[1651968]=# \d pg_conflict.conflict_log_table_16406
Did not find any relation named "pg_conflict.conflict_log_table_16406".

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
On Tue, Dec 23, 2025 at 5:49 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Dilip.
>
> Here are some review comments after a first pass of patch v15-0001.
>

And, some more review comments for patch v15-0001.

======
src/backend/catalog/pg_subscription.c

1.
+ /* Always set the destination, default will be log. */
+ values[Anum_pg_subscription_sublogdestination - 1] =
+ CStringGetTextDatum(ConflictLogDestLabels[opts.logdest]);
+

None of the other values[] assignments here have a comment talking
about defaults, etc, so I don't think this needs one either.

======
src/backend/commands/subscriptioncmds.c

CreateSubscription:

2.
+ {
+ char    conflict_table_name[NAMEDATALEN];
+ Oid     namespaceId, logrelid;

In similar code in AlterSubscription, this was just called 'relname'.
Better to be consistent where possible. I think 'relname' would be
fine here too.

~~~

3.
+ else
+ {
+ /* Destination is "log"; no table is needed. */
+ values[Anum_pg_subscription_subconflictlogrelid - 1] =
+ ObjectIdGetDatum(InvalidOid);
+ }

I think it's better to say this using coded Asserts instead of just
assertions in comments.

e.g.

/* There is no conflict log table */
Assert(opts.logdest == CONFLICT_LOG_DEST_LOG)
values[...] = ObjectIdGetDatum(InvalidOid);

~~~

4.
+ if (isTempNamespace(namespaceId))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not generate conflict log table \"%s\"",
+ conflictrel),
+ errdetail("Conflict log tables cannot be created in a temporary namespace."),
+ errhint("Ensure your 'search_path' is set to permanent schema.")));
+
+ /* Report an error if the specified conflict log table already exists. */
+ if (OidIsValid(get_relname_relid(conflictrel, namespaceId)))
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_TABLE),
+ errmsg("could not generate conflict log table \"%s.%s\"",
+ get_namespace_name(namespaceId), conflictrel),
+ errdetail("A table with the internally generated name already exists."),
+ errhint("Drop the existing table or change your 'search_path' to use
a different schema.")));

I'm not sure about these messages:

4a.
"could not generate conflict log table".
- Why say "generate"?
- We don't need to say "conflict log table" -- that's already in the detail

SUGGESTION (something like)
"could not create relation \"%s\""

~

4b.
For the 2nd error, I think errmsg should look like below, same as any
other duplicate table error.
"relation \"%s.%s\" already exists"

~

4c.
+ errdetail("A table with the internally generated name already exists."),

I don't think this errdetail added anything useful. It already exists
-- that's all you need to know. Why does it matter that the name was
generated automatically?

~~~

GetLogDestination:

5.
+ for (int i = CONFLICT_LOG_DEST_LOG; i <= CONFLICT_LOG_DEST_ALL; i++)
+ {
+ if (pg_strcasecmp(dest, ConflictLogDestLabels[i]) == 0)
+ return (ConflictLogDest) i;
+ }
+
+ /* Unrecognized string. */
+ return CONFLICT_LOG_DEST_INVALID;

This code is making rash assumptions about the enums values being the
same as ordinals.

IMO it should be written like:

if (strcmp(dest, "log") == 0)
return CONFLICT_LOG_DEST_LOG;

if (strcmp(dest, "table") == 0)
return CONFLICT_LOG_DEST_TABLE;

if (strcmp(dest, "all") == 0)
return CONFLICT_LOG_DEST_ALL;

/* Unrecognized dest. */
ereport(ERROR, ...);

~~~

IsConflictLogTable

6.
+bool
+IsConflictLogTable(Oid relid)
+{
+ Relation        rel;

If you enforce (as I've suggested elsewhere previously) a name
convention that the CLT must have "pg_" prefix, then perhaps you can
exit early from this function without having to scan all the OIDs,
just by checking first that the RelationGetRelationName(rel) must
start with "pg_".

======
src/test/regress/sql/subscription.sql

7.
+-- fail - unrecognized format value

/format/parameter/

~~

8.
Some of these tests are grouped together like

"ALTER: State transitions"
and
"Ensure drop table is not allowed, and DROP SUBSCRIPTION reaps the table"
etc.

These group boundaries should be identified more clearly with more
substantial comments.
e.g
#-- ==================================
#-- ALTER - state transition tests
#-- ==================================

~~~

9.
The "pg_relation_is_publishable" seems misplaced because it is buried
among the drop/reap tests. Maybe it should come before all that.

======
src/tools/pgindent/typedefs.list

10.
What about "typedef enum ConflictLogDest"

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Tue, Dec 23, 2025 at 5:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 23, 2025 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Dec 23, 2025 at 10:55 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Mon, Dec 22, 2025 at 9:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Mon, Dec 22, 2025 at 3:09 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > I think this needs more thought, others can be fixed.
> > > >
> > > > > 2)
> > > > > postgres=# drop schema shveta cascade;
> > > > > NOTICE:  drop cascades to subscription sub1
> > > > > ERROR:  global objects cannot be deleted by doDeletion
> > > > >
> > > > > Is this expected? Is the user supposed to see this error?
> > > > >
> > > > See below code, so this says if the object being dropped is the
> > > > outermost object (i.e. if we are dropping the table directly) then it
> > > > will disallow dropping the object on which it has INTERNAL DEPENDENCY,
> > > > OTOH if the object is being dropped via recursive drop (i.e. the table
> > > > is being dropped while dropping the schema) then object on which it
> > > > has INTERNAL dependency will also be added to the deletion list and
> > > > later will be dropped via doDeletion and later we are getting error as
> > > > subscription is a global object.  I thought maybe we can handle an
> > > > additional case that the INTERNAL DEPENDENCY, is on subscription the
> > > > disallow dropping it irrespective of whether it is being called
> > > > directly or via recursive drop but then it will give an issue even
> > > > when we are trying to drop table during subscription drop, we can make
> > > > handle this case as well via 'flags' passed in findDependentObjects()
> > > > but need more investigation.
> > > >
> > > > Seeing this complexity makes me think more on is it really worth it to
> > > > maintain this dependency?  Because during subscription drop we anyway
> > > > have to call performDeletion externally because this dependency is
> > > > local so we are just disallowing the conflict table drop, however the
> > > > ALTER table is allowed so what we are really protecting by protecting
> > > > the table drop, I think it can be just documented that if user try to
> > > > drop the table then conflict will not be inserted anymore?
> > > >
> > > > findDependentObjects()
> > > > {
> > > > ...
> > > >      switch (foundDep->deptype)
> > > >      {
> > > >          ....
> > > >          case DEPENDENCY_INTERNAL:
> > > >             * 1. At the outermost recursion level, we must disallow the
> > > >             * DROP. However, if the owning object is listed in
> > > >             * pendingObjects, just release the caller's lock and return;
> > > >             * we'll eventually complete the DROP when we reach that entry
> > > >             * in the pending list.
> > > >      }
> > > > }
> > > >
> > > > [1]
> > > > postgres[1333899]=# select * from pg_depend where objid > 16410;
> > > >  classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
> > > > ---------+-------+----------+------------+----------+-------------+---------
> > > >     1259 | 16420 |        0 |       2615 |    16410 |           0 | n
> > > >     1259 | 16420 |        0 |       6100 |    16419 |           0 | i
> > > > (4 rows)
> > > >
> > > > 16420 -> conflict_log_table_16419
> > > > 16419 -> subscription
> > > > 16410 -> schema s1
> > > >
> > >
> > > One approach could be to use something similar to
> > > PERFORM_DELETION_SKIP_EXTENSIONS in our case, but only for recursive
> > > drops. The effect would be that 'DROP SCHEMA ... CASCADE' would
> > > proceed without error, i.e., it would drop the tables as well without
> > > including the subscription in the dependency list. But if we try to
> > > drop a table directly (e.g., DROP TABLE CLT), it will still result in:
> > > ERROR: cannot drop table because subscription sub1 requires it
> > >
> >
> > I think this way of allowing dropping the conflict table without
> > caring for the parent object (subscription) is not a good idea. How
> > about creating a dedicated schema, say pg_conflict for the purpose of
> > storing conflict tables? This will be similar to the pg_toast schema
> > for toast tables. So, similar to that each database will have a
> > pg_conflict schema. It prevents the "orphan" problem where a user
> > accidentally drops the logging schema but the Subscription is still
> > trying to write to it. pg_dump needs to ignore all system schemas
> > EXCEPT pg_conflict. This ensures the history is preserved during
> > migrations while still protecting the tables from accidental user
> > deletion. About permissions, I think we need to set the schema
> > permissions so that USAGE is public (so users can SELECT from their
> > logs) but CREATE is restricted to the superuser/subscription owner. We
> > may need to think some more about permissions.
> >
> > I also tried to reason out if we can allow storing the conflict table
> > in pg_catalog but here are a few reasons why it won't be a good idea.
> > I think by default, pg_dump completely ignores the pg_catalog schema.
> > It assumes pg_catalog contains static system definitions (like
> > pg_class, pg_proc, etc.) that are re-generated by the initdb process,
> > not user data. If we place a conflict table in pg_catalog, it will not
> > be backed up. If a user runs pg_dump/all to migrate to a new server,
> > their subscription definition will survive, but their entire history
> > of conflict logs will vanish. Also from the permissions angle, If a
> > user wants to write a custom PL/pgSQL function to "retry" conflicts,
> > they might need to DELETE rows from the conflict table after fixing
> > them. Granting DELETE permissions on a table inside pg_catalog is
> > non-standard and often frowned upon by security auditors. It blurs the
> > line between "System Internals" (immutable) and "User Data" (mutable).
> > So, in short a separate pg_conflict schema appears to be a better solution.
>
> Yeah that makes sense.  Although I haven't thought about all cases
> whether it can be a problem anywhere, but meanwhile I tried
> prototyping with this and it behaves what we want.
>
> postgres[1651968]=# select * from pg_conflict.conflict_log_table_16406 ;
>  relid | schemaname | relname |     conflict_type     | remote_xid |
> remote_commit_lsn |       remote_commit_ts        | remote_origin |
> replica_identity |  remote_tuple
> |
> local_conflicts
>
-------+------------+---------+-----------------------+------------+-------------------+-------------------------------+---------------+------------------+----------------
>
+------------------------------------------------------------------------------------------------------------------------------------
>  16385 | public     | test    | update_origin_differs |        761 |
> 0/01760BD8        | 2025-12-23 11:08:30.583816+00 | pg_16406      |
> {"a":1}          | {"a":1,"b":20}
> |
{"{\"xid\":\"772\",\"commit_ts\":\"2025-12-23T11:08:25.568561+00:00\",\"origin\":null,\"key\":null,\"tuple\":{\"a\":1,\"b\":10}}"}
> (1 row)
>
> -- Case1: Alter is not allowed
> postgres[1651968]=# ALTER TABLE pg_conflict.conflict_log_table_16406
> ADD COLUMN a int;
> ERROR:  42501: permission denied: "conflict_log_table_16406" is a system catalog
> LOCATION:  RangeVarCallbackForAlterRelation, tablecmds.c:19634
>

How was this achieved? Did you modify IsSystemClass to behave
similarly to IsToastClass?

I tried to analyze whether there are alternative approaches. The
possible options I see are:

1)
heap_create_with_catalog() provides the boolean argument use_user_acl,
which is meant to apply user-defined default privileges. In theory, we
could predefine default ACLs for our schema and then invoke
heap_create_with_catalog() with use_user_acl = true. But it’s not
clear how to do this purely from internal code. We would need to mimic
or reuse the logic behind SetDefaultACLsInSchemas.

2)
Another option is to create the table using heap_create_with_catalog()
with use_user_acl = false, and then explicitly update pg_class.relacl
for that table, similar to what ExecGrant_Relation does when
processing GRANT/REVOKE. But I couldn’t find any existing internal
code paths (outside of the GRANT/REVOKE implementation itself) that do
this kind of post-creation ACL manipulation.
~~

So overall, I feel changing IsSystemClass is the simpler way right
now. To set ACL before/after/during heap_create_with_catalog is a
tricky thing, at-least I could not find an easier way to do this,
unless I have missed something.
Thoughts on possible approaches?

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Fri, 19 Dec 2025 at 11:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 19, 2025 at 10:40 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Fri, Dec 19, 2025 at 9:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> >
> > > 2. Do we want to support multi destination then providing string like
> > > 'conflict_log_destination = 'log,table,..' make more sense but then we
> > > would have to store as a string in catalog and parse it everytime we
> > > insert conflicts or alter subscription OTOH currently I have just
> > > support single option log/table/both which make things much easy
> > > because then in catalog we can store as a single char field and don't
> > > need any parsing.  And since the input are taken as a string itself,
> > > even if in future we want to support more options like  'log,table,..'
> > > it would be backward compatible with old options.
> >
> > I feel, combination of options might be a good idea, similar to how
> > 'log_destination' provides. But it can be done in future versions and
> > the first draft can be a simple one.
> >
>
> Considering the future extension of storing conflict information in
> multiple places, it would be good to follow log_destination. Yes, it
> is more work now but I feel that will be future-proof.

The attached patch has the changes to specify conflict_log_destination
with a combination of table, log and all. This is implemented in
v15-0006 patch, there is no change in other patched v15-0001 ...
v15-0005 patches which are the same as the patches attached from [1].

[1] - https://www.postgresql.org/message-id/CALDaNm1zR1L2oq-LqYEcc8-wTZYjfJsiaTC_jQ8pGwbm0fv%2B3Q%40mail.gmail.com

Regards,
Vignesh

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Dec 24, 2025 at 4:02 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Dec 23, 2025 at 5:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Dec 23, 2025 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Dec 23, 2025 at 10:55 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > On Mon, Dec 22, 2025 at 9:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Mon, Dec 22, 2025 at 3:09 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > > >
> > > > > I think this needs more thought, others can be fixed.
> > > > >
> > > > > > 2)
> > > > > > postgres=# drop schema shveta cascade;
> > > > > > NOTICE:  drop cascades to subscription sub1
> > > > > > ERROR:  global objects cannot be deleted by doDeletion
> > > > > >
> > > > > > Is this expected? Is the user supposed to see this error?
> > > > > >
> > > > > See below code, so this says if the object being dropped is the
> > > > > outermost object (i.e. if we are dropping the table directly) then it
> > > > > will disallow dropping the object on which it has INTERNAL DEPENDENCY,
> > > > > OTOH if the object is being dropped via recursive drop (i.e. the table
> > > > > is being dropped while dropping the schema) then object on which it
> > > > > has INTERNAL dependency will also be added to the deletion list and
> > > > > later will be dropped via doDeletion and later we are getting error as
> > > > > subscription is a global object.  I thought maybe we can handle an
> > > > > additional case that the INTERNAL DEPENDENCY, is on subscription the
> > > > > disallow dropping it irrespective of whether it is being called
> > > > > directly or via recursive drop but then it will give an issue even
> > > > > when we are trying to drop table during subscription drop, we can make
> > > > > handle this case as well via 'flags' passed in findDependentObjects()
> > > > > but need more investigation.
> > > > >
> > > > > Seeing this complexity makes me think more on is it really worth it to
> > > > > maintain this dependency?  Because during subscription drop we anyway
> > > > > have to call performDeletion externally because this dependency is
> > > > > local so we are just disallowing the conflict table drop, however the
> > > > > ALTER table is allowed so what we are really protecting by protecting
> > > > > the table drop, I think it can be just documented that if user try to
> > > > > drop the table then conflict will not be inserted anymore?
> > > > >
> > > > > findDependentObjects()
> > > > > {
> > > > > ...
> > > > >      switch (foundDep->deptype)
> > > > >      {
> > > > >          ....
> > > > >          case DEPENDENCY_INTERNAL:
> > > > >             * 1. At the outermost recursion level, we must disallow the
> > > > >             * DROP. However, if the owning object is listed in
> > > > >             * pendingObjects, just release the caller's lock and return;
> > > > >             * we'll eventually complete the DROP when we reach that entry
> > > > >             * in the pending list.
> > > > >      }
> > > > > }
> > > > >
> > > > > [1]
> > > > > postgres[1333899]=# select * from pg_depend where objid > 16410;
> > > > >  classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
> > > > > ---------+-------+----------+------------+----------+-------------+---------
> > > > >     1259 | 16420 |        0 |       2615 |    16410 |           0 | n
> > > > >     1259 | 16420 |        0 |       6100 |    16419 |           0 | i
> > > > > (4 rows)
> > > > >
> > > > > 16420 -> conflict_log_table_16419
> > > > > 16419 -> subscription
> > > > > 16410 -> schema s1
> > > > >
> > > >
> > > > One approach could be to use something similar to
> > > > PERFORM_DELETION_SKIP_EXTENSIONS in our case, but only for recursive
> > > > drops. The effect would be that 'DROP SCHEMA ... CASCADE' would
> > > > proceed without error, i.e., it would drop the tables as well without
> > > > including the subscription in the dependency list. But if we try to
> > > > drop a table directly (e.g., DROP TABLE CLT), it will still result in:
> > > > ERROR: cannot drop table because subscription sub1 requires it
> > > >
> > >
> > > I think this way of allowing dropping the conflict table without
> > > caring for the parent object (subscription) is not a good idea. How
> > > about creating a dedicated schema, say pg_conflict for the purpose of
> > > storing conflict tables? This will be similar to the pg_toast schema
> > > for toast tables. So, similar to that each database will have a
> > > pg_conflict schema. It prevents the "orphan" problem where a user
> > > accidentally drops the logging schema but the Subscription is still
> > > trying to write to it. pg_dump needs to ignore all system schemas
> > > EXCEPT pg_conflict. This ensures the history is preserved during
> > > migrations while still protecting the tables from accidental user
> > > deletion. About permissions, I think we need to set the schema
> > > permissions so that USAGE is public (so users can SELECT from their
> > > logs) but CREATE is restricted to the superuser/subscription owner. We
> > > may need to think some more about permissions.
> > >
> > > I also tried to reason out if we can allow storing the conflict table
> > > in pg_catalog but here are a few reasons why it won't be a good idea.
> > > I think by default, pg_dump completely ignores the pg_catalog schema.
> > > It assumes pg_catalog contains static system definitions (like
> > > pg_class, pg_proc, etc.) that are re-generated by the initdb process,
> > > not user data. If we place a conflict table in pg_catalog, it will not
> > > be backed up. If a user runs pg_dump/all to migrate to a new server,
> > > their subscription definition will survive, but their entire history
> > > of conflict logs will vanish. Also from the permissions angle, If a
> > > user wants to write a custom PL/pgSQL function to "retry" conflicts,
> > > they might need to DELETE rows from the conflict table after fixing
> > > them. Granting DELETE permissions on a table inside pg_catalog is
> > > non-standard and often frowned upon by security auditors. It blurs the
> > > line between "System Internals" (immutable) and "User Data" (mutable).
> > > So, in short a separate pg_conflict schema appears to be a better solution.
> >
> > Yeah that makes sense.  Although I haven't thought about all cases
> > whether it can be a problem anywhere, but meanwhile I tried
> > prototyping with this and it behaves what we want.
> >
> > postgres[1651968]=# select * from pg_conflict.conflict_log_table_16406 ;
> >  relid | schemaname | relname |     conflict_type     | remote_xid |
> > remote_commit_lsn |       remote_commit_ts        | remote_origin |
> > replica_identity |  remote_tuple
> > |
> > local_conflicts
> >
-------+------------+---------+-----------------------+------------+-------------------+-------------------------------+---------------+------------------+----------------
> >
+------------------------------------------------------------------------------------------------------------------------------------
> >  16385 | public     | test    | update_origin_differs |        761 |
> > 0/01760BD8        | 2025-12-23 11:08:30.583816+00 | pg_16406      |
> > {"a":1}          | {"a":1,"b":20}
> > |
{"{\"xid\":\"772\",\"commit_ts\":\"2025-12-23T11:08:25.568561+00:00\",\"origin\":null,\"key\":null,\"tuple\":{\"a\":1,\"b\":10}}"}
> > (1 row)
> >
> > -- Case1: Alter is not allowed
> > postgres[1651968]=# ALTER TABLE pg_conflict.conflict_log_table_16406
> > ADD COLUMN a int;
> > ERROR:  42501: permission denied: "conflict_log_table_16406" is a system catalog
> > LOCATION:  RangeVarCallbackForAlterRelation, tablecmds.c:19634
> >
>
> How was this achieved? Did you modify IsSystemClass to behave
> similarly to IsToastClass?

Right

> I tried to analyze whether there are alternative approaches. The
> possible options I see are:
>
> 1)
> heap_create_with_catalog() provides the boolean argument use_user_acl,
> which is meant to apply user-defined default privileges. In theory, we
> could predefine default ACLs for our schema and then invoke
> heap_create_with_catalog() with use_user_acl = true. But it’s not
> clear how to do this purely from internal code. We would need to mimic
> or reuse the logic behind SetDefaultACLsInSchemas.
> 2)
> Another option is to create the table using heap_create_with_catalog()
> with use_user_acl = false, and then explicitly update pg_class.relacl
> for that table, similar to what ExecGrant_Relation does when
> processing GRANT/REVOKE. But I couldn’t find any existing internal
> code paths (outside of the GRANT/REVOKE implementation itself) that do
> this kind of post-creation ACL manipulation.

I haven't analyzed this options, I will do that but not before Jan 3rd
as I will be away from my laptop for a week.

> So overall, I feel changing IsSystemClass is the simpler way right
> now. To set ACL before/after/during heap_create_with_catalog is a
> tricky thing, at-least I could not find an easier way to do this,
> unless I have missed something.
> Thoughts on possible approaches?

Here is the patches I have changed by using IsSystemClass(), based on
this many other things changed like we don't need to check for the
temp schema and also the caller of create_conflict_log_table() now
don't need to find the creation schema so it don't need to generate
the relname so that part is also moved within
create_conflict_log_table().  Fixed most of the comments given by
Peter and Shveta, although some of them are still open e.g. the name
of the conflict log table as of now I have kept as
conflict_log_table_<subid> other options are

1. pg_conflict_<subid>
2. conflict_log_<subid>
3. sub_conflict_log_<subid>

I prefer 3, considering it says this table holds subscription conflict
logs.  Thoughts?

Vignesh, your patches have to be rebased on the new version.

--
Regards,
Dilip Kumar
Google

Вложения