Обсуждение: Proposal: Conflict log history table for Logical Replication

Поиск
Список
Период
Сортировка

Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
Currently we log conflicts to the server's log file and updates, this
approach has limitations, 1) Difficult to query and analyze, parsing
plain text log files for conflict details is inefficient. 2) Lack of
structured data, key conflict attributes (table, operation, old/new
data, LSN, etc.) are not readily available in a structured, queryable
format. 3) Difficult for external monitoring tools or custom
resolution scripts to consume conflict data directly.

This proposal aims to address these limitations by introducing a
conflict log history table, providing a structured, and queryable
record of all logical replication conflicts.  This should be a
configurable option whether to log into the conflict log history
table, server logs or both.

This proposal has two main design questions:
===================================

1. How do we store conflicting tuples from different tables?
Using a JSON column to store the row data seems like the most flexible
solution, as it can accommodate different table schemas.

2. Should this be a system table or a user table?
a) System Table: Storing this in a system catalog is simple, but
catalogs aren't designed for ever-growing data. While pg_large_object
is an exception, this is not what we generally do IMHO.
b) User Table: This offers more flexibility. We could allow a user to
specify the table name during CREATE SUBSCRIPTION.  Then we choose to
either create the table internally or let the user create the table
with a predefined schema.

A potential drawback is that a user might drop or alter the table.
However, we could mitigate this risk by simply logging a WARNING if
the table is configured but an insertion fails.
I am currently working on a POC patch for the same, but will post that
once we have some thoughts on design choices.

Schema for the conflict log history table may look like this, although
there is a room for discussion on this.

Note:  I think these fields are self explanatory so I haven't
explained them here.

conflict_log_table (
    logid  SERIAL PRIMARY KEY,
    subid                OID,
    schema_id          OID,
    table_id            OID,
    conflict_type        TEXT NOT NULL,
    operation_type       TEXT NOT NULL,
    replication_origin   TEXT,
    remote_commit_ts TIMESTAMPTZ,
    local_commit_ts TIMESTAMPTZ,
    ri_key                    JSON,
    remote_tuple         JSON,
    local_tuple          JSON,
);

Credit:  Thanks to Amit Kapila for discussing this offlist and
providing some valuable suggestions.

-- 
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Currently we log conflicts to the server's log file and updates, this
> approach has limitations, 1) Difficult to query and analyze, parsing
> plain text log files for conflict details is inefficient. 2) Lack of
> structured data, key conflict attributes (table, operation, old/new
> data, LSN, etc.) are not readily available in a structured, queryable
> format. 3) Difficult for external monitoring tools or custom
> resolution scripts to consume conflict data directly.
>
> This proposal aims to address these limitations by introducing a
> conflict log history table, providing a structured, and queryable
> record of all logical replication conflicts.  This should be a
> configurable option whether to log into the conflict log history
> table, server logs or both.
>

+1 for the idea.

> This proposal has two main design questions:
> ===================================
>
> 1. How do we store conflicting tuples from different tables?
> Using a JSON column to store the row data seems like the most flexible
> solution, as it can accommodate different table schemas.

Yes, that is one option. I have not looked into details myself, but
you can also explore 'anyarray' used in pg_statistics to store 'Column
data values of the appropriate kind'.

> 2. Should this be a system table or a user table?
> a) System Table: Storing this in a system catalog is simple, but
> catalogs aren't designed for ever-growing data. While pg_large_object
> is an exception, this is not what we generally do IMHO.
> b) User Table: This offers more flexibility. We could allow a user to
> specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> either create the table internally or let the user create the table
> with a predefined schema.
>
> A potential drawback is that a user might drop or alter the table.
> However, we could mitigate this risk by simply logging a WARNING if
> the table is configured but an insertion fails.

I believe it makes more sense for this to be a catalog table rather
than a user table. I wanted to check if we already have a large
catalog table of this kind, and I think pg_statistic could be an
example of a sizable catalog table. To get a rough idea of how size
scales with data, I ran a quick experiment: I created 1000 tables,
each with 2 JSON columns, 1 text column, and 2 integer columns. Then,
I inserted 1000 rows into each table and ran ANALYZE to collect
statistics. Here’s what I observed on a fresh database before and
after:

Before:
pg_statistic row count: 412
Table size: ~256 kB

After:
pg_statistic row count: 6,412
Table size: ~5.3 MB

Although it isn’t an exact comparison, this gives us some insight into
how the statistics catalog table size grows with the number of rows.
It doesn’t seem excessively large with 6k rows, given the fact that
pg_statistic itself is a complex table having many 'anyarray'-type
columns.

That said, irrespective of what we decide, it would be ideal to offer
users an option for automatic purging, perhaps via a retention period
parameter like conflict_stats_retention_period (say default to 30
days), or a manual purge API such as purge_conflict_stats('older than
date'). I wasn’t able to find any such purge mechanism for PostgreSQL
stats tables, but Oracle does provide such purging options for some of
their statistics tables (not related to conflicts), see [1], [2].
And to manage it better, it could be range partitioned on timestamp.


> I am currently working on a POC patch for the same, but will post that
> once we have some thoughts on design choices.
>
> Schema for the conflict log history table may look like this, although
> there is a room for discussion on this.
>
> Note:  I think these fields are self explanatory so I haven't
> explained them here.
>
> conflict_log_table (
>     logid  SERIAL PRIMARY KEY,
>     subid                OID,
>     schema_id          OID,
>     table_id            OID,
>     conflict_type        TEXT NOT NULL,
>     operation_type       TEXT NOT NULL,

I feel operation_type is not needed when we already have
conflict_type. The name of 'conflict_type' is enough to give us info
on operation-type.

>     replication_origin   TEXT,
>     remote_commit_ts TIMESTAMPTZ,
>     local_commit_ts TIMESTAMPTZ,
>     ri_key                    JSON,
>     remote_tuple         JSON,
>     local_tuple          JSON,
> );
>
> Credit:  Thanks to Amit Kapila for discussing this offlist and
> providing some valuable suggestions.
>

[1]

https://docs.oracle.com/en/database/oracle/oracle-database/21/arpls/DBMS_STATS.html#GUID-8E6413D5-F827-4F57-9FAD-7EC56362A98C

[2]

https://docs.oracle.com/en/database/oracle/oracle-database/21/arpls/DBMS_STATS.html#GUID-A04AE1C0-5DE1-4AFC-91F8-D35D41DF98A2

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Thu, Aug 7, 2025 at 12:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Currently we log conflicts to the server's log file and updates, this
> > approach has limitations, 1) Difficult to query and analyze, parsing
> > plain text log files for conflict details is inefficient. 2) Lack of
> > structured data, key conflict attributes (table, operation, old/new
> > data, LSN, etc.) are not readily available in a structured, queryable
> > format. 3) Difficult for external monitoring tools or custom
> > resolution scripts to consume conflict data directly.
> >
> > This proposal aims to address these limitations by introducing a
> > conflict log history table, providing a structured, and queryable
> > record of all logical replication conflicts.  This should be a
> > configurable option whether to log into the conflict log history
> > table, server logs or both.
> >
>
> +1 for the idea.
>
> > This proposal has two main design questions:
> > ===================================
> >
> > 1. How do we store conflicting tuples from different tables?
> > Using a JSON column to store the row data seems like the most flexible
> > solution, as it can accommodate different table schemas.
>
> Yes, that is one option. I have not looked into details myself, but
> you can also explore 'anyarray' used in pg_statistics to store 'Column
> data values of the appropriate kind'.
>
> > 2. Should this be a system table or a user table?
> > a) System Table: Storing this in a system catalog is simple, but
> > catalogs aren't designed for ever-growing data. While pg_large_object
> > is an exception, this is not what we generally do IMHO.
> > b) User Table: This offers more flexibility. We could allow a user to
> > specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> > either create the table internally or let the user create the table
> > with a predefined schema.
> >
> > A potential drawback is that a user might drop or alter the table.
> > However, we could mitigate this risk by simply logging a WARNING if
> > the table is configured but an insertion fails.
>
> I believe it makes more sense for this to be a catalog table rather
> than a user table. I wanted to check if we already have a large
> catalog table of this kind, and I think pg_statistic could be an
> example of a sizable catalog table. To get a rough idea of how size
> scales with data, I ran a quick experiment: I created 1000 tables,
> each with 2 JSON columns, 1 text column, and 2 integer columns. Then,
> I inserted 1000 rows into each table and ran ANALYZE to collect
> statistics. Here’s what I observed on a fresh database before and
> after:
>
> Before:
> pg_statistic row count: 412
> Table size: ~256 kB
>
> After:
> pg_statistic row count: 6,412
> Table size: ~5.3 MB
>
> Although it isn’t an exact comparison, this gives us some insight into
> how the statistics catalog table size grows with the number of rows.
> It doesn’t seem excessively large with 6k rows, given the fact that
> pg_statistic itself is a complex table having many 'anyarray'-type
> columns.
>
> That said, irrespective of what we decide, it would be ideal to offer
> users an option for automatic purging, perhaps via a retention period
> parameter like conflict_stats_retention_period (say default to 30
> days), or a manual purge API such as purge_conflict_stats('older than
> date'). I wasn’t able to find any such purge mechanism for PostgreSQL
> stats tables, but Oracle does provide such purging options for some of
> their statistics tables (not related to conflicts), see [1], [2].
> And to manage it better, it could be range partitioned on timestamp.
>

It seems BDR also has one such conflict-log table which is a catalog
table and is also partitioned on time. It has a default retention
period of 30 days. See 'bdr.conflict_history' mentioned under
'catalogs' in [1]

[1]: https://www.enterprisedb.com/docs/pgd/latest/reference/tables-views-functions/#user-visible-catalogs-and-views

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Aug 7, 2025 at 1:43 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Aug 7, 2025 at 12:25 PM shveta malik <shveta.malik@gmail.com> wrote:

Thanks Shveta for your opinion on the design.

> > On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >

> > > This proposal aims to address these limitations by introducing a
> > > conflict log history table, providing a structured, and queryable
> > > record of all logical replication conflicts.  This should be a
> > > configurable option whether to log into the conflict log history
> > > table, server logs or both.
> > >
> >
> > +1 for the idea.

Thanks

> >
> > > This proposal has two main design questions:
> > > ===================================
> > >
> > > 1. How do we store conflicting tuples from different tables?
> > > Using a JSON column to store the row data seems like the most flexible
> > > solution, as it can accommodate different table schemas.
> >
> > Yes, that is one option. I have not looked into details myself, but
> > you can also explore 'anyarray' used in pg_statistics to store 'Column
> > data values of the appropriate kind'.

I think conversion from row to json and json to row is convenient and
also other extensions like pgactive/bdr also provide as JSON.  But we
can explore this alternative options as well, thanks

> > > 2. Should this be a system table or a user table?
> > > a) System Table: Storing this in a system catalog is simple, but
> > > catalogs aren't designed for ever-growing data. While pg_large_object
> > > is an exception, this is not what we generally do IMHO.
> > > b) User Table: This offers more flexibility. We could allow a user to
> > > specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> > > either create the table internally or let the user create the table
> > > with a predefined schema.
> > >
> > > A potential drawback is that a user might drop or alter the table.
> > > However, we could mitigate this risk by simply logging a WARNING if
> > > the table is configured but an insertion fails.
> >
> > I believe it makes more sense for this to be a catalog table rather
> > than a user table. I wanted to check if we already have a large
> > catalog table of this kind, and I think pg_statistic could be an
> > example of a sizable catalog table. To get a rough idea of how size
> > scales with data, I ran a quick experiment: I created 1000 tables,
> > each with 2 JSON columns, 1 text column, and 2 integer columns. Then,
> > I inserted 1000 rows into each table and ran ANALYZE to collect
> > statistics. Here’s what I observed on a fresh database before and
> > after:
> >
> > Before:
> > pg_statistic row count: 412
> > Table size: ~256 kB
> >
> > After:
> > pg_statistic row count: 6,412
> > Table size: ~5.3 MB
> >
> > Although it isn’t an exact comparison, this gives us some insight into
> > how the statistics catalog table size grows with the number of rows.
> > It doesn’t seem excessively large with 6k rows, given the fact that
> > pg_statistic itself is a complex table having many 'anyarray'-type
> > columns.

Yeah that's good analysis, apart from this pg_largeobject is also a
catalog which grows with each large object and growth rate for that
will be very high because it stores large object data in catalog.

> >
> > That said, irrespective of what we decide, it would be ideal to offer
> > users an option for automatic purging, perhaps via a retention period
> > parameter like conflict_stats_retention_period (say default to 30
> > days), or a manual purge API such as purge_conflict_stats('older than
> > date'). I wasn’t able to find any such purge mechanism for PostgreSQL
> > stats tables, but Oracle does provide such purging options for some of
> > their statistics tables (not related to conflicts), see [1], [2].
> > And to manage it better, it could be range partitioned on timestamp.

Yeah that's an interesting suggestion to timestamp based partitioning
it for purging.

> It seems BDR also has one such conflict-log table which is a catalog
> table and is also partitioned on time. It has a default retention
> period of 30 days. See 'bdr.conflict_history' mentioned under
> 'catalogs' in [1]
>
> [1]: https://www.enterprisedb.com/docs/pgd/latest/reference/tables-views-functions/#user-visible-catalogs-and-views

Actually bdr is an extension and this table is under extension
namespace (bdr.conflict_history) so this is not really a catalog but
its a extension managed table.  So logically for PostgreSQL its an
user table but yeah this is created and managed by the extension.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Aug 7, 2025 at 1:43 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Aug 7, 2025 at 12:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Thanks Shveta for your opinion on the design.
>
> > > On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
>
> > > > This proposal aims to address these limitations by introducing a
> > > > conflict log history table, providing a structured, and queryable
> > > > record of all logical replication conflicts.  This should be a
> > > > configurable option whether to log into the conflict log history
> > > > table, server logs or both.
> > > >
> > >
> > > +1 for the idea.
>
> Thanks
>
> > >
> > > > This proposal has two main design questions:
> > > > ===================================
> > > >
> > > > 1. How do we store conflicting tuples from different tables?
> > > > Using a JSON column to store the row data seems like the most flexible
> > > > solution, as it can accommodate different table schemas.
> > >
> > > Yes, that is one option. I have not looked into details myself, but
> > > you can also explore 'anyarray' used in pg_statistics to store 'Column
> > > data values of the appropriate kind'.
>
> I think conversion from row to json and json to row is convenient and
> also other extensions like pgactive/bdr also provide as JSON.

Okay. Agreed.

> But we
> can explore this alternative options as well, thanks
>
> > > > 2. Should this be a system table or a user table?
> > > > a) System Table: Storing this in a system catalog is simple, but
> > > > catalogs aren't designed for ever-growing data. While pg_large_object
> > > > is an exception, this is not what we generally do IMHO.
> > > > b) User Table: This offers more flexibility. We could allow a user to
> > > > specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> > > > either create the table internally or let the user create the table
> > > > with a predefined schema.
> > > >
> > > > A potential drawback is that a user might drop or alter the table.
> > > > However, we could mitigate this risk by simply logging a WARNING if
> > > > the table is configured but an insertion fails.
> > >
> > > I believe it makes more sense for this to be a catalog table rather
> > > than a user table. I wanted to check if we already have a large
> > > catalog table of this kind, and I think pg_statistic could be an
> > > example of a sizable catalog table. To get a rough idea of how size
> > > scales with data, I ran a quick experiment: I created 1000 tables,
> > > each with 2 JSON columns, 1 text column, and 2 integer columns. Then,
> > > I inserted 1000 rows into each table and ran ANALYZE to collect
> > > statistics. Here’s what I observed on a fresh database before and
> > > after:
> > >
> > > Before:
> > > pg_statistic row count: 412
> > > Table size: ~256 kB
> > >
> > > After:
> > > pg_statistic row count: 6,412
> > > Table size: ~5.3 MB
> > >
> > > Although it isn’t an exact comparison, this gives us some insight into
> > > how the statistics catalog table size grows with the number of rows.
> > > It doesn’t seem excessively large with 6k rows, given the fact that
> > > pg_statistic itself is a complex table having many 'anyarray'-type
> > > columns.
>
> Yeah that's good analysis, apart from this pg_largeobject is also a
> catalog which grows with each large object and growth rate for that
> will be very high because it stores large object data in catalog.
>
> > >
> > > That said, irrespective of what we decide, it would be ideal to offer
> > > users an option for automatic purging, perhaps via a retention period
> > > parameter like conflict_stats_retention_period (say default to 30
> > > days), or a manual purge API such as purge_conflict_stats('older than
> > > date'). I wasn’t able to find any such purge mechanism for PostgreSQL
> > > stats tables, but Oracle does provide such purging options for some of
> > > their statistics tables (not related to conflicts), see [1], [2].
> > > And to manage it better, it could be range partitioned on timestamp.
>
> Yeah that's an interesting suggestion to timestamp based partitioning
> it for purging.
>
> > It seems BDR also has one such conflict-log table which is a catalog
> > table and is also partitioned on time. It has a default retention
> > period of 30 days. See 'bdr.conflict_history' mentioned under
> > 'catalogs' in [1]
> >
> > [1]: https://www.enterprisedb.com/docs/pgd/latest/reference/tables-views-functions/#user-visible-catalogs-and-views
>
> Actually bdr is an extension and this table is under extension
> namespace (bdr.conflict_history) so this is not really a catalog but
> its a extension managed table.

Yes, right. Sorry for confusion.

> So logically for PostgreSQL its an
> user table but yeah this is created and managed by the extension.
>

Any idea if the user can alter/drop or perform any DML on it? I could
not find any details on this part.

> --
> Regards,
> Dilip Kumar
> Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > So logically for PostgreSQL its an
> > user table but yeah this is created and managed by the extension.
> >
>
> Any idea if the user can alter/drop or perform any DML on it? I could
> not find any details on this part.

In my experience, for such extension managed tables where we want them
to behave like catalog, generally users are just granted with SELECT
permission.  So although it is not a catalog but for accessibility
wise for non admin users it is like a catalog.  IMHO, even if we
choose to create a user table for conflict log history we can also
control the permissions similarly.  What's your opinion on this?

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Fri, Aug 8, 2025 at 10:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > So logically for PostgreSQL its an
> > > user table but yeah this is created and managed by the extension.
> > >
> >
> > Any idea if the user can alter/drop or perform any DML on it? I could
> > not find any details on this part.
>
> In my experience, for such extension managed tables where we want them
> to behave like catalog, generally users are just granted with SELECT
> permission.  So although it is not a catalog but for accessibility
> wise for non admin users it is like a catalog.  IMHO, even if we
> choose to create a user table for conflict log history we can also
> control the permissions similarly.
>

Yes, it can be done. Technically there is nothing preventing us from
doing it. But in my experience, I have never seen any
system-maintained statistics tables to be a user table rather than
catalog table. Extensions are a different case; they typically manage
their own tables, which are not part of the system catalog. But if any
such stats related functionality is part of the core database, it
generally makes more sense to implement it as a catalog table
(provided there are no major obstacles to doing so). But I am curious
to know what others think here.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Aug 8, 2025 at 10:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > So logically for PostgreSQL its an
> > > user table but yeah this is created and managed by the extension.
> > >
> >
> > Any idea if the user can alter/drop or perform any DML on it? I could
> > not find any details on this part.
>
> In my experience, for such extension managed tables where we want them
> to behave like catalog, generally users are just granted with SELECT
> permission.  So although it is not a catalog but for accessibility
> wise for non admin users it is like a catalog.  IMHO, even if we
> choose to create a user table for conflict log history we can also
> control the permissions similarly.  What's your opinion on this?
>

Yes, I think it is important to control permissions on this table even
if it is a user table. How about giving SELECT, DELETE, TRUNCATE
permissions to subscription owner assuming we create one such table
per subscription?

It should be a user table due to following reasons (a) It is an ever
growing table by definition and we need some level of user control to
manage it (like remove the old data); (b) We may want some sort of
partitioning streategy to manage it, even though, we decide to do it
ourselves now but in future, we should allow user to also specify it;
(c) We may also want user to specify what exact information she wants
to get stored considering in future we want resolutions to also be
stored in it. See a somewhat similar proposal to store errors during
copy by Tom [1]; (d) In a near-by thread, we are discussing storing
errors during copy in user table [2] and we have some similarity with
that proposal as well.

If we agree on this then the next thing to consider is whether we
allow users to create such a table or do it ourselves. In the long
term, we may want both but for simplicity, we can auto-create
ourselves during CREATE SUBSCRIPTION with some option. BTW, if we
decide to let user create it then we can consider the idea of TYPED
tables as discussed in emails [3][4].

For user tables, we need to consider how to avoid replicating these
tables for publications that use FOR ALL TABLES specifier. One idea is
to use EXCLUDE table functionality as being discussed in thread [5]
but that would also be a bit tricky especially if we decide to create
such a table automatically. One naive idea is that internally we skip
sending changes from this table for "FOR ALL TABLES" publication, and
we shouldn't allow creating publication for this table. OTOH, if we
allow the user to create and specify this table, we can ask her to
specify with EXCLUDE syntax in publication. This needs more thoughts.

[1] - https://www.postgresql.org/message-id/flat/752672.1699474336%40sss.pgh.pa.us#b8450be5645c4252d7d02cf7aca1fc7b
[2] - https://www.postgresql.org/message-id/CACJufxH_OJpVra%3D0c4ow8fbxHj7heMcVaTNEPa5vAurSeNA-6Q%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/28c420cf-f25d-44f1-89fd-04ef0b2dd3db%40dunslane.net
[4] -
https://www.postgresql.org/message-id/CADrsxdYG%2B%2BK%3DiKjRm35u03q-Nb0tQPJaqjxnA2mGt5O%3DDht7sw%40mail.gmail.com
[5] -
https://www.postgresql.org/message-id/CANhcyEW%2BuJB_bvQLEaZCgoRTc1%3Di%2BQnrPPHxZ2%3D0SBSCyj9pkg%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Alastair Turner
Дата:
On Wed, 13 Aug 2025 at 11:09, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Aug 8, 2025 at 10:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > So logically for PostgreSQL its an
> > > user table but yeah this is created and managed by the extension.
> > >
> >
> > Any idea if the user can alter/drop or perform any DML on it? I could
> > not find any details on this part.
>
> In my experience, for such extension managed tables where we want them
> to behave like catalog, generally users are just granted with SELECT
> permission.  So although it is not a catalog but for accessibility
> wise for non admin users it is like a catalog.  IMHO, even if we
> choose to create a user table for conflict log history we can also
> control the permissions similarly.  What's your opinion on this?
>

Yes, I think it is important to control permissions on this table even
if it is a user table. How about giving SELECT, DELETE, TRUNCATE
permissions to subscription owner assuming we create one such table
per subscription?

It should be a user table due to following reasons (a) It is an ever
growing table by definition and we need some level of user control to
manage it (like remove the old data); (b) We may want some sort of
partitioning streategy to manage it, even though, we decide to do it
ourselves now but in future, we should allow user to also specify it;
(c) We may also want user to specify what exact information she wants
to get stored considering in future we want resolutions to also be
stored in it. See a somewhat similar proposal to store errors during
copy by Tom [1]; (d) In a near-by thread, we are discussing storing
errors during copy in user table [2] and we have some similarity with
that proposal as well.

If we agree on this then the next thing to consider is whether we
allow users to create such a table or do it ourselves. In the long
term, we may want both but for simplicity, we can auto-create
ourselves during CREATE SUBSCRIPTION with some option. BTW, if we
decide to let user create it then we can consider the idea of TYPED
tables as discussed in emails [3][4].

Having it be a user table, and specifying the table per subscription sounds good. This is very similar to how the load error tables for CloudBerry behave, for instance. To have both options for table creation, CREATE ... IF NOT EXISTS semantics work well - if the option on CREATE SUBSCRIPTION specifies an existing table of the right type use it, or create one with the name supplied. This would also give the user control over whether to have one table per subscription, one central table or anything in between. Rather than constraining permissions on the table, the CREATE SUBSCRIPTION command could create a dependency relationship between the table and the subscription.This would prevent removal of the table, even by a superuser.
 
For user tables, we need to consider how to avoid replicating these
tables for publications that use FOR ALL TABLES specifier. One idea is
to use EXCLUDE table functionality as being discussed in thread [5]
but that would also be a bit tricky especially if we decide to create
such a table automatically. One naive idea is that internally we skip
sending changes from this table for "FOR ALL TABLES" publication, and
we shouldn't allow creating publication for this table. OTOH, if we
allow the user to create and specify this table, we can ask her to
specify with EXCLUDE syntax in publication. This needs more thoughts.

If a dependency relationship is established between the error table and the subscription, could this be used as a basis for filtering the error tables from FOR ALL TABLES subscriptions?

Regards

Alastair 

Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Aug 14, 2025 at 4:26 PM Alastair Turner <minion@decodable.me> wrote:
>
> On Wed, 13 Aug 2025 at 11:09, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Aug 8, 2025 at 10:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> >
>> > On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
>> > >
>> > > On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> > > >
>> > > > So logically for PostgreSQL its an
>> > > > user table but yeah this is created and managed by the extension.
>> > > >
>> > >
>> > > Any idea if the user can alter/drop or perform any DML on it? I could
>> > > not find any details on this part.
>> >
>> > In my experience, for such extension managed tables where we want them
>> > to behave like catalog, generally users are just granted with SELECT
>> > permission.  So although it is not a catalog but for accessibility
>> > wise for non admin users it is like a catalog.  IMHO, even if we
>> > choose to create a user table for conflict log history we can also
>> > control the permissions similarly.  What's your opinion on this?
>> >
>>
>> Yes, I think it is important to control permissions on this table even
>> if it is a user table. How about giving SELECT, DELETE, TRUNCATE
>> permissions to subscription owner assuming we create one such table
>> per subscription?
>>
>> It should be a user table due to following reasons (a) It is an ever
>> growing table by definition and we need some level of user control to
>> manage it (like remove the old data); (b) We may want some sort of
>> partitioning streategy to manage it, even though, we decide to do it
>> ourselves now but in future, we should allow user to also specify it;
>> (c) We may also want user to specify what exact information she wants
>> to get stored considering in future we want resolutions to also be
>> stored in it. See a somewhat similar proposal to store errors during
>> copy by Tom [1]; (d) In a near-by thread, we are discussing storing
>> errors during copy in user table [2] and we have some similarity with
>> that proposal as well.
>>
>> If we agree on this then the next thing to consider is whether we
>> allow users to create such a table or do it ourselves. In the long
>> term, we may want both but for simplicity, we can auto-create
>> ourselves during CREATE SUBSCRIPTION with some option. BTW, if we
>> decide to let user create it then we can consider the idea of TYPED
>> tables as discussed in emails [3][4].
>
>
> Having it be a user table, and specifying the table per subscription sounds good. This is very similar to how the
loaderror tables for CloudBerry behave, for instance. To have both options for table creation, CREATE ... IF NOT EXISTS
semanticswork well - if the option on CREATE SUBSCRIPTION specifies an existing table of the right type use it, or
createone with the name supplied. This would also give the user control over whether to have one table per
subscription,one central table or anything in between. 
>

Sounds reasonable. I think the first version we can let such a table
be created automatically with some option(s) with subscription. Then,
in subsequent versions, we can extend the functionality to allow
existing tables.

>
> Rather than constraining permissions on the table, the CREATE SUBSCRIPTION command could create a dependency
relationshipbetween the table and the subscription.This would prevent removal of the table, even by a superuser. 
>

Okay, that makes sense. But, we still probably want to disallow users
from inserting or updating rows in the conflict table.

>>
>> For user tables, we need to consider how to avoid replicating these
>> tables for publications that use FOR ALL TABLES specifier. One idea is
>> to use EXCLUDE table functionality as being discussed in thread [5]
>> but that would also be a bit tricky especially if we decide to create
>> such a table automatically. One naive idea is that internally we skip
>> sending changes from this table for "FOR ALL TABLES" publication, and
>> we shouldn't allow creating publication for this table. OTOH, if we
>> allow the user to create and specify this table, we can ask her to
>> specify with EXCLUDE syntax in publication. This needs more thoughts.
>
>
> If a dependency relationship is established between the error table and the subscription, could this be used as a
basisfor filtering the error tables from FOR ALL TABLES subscriptions? 
>

Yeah, that is worth considering.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Aug 13, 2025 at 3:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 10:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Aug 8, 2025 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > So logically for PostgreSQL its an
> > > > user table but yeah this is created and managed by the extension.
> > > >
> > >
> > > Any idea if the user can alter/drop or perform any DML on it? I could
> > > not find any details on this part.
> >
> > In my experience, for such extension managed tables where we want them
> > to behave like catalog, generally users are just granted with SELECT
> > permission.  So although it is not a catalog but for accessibility
> > wise for non admin users it is like a catalog.  IMHO, even if we
> > choose to create a user table for conflict log history we can also
> > control the permissions similarly.  What's your opinion on this?
> >
>
> Yes, I think it is important to control permissions on this table even
> if it is a user table. How about giving SELECT, DELETE, TRUNCATE
> permissions to subscription owner assuming we create one such table
> per subscription?

Right, we need to control the permission.  I am not sure whether we
want a per subscription table or a common one. Earlier I was thinking
of a single table, but I think per subscription is not a bad idea
especially for managing the permissions.  And there can not be a
really huge number of subscriptions that we need to worry about
creating many conflict log history tables and that too we will only
create such tables when users pass that subscription option.


> It should be a user table due to following reasons (a) It is an ever
> growing table by definition and we need some level of user control to
> manage it (like remove the old data); (b) We may want some sort of
> partitioning streategy to manage it, even though, we decide to do it
> ourselves now but in future, we should allow user to also specify it;

Maybe we can partition by range on date (when entry is inserted) .
That way it would be easy to get rid of older partitions for users.

> (c) We may also want user to specify what exact information she wants
> to get stored considering in future we want resolutions to also be
> stored in it. See a somewhat similar proposal to store errors during
> copy by Tom [1]; (d) In a near-by thread, we are discussing storing
> errors during copy in user table [2] and we have some similarity with
> that proposal as well.

Right, we may consider that as well.

> If we agree on this then the next thing to consider is whether we
> allow users to create such a table or do it ourselves. In the long
> term, we may want both but for simplicity, we can auto-create
> ourselves during CREATE SUBSCRIPTION with some option. BTW, if we
> decide to let user create it then we can consider the idea of TYPED
> tables as discussed in emails [3][4].

Yeah that's an interesting option.

>
> For user tables, we need to consider how to avoid replicating these
> tables for publications that use FOR ALL TABLES specifier. One idea is
> to use EXCLUDE table functionality as being discussed in thread [5]
> but that would also be a bit tricky especially if we decide to create
> such a table automatically. One naive idea is that internally we skip
> sending changes from this table for "FOR ALL TABLES" publication, and
> we shouldn't allow creating publication for this table. OTOH, if we
> allow the user to create and specify this table, we can ask her to
> specify with EXCLUDE syntax in publication. This needs more thoughts.

Yes this needs more thought, I will think more on this point and respond.

Yet another question is about table names, whether we keep some
standard name like conflict_log_history_$subid or let users pass the
name.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Aug 15, 2025 at 2:31 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Yet another question is about table names, whether we keep some
> standard name like conflict_log_history_$subid or let users pass the
> name.
>

It would be good if we can let the user specify the table_name and if
she didn't specify then use an internally generated name. I think it
will be somewhat similar to slot_name. However, in this case, there is
one challenge which is how can we decide whether the schema of the
user provided table_name is correct or not? Do we compare it with the
standard schema we are planning to use?

One idea to keep things simple for the first version is that we allow
users to specify the table_name for storing conflicts but the table
should be created internally and if the same name table already
exists, we can give an ERROR. Then we can later extend the
functionality to even allow storing conflicts in pre-created tables
with more checks about its schema.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Aug 18, 2025 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Aug 15, 2025 at 2:31 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Yet another question is about table names, whether we keep some
> > standard name like conflict_log_history_$subid or let users pass the
> > name.
> >
>
> It would be good if we can let the user specify the table_name and if
> she didn't specify then use an internally generated name. I think it
> will be somewhat similar to slot_name. However, in this case, there is
> one challenge which is how can we decide whether the schema of the
> user provided table_name is correct or not? Do we compare it with the
> standard schema we are planning to use?

Ideally we can do that, if you see in this thread [1] there is a patch
[2] which first try to validate the table schema and if it doesn't
exist it creates it on its own.  And it seems fine to me.

> One idea to keep things simple for the first version is that we allow
> users to specify the table_name for storing conflicts but the table
> should be created internally and if the same name table already
> exists, we can give an ERROR. Then we can later extend the
> functionality to even allow storing conflicts in pre-created tables
> with more checks about its schema.

That's fair too.  I am wondering what namespace we should create this
user table in. If we are creating internally, I assume the user should
provide a schema qualified name right?


[1] https://www.postgresql.org/message-id/flat/752672.1699474336%40sss.pgh.pa.us#b8450be5645c4252d7d02cf7aca1fc7b
[2] https://www.postgresql.org/message-id/attachment/152792/v8-0001-Add-a-new-COPY-option-SAVE_ERROR.patch


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Wed, Aug 20, 2025 at 11:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Aug 18, 2025 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > One idea to keep things simple for the first version is that we allow
> > users to specify the table_name for storing conflicts but the table
> > should be created internally and if the same name table already
> > exists, we can give an ERROR. Then we can later extend the
> > functionality to even allow storing conflicts in pre-created tables
> > with more checks about its schema.
>
> That's fair too.  I am wondering what namespace we should create this
> user table in. If we are creating internally, I assume the user should
> provide a schema qualified name right?
>

Yeah, but if not provided then we should create it based on
search_path similar to what we do when user created the table from
psql.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Aug 20, 2025 at 5:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 20, 2025 at 11:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Aug 18, 2025 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > > One idea to keep things simple for the first version is that we allow
> > > users to specify the table_name for storing conflicts but the table
> > > should be created internally and if the same name table already
> > > exists, we can give an ERROR. Then we can later extend the
> > > functionality to even allow storing conflicts in pre-created tables
> > > with more checks about its schema.
> >
> > That's fair too.  I am wondering what namespace we should create this
> > user table in. If we are creating internally, I assume the user should
> > provide a schema qualified name right?
> >
>
> Yeah, but if not provided then we should create it based on
> search_path similar to what we do when user created the table from
> psql.

Yeah that makes sense.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Aug 21, 2025 at 9:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Aug 20, 2025 at 5:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Aug 20, 2025 at 11:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Aug 18, 2025 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > >
> > > > One idea to keep things simple for the first version is that we allow
> > > > users to specify the table_name for storing conflicts but the table
> > > > should be created internally and if the same name table already
> > > > exists, we can give an ERROR. Then we can later extend the
> > > > functionality to even allow storing conflicts in pre-created tables
> > > > with more checks about its schema.
> > >
> > > That's fair too.  I am wondering what namespace we should create this
> > > user table in. If we are creating internally, I assume the user should
> > > provide a schema qualified name right?
> > >
> >
> > Yeah, but if not provided then we should create it based on
> > search_path similar to what we do when user created the table from
> > psql.

While working on the patch, I see there are some open questions

1. We decided to pass the conflict history table name during
subscription creation. And it makes sense to create this table when
the CREATE SUBSCRIPTION command is executed. A potential concern is
that the subscription owner will also own this table, having full
control over it, including the ability to drop or alter its schema.
This might not be an issue. If an INSERT into the conflict table
fails, we can check the table's existence and schema. If they are not
as expected, the conflict log history option can be disabled and
re-enabled later via ALTER SUBSCRIPTION.

2. A further challenge is how to exclude these tables from publishing
changes. If we support a subscription-level log history table and the
user publishes ALL TABLES, the output plugin uses
is_publishable_relation() to check if a table is publishable. However,
applying the same logic here would require checking each subscription
on the node to see if the table is designated as a conflict log
history table for any subscription, which could be costly.

3. And one last thing is about should we consider dropping this table
when we drop the subscription, I think this makes sense as we are
internally creating it while creating the subscription.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Alastair Turner
Дата:
Hi Dilip

Thanks for working on this, I think it will make conflict detection a lot more useful. 

On Sat, 6 Sept 2025, 10:38 Dilip Kumar, <dilipbalaut@gmail.com> wrote:
While working on the patch, I see there are some open questions

1. We decided to pass the conflict history table name during
subscription creation. And it makes sense to create this table when
the CREATE SUBSCRIPTION command is executed. A potential concern is
that the subscription owner will also own this table, having full
control over it, including the ability to drop or alter its schema. 
...

Typed tables and the dependency framework can address this concern. The schema of a typed table cannot be changed. If the subscription is marked as a dependency of the log table, the table cannot be dropped while the subscription exists.
 
2. A further challenge is how to exclude these tables from publishing
changes. If we support a subscription-level log history table and the
user publishes ALL TABLES, the output plugin uses
is_publishable_relation() to check if a table is publishable. However,
applying the same logic here would require checking each subscription
on the node to see if the table is designated as a conflict log
history table for any subscription, which could be costly.

 Checking the type of a table and/or whether a subscription object depends on it in a certain way would be a far less costly operation to add to is_publishable_relation()
 
3. And one last thing is about should we consider dropping this table
when we drop the subscription, I think this makes sense as we are
internally creating it while creating the subscription.

Having to clean up the log table explicitly is likely to annoy users far less than having the conflict data destroyed as a side effect of another operation. I would strongly suggest leaving the table in place when the subscription is dropped.

Regards
Alastair

Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sun, Sep 7, 2025 at 1:42 PM Alastair Turner <minion@decodable.me> wrote:
>
> Hi Dilip
>
> Thanks for working on this, I think it will make conflict detection a lot more useful.

Thanks for the suggestions, please find my reply inline.

> On Sat, 6 Sept 2025, 10:38 Dilip Kumar, <dilipbalaut@gmail.com> wrote:
>>
>> While working on the patch, I see there are some open questions
>>
>> 1. We decided to pass the conflict history table name during
>> subscription creation. And it makes sense to create this table when
>> the CREATE SUBSCRIPTION command is executed. A potential concern is
>> that the subscription owner will also own this table, having full
>> control over it, including the ability to drop or alter its schema.

>
> Typed tables and the dependency framework can address this concern. The schema of a typed table cannot be changed. If
thesubscription is marked as a dependency of the log table, the table cannot be dropped while the subscription exists. 

Yeah type table can be useful here, but only concern is when do we
create this type.  One option is whenever we can create a catalog
relation say "conflict_log_history" that will create a type and then
for each subscription if we need to create the conflict history table
we can create it as "conflict_log_history" type, but this might not be
a best option as we are creating catalog just for using this type.
Second option is to create a type while creating a table itself but
then again the problem remains the same as subscription owners get
control over altering the schema of the type itself.  So the goal is
we want this type to be created such that it can not be altered so
IMHO option1 is more suitable i.e. creating conflict_log_history as
catalog and per subscription table can be created as this type.

>>
>> 2. A further challenge is how to exclude these tables from publishing
>> changes. If we support a subscription-level log history table and the
>> user publishes ALL TABLES, the output plugin uses
>> is_publishable_relation() to check if a table is publishable. However,
>> applying the same logic here would require checking each subscription
>> on the node to see if the table is designated as a conflict log
>> history table for any subscription, which could be costly.
>
>
>  Checking the type of a table and/or whether a subscription object depends on it in a certain way would be a far less
costlyoperation to add to is_publishable_relation() 
+1

>
>>
>> 3. And one last thing is about should we consider dropping this table
>> when we drop the subscription, I think this makes sense as we are
>> internally creating it while creating the subscription.
>
>
> Having to clean up the log table explicitly is likely to annoy users far less than having the conflict data destroyed
asa side effect of another operation. I would strongly suggest leaving the table in place when the subscription is
dropped.

Thanks for the input, I would like to hear opinions from others as
well here.  I agree that implicitly getting rid of the conflict
history might be problematic but we also need to consider that we are
considering dropping this when the whole subscription is dropped.  Not
sure even after subscription drop users will be interested in conflict
history, if yes then they need to be aware of preserving that isn't
it.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Sep 8, 2025 at 12:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sun, Sep 7, 2025 at 1:42 PM Alastair Turner <minion@decodable.me> wrote:
> >
> > Hi Dilip
> >
> > Thanks for working on this, I think it will make conflict detection a lot more useful.
>
> Thanks for the suggestions, please find my reply inline.
>
> > On Sat, 6 Sept 2025, 10:38 Dilip Kumar, <dilipbalaut@gmail.com> wrote:
> >>
> >> While working on the patch, I see there are some open questions
> >>
> >> 1. We decided to pass the conflict history table name during
> >> subscription creation. And it makes sense to create this table when
> >> the CREATE SUBSCRIPTION command is executed. A potential concern is
> >> that the subscription owner will also own this table, having full
> >> control over it, including the ability to drop or alter its schema.
>
> >
> > Typed tables and the dependency framework can address this concern. The schema of a typed table cannot be changed.
Ifthe subscription is marked as a dependency of the log table, the table cannot be dropped while the subscription
exists.
>
> Yeah type table can be useful here, but only concern is when do we
> create this type.
>

How about having this as a built-in type?

>  One option is whenever we can create a catalog
> relation say "conflict_log_history" that will create a type and then
> for each subscription if we need to create the conflict history table
> we can create it as "conflict_log_history" type, but this might not be
> a best option as we are creating catalog just for using this type.
> Second option is to create a type while creating a table itself but
> then again the problem remains the same as subscription owners get
> control over altering the schema of the type itself.  So the goal is
> we want this type to be created such that it can not be altered so
> IMHO option1 is more suitable i.e. creating conflict_log_history as
> catalog and per subscription table can be created as this type.
>

I think having it as a catalog table has drawbacks like who will clean
this ever growing table. The one thing is not clear from Alastair's
response is that he said to make subscription as a dependency of
table, if we do so, then won't it be difficult to even drop
subscription and also doesn't that sound reverse of what we want.

> >>
> >> 2. A further challenge is how to exclude these tables from publishing
> >> changes. If we support a subscription-level log history table and the
> >> user publishes ALL TABLES, the output plugin uses
> >> is_publishable_relation() to check if a table is publishable. However,
> >> applying the same logic here would require checking each subscription
> >> on the node to see if the table is designated as a conflict log
> >> history table for any subscription, which could be costly.
> >
> >
> >  Checking the type of a table and/or whether a subscription object depends on it in a certain way would be a far
lesscostly operation to add to is_publishable_relation() 
> +1
>
> >
> >>
> >> 3. And one last thing is about should we consider dropping this table
> >> when we drop the subscription, I think this makes sense as we are
> >> internally creating it while creating the subscription.
> >
> >
> > Having to clean up the log table explicitly is likely to annoy users far less than having the conflict data
destroyedas a side effect of another operation. I would strongly suggest leaving the table in place when the
subscriptionis dropped. 
>
> Thanks for the input, I would like to hear opinions from others as
> well here.
>

But OTOH, there could be users who want such a table to be dropped.
One possibility is that if we user provided us a pre-created table
then we leave it to user to remove the table, otherwise, we can remove
with drop subscription. BTW, did we decide that we want a
conflict-table-per-subscription or one table for all subscriptions, if
later, then I guess the problem would be that it has to be a shared
table across databases.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Sep 10, 2025 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Sep 8, 2025 at 12:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Sun, Sep 7, 2025 at 1:42 PM Alastair Turner <minion@decodable.me> wrote:
> > >
> > > Hi Dilip
> > >
> > > Thanks for working on this, I think it will make conflict detection a lot more useful.
> >
> > Thanks for the suggestions, please find my reply inline.
> >
> > > On Sat, 6 Sept 2025, 10:38 Dilip Kumar, <dilipbalaut@gmail.com> wrote:
> > >>
> > >> While working on the patch, I see there are some open questions
> > >>
> > >> 1. We decided to pass the conflict history table name during
> > >> subscription creation. And it makes sense to create this table when
> > >> the CREATE SUBSCRIPTION command is executed. A potential concern is
> > >> that the subscription owner will also own this table, having full
> > >> control over it, including the ability to drop or alter its schema.
> >
> > >
> > > Typed tables and the dependency framework can address this concern. The schema of a typed table cannot be
changed.If the subscription is marked as a dependency of the log table, the table cannot be dropped while the
subscriptionexists. 
> >
> > Yeah type table can be useful here, but only concern is when do we
> > create this type.
> >
>
> How about having this as a built-in type?

Here we will have to create a built-in type of type table which is I
think typcategory => 'C' and if we create this type it should be
supplied with the "typrelid" that means there should be a backing
catalog table. At least thats what I think.

> >  One option is whenever we can create a catalog
> > relation say "conflict_log_history" that will create a type and then
> > for each subscription if we need to create the conflict history table
> > we can create it as "conflict_log_history" type, but this might not be
> > a best option as we are creating catalog just for using this type.
> > Second option is to create a type while creating a table itself but
> > then again the problem remains the same as subscription owners get
> > control over altering the schema of the type itself.  So the goal is
> > we want this type to be created such that it can not be altered so
> > IMHO option1 is more suitable i.e. creating conflict_log_history as
> > catalog and per subscription table can be created as this type.
> >
>
> I think having it as a catalog table has drawbacks like who will clean
> this ever growing table.

No, I didn't mean an ever growing catalog table, I was giving an
option to create a catalog table just to create a built-in type and
then we will create an actual log history table of this built-in type
for each subscription while creating the subscription.  So this
catalog table will be there but nothing will be inserted to this table
and whenever the user supplies a conflict log history table name while
creating a subscription that time we will create an actual table and
the type of the table will be as the catalog table type.  I agree
creating a catalog table for this purpose might not be worth it, but I
am not yet able to figure out how to create a built-in type of type
table without creating the actual table.

 The one thing is not clear from Alastair's
> response is that he said to make subscription as a dependency of
> table, if we do so, then won't it be difficult to even drop
> subscription and also doesn't that sound reverse of what we want.

I assume he means subscription will be dependent on the log table,
that means we can not drop the log table as subscription is dependent
on this table.

> > >>
> > >> 2. A further challenge is how to exclude these tables from publishing
> > >> changes. If we support a subscription-level log history table and the
> > >> user publishes ALL TABLES, the output plugin uses
> > >> is_publishable_relation() to check if a table is publishable. However,
> > >> applying the same logic here would require checking each subscription
> > >> on the node to see if the table is designated as a conflict log
> > >> history table for any subscription, which could be costly.
> > >
> > >
> > >  Checking the type of a table and/or whether a subscription object depends on it in a certain way would be a far
lesscostly operation to add to is_publishable_relation() 
> > +1
> >
> > >
> > >>
> > >> 3. And one last thing is about should we consider dropping this table
> > >> when we drop the subscription, I think this makes sense as we are
> > >> internally creating it while creating the subscription.
> > >
> > >
> > > Having to clean up the log table explicitly is likely to annoy users far less than having the conflict data
destroyedas a side effect of another operation. I would strongly suggest leaving the table in place when the
subscriptionis dropped. 
> >
> > Thanks for the input, I would like to hear opinions from others as
> > well here.
> >
>
> But OTOH, there could be users who want such a table to be dropped.
> One possibility is that if we user provided us a pre-created table
> then we leave it to user to remove the table, otherwise, we can remove
> with drop subscription.

Thanks make sense.

 BTW, did we decide that we want a
> conflict-table-per-subscription or one table for all subscriptions, if
> later, then I guess the problem would be that it has to be a shared
> table across databases.

Right and I don't think there is an option to create a user defined
shared table.  And I don't think there is any issue creating per
subscription conflict log history table, except that the subscription
owner should have permission to create the table in the database while
creating the subscription, but I think this is expected, either user
can get the sufficient privilege or disable the option for conflict
log history table.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Alastair Turner
Дата:


On Wed, 10 Sept 2025 at 11:15, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 10, 2025 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
... 
>
> How about having this as a built-in type?

Here we will have to create a built-in type of type table which is I
think typcategory => 'C' and if we create this type it should be
supplied with the "typrelid" that means there should be a backing
catalog table. At least thats what I think.
A compound type can be used for building a table, it's not necessary to create a table when creating the type. In user SQL:

CREATE TYPE conflict_log_type AS (
  conflictid UUID,
  subid OID,
  tableid OID,
  conflicttype TEXT,
  operationtype TEXT,
  replication_origin   TEXT,
  remote_commit_ts TIMESTAMPTZ,
  local_commit_ts TIMESTAMPTZ,
  ri_key                    JSON,
  remote_tuple         JSON,
  local_tuple          JSON
);

CREATE TABLE my_subscription_conflicts OF conflict_log_type;
 
...

 The one thing is not clear from Alastair's
> response is that he said to make subscription as a dependency of
> table, if we do so, then won't it be difficult to even drop
> subscription and also doesn't that sound reverse of what we want.

I assume he means subscription will be dependent on the log table,
that means we can not drop the log table as subscription is dependent
on this table.
 
Yes, that's what I was proposing.
 
> > >>
> > >> 2. A further challenge is how to exclude these tables from publishing
> > >> changes. If we support a subscription-level log history table and the
> > >> user publishes ALL TABLES, the output plugin uses
> > >> is_publishable_relation() to check if a table is publishable. However,
> > >> applying the same logic here would require checking each subscription
> > >> on the node to see if the table is designated as a conflict log
> > >> history table for any subscription, which could be costly.
> > >
> > >
> > >  Checking the type of a table and/or whether a subscription object depends on it in a certain way would be a far less costly operation to add to is_publishable_relation()
> > +1
> >
> > >
> > >>
> > >> 3. And one last thing is about should we consider dropping this table
> > >> when we drop the subscription, I think this makes sense as we are
> > >> internally creating it while creating the subscription.
> > >
> > >
> > > Having to clean up the log table explicitly is likely to annoy users far less than having the conflict data destroyed as a side effect of another operation. I would strongly suggest leaving the table in place when the subscription is dropped.
> >
> > Thanks for the input, I would like to hear opinions from others as
> > well here.
> >
>
> But OTOH, there could be users who want such a table to be dropped.
> One possibility is that if we user provided us a pre-created table
> then we leave it to user to remove the table, otherwise, we can remove
> with drop subscription.

Thanks make sense.

 BTW, did we decide that we want a
> conflict-table-per-subscription or one table for all subscriptions, if
> later, then I guess the problem would be that it has to be a shared
> table across databases.

Right and I don't think there is an option to create a user defined
shared table.  And I don't think there is any issue creating per
subscription conflict log history table, except that the subscription
owner should have permission to create the table in the database while
creating the subscription, but I think this is expected, either user
can get the sufficient privilege or disable the option for conflict
log history table.

Since  subscriptions are created in a particular database, it seems reasonable that error tables would also be created in a particular database.

Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Sep 10, 2025 at 4:32 PM Alastair Turner <minion@decodable.me> wrote:
>
>> Here we will have to create a built-in type of type table which is I
>> think typcategory => 'C' and if we create this type it should be
>> supplied with the "typrelid" that means there should be a backing
>> catalog table. At least thats what I think.
>
> A compound type can be used for building a table, it's not necessary to create a table when creating the type. In
userSQL: 
>
> CREATE TYPE conflict_log_type AS (
>   conflictid UUID,
>   subid OID,
>   tableid OID,
>   conflicttype TEXT,
>   operationtype TEXT,
>   replication_origin   TEXT,
>   remote_commit_ts TIMESTAMPTZ,
>   local_commit_ts TIMESTAMPTZ,
>   ri_key                    JSON,
>   remote_tuple         JSON,
>   local_tuple          JSON
> );
>
> CREATE TABLE my_subscription_conflicts OF conflict_log_type;

Problem is if you CREATE TYPE just before creating the table that
means subscription owners get full control over the type as well it
means they can alter the type itself.  So logically this TYPE should
be a built-in type so that subscription owners do not have control to
ALTER the type but they have permission to create a table from this
type.  But the problem is whenever you create a type it needs to have
corresponding relid in pg_class in fact you can just create a type as
per your example and see[1] it will get corresponding entry in
pg_class.

So the problem is if you create a user defined type it will be created
under the subscription owner and it defeats the purpose of not
allowing to alter the type OTOH if we create a built-in type it needs
to have a corresponding entry in pg_class.

So what's your proposal, create this type while creating a
subscription or as a built-in type, or anything else?


[1]
postgres[1948123]=# CREATE TYPE conflict_log_type AS (conflictid UUID);
postgres[1948123]=# select oid, typrelid, typcategory from pg_type
where typname='conflict_log_type';

  oid  | typrelid | typcategory
-------+----------+-------------
 16386 |    16384 | C
(1 row)

postgres[1948123]=# select relname from pg_class where oid=16384;
      relname
-------------------
 conflict_log_type


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Bharath Rupireddy
Дата:
Hi,

On Tue, Aug 5, 2025 at 5:24 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Currently we log conflicts to the server's log file and updates, this
> approach has limitations, 1) Difficult to query and analyze, parsing
> plain text log files for conflict details is inefficient. 2) Lack of
> structured data, key conflict attributes (table, operation, old/new
> data, LSN, etc.) are not readily available in a structured, queryable
> format. 3) Difficult for external monitoring tools or custom
> resolution scripts to consume conflict data directly.
>
> This proposal aims to address these limitations by introducing a
> conflict log history table, providing a structured, and queryable
> record of all logical replication conflicts.  This should be a
> configurable option whether to log into the conflict log history
> table, server logs or both.

+1 for the overall idea. Having an option to separate out the
conflicts helps analyze the data correctness issues and understand the
behavior of conflicts.

Parsing server logs file for analysis and debugging is a typical
requirement differently met with tools like log_fdw or capture server
logs in CSV format for parsing or do text search and analyze etc.

> This proposal has two main design questions:
> ===================================
>
> 1. How do we store conflicting tuples from different tables?
> Using a JSON column to store the row data seems like the most flexible
> solution, as it can accommodate different table schemas.

How good is storing conflicts on the table? Is it okay to generate WAL
traffic? Is it okay to physically replicate this log table to all
replicas? Is it okay to logically replicate this log table to all
subscribers and logical decoding clients? How does this table get
truncated? If truncation gets delayed, won't it unnecessarily fill up
storage?

> 2. Should this be a system table or a user table?
> a) System Table: Storing this in a system catalog is simple, but
> catalogs aren't designed for ever-growing data. While pg_large_object
> is an exception, this is not what we generally do IMHO.
> b) User Table: This offers more flexibility. We could allow a user to
> specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> either create the table internally or let the user create the table
> with a predefined schema.

-1 for the system table for sure.

> A potential drawback is that a user might drop or alter the table.
> However, we could mitigate this risk by simply logging a WARNING if
> the table is configured but an insertion fails.
> I am currently working on a POC patch for the same, but will post that
> once we have some thoughts on design choices.

How about streaming the conflicts in fixed format to a separate log
file other than regular postgres server log file?  All the
rules/settings that apply to regular postgres server log files also
apply for conflicts server log files (rotation, GUCs, format
CSV/JSON/TEXT etc.). This way there's no additional WAL, and we don't
have to worry about drop/alter, truncate, delete, update/insert,
permission model, physical replication, logical replication, storage
space etc.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Sep 11, 2025 at 12:53 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Aug 5, 2025 at 5:24 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Currently we log conflicts to the server's log file and updates, this
> > approach has limitations, 1) Difficult to query and analyze, parsing
> > plain text log files for conflict details is inefficient. 2) Lack of
> > structured data, key conflict attributes (table, operation, old/new
> > data, LSN, etc.) are not readily available in a structured, queryable
> > format. 3) Difficult for external monitoring tools or custom
> > resolution scripts to consume conflict data directly.
> >
> > This proposal aims to address these limitations by introducing a
> > conflict log history table, providing a structured, and queryable
> > record of all logical replication conflicts.  This should be a
> > configurable option whether to log into the conflict log history
> > table, server logs or both.
>
> +1 for the overall idea. Having an option to separate out the
> conflicts helps analyze the data correctness issues and understand the
> behavior of conflicts.
>
> Parsing server logs file for analysis and debugging is a typical
> requirement differently met with tools like log_fdw or capture server
> logs in CSV format for parsing or do text search and analyze etc.
>
> > This proposal has two main design questions:
> > ===================================
> >
> > 1. How do we store conflicting tuples from different tables?
> > Using a JSON column to store the row data seems like the most flexible
> > solution, as it can accommodate different table schemas.
>
> How good is storing conflicts on the table? Is it okay to generate WAL
> traffic?
>

Yesh, I think so. One would like to query conflicts and resolutions
for those conflicts at a later point to ensure consistency. BTW, if
you are worried about WAL traffic, please note conflicts shouldn't be
a very often event, so additional WAL should be okay. OTOH, if the
conflicts are frequent, anyway, the performance won't be that great as
that means there is a kind of ERROR which we have to deal by having
resolution for it.

> Is it okay to physically replicate this log table to all
> replicas?
>

Yes, that should be okay as we want the conflict_tables to be present
after failover.

 Is it okay to logically replicate this log table to all
> subscribers and logical decoding clients?
>

I think we should avoid this.

> How does this table get
> truncated? If truncation gets delayed, won't it unnecessarily fill up
> storage?
>

I think it should be users responsibility to clean this table as they
better know when the data in the table is obsolete. Eventually, we can
also have some policies via options or some other way to get it
truncated. IIRC, we also discussed having these as partition tables so
that it is easy to discard data. However, for initial version, we may
want something simpler.

> > 2. Should this be a system table or a user table?
> > a) System Table: Storing this in a system catalog is simple, but
> > catalogs aren't designed for ever-growing data. While pg_large_object
> > is an exception, this is not what we generally do IMHO.
> > b) User Table: This offers more flexibility. We could allow a user to
> > specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> > either create the table internally or let the user create the table
> > with a predefined schema.
>
> -1 for the system table for sure.
>
> > A potential drawback is that a user might drop or alter the table.
> > However, we could mitigate this risk by simply logging a WARNING if
> > the table is configured but an insertion fails.
> > I am currently working on a POC patch for the same, but will post that
> > once we have some thoughts on design choices.
>
> How about streaming the conflicts in fixed format to a separate log
> file other than regular postgres server log file?
>

I would prefer this info to be stored in tables as it would be easy to
query them. If we use separate LOGs then we should provide some views
to query the LOG.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Sep 11, 2025 at 8:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 11, 2025 at 12:53 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Aug 5, 2025 at 5:24 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > Currently we log conflicts to the server's log file and updates, this
> > > approach has limitations, 1) Difficult to query and analyze, parsing
> > > plain text log files for conflict details is inefficient. 2) Lack of
> > > structured data, key conflict attributes (table, operation, old/new
> > > data, LSN, etc.) are not readily available in a structured, queryable
> > > format. 3) Difficult for external monitoring tools or custom
> > > resolution scripts to consume conflict data directly.
> > >
> > > This proposal aims to address these limitations by introducing a
> > > conflict log history table, providing a structured, and queryable
> > > record of all logical replication conflicts.  This should be a
> > > configurable option whether to log into the conflict log history
> > > table, server logs or both.
> >
> > +1 for the overall idea. Having an option to separate out the
> > conflicts helps analyze the data correctness issues and understand the
> > behavior of conflicts.
> >
> > Parsing server logs file for analysis and debugging is a typical
> > requirement differently met with tools like log_fdw or capture server
> > logs in CSV format for parsing or do text search and analyze etc.
> >
> > > This proposal has two main design questions:
> > > ===================================
> > >
> > > 1. How do we store conflicting tuples from different tables?
> > > Using a JSON column to store the row data seems like the most flexible
> > > solution, as it can accommodate different table schemas.
> >
> > How good is storing conflicts on the table? Is it okay to generate WAL
> > traffic?
> >
>
> Yesh, I think so. One would like to query conflicts and resolutions
> for those conflicts at a later point to ensure consistency. BTW, if
> you are worried about WAL traffic, please note conflicts shouldn't be
> a very often event, so additional WAL should be okay. OTOH, if the
> conflicts are frequent, anyway, the performance won't be that great as
> that means there is a kind of ERROR which we have to deal by having
> resolution for it.
>
> > Is it okay to physically replicate this log table to all
> > replicas?
> >
>
> Yes, that should be okay as we want the conflict_tables to be present
> after failover.
>
>  Is it okay to logically replicate this log table to all
> > subscribers and logical decoding clients?
> >
>
> I think we should avoid this.
>
> > How does this table get
> > truncated? If truncation gets delayed, won't it unnecessarily fill up
> > storage?
> >
>
> I think it should be users responsibility to clean this table as they
> better know when the data in the table is obsolete. Eventually, we can
> also have some policies via options or some other way to get it
> truncated. IIRC, we also discussed having these as partition tables so
> that it is easy to discard data. However, for initial version, we may
> want something simpler.
>
> > > 2. Should this be a system table or a user table?
> > > a) System Table: Storing this in a system catalog is simple, but
> > > catalogs aren't designed for ever-growing data. While pg_large_object
> > > is an exception, this is not what we generally do IMHO.
> > > b) User Table: This offers more flexibility. We could allow a user to
> > > specify the table name during CREATE SUBSCRIPTION.  Then we choose to
> > > either create the table internally or let the user create the table
> > > with a predefined schema.
> >
> > -1 for the system table for sure.
> >
> > > A potential drawback is that a user might drop or alter the table.
> > > However, we could mitigate this risk by simply logging a WARNING if
> > > the table is configured but an insertion fails.
> > > I am currently working on a POC patch for the same, but will post that
> > > once we have some thoughts on design choices.
> >
> > How about streaming the conflicts in fixed format to a separate log
> > file other than regular postgres server log file?
> >
>
> I would prefer this info to be stored in tables as it would be easy to
> query them. If we use separate LOGs then we should provide some views
> to query the LOG.

I was looking into another thread where we provide an error table for
COPY [1], it requires the user to pre-create the error table. And
inside the COPY command we will validate the table, validation in that
context is a one-time process checking for: (1) table existence, (2)
ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
matching column names and data types. This approach avoids concerns
about the user's DROP or ALTER permissions.

Our requirement for the logical replication conflict log table
differs, as we must validate the target table upon every conflict
insertion, not just at subscription creation. A more robust
alternative is to perform validation and acquire a lock on the
conflict table whenever the subscription worker starts. This prevents
modifications (like ALTER or DROP) while the worker is active. When
the worker gets restarted, we can re-validate the table and
automatically disable the conflict logging feature if validation
fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
option again.

And if we want in first version we can expect user to create the table
as per the expected schema and supply it, this will avoid the need of
handling how to avoid it from publishing as it will be user's
responsibility and then in top up patches we can also allow to create
the table internally if tables doesn't exist and then we can find out
solution to avoid it from being publish when ALL TABLES are published.

Thoughts?

[1] https://www.postgresql.org/message-id/CACJufxEo-rsH5v__S3guUhDdXjakC7m7N5wj%3DmOB5rPiySBoQg%40mail.gmail.com

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Bharath Rupireddy
Дата:
Hi,

On Wed, Sep 10, 2025 at 8:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > How about streaming the conflicts in fixed format to a separate log
> > file other than regular postgres server log file?
>
> I would prefer this info to be stored in tables as it would be easy to
> query them. If we use separate LOGs then we should provide some views
> to query the LOG.

Providing views to query the conflicts LOG is the easiest way than
having tables (Probably we must provide both - logging conflicts to
tables and separate LOG files). However, wanting the conflicts logs
after failovers is something that makes me think the table approach is
better. I'm open to more thoughts here.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Bharath Rupireddy
Дата:
Hi,

On Fri, Sep 12, 2025 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I was looking into another thread where we provide an error table for
> COPY [1], it requires the user to pre-create the error table. And
> inside the COPY command we will validate the table, validation in that
> context is a one-time process checking for: (1) table existence, (2)
> ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
> matching column names and data types. This approach avoids concerns
> about the user's DROP or ALTER permissions.
>
> Our requirement for the logical replication conflict log table
> differs, as we must validate the target table upon every conflict
> insertion, not just at subscription creation. A more robust
> alternative is to perform validation and acquire a lock on the
> conflict table whenever the subscription worker starts. This prevents
> modifications (like ALTER or DROP) while the worker is active. When
> the worker gets restarted, we can re-validate the table and
> automatically disable the conflict logging feature if validation
> fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
> option again.

Having to worry about ALTER/DROP and adding code to protect seems like
an overkill.

> And if we want in first version we can expect user to create the table
> as per the expected schema and supply it, this will avoid the need of
> handling how to avoid it from publishing as it will be user's
> responsibility and then in top up patches we can also allow to create
> the table internally if tables doesn't exist and then we can find out
> solution to avoid it from being publish when ALL TABLES are published.

This looks much more simple to start with.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sat, Sep 13, 2025 at 6:16 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks for the feedback Bharath

> On Fri, Sep 12, 2025 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I was looking into another thread where we provide an error table for
> > COPY [1], it requires the user to pre-create the error table. And
> > inside the COPY command we will validate the table, validation in that
> > context is a one-time process checking for: (1) table existence, (2)
> > ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
> > matching column names and data types. This approach avoids concerns
> > about the user's DROP or ALTER permissions.
> >
> > Our requirement for the logical replication conflict log table
> > differs, as we must validate the target table upon every conflict
> > insertion, not just at subscription creation. A more robust
> > alternative is to perform validation and acquire a lock on the
> > conflict table whenever the subscription worker starts. This prevents
> > modifications (like ALTER or DROP) while the worker is active. When
> > the worker gets restarted, we can re-validate the table and
> > automatically disable the conflict logging feature if validation
> > fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
> > option again.
>
> Having to worry about ALTER/DROP and adding code to protect seems like
> an overkill.

IMHO eventually if we can control that I feel this is a good goal to
have.  So that we can avoid failure during conflict insertion.  We may
argue its user's responsibility to not alter the table and we can just
check the validity during create/alter subscription.

> > And if we want in first version we can expect user to create the table
> > as per the expected schema and supply it, this will avoid the need of
> > handling how to avoid it from publishing as it will be user's
> > responsibility and then in top up patches we can also allow to create
> > the table internally if tables doesn't exist and then we can find out
> > solution to avoid it from being publish when ALL TABLES are published.
>
> This looks much more simple to start with.

Right.

PFA, attached WIP patches, 0001 allow user created tables to provide
as input for conflict history tables and we will validate the table
during create/alter subscription.  0002 add an option to internally
create the table if it does not exist.

TODO:
- Still patches are WIP and need more work testing for different failure cases
- Need to explore an option to create a built-in type (I will start a
separate thread for the same)
- Need to add test cases
- Need to explore options to avoid getting published, but maybe we
only need to avoid this when we internally create the table?

Here is some basic test I tried:

psql -d postgres -c "CREATE TABLE test(a int, b int, primary key(a));"
psql -d postgres -p 5433 -c "CREATE SCHEMA myschema"
psql -d postgres -p 5433 -c "CREATE TABLE test(a int, b int, primary key(a));"
psql -d postgres -p 5433 -c "GRANT INSERT, UPDATE, SELECT, DELETE ON
test TO dk "
psql -d postgres -c "CREATE PUBLICATION pub FOR ALL TABLES ;"

psql -d postgres -p 5433 -c "CREATE SUBSCRIPTION sub CONNECTION
'dbname=postgres port=5432' PUBLICATION pub
WITH(conflict_log_table=myschema.conflict_log_history)";
psql -d postgres -p 5432 -c "INSERT INTO test VALUES(1,2);"
psql -d postgres -p 5433 -c "UPDATE test SET b=10 WHERE a=1;"
psql -d postgres -p 5432 -c "UPDATE test SET b=20 WHERE a=1;"

postgres[1202034]=# select * from myschema.conflict_log_history ;
-[ RECORD 1 ]-----+------------------------------
relid             | 16385
local_xid         | 763
remote_xid        | 757
local_lsn         | 0/00000000
remote_commit_lsn | 0/0174AB30
local_commit_ts   | 2025-09-14 06:45:00.828874+00
remote_commit_ts  | 2025-09-14 06:45:05.845614+00
table_schema      | public
table_name        | test
conflict_type     | update_origin_differs
local_origin      |
remote_origin     | pg_16396
key_tuple         | {"a":1,"b":20}
local_tuple       | {"a":1,"b":10}
remote_tuple      | {"a":1,"b":20}


--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Sun, Sep 14, 2025 at 12:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Sep 13, 2025 at 6:16 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Thanks for the feedback Bharath
>
> > On Fri, Sep 12, 2025 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > I was looking into another thread where we provide an error table for
> > > COPY [1], it requires the user to pre-create the error table. And
> > > inside the COPY command we will validate the table, validation in that
> > > context is a one-time process checking for: (1) table existence, (2)
> > > ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
> > > matching column names and data types. This approach avoids concerns
> > > about the user's DROP or ALTER permissions.
> > >
> > > Our requirement for the logical replication conflict log table
> > > differs, as we must validate the target table upon every conflict
> > > insertion, not just at subscription creation. A more robust
> > > alternative is to perform validation and acquire a lock on the
> > > conflict table whenever the subscription worker starts. This prevents
> > > modifications (like ALTER or DROP) while the worker is active. When
> > > the worker gets restarted, we can re-validate the table and
> > > automatically disable the conflict logging feature if validation
> > > fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
> > > option again.
> >
> > Having to worry about ALTER/DROP and adding code to protect seems like
> > an overkill.
>
> IMHO eventually if we can control that I feel this is a good goal to
> have.  So that we can avoid failure during conflict insertion.  We may
> argue its user's responsibility to not alter the table and we can just
> check the validity during create/alter subscription.
>

If we compare conflict_history_table with the slot that gets created
with subscription, one can say the same thing about slots. Users can
drop the slots and whole replication will stop. I think this table
will be created with the same privileges as the owner of a
subscription which can be either a superuser or a user with the
privileges of the pg_create_subscription role, so we can rely on such
users.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Sep 18, 2025 at 2:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Sep 14, 2025 at 12:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Sat, Sep 13, 2025 at 6:16 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Thanks for the feedback Bharath
> >
> > > On Fri, Sep 12, 2025 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I was looking into another thread where we provide an error table for
> > > > COPY [1], it requires the user to pre-create the error table. And
> > > > inside the COPY command we will validate the table, validation in that
> > > > context is a one-time process checking for: (1) table existence, (2)
> > > > ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
> > > > matching column names and data types. This approach avoids concerns
> > > > about the user's DROP or ALTER permissions.
> > > >
> > > > Our requirement for the logical replication conflict log table
> > > > differs, as we must validate the target table upon every conflict
> > > > insertion, not just at subscription creation. A more robust
> > > > alternative is to perform validation and acquire a lock on the
> > > > conflict table whenever the subscription worker starts. This prevents
> > > > modifications (like ALTER or DROP) while the worker is active. When
> > > > the worker gets restarted, we can re-validate the table and
> > > > automatically disable the conflict logging feature if validation
> > > > fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
> > > > option again.
> > >
> > > Having to worry about ALTER/DROP and adding code to protect seems like
> > > an overkill.
> >
> > IMHO eventually if we can control that I feel this is a good goal to
> > have.  So that we can avoid failure during conflict insertion.  We may
> > argue its user's responsibility to not alter the table and we can just
> > check the validity during create/alter subscription.
> >
>
> If we compare conflict_history_table with the slot that gets created
> with subscription, one can say the same thing about slots. Users can
> drop the slots and whole replication will stop. I think this table
> will be created with the same privileges as the owner of a
> subscription which can be either a superuser or a user with the
> privileges of the pg_create_subscription role, so we can rely on such
> users.

Yeah that's a valid point.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Masahiko Sawada
Дата:
On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Sep 14, 2025 at 12:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Sat, Sep 13, 2025 at 6:16 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Thanks for the feedback Bharath
> >
> > > On Fri, Sep 12, 2025 at 3:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I was looking into another thread where we provide an error table for
> > > > COPY [1], it requires the user to pre-create the error table. And
> > > > inside the COPY command we will validate the table, validation in that
> > > > context is a one-time process checking for: (1) table existence, (2)
> > > > ability to acquire a sufficient lock, (3) INSERT privileges, and (4)
> > > > matching column names and data types. This approach avoids concerns
> > > > about the user's DROP or ALTER permissions.
> > > >
> > > > Our requirement for the logical replication conflict log table
> > > > differs, as we must validate the target table upon every conflict
> > > > insertion, not just at subscription creation. A more robust
> > > > alternative is to perform validation and acquire a lock on the
> > > > conflict table whenever the subscription worker starts. This prevents
> > > > modifications (like ALTER or DROP) while the worker is active. When
> > > > the worker gets restarted, we can re-validate the table and
> > > > automatically disable the conflict logging feature if validation
> > > > fails.  And this can be enabled by ALTER SUBSCRIPTION by setting the
> > > > option again.
> > >
> > > Having to worry about ALTER/DROP and adding code to protect seems like
> > > an overkill.
> >
> > IMHO eventually if we can control that I feel this is a good goal to
> > have.  So that we can avoid failure during conflict insertion.  We may
> > argue its user's responsibility to not alter the table and we can just
> > check the validity during create/alter subscription.
> >
>
> If we compare conflict_history_table with the slot that gets created
> with subscription, one can say the same thing about slots. Users can
> drop the slots and whole replication will stop. I think this table
> will be created with the same privileges as the owner of a
> subscription which can be either a superuser or a user with the
> privileges of the pg_create_subscription role, so we can rely on such
> users.

We might want to consider which role inserts the conflict info into
the history table. For example, if any table created by a user can be
used as the history table for a subscription and the conflict info
insertion is performed by the subscription owner, we would end up
having the same security issue that was addressed by the run_as_owner
subscription option.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Sep 18, 2025 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > If we compare conflict_history_table with the slot that gets created
> > with subscription, one can say the same thing about slots. Users can
> > drop the slots and whole replication will stop. I think this table
> > will be created with the same privileges as the owner of a
> > subscription which can be either a superuser or a user with the
> > privileges of the pg_create_subscription role, so we can rely on such
> > users.
>
> We might want to consider which role inserts the conflict info into
> the history table. For example, if any table created by a user can be
> used as the history table for a subscription and the conflict info
> insertion is performed by the subscription owner, we would end up
> having the same security issue that was addressed by the run_as_owner
> subscription option.
>

Yeah, I don't think we want to open that door. For user created
tables, we should perform actions with table_owner's privilege. In
such a case, if one wants to create a subscription with run_as_owner
option, she should give DML operation permissions to the subscription
owner. OTOH, if we create this table internally (via subscription
owner) then irrespective of run_as_owner, we will always insert as
subscription_owner.

AFAIR, one open point for internally created tables is whether we
should skip changes to conflict_history table while replicating
changes? The table will be considered under for ALL TABLES
publications, if defined? Ideally, these should behave as catalog
tables, so one option is to mark them as 'user_catalog_table', or the
other option is we have some hard-code checks during replication. The
first option has the advantage that it won't write additional WAL for
these tables which is otherwise required under wal_level=logical. What
other options do we have?

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Masahiko Sawada
Дата:
On Sat, Sep 20, 2025 at 4:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 18, 2025 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > If we compare conflict_history_table with the slot that gets created
> > > with subscription, one can say the same thing about slots. Users can
> > > drop the slots and whole replication will stop. I think this table
> > > will be created with the same privileges as the owner of a
> > > subscription which can be either a superuser or a user with the
> > > privileges of the pg_create_subscription role, so we can rely on such
> > > users.
> >
> > We might want to consider which role inserts the conflict info into
> > the history table. For example, if any table created by a user can be
> > used as the history table for a subscription and the conflict info
> > insertion is performed by the subscription owner, we would end up
> > having the same security issue that was addressed by the run_as_owner
> > subscription option.
> >
>
> Yeah, I don't think we want to open that door. For user created
> tables, we should perform actions with table_owner's privilege. In
> such a case, if one wants to create a subscription with run_as_owner
> option, she should give DML operation permissions to the subscription
> owner. OTOH, if we create this table internally (via subscription
> owner) then irrespective of run_as_owner, we will always insert as
> subscription_owner.

Agreed.

>
> AFAIR, one open point for internally created tables is whether we
> should skip changes to conflict_history table while replicating
> changes? The table will be considered under for ALL TABLES
> publications, if defined? Ideally, these should behave as catalog
> tables, so one option is to mark them as 'user_catalog_table', or the
> other option is we have some hard-code checks during replication. The
> first option has the advantage that it won't write additional WAL for
> these tables which is otherwise required under wal_level=logical. What
> other options do we have?

I think conflict history information is subscriber local information
so doesn't have to be replicated to another subscriber. Also it could
be problematic in cross-major-version replication cases if we break
the compatibility of history table definition. I would expect that the
history table works as a catalog table in terms of logical
decoding/replication. It would probably make sense to reuse the
user_catalog_table option for that purpose. If we have a history table
for each subscription that wants to record the conflict history (I
believe so), it would be hard to go with the second option (having
hard-code checks).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Tue, Sep 23, 2025 at 11:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Sep 20, 2025 at 4:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > AFAIR, one open point for internally created tables is whether we
> > should skip changes to conflict_history table while replicating
> > changes? The table will be considered under for ALL TABLES
> > publications, if defined? Ideally, these should behave as catalog
> > tables, so one option is to mark them as 'user_catalog_table', or the
> > other option is we have some hard-code checks during replication. The
> > first option has the advantage that it won't write additional WAL for
> > these tables which is otherwise required under wal_level=logical. What
> > other options do we have?
>
> I think conflict history information is subscriber local information
> so doesn't have to be replicated to another subscriber. Also it could
> be problematic in cross-major-version replication cases if we break
> the compatibility of history table definition.
>

Right, this is another reason not to replicate it.

> I would expect that the
> history table works as a catalog table in terms of logical
> decoding/replication. It would probably make sense to reuse the
> user_catalog_table option for that purpose. If we have a history table
> for each subscription that wants to record the conflict history (I
> believe so), it would be hard to go with the second option (having
> hard-code checks).
>

Agreed. Let's wait and see what Dilip or others have to say on this.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Sep 23, 2025 at 11:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Sep 20, 2025 at 4:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > If we compare conflict_history_table with the slot that gets created
> > > > with subscription, one can say the same thing about slots. Users can
> > > > drop the slots and whole replication will stop. I think this table
> > > > will be created with the same privileges as the owner of a
> > > > subscription which can be either a superuser or a user with the
> > > > privileges of the pg_create_subscription role, so we can rely on such
> > > > users.
> > >
> > > We might want to consider which role inserts the conflict info into
> > > the history table. For example, if any table created by a user can be
> > > used as the history table for a subscription and the conflict info
> > > insertion is performed by the subscription owner, we would end up
> > > having the same security issue that was addressed by the run_as_owner
> > > subscription option.
> > >
> >
> > Yeah, I don't think we want to open that door. For user created
> > tables, we should perform actions with table_owner's privilege. In
> > such a case, if one wants to create a subscription with run_as_owner
> > option, she should give DML operation permissions to the subscription
> > owner. OTOH, if we create this table internally (via subscription
> > owner) then irrespective of run_as_owner, we will always insert as
> > subscription_owner.
>
> Agreed.

Yeah that makes sense to me as well.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Sep 24, 2025 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Sep 23, 2025 at 11:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sat, Sep 20, 2025 at 4:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > AFAIR, one open point for internally created tables is whether we
> > > should skip changes to conflict_history table while replicating
> > > changes? The table will be considered under for ALL TABLES
> > > publications, if defined? Ideally, these should behave as catalog
> > > tables, so one option is to mark them as 'user_catalog_table', or the
> > > other option is we have some hard-code checks during replication. The
> > > first option has the advantage that it won't write additional WAL for
> > > these tables which is otherwise required under wal_level=logical. What
> > > other options do we have?
> >
> > I think conflict history information is subscriber local information
> > so doesn't have to be replicated to another subscriber. Also it could
> > be problematic in cross-major-version replication cases if we break
> > the compatibility of history table definition.
> >
>
> Right, this is another reason not to replicate it.
>
> > I would expect that the
> > history table works as a catalog table in terms of logical
> > decoding/replication. It would probably make sense to reuse the
> > user_catalog_table option for that purpose. If we have a history table
> > for each subscription that wants to record the conflict history (I
> > believe so), it would be hard to go with the second option (having
> > hard-code checks).
> >
>
> Agreed. Let's wait and see what Dilip or others have to say on this.

Yeah I think this makes sense to create as 'user_catalog_table' tables
when we internally create them.  However, IMHO when a user provides
its own table, I believe we should not enforce the restriction for
that table to be created as a 'user_catalog_table' table, or do you
think we should enforce that property?

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Masahiko Sawada
Дата:
On Wed, Sep 24, 2025 at 4:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Sep 24, 2025 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Sep 23, 2025 at 11:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Sat, Sep 20, 2025 at 4:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > >
> > > > AFAIR, one open point for internally created tables is whether we
> > > > should skip changes to conflict_history table while replicating
> > > > changes? The table will be considered under for ALL TABLES
> > > > publications, if defined? Ideally, these should behave as catalog
> > > > tables, so one option is to mark them as 'user_catalog_table', or the
> > > > other option is we have some hard-code checks during replication. The
> > > > first option has the advantage that it won't write additional WAL for
> > > > these tables which is otherwise required under wal_level=logical. What
> > > > other options do we have?
> > >
> > > I think conflict history information is subscriber local information
> > > so doesn't have to be replicated to another subscriber. Also it could
> > > be problematic in cross-major-version replication cases if we break
> > > the compatibility of history table definition.
> > >
> >
> > Right, this is another reason not to replicate it.
> >
> > > I would expect that the
> > > history table works as a catalog table in terms of logical
> > > decoding/replication. It would probably make sense to reuse the
> > > user_catalog_table option for that purpose. If we have a history table
> > > for each subscription that wants to record the conflict history (I
> > > believe so), it would be hard to go with the second option (having
> > > hard-code checks).
> > >
> >
> > Agreed. Let's wait and see what Dilip or others have to say on this.
>
> Yeah I think this makes sense to create as 'user_catalog_table' tables
> when we internally create them.  However, IMHO when a user provides
> its own table, I believe we should not enforce the restriction for
> that table to be created as a 'user_catalog_table' table, or do you
> think we should enforce that property?

I find that's a user's responsibility, so I would not enforce that
property for user-provided-tables.

BTW what is the main use case for supporting the use of user-provided
tables for the history table? I think we basically don't want the
history table to be updated by any other processes than apply workers,
so it would make more sense that such a table is created internally
and tied to the subscription. I'm less convinced that it has enough
upside to warrant the complexity.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sat, Sep 20, 2025 at 5:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 18, 2025 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > If we compare conflict_history_table with the slot that gets created
> > > with subscription, one can say the same thing about slots. Users can
> > > drop the slots and whole replication will stop. I think this table
> > > will be created with the same privileges as the owner of a
> > > subscription which can be either a superuser or a user with the
> > > privileges of the pg_create_subscription role, so we can rely on such
> > > users.
> >
> > We might want to consider which role inserts the conflict info into
> > the history table. For example, if any table created by a user can be
> > used as the history table for a subscription and the conflict info
> > insertion is performed by the subscription owner, we would end up
> > having the same security issue that was addressed by the run_as_owner
> > subscription option.
> >
>
> Yeah, I don't think we want to open that door. For user created
> tables, we should perform actions with table_owner's privilege. In
> such a case, if one wants to create a subscription with run_as_owner
> option, she should give DML operation permissions to the subscription
> owner. OTOH, if we create this table internally (via subscription
> owner) then irrespective of run_as_owner, we will always insert as
> subscription_owner.
>
> AFAIR, one open point for internally created tables is whether we
> should skip changes to conflict_history table while replicating
> changes? The table will be considered under for ALL TABLES
> publications, if defined? Ideally, these should behave as catalog
> tables, so one option is to mark them as 'user_catalog_table', or the
> other option is we have some hard-code checks during replication. The
> first option has the advantage that it won't write additional WAL for
> these tables which is otherwise required under wal_level=logical. What
> other options do we have?

I was doing more analysis and testing for 'use_catalog_table', so what
I found is when a table is marked as  'use_catalog_table', it will log
extra information i.e. CID[1] so that these tables can be used for
scanning as well during decoding like catalog tables using historical
snapshot.  And I have checked the code and tested as well
'use_catalog_table' does get streamed with ALL TABLE options.  Am I
missing something or are we thinking of changing the behavior of
use_catalog_table so that they do not get decoded, but I think that
will change the existing behaviour so might not be a good option, yet
another idea is to invent some other option for which purpose called
'conflict_history_purpose' but maybe that doesn't justify the purpose
of the new option IMHO.

[1]
/*
* For logical decode we need combo CIDs to properly decode the
* catalog
*/
if (RelationIsAccessibleInLogicalDecoding(relation))
log_heap_new_cid(relation, &tp);


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Sep 25, 2025 at 11:09 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Sep 20, 2025 at 5:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 11:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Sep 18, 2025 at 1:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > If we compare conflict_history_table with the slot that gets created
> > > > with subscription, one can say the same thing about slots. Users can
> > > > drop the slots and whole replication will stop. I think this table
> > > > will be created with the same privileges as the owner of a
> > > > subscription which can be either a superuser or a user with the
> > > > privileges of the pg_create_subscription role, so we can rely on such
> > > > users.
> > >
> > > We might want to consider which role inserts the conflict info into
> > > the history table. For example, if any table created by a user can be
> > > used as the history table for a subscription and the conflict info
> > > insertion is performed by the subscription owner, we would end up
> > > having the same security issue that was addressed by the run_as_owner
> > > subscription option.
> > >
> >
> > Yeah, I don't think we want to open that door. For user created
> > tables, we should perform actions with table_owner's privilege. In
> > such a case, if one wants to create a subscription with run_as_owner
> > option, she should give DML operation permissions to the subscription
> > owner. OTOH, if we create this table internally (via subscription
> > owner) then irrespective of run_as_owner, we will always insert as
> > subscription_owner.
> >
> > AFAIR, one open point for internally created tables is whether we
> > should skip changes to conflict_history table while replicating
> > changes? The table will be considered under for ALL TABLES
> > publications, if defined? Ideally, these should behave as catalog
> > tables, so one option is to mark them as 'user_catalog_table', or the
> > other option is we have some hard-code checks during replication. The
> > first option has the advantage that it won't write additional WAL for
> > these tables which is otherwise required under wal_level=logical. What
> > other options do we have?
>
> I was doing more analysis and testing for 'use_catalog_table', so what
> I found is when a table is marked as  'use_catalog_table', it will log
> extra information i.e. CID[1] so that these tables can be used for
> scanning as well during decoding like catalog tables using historical
> snapshot.  And I have checked the code and tested as well
> 'use_catalog_table' does get streamed with ALL TABLE options.  Am I
> missing something or are we thinking of changing the behavior of
> use_catalog_table so that they do not get decoded, but I think that
> will change the existing behaviour so might not be a good option, yet
> another idea is to invent some other option for which purpose called
> 'conflict_history_purpose' but maybe that doesn't justify the purpose
> of the new option IMHO.
>
> [1]
> /*
> * For logical decode we need combo CIDs to properly decode the
> * catalog
> */
> if (RelationIsAccessibleInLogicalDecoding(relation))
> log_heap_new_cid(relation, &tp);
>

Meanwhile I am also exploring the option where we can just CREATE TYPE
in initialize_data_directory() during initdb, basically we will create
this type in template1 so that it will be available in all the
databases, and that would simplify the table creation whether we
create internally or we allow user to create it.  And while checking
is_publishable_class we can check the type and avoid publishing those
tables.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Sep 25, 2025 at 11:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > [1]
> > /*
> > * For logical decode we need combo CIDs to properly decode the
> > * catalog
> > */
> > if (RelationIsAccessibleInLogicalDecoding(relation))
> > log_heap_new_cid(relation, &tp);
> >
>
> Meanwhile I am also exploring the option where we can just CREATE TYPE
> in initialize_data_directory() during initdb, basically we will create
> this type in template1 so that it will be available in all the
> databases, and that would simplify the table creation whether we
> create internally or we allow user to create it.  And while checking
> is_publishable_class we can check the type and avoid publishing those
> tables.
>

Based on my off list discussion with Amit, one option could be to set
HEAP_INSERT_NO_LOGICAL option while inserting tuple into conflict
history table, for that we can not use SPI interface to insert instead
we will have to directly call the heap_insert() to add this option.
Since we do not want to create any trigger etc on this table, direct
insert should be fine, but if we plan to create this table as
partitioned table in future then direct heap insert might not work.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Sep 25, 2025 at 4:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 25, 2025 at 11:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > > [1]
> > > /*
> > > * For logical decode we need combo CIDs to properly decode the
> > > * catalog
> > > */
> > > if (RelationIsAccessibleInLogicalDecoding(relation))
> > > log_heap_new_cid(relation, &tp);
> > >
> >
> > Meanwhile I am also exploring the option where we can just CREATE TYPE
> > in initialize_data_directory() during initdb, basically we will create
> > this type in template1 so that it will be available in all the
> > databases, and that would simplify the table creation whether we
> > create internally or we allow user to create it.  And while checking
> > is_publishable_class we can check the type and avoid publishing those
> > tables.
> >
>
> Based on my off list discussion with Amit, one option could be to set
> HEAP_INSERT_NO_LOGICAL option while inserting tuple into conflict
> history table, for that we can not use SPI interface to insert instead
> we will have to directly call the heap_insert() to add this option.
> Since we do not want to create any trigger etc on this table, direct
> insert should be fine, but if we plan to create this table as
> partitioned table in future then direct heap insert might not work.

Upon further reflection, I realized that while this approach avoids
streaming inserts to the conflict log history table, it still requires
that table to exist on the subscriber node upon subscription creation,
which isn't ideal.

We have two main options to address this:

Option1:
When calling pg_get_publication_tables(), if the 'alltables' option is
used, we can scan all subscriptions and explicitly ignore (filter out)
all conflict history tables.  This will not be very costly as this
will scan the subscriber when pg_get_publication_tables() is called,
which is only called during create subscription/alter subscription on
the remote node.

Option2:
Alternatively, we could introduce a table creation option, like a
'non-publishable' flag, to prevent a table from being streamed
entirely. I believe this would be a valuable, independent feature for
users who want to create certain tables without including them in
logical replication.

I prefer option2, as I feel this can add value independent of this patch.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 25, 2025 at 4:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Sep 25, 2025 at 11:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > > [1]
> > > > /*
> > > > * For logical decode we need combo CIDs to properly decode the
> > > > * catalog
> > > > */
> > > > if (RelationIsAccessibleInLogicalDecoding(relation))
> > > > log_heap_new_cid(relation, &tp);
> > > >
> > >
> > > Meanwhile I am also exploring the option where we can just CREATE TYPE
> > > in initialize_data_directory() during initdb, basically we will create
> > > this type in template1 so that it will be available in all the
> > > databases, and that would simplify the table creation whether we
> > > create internally or we allow user to create it.  And while checking
> > > is_publishable_class we can check the type and avoid publishing those
> > > tables.
> > >
> >
> > Based on my off list discussion with Amit, one option could be to set
> > HEAP_INSERT_NO_LOGICAL option while inserting tuple into conflict
> > history table, for that we can not use SPI interface to insert instead
> > we will have to directly call the heap_insert() to add this option.
> > Since we do not want to create any trigger etc on this table, direct
> > insert should be fine, but if we plan to create this table as
> > partitioned table in future then direct heap insert might not work.
>
> Upon further reflection, I realized that while this approach avoids
> streaming inserts to the conflict log history table, it still requires
> that table to exist on the subscriber node upon subscription creation,
> which isn't ideal.
>

I am not able to understand what exact problem you are seeing here. I
was thinking that during the CREATE SUBSCRIPTION command, a new table
with user provided name will be created similar to how we create a
slot. The difference would be that we create a slot on the
remote/publisher node but this table will be created locally.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sat, Sep 27, 2025 at 8:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I am not able to understand what exact problem you are seeing here. I
> was thinking that during the CREATE SUBSCRIPTION command, a new table
> with user provided name will be created similar to how we create a
> slot. The difference would be that we create a slot on the
> remote/publisher node but this table will be created locally.
>
That's not an issue, the problem here we are discussing is the
conflict history table which is created on the subscriber node should
not be published when this node subscription node create another
publisher with ALL TABLE option.  So we found a option for inserting
into this table with HEAP_INSERT_NO_LOGICAL flag so that those insert
will not be decoded, but what about another not subscribing from this
publisher, they should have this table because when ALL TABLES are
published subscriber node expect all user table to present there even
if its changes are not published.  Consider below example

Node1:
CREATE PUBLICATION pub_node1..

Node2:
CREATE SUBSCRIPTION sub.. PUBLICATION pub_node1
WITH(conflict_history_table='my_conflict_table');
CREATE PUBLICATION pub_node2 FOR ALL TABLE;

Node3:
CREATE SUBSCRIPTION sub1.. PUBLICATION pub_node2; --this will expect
'my_conflict_table' to exist here because when it will call
pg_get_publication_tables() from Node2 it will also get the
'my_conflict_table' along with other user tables.

And as a solution I wanted to avoid this table to be avoided when
pg_get_publication_tables() is being called.
Option1: We can see if table name is listed as conflict history table
in any of the subscribers on Node2 we will ignore this.
Option2: Provide a new table option to mark table as non publishable
table when ALL TABLE option is provided, I think this option can be
useful independently as well.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Sat, Sep 27, 2025 at 9:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Sep 27, 2025 at 8:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I am not able to understand what exact problem you are seeing here. I
> > was thinking that during the CREATE SUBSCRIPTION command, a new table
> > with user provided name will be created similar to how we create a
> > slot. The difference would be that we create a slot on the
> > remote/publisher node but this table will be created locally.
> >
> That's not an issue, the problem here we are discussing is the
> conflict history table which is created on the subscriber node should
> not be published when this node subscription node create another
> publisher with ALL TABLE option.  So we found a option for inserting
> into this table with HEAP_INSERT_NO_LOGICAL flag so that those insert
> will not be decoded, but what about another not subscribing from this
> publisher, they should have this table because when ALL TABLES are
> published subscriber node expect all user table to present there even
> if its changes are not published.  Consider below example
>
> Node1:
> CREATE PUBLICATION pub_node1..
>
> Node2:
> CREATE SUBSCRIPTION sub.. PUBLICATION pub_node1
> WITH(conflict_history_table='my_conflict_table');
> CREATE PUBLICATION pub_node2 FOR ALL TABLE;
>
> Node3:
> CREATE SUBSCRIPTION sub1.. PUBLICATION pub_node2; --this will expect
> 'my_conflict_table' to exist here because when it will call
> pg_get_publication_tables() from Node2 it will also get the
> 'my_conflict_table' along with other user tables.
>
> And as a solution I wanted to avoid this table to be avoided when
> pg_get_publication_tables() is being called.
> Option1: We can see if table name is listed as conflict history table
> in any of the subscribers on Node2 we will ignore this.
> Option2: Provide a new table option to mark table as non publishable
> table when ALL TABLE option is provided, I think this option can be
> useful independently as well.
>

I agree that option-2 is useful and IIUC, we are already working on
something similar in thread [1]. However, it is better to use option-1
here because we are using non-user specified mechanism to skip changes
during replication, so following the same during other times is
preferable. Once we have that other feature [1], we can probably
optimize this code to use it without taking input from the user. The
other reason of not going with the option-2 in the way you are
proposing is that it doesn't seem like a good idea to have multiple
ways to specify skipping tables from publishing. I find the approach
being discussed in thread [1] a generic and better than a new
table-level option.

[1] - https://www.postgresql.org/message-id/CANhcyEVt2CBnG7MOktaPPV4rYapHR-VHe5%3DqoziTZh1L9SVc6w%40mail.gmail.com
--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sun, Sep 28, 2025 at 2:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>

> I agree that option-2 is useful and IIUC, we are already working on
> something similar in thread [1]. However, it is better to use option-1
> here because we are using non-user specified mechanism to skip changes
> during replication, so following the same during other times is
> preferable. Once we have that other feature [1], we can probably
> optimize this code to use it without taking input from the user. The
> other reason of not going with the option-2 in the way you are
> proposing is that it doesn't seem like a good idea to have multiple
> ways to specify skipping tables from publishing. I find the approach
> being discussed in thread [1] a generic and better than a new
> table-level option.
>
> [1] - https://www.postgresql.org/message-id/CANhcyEVt2CBnG7MOktaPPV4rYapHR-VHe5%3DqoziTZh1L9SVc6w%40mail.gmail.com

I understand the current discussion revolves around using an EXCEPT
clause (for tables/schemas/columns) during publication creation.  But
what we want is to mark some table which will be excluded permanently
from publication, because we can not expect users to explicitly
exclude them while creating publication.

So, I propose we add a "non-publishable" property to tables
themselves. This is a more valuable option for users who are certain
that certain tables should never be replicated.

By marking a table as non-publishable, we save users the effort of
repeatedly listing it in the EXCEPT option for every new publication.
Both methods have merit, but the proposed table property addresses the
need for a permanent, system-wide exclusion.

See below test with a quick hack, what I am referring to.

postgres[2730657]=# CREATE TABLE test(a int) WITH
(NON_PUBLISHABLE_TABLE = true);
CREATE TABLE
postgres[2730657]=# CREATE PUBLICATION pub FOR ALL TABLES ;
CREATE PUBLICATION
postgres[2730657]=# select pg_get_publication_tables('pub');
 pg_get_publication_tables
---------------------------
(0 rows)


But I agree this is an additional table option which might need
consensus, so meanwhile we can proceed with option2, I will prepare
patches with option-2 and as a add on patch I will propose option-1.
And this option-1 patch can be discussed in a separate thread as well.

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Sun, Sep 28, 2025 at 5:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sun, Sep 28, 2025 at 2:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > I agree that option-2 is useful and IIUC, we are already working on
> > something similar in thread [1]. However, it is better to use option-1
> > here because we are using non-user specified mechanism to skip changes
> > during replication, so following the same during other times is
> > preferable. Once we have that other feature [1], we can probably
> > optimize this code to use it without taking input from the user. The
> > other reason of not going with the option-2 in the way you are
> > proposing is that it doesn't seem like a good idea to have multiple
> > ways to specify skipping tables from publishing. I find the approach
> > being discussed in thread [1] a generic and better than a new
> > table-level option.
> >
> > [1] - https://www.postgresql.org/message-id/CANhcyEVt2CBnG7MOktaPPV4rYapHR-VHe5%3DqoziTZh1L9SVc6w%40mail.gmail.com
>
> I understand the current discussion revolves around using an EXCEPT
> clause (for tables/schemas/columns) during publication creation.  But
> what we want is to mark some table which will be excluded permanently
> from publication, because we can not expect users to explicitly
> exclude them while creating publication.
>
> So, I propose we add a "non-publishable" property to tables
> themselves. This is a more valuable option for users who are certain
> that certain tables should never be replicated.
>
> By marking a table as non-publishable, we save users the effort of
> repeatedly listing it in the EXCEPT option for every new publication.
> Both methods have merit, but the proposed table property addresses the
> need for a permanent, system-wide exclusion.
>
> See below test with a quick hack, what I am referring to.
>
> postgres[2730657]=# CREATE TABLE test(a int) WITH
> (NON_PUBLISHABLE_TABLE = true);
> CREATE TABLE
> postgres[2730657]=# CREATE PUBLICATION pub FOR ALL TABLES ;
> CREATE PUBLICATION
> postgres[2730657]=# select pg_get_publication_tables('pub');
>  pg_get_publication_tables
> ---------------------------
> (0 rows)
>
>
> But I agree this is an additional table option which might need
> consensus, so meanwhile we can proceed with option2, I will prepare
> patches with option-2 and as a add on patch I will propose option-1.
> And this option-1 patch can be discussed in a separate thread as well.

So here is the patch set using option-2, with this when alltable
option is used and we get pg_get_publication_tables(), this will check
the relid against the conflict history tables in the subscribers and
those tables will not be added to the list.  I will start a separate
thread for proposing the patch I sent in previous email.

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Mon, Sep 29, 2025 at 3:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sun, Sep 28, 2025 at 5:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Sun, Sep 28, 2025 at 2:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > > I agree that option-2 is useful and IIUC, we are already working on
> > > something similar in thread [1]. However, it is better to use option-1
> > > here because we are using non-user specified mechanism to skip changes
> > > during replication, so following the same during other times is
> > > preferable. Once we have that other feature [1], we can probably
> > > optimize this code to use it without taking input from the user. The
> > > other reason of not going with the option-2 in the way you are
> > > proposing is that it doesn't seem like a good idea to have multiple
> > > ways to specify skipping tables from publishing. I find the approach
> > > being discussed in thread [1] a generic and better than a new
> > > table-level option.
> > >
> > > [1] -
https://www.postgresql.org/message-id/CANhcyEVt2CBnG7MOktaPPV4rYapHR-VHe5%3DqoziTZh1L9SVc6w%40mail.gmail.com
> >
> > I understand the current discussion revolves around using an EXCEPT
> > clause (for tables/schemas/columns) during publication creation.  But
> > what we want is to mark some table which will be excluded permanently
> > from publication, because we can not expect users to explicitly
> > exclude them while creating publication.
> >
> > So, I propose we add a "non-publishable" property to tables
> > themselves. This is a more valuable option for users who are certain
> > that certain tables should never be replicated.
> >
> > By marking a table as non-publishable, we save users the effort of
> > repeatedly listing it in the EXCEPT option for every new publication.
> > Both methods have merit, but the proposed table property addresses the
> > need for a permanent, system-wide exclusion.
> >
> > See below test with a quick hack, what I am referring to.
> >
> > postgres[2730657]=# CREATE TABLE test(a int) WITH
> > (NON_PUBLISHABLE_TABLE = true);
> > CREATE TABLE
> > postgres[2730657]=# CREATE PUBLICATION pub FOR ALL TABLES ;
> > CREATE PUBLICATION
> > postgres[2730657]=# select pg_get_publication_tables('pub');
> >  pg_get_publication_tables
> > ---------------------------
> > (0 rows)
> >
> >
> > But I agree this is an additional table option which might need
> > consensus, so meanwhile we can proceed with option2, I will prepare
> > patches with option-2 and as a add on patch I will propose option-1.
> > And this option-1 patch can be discussed in a separate thread as well.
>
> So here is the patch set using option-2, with this when alltable
> option is used and we get pg_get_publication_tables(), this will check
> the relid against the conflict history tables in the subscribers and
> those tables will not be added to the list.  I will start a separate
> thread for proposing the patch I sent in previous email.
>

I have started going through this thread. Is it possible to rebase the
patches and post?

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Nov 11, 2025 at 3:49 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Sep 29, 2025 at 3:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
>
> I have started going through this thread. Is it possible to rebase the
> patches and post?

Thanks Shveta, I will post the rebased patch by tomorrow.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Sep 25, 2025 at 4:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Sep 25, 2025 at 11:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > > [1]
> > > > /*
> > > > * For logical decode we need combo CIDs to properly decode the
> > > > * catalog
> > > > */
> > > > if (RelationIsAccessibleInLogicalDecoding(relation))
> > > > log_heap_new_cid(relation, &tp);
> > > >
> > >
> > > Meanwhile I am also exploring the option where we can just CREATE TYPE
> > > in initialize_data_directory() during initdb, basically we will create
> > > this type in template1 so that it will be available in all the
> > > databases, and that would simplify the table creation whether we
> > > create internally or we allow user to create it.  And while checking
> > > is_publishable_class we can check the type and avoid publishing those
> > > tables.
> > >
> >
> > Based on my off list discussion with Amit, one option could be to set
> > HEAP_INSERT_NO_LOGICAL option while inserting tuple into conflict
> > history table, for that we can not use SPI interface to insert instead
> > we will have to directly call the heap_insert() to add this option.
> > Since we do not want to create any trigger etc on this table, direct
> > insert should be fine, but if we plan to create this table as
> > partitioned table in future then direct heap insert might not work.
>
> Upon further reflection, I realized that while this approach avoids
> streaming inserts to the conflict log history table, it still requires
> that table to exist on the subscriber node upon subscription creation,
> which isn't ideal.
>
> We have two main options to address this:
>
> Option1:
> When calling pg_get_publication_tables(), if the 'alltables' option is
> used, we can scan all subscriptions and explicitly ignore (filter out)
> all conflict history tables.  This will not be very costly as this
> will scan the subscriber when pg_get_publication_tables() is called,
> which is only called during create subscription/alter subscription on
> the remote node.
>
> Option2:
> Alternatively, we could introduce a table creation option, like a
> 'non-publishable' flag, to prevent a table from being streamed
> entirely. I believe this would be a valuable, independent feature for
> users who want to create certain tables without including them in
> logical replication.
>
> I prefer option2, as I feel this can add value independent of this patch.
>

I agree that marking tables with a flag to easily exclude them during
publishing would be cleaner. In the current patch, for an ALL-TABLES
publication, we scan pg_subscription for each table in pg_class to
check its subconflicttable and decide whether to ignore it. But since
this only happens during create/alter subscription and refresh
publication, the overhead should be acceptable.

Introducing a ‘NON_PUBLISHABLE_TABLE’ option would be a good
enhancement but since we already have the EXCEPT list built in a
separate thread, that might be sufficient for now. IMO, such
conflict-tables should be marked internally (for example, with a
‘non_publishable’ or ‘conflict_log_table’ flag) so they can be easily
identified within the system, without requiring users to explicitly
specify them in EXCEPT or as NON_PUBLISHABLE_TABLE. I would like to
see what others think on this.
For the time being, the current implementation looks fine, considering
it runs only during a few publication-related DDL operations.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Nov 12, 2025 at 12:21 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >

> I agree that marking tables with a flag to easily exclude them during
> publishing would be cleaner. In the current patch, for an ALL-TABLES
> publication, we scan pg_subscription for each table in pg_class to
> check its subconflicttable and decide whether to ignore it. But since
> this only happens during create/alter subscription and refresh
> publication, the overhead should be acceptable.

Thanks for your opinion.

> Introducing a ‘NON_PUBLISHABLE_TABLE’ option would be a good
> enhancement but since we already have the EXCEPT list built in a
> separate thread, that might be sufficient for now. IMO, such
> conflict-tables should be marked internally (for example, with a
> ‘non_publishable’ or ‘conflict_log_table’ flag) so they can be easily
> identified within the system, without requiring users to explicitly
> specify them in EXCEPT or as NON_PUBLISHABLE_TABLE. I would like to
> see what others think on this.
> For the time being, the current implementation looks fine, considering
> it runs only during a few publication-related DDL operations.

+1

Here is the rebased patch, changes apart from rebasing it
1) Dropped the conflict history table during drop subscription
2) Added test cases for testing the conflict history table behavior
with CREATE/ALTER/DROP subscription

TODO:
1) Need more thoughts on the table schema whether we need to capture
more items or shall we drop some fields if we think those are not
necessary.
2) Logical replication test for generating conflict and capturing in
conflict history table.

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Nov 12, 2025 at 2:40 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 12:21 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
>
> > I agree that marking tables with a flag to easily exclude them during
> > publishing would be cleaner. In the current patch, for an ALL-TABLES
> > publication, we scan pg_subscription for each table in pg_class to
> > check its subconflicttable and decide whether to ignore it. But since
> > this only happens during create/alter subscription and refresh
> > publication, the overhead should be acceptable.
>
> Thanks for your opinion.
>
> > Introducing a ‘NON_PUBLISHABLE_TABLE’ option would be a good
> > enhancement but since we already have the EXCEPT list built in a
> > separate thread, that might be sufficient for now. IMO, such
> > conflict-tables should be marked internally (for example, with a
> > ‘non_publishable’ or ‘conflict_log_table’ flag) so they can be easily
> > identified within the system, without requiring users to explicitly
> > specify them in EXCEPT or as NON_PUBLISHABLE_TABLE. I would like to
> > see what others think on this.
> > For the time being, the current implementation looks fine, considering
> > it runs only during a few publication-related DDL operations.
>
> +1
>
> Here is the rebased patch, changes apart from rebasing it
> 1) Dropped the conflict history table during drop subscription
> 2) Added test cases for testing the conflict history table behavior
> with CREATE/ALTER/DROP subscription

Thanks.

> TODO:
> 1) Need more thoughts on the table schema whether we need to capture
> more items or shall we drop some fields if we think those are not
> necessary.

Yes, this needs some more thoughts. I will review.

I feel since design is somewhat agreed upon, we may handle
code-correction/completion. I have not looked at the rebased patch
yet, but here are a few comments based on old-version.

Few observations related to publication.
------------------------------

(In the below comments, clt/CLT implies Conflict Log Table)

1)
'select pg_relation_is_publishable(clt)' returns true for conflict-log table.

2)
'\d+ clt'   shows all-tables publication name. I feel we should not
show that for clt.

3)
I am able to create a publication for clt table, should it be allowed?

create subscription sub1 connection '...' publication pub1
WITH(conflict_log_table='clt');
create publication pub3 for table clt;

4)
Is there a reason we have not made '!IsConflictHistoryRelid' check as
part of is_publishable_class() itself? If we do so, other code-logics
will also get clt as non-publishable always (and will solve a few of
the above issues I think). IIUC, there is no place where we want to
mark CLT as publishable or is there any?

5) Also, I feel we can add some documentation now to help others to
understand/review the patch better without going through the long
thread.


Few observations related to conflict-logging:
------------------------------
1)
I found that for the conflicts which ultimately result in Error, we do
not insert any conflict-record in clt.

a)
Example: insert_exists, update_Exists
create table tab1 (i int primary key, j int);
sub: insert into tab1 values(30,10);
pub: insert into tab1 values(30,10);
ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
No record in clt.

sub:
<some pre-data needed>
update tab1 set i=40 where i = 30;
pub: update tab1 set i=40 where i = 20;
ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
No record in clt.

b)
Another question related to this is, since these conflicts (which
results in error) keep on happening until user resolves these or skips
these or 'disable_on_error' is set. Then are we going to insert these
multiple times? We do count these in 'confl_insert_exists' and
'confl_update_exists' everytime, so it makes sense to log those each
time in clt as well. Thoughts?

2)
Conflicts where row on sub is missing, local_ts incorrectly inserted.
It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
indicating that it is not applicable for this conflict-type?

Example: delete_missing, update_missing
pub:
 insert into tab1 values(10,10);
 insert into tab1 values(20,10);
 sub:  delete from tab1 where i=10;
 pub:  delete from tab1 where i=10;


thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Nov 12, 2025 at 3:14 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 2:40 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Nov 12, 2025 at 12:21 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> >
> > > I agree that marking tables with a flag to easily exclude them during
> > > publishing would be cleaner. In the current patch, for an ALL-TABLES
> > > publication, we scan pg_subscription for each table in pg_class to
> > > check its subconflicttable and decide whether to ignore it. But since
> > > this only happens during create/alter subscription and refresh
> > > publication, the overhead should be acceptable.
> >
> > Thanks for your opinion.
> >
> > > Introducing a ‘NON_PUBLISHABLE_TABLE’ option would be a good
> > > enhancement but since we already have the EXCEPT list built in a
> > > separate thread, that might be sufficient for now. IMO, such
> > > conflict-tables should be marked internally (for example, with a
> > > ‘non_publishable’ or ‘conflict_log_table’ flag) so they can be easily
> > > identified within the system, without requiring users to explicitly
> > > specify them in EXCEPT or as NON_PUBLISHABLE_TABLE. I would like to
> > > see what others think on this.
> > > For the time being, the current implementation looks fine, considering
> > > it runs only during a few publication-related DDL operations.
> >
> > +1
> >
> > Here is the rebased patch, changes apart from rebasing it
> > 1) Dropped the conflict history table during drop subscription
> > 2) Added test cases for testing the conflict history table behavior
> > with CREATE/ALTER/DROP subscription
>
> Thanks.
>
> > TODO:
> > 1) Need more thoughts on the table schema whether we need to capture
> > more items or shall we drop some fields if we think those are not
> > necessary.
>
> Yes, this needs some more thoughts. I will review.
>
> I feel since design is somewhat agreed upon, we may handle
> code-correction/completion. I have not looked at the rebased patch
> yet, but here are a few comments based on old-version.
>
> Few observations related to publication.
> ------------------------------
>
> (In the below comments, clt/CLT implies Conflict Log Table)
>
> 1)
> 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
>
> 2)
> '\d+ clt'   shows all-tables publication name. I feel we should not
> show that for clt.
>
> 3)
> I am able to create a publication for clt table, should it be allowed?
>
> create subscription sub1 connection '...' publication pub1
> WITH(conflict_log_table='clt');
> create publication pub3 for table clt;
>
> 4)
> Is there a reason we have not made '!IsConflictHistoryRelid' check as
> part of is_publishable_class() itself? If we do so, other code-logics
> will also get clt as non-publishable always (and will solve a few of
> the above issues I think). IIUC, there is no place where we want to
> mark CLT as publishable or is there any?
>
> 5) Also, I feel we can add some documentation now to help others to
> understand/review the patch better without going through the long
> thread.
>
>
> Few observations related to conflict-logging:
> ------------------------------
> 1)
> I found that for the conflicts which ultimately result in Error, we do
> not insert any conflict-record in clt.
>
> a)
> Example: insert_exists, update_Exists
> create table tab1 (i int primary key, j int);
> sub: insert into tab1 values(30,10);
> pub: insert into tab1 values(30,10);
> ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> No record in clt.
>
> sub:
> <some pre-data needed>
> update tab1 set i=40 where i = 30;
> pub: update tab1 set i=40 where i = 20;
> ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> No record in clt.
>
> b)
> Another question related to this is, since these conflicts (which
> results in error) keep on happening until user resolves these or skips
> these or 'disable_on_error' is set. Then are we going to insert these
> multiple times? We do count these in 'confl_insert_exists' and
> 'confl_update_exists' everytime, so it makes sense to log those each
> time in clt as well. Thoughts?
>
> 2)
> Conflicts where row on sub is missing, local_ts incorrectly inserted.
> It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> indicating that it is not applicable for this conflict-type?
>
> Example: delete_missing, update_missing
> pub:
>  insert into tab1 values(10,10);
>  insert into tab1 values(20,10);
>  sub:  delete from tab1 where i=10;
>  pub:  delete from tab1 where i=10;
>

3)
We also need to think how we are going to display the info in case of
multiple_unique_conflicts as there could be multiple local and remote
tuples conflicting for one single operation. Example:

create table conf_tab (a int primary key, b int unique, c int unique);

sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);

pub: insert into conf_tab values (2,3,4);

ERROR:  conflict detected on relation "public.conf_tab":
conflict=multiple_unique_conflicts
DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
Key already exists in unique index "conf_tab_b_key", modified locally
in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
Key already exists in unique index "conf_tab_c_key", modified locally
in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
CONTEXT:  processing remote data for replication origin "pg_16392"
during message type "INSERT" for replication target relation
"public.conf_tab" in transaction 781, finished at 0/017FDDA0

Currently in clt, we have singular terms such as 'key_tuple',
'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
But it does not look reasonable to have multiple rows inserted for a
single conflict raised. I will think more about this.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> > Few observations related to publication.
> > ------------------------------

Thanks Shveta, for testing and sharing your thoughts.  IMHO for
conflict log tables it should be good enough if we restrict it when
ALL TABLE options are used, I don't think we need to put extra effort
to completely restrict it even if users want to explicitly list it
into the publication.

> >
> > (In the below comments, clt/CLT implies Conflict Log Table)
> >
> > 1)
> > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.

This function is used while publishing every single change and I don't
think we want to add a cost to check each subscription to identify
whether the table is listed as CLT.

> > 2)
> > '\d+ clt'   shows all-tables publication name. I feel we should not
> > show that for clt.

I think we should fix this.

> > 3)
> > I am able to create a publication for clt table, should it be allowed?

I believe we should not do any specific handling to restrict this but
I am open for the opinions.

> > create subscription sub1 connection '...' publication pub1
> > WITH(conflict_log_table='clt');
> > create publication pub3 for table clt;
> >
> > 4)
> > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > part of is_publishable_class() itself? If we do so, other code-logics
> > will also get clt as non-publishable always (and will solve a few of
> > the above issues I think). IIUC, there is no place where we want to
> > mark CLT as publishable or is there any?

IMHO the main reason is performance.

> > 5) Also, I feel we can add some documentation now to help others to
> > understand/review the patch better without going through the long
> > thread.

Make sense, I will do that in the next version.

> >
> > Few observations related to conflict-logging:
> > ------------------------------
> > 1)
> > I found that for the conflicts which ultimately result in Error, we do
> > not insert any conflict-record in clt.
> >
> > a)
> > Example: insert_exists, update_Exists
> > create table tab1 (i int primary key, j int);
> > sub: insert into tab1 values(30,10);
> > pub: insert into tab1 values(30,10);
> > ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> > No record in clt.
> >
> > sub:
> > <some pre-data needed>
> > update tab1 set i=40 where i = 30;
> > pub: update tab1 set i=40 where i = 20;
> > ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> > No record in clt.

Yeah that interesting need to put thought on how to commit this record
when an outer transaction is aborted as we do not have autonomous
transactions which are generally used for this kind of logging.  But
we can explore more options like inserting into conflict log tables
outside the outer transaction.

> > b)
> > Another question related to this is, since these conflicts (which
> > results in error) keep on happening until user resolves these or skips
> > these or 'disable_on_error' is set. Then are we going to insert these
> > multiple times? We do count these in 'confl_insert_exists' and
> > 'confl_update_exists' everytime, so it makes sense to log those each
> > time in clt as well. Thoughts?

I think it make sense to insert every time we see the conflict, but it
would be good to have opinion from others as well.

> > 2)
> > Conflicts where row on sub is missing, local_ts incorrectly inserted.
> > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> > indicating that it is not applicable for this conflict-type?
> >
> > Example: delete_missing, update_missing
> > pub:
> >  insert into tab1 values(10,10);
> >  insert into tab1 values(20,10);
> >  sub:  delete from tab1 where i=10;
> >  pub:  delete from tab1 where i=10;

Sure I will test this.

>
> 3)
> We also need to think how we are going to display the info in case of
> multiple_unique_conflicts as there could be multiple local and remote
> tuples conflicting for one single operation. Example:
>
> create table conf_tab (a int primary key, b int unique, c int unique);
>
> sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
>
> pub: insert into conf_tab values (2,3,4);
>
> ERROR:  conflict detected on relation "public.conf_tab":
> conflict=multiple_unique_conflicts
> DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> Key already exists in unique index "conf_tab_b_key", modified locally
> in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> Key already exists in unique index "conf_tab_c_key", modified locally
> in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> CONTEXT:  processing remote data for replication origin "pg_16392"
> during message type "INSERT" for replication target relation
> "public.conf_tab" in transaction 781, finished at 0/017FDDA0
>
> Currently in clt, we have singular terms such as 'key_tuple',
> 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> But it does not look reasonable to have multiple rows inserted for a
> single conflict raised. I will think more about this.

Currently I am inserting multiple records in the conflict history
table, the same as each tuple is logged, but couldn't find any better
way for this. Another option is to use an array of tuples instead of a
single tuple but not sure this might make things more complicated to
process by any external tool.  But you are right, this needs more
discussion.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > Few observations related to publication.
> > > ------------------------------
>
> Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> conflict log tables it should be good enough if we restrict it when
> ALL TABLE options are used, I don't think we need to put extra effort
> to completely restrict it even if users want to explicitly list it
> into the publication.
>
> > >
> > > (In the below comments, clt/CLT implies Conflict Log Table)
> > >
> > > 1)
> > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.

After putting more thought I have changed this to return false for
clt, as this is just an exposed function not called by pgoutput layer.

> > > 2)
> > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > show that for clt.
>
Fixed

>
> > > 3)
> > > I am able to create a publication for clt table, should it be allowed?
>
> I believe we should not do any specific handling to restrict this but
> I am open for the opinions.

Restricting this as well, lets see what others think.


>
> > > 5) Also, I feel we can add some documentation now to help others to
> > > understand/review the patch better without going through the long
> > > thread.
>
> Make sense, I will do that in the next version.
Done that but not compiled the docs as I don't currently have the
setup so added as WIP patch.


> > > 2)
> > > Conflicts where row on sub is missing, local_ts incorrectly inserted.
> > > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> > > indicating that it is not applicable for this conflict-type?
> > >
> > > Example: delete_missing, update_missing
> > > pub:
> > >  insert into tab1 values(10,10);
> > >  insert into tab1 values(20,10);
> > >  sub:  delete from tab1 where i=10;
> > >  pub:  delete from tab1 where i=10;
>
> Sure I will test this.

I have fixed this.

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > Few observations related to publication.
> > > ------------------------------
>
> Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> conflict log tables it should be good enough if we restrict it when
> ALL TABLE options are used, I don't think we need to put extra effort
> to completely restrict it even if users want to explicitly list it
> into the publication.
>
> > >
> > > (In the below comments, clt/CLT implies Conflict Log Table)
> > >
> > > 1)
> > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
>
> This function is used while publishing every single change and I don't
> think we want to add a cost to check each subscription to identify
> whether the table is listed as CLT.
>
> > > 2)
> > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > show that for clt.
>
> I think we should fix this.
>
> > > 3)
> > > I am able to create a publication for clt table, should it be allowed?
>
> I believe we should not do any specific handling to restrict this but
> I am open for the opinions.
>
> > > create subscription sub1 connection '...' publication pub1
> > > WITH(conflict_log_table='clt');
> > > create publication pub3 for table clt;
> > >
> > > 4)
> > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > part of is_publishable_class() itself? If we do so, other code-logics
> > > will also get clt as non-publishable always (and will solve a few of
> > > the above issues I think). IIUC, there is no place where we want to
> > > mark CLT as publishable or is there any?
>
> IMHO the main reason is performance.
>
> > > 5) Also, I feel we can add some documentation now to help others to
> > > understand/review the patch better without going through the long
> > > thread.
>
> Make sense, I will do that in the next version.
>
> > >
> > > Few observations related to conflict-logging:
> > > ------------------------------
> > > 1)
> > > I found that for the conflicts which ultimately result in Error, we do
> > > not insert any conflict-record in clt.
> > >
> > > a)
> > > Example: insert_exists, update_Exists
> > > create table tab1 (i int primary key, j int);
> > > sub: insert into tab1 values(30,10);
> > > pub: insert into tab1 values(30,10);
> > > ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> > > No record in clt.
> > >
> > > sub:
> > > <some pre-data needed>
> > > update tab1 set i=40 where i = 30;
> > > pub: update tab1 set i=40 where i = 20;
> > > ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> > > No record in clt.
>
> Yeah that interesting need to put thought on how to commit this record
> when an outer transaction is aborted as we do not have autonomous
> transactions which are generally used for this kind of logging.

Right

> But
> we can explore more options like inserting into conflict log tables
> outside the outer transaction.

Yes, that seems the way to me. I could not find any such existing
reference/usage in code though.

>
> > > b)
> > > Another question related to this is, since these conflicts (which
> > > results in error) keep on happening until user resolves these or skips
> > > these or 'disable_on_error' is set. Then are we going to insert these
> > > multiple times? We do count these in 'confl_insert_exists' and
> > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > time in clt as well. Thoughts?
>
> I think it make sense to insert every time we see the conflict, but it
> would be good to have opinion from others as well.
>
> > > 2)
> > > Conflicts where row on sub is missing, local_ts incorrectly inserted.
> > > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> > > indicating that it is not applicable for this conflict-type?
> > >
> > > Example: delete_missing, update_missing
> > > pub:
> > >  insert into tab1 values(10,10);
> > >  insert into tab1 values(20,10);
> > >  sub:  delete from tab1 where i=10;
> > >  pub:  delete from tab1 where i=10;
>
> Sure I will test this.
>
> >
> > 3)
> > We also need to think how we are going to display the info in case of
> > multiple_unique_conflicts as there could be multiple local and remote
> > tuples conflicting for one single operation. Example:
> >
> > create table conf_tab (a int primary key, b int unique, c int unique);
> >
> > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
> >
> > pub: insert into conf_tab values (2,3,4);
> >
> > ERROR:  conflict detected on relation "public.conf_tab":
> > conflict=multiple_unique_conflicts
> > DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> > Key already exists in unique index "conf_tab_b_key", modified locally
> > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> > Key already exists in unique index "conf_tab_c_key", modified locally
> > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> > CONTEXT:  processing remote data for replication origin "pg_16392"
> > during message type "INSERT" for replication target relation
> > "public.conf_tab" in transaction 781, finished at 0/017FDDA0
> >
> > Currently in clt, we have singular terms such as 'key_tuple',
> > 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> > But it does not look reasonable to have multiple rows inserted for a
> > single conflict raised. I will think more about this.
>
> Currently I am inserting multiple records in the conflict history
> table, the same as each tuple is logged, but couldn't find any better
> way for this. Another option is to use an array of tuples instead of a
> single tuple but not sure this might make things more complicated to
> process by any external tool.

It’s arguable and hard to say what the correct behaviour should be.
I’m slightly leaning toward having a single row per conflict. IMO,
overall the confl_* counters in pg_stat_subscription_stats should
align with the number of entries in the conflict history table, which
implies one row even for multiple_unique_conflicts. But I also
understand that this approach could make things complicated for
external tools. For now, we can proceed with logging multiple rows for
a single multiple_unique_conflicts occurrence and wait to hear others’
opinions.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Mon, Nov 17, 2025 at 11:54 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > Few observations related to publication.
> > > > ------------------------------
> >
> > Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> > conflict log tables it should be good enough if we restrict it when
> > ALL TABLE options are used, I don't think we need to put extra effort
> > to completely restrict it even if users want to explicitly list it
> > into the publication.
> >
> > > >
> > > > (In the below comments, clt/CLT implies Conflict Log Table)
> > > >
> > > > 1)
> > > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
>
> After putting more thought I have changed this to return false for
> clt, as this is just an exposed function not called by pgoutput layer.
>
> > > > 2)
> > > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > > show that for clt.
> >
> Fixed
>
> >
> > > > 3)
> > > > I am able to create a publication for clt table, should it be allowed?
> >
> > I believe we should not do any specific handling to restrict this but
> > I am open for the opinions.
>
> Restricting this as well, lets see what others think.
>
>
> >
> > > > 5) Also, I feel we can add some documentation now to help others to
> > > > understand/review the patch better without going through the long
> > > > thread.
> >
> > Make sense, I will do that in the next version.
> Done that but not compiled the docs as I don't currently have the
> setup so added as WIP patch.
>
>
> > > > 2)
> > > > Conflicts where row on sub is missing, local_ts incorrectly inserted.
> > > > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> > > > indicating that it is not applicable for this conflict-type?
> > > >
> > > > Example: delete_missing, update_missing
> > > > pub:
> > > >  insert into tab1 values(10,10);
> > > >  insert into tab1 values(20,10);
> > > >  sub:  delete from tab1 where i=10;
> > > >  pub:  delete from tab1 where i=10;
> >
> > Sure I will test this.
>
> I have fixed this.

Thanks for the patch.  Some feedback about the clt:

1)
local_origin is always NULL in my tests for all conflict types I tried.

2)
Do we need 'key_tuple' as such or replica_identity is enough/better?
I see 'key_tuple' inserted as {"i":10,"j":null} for delete_missing
case where query was 'delete from tab1 where i=10'; here 'i' is PK;
which seems okay.
But it is '{"i":20,"j":200}' for update_origin_differ case where query
was 'update tab1 set j=200 where i =20'. Here too RI is 'i' alone. I
feel 'j' should not be part of the key but let me know if I have
misunderstood. IMO, 'j' being part of remote_tuple should be good
enough.

3)
Do we need to have a timestamp column as well to say when conflict was
recorded? Or local_commit_ts, remote_commit_ts are sufficient?
Thoughts

4)
Also, it makes sense if we have 'conflict_type' next to 'relid'. I
feel relid and conflict_type are primary columns and rest are related
details.

5)
Do we need table_schema, table_name when we have relid already? If we
want to retain these, we can name them as schemaname and relname to be
consistent with all other stats tables. IMO, then the order can be:
relid, schemaname, relname, conflcit_type and then the rest of the
details.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi Dilip.

I started to look at this thread. Here are some comments for patch v4-0001.


=====
GENERAL

1.
There's some inconsistency in how this new table is called at different times :
a) "conflict table"
b) "conflict log table"
c) "conflict log history table"
d) "conflict history"

My preference was (b). Making this consistent will have impacts on
many macros, variables, comments, function names, etc.

~~~

2.
What about enhancements to description \dRs+ so the subscription
conflict log table is displayed?

~~~

3.
What about enhancements to the tab-complete code?

======
src/backend/commands/subscriptioncmds.c

4.
 #define SUBOPT_MAX_RETENTION_DURATION 0x00008000
 #define SUBOPT_LSN 0x00010000
 #define SUBOPT_ORIGIN 0x00020000
+#define SUBOPT_CONFLICT_TABLE 0x00030000

Bug? Shouldn't that be 0x00040000.

~~~

5.
+ char    *conflicttable;
  XLogRecPtr lsn;
 } SubOpts;

IMO 'conflicttable' looks too much like 'conflictable', which may
cause some confusion on first reading.

~~~

6.
+static void CreateConflictLogTable(Oid namespaceId, char *conflictrel);
+static void DropConflictLogTable(Oid namespaceId, char *conflictrel);

AFAIK it is more conventional for the static functions to be
snake_case and the extern functions to use CamelCase. So these would
be:
- create_conflict_log_table
- drop_conflict_log_table

~~~

CreateSubscription:

7.
+ /* If conflict log table name is given than create the table. */
+ if (opts.conflicttable)
+ CreateConflictLogTable(conflict_table_nspid, conflict_table);
+

typo: /If conflict/If a conflict/

typo: "than"

~~~

AlterSubscription:

8.
-   SUBOPT_ORIGIN);
+   SUBOPT_ORIGIN |
+   SUBOPT_CONFLICT_TABLE);

The line wrapping doesn't seem necessary.

~~~

9.
+ replaces[Anum_pg_subscription_subconflictnspid - 1] = true;
+ replaces[Anum_pg_subscription_subconflicttable - 1] = true;
+
+ CreateConflictLogTable(nspid, relname);
+ }
+

What are the rules regarding replacing one log table with a different
log table for the same subscription? I didn't see anything about this
scenario, nor any test cases.

~~~

CreateConflictLogTable:

10.
+ /*
+ * Check if table with same name already present, if so report an error
+ * as currently we do not support user created table as conflict log
+ * table.
+ */

Is the comment about "user-created table" strictly correct? e.g. Won't
you encounter the same problem if there are 2 subscriptions trying to
set the same-named conflict log table?

SUGGESTION
Report an error if the specified conflict log table already exists.

~~~

DropConflictLogTable:

11.
+ /*
+ * Drop conflict log table if exist, use if exists ensures the command
+ * won't error if the table is already gone.
+ */

The reason for EXISTS was already mentioned in the function comment.

SUGGESTION
Drop the conflict log table if it exists.

======
src/backend/replication/logical/conflict.c

12.
+static Datum TupleTableSlotToJsonDatum(TupleTableSlot *slot);
+
+static void InsertConflictLog(Relation rel,
+   TransactionId local_xid,
+   TimestampTz local_ts,
+   ConflictType conflict_type,
+   RepOriginId origin_id,
+   TupleTableSlot *searchslot,
+   TupleTableSlot *localslot,
+   TupleTableSlot *remoteslot);

Same as earlier comment #6 -- isn't it conventional to use snake_case
for the static function names?

~~~

TupleTableSlotToJsonDatum:

13.
+ * This would be a new internal helper function for logical replication
+ * Needs to handle various data types and potentially TOASTed data

What's this comment about? Something doesn't look quite right.

~~~

InsertConflictLog:

14.
+ /* TODO: proper error code */
+ relid = get_relname_relid(relname, nspid);
+ if (!OidIsValid(relid))
+ elog(ERROR, "conflict log history table does not exists");
+ conflictrel = table_open(relid, RowExclusiveLock);
+ if (conflictrel == NULL)
+ elog(ERROR, "could not open conflict log history table");

14a.
What's the TODO comment for? Are you going to replace these elogs?

~

14b.
Typo: "does not exists"

~

14c.
An unnecessary double-blank line follows this code fragment.

~~~

15.
+ /* Populate the values and nulls arrays */
+ attno = 0;
+ values[attno] = ObjectIdGetDatum(RelationGetRelid(rel));
+ attno++;
+
+ if (TransactionIdIsValid(local_xid))
+ values[attno] = TransactionIdGetDatum(local_xid);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (TransactionIdIsValid(remote_xid))
+ values[attno] = TransactionIdGetDatum(remote_xid);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ values[attno] = LSNGetDatum(remote_final_lsn);
+ attno++;
+
+ if (local_ts > 0)
+ values[attno] = TimestampTzGetDatum(local_ts);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (remote_commit_ts > 0)
+ values[attno] = TimestampTzGetDatum(remote_commit_ts);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ values[attno] =
+ CStringGetTextDatum(get_namespace_name(RelationGetNamespace(rel)));
+ attno++;
+
+ values[attno] = CStringGetTextDatum(RelationGetRelationName(rel));
+ attno++;
+
+ values[attno] = CStringGetTextDatum(ConflictTypeNames[conflict_type]);
+ attno++;
+
+ if (origin_id != InvalidRepOriginId)
+ replorigin_by_oid(origin_id, true, &origin);
+
+ if (origin != NULL)
+ values[attno] = CStringGetTextDatum(origin);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (replorigin_session_origin != InvalidRepOriginId)
+ replorigin_by_oid(replorigin_session_origin, true, &remote_origin);
+
+ if (remote_origin != NULL)
+ values[attno] = CStringGetTextDatum(remote_origin);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (searchslot != NULL)
+ values[attno] = TupleTableSlotToJsonDatum(searchslot);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (localslot != NULL)
+ values[attno] = TupleTableSlotToJsonDatum(localslot);
+ else
+ nulls[attno] = true;
+ attno++;
+
+ if (remoteslot != NULL)
+ values[attno] = TupleTableSlotToJsonDatum(remoteslot);
+ else
+ nulls[attno] = true;
+

15a.
It might be simpler to just post-increment that 'attno' in all the
assignments and save a dozen lines of code:
e.g. values[attno++] = ...

~

15b.
Also, put a sanity Assert check at the end, like:
Assert(attno + 1 == MAX_CONFLICT_ATTR_NUM);


======
src/backend/utils/cache/lsyscache.c

16.
+ if (isnull)
+ {
+ ReleaseSysCache(tup);
+ return NULL;
+ }
+
+ *nspid = subform->subconflictnspid;
+ relname = pstrdup(TextDatumGetCString(datum));
+
+ ReleaseSysCache(tup);
+
+ return relname;

It would be tidier to have a single release/return by coding this
slightly differently.

SUGGESTION:

char *relname = NULL;
...
if (!isnull)
{
  *nspid = subform->subconflictnspid;
  relname = pstrdup(TextDatumGetCString(datum));
}

ReleaseSysCache(tup);
return relname;

======
src/include/catalog/pg_subscription.h

17.
+ Oid subconflictnspid; /* Namespace Oid in which the conflict history
+ * table is created. */

Would it be better to make these 2 new member names more alike, since
they go together. e.g.
confl_table_nspid
confl_table_name

======
src/include/replication/conflict.h

18.
+#define MAX_CONFLICT_ATTR_NUM 15

I felt this doesn't really belong here. Just define it atop/within the
function InsertConflictLog()

~~~

19.
 extern void InitConflictIndexes(ResultRelInfo *relInfo);
+
 #endif

Spurious whitespace change not needed for this patch.

======
src/test/regress/sql/subscription.sql

20.
How about adding some more test scenarios:
e.g.1. ALTER the conflict log table of some subscription that already has one
e.g.2. Have multiple subscriptions that specify the same conflict log table

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Here are some comments for the patch v4-0002.

======
GENERAL

1.
The patch should include test cases:

- to confirm an error happens when attempting to publish clt
- to confirm \dt+ clt is not showing the ALL TABLES publication
- to confirm that SQL function pg_relation_is_publishable givesthe
expected result
- etc.

======
Commit Message

1.
When all table option is used with publication don't publish the
conflict history tables.

~

Maybe reword that using uppercase for keywords, like:

SUGGESTION
A conflict log table will not be published by a FOR ALL TABLES publication.

======
src/backend/catalog/pg_publication.c

check_publication_add_relation:

3.
+ /* Can't be created as conflict log table */
+ if (IsConflictLogRelid(RelationGetRelid(targetrel)))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot add relation \"%s\" to publication",
+ RelationGetRelationName(targetrel)),
+ errdetail("This operation is not supported for conflict log tables.")));

3a.
Typo in comment.

SUGGESTION
Can't be a conflict log table

~

3b.
I was wondering if this check should be moved to the bottom of the function.

I think IsConflictLogRelid() is the most inefficient of all these
conditions, so it is better to give the other ones a chance to fail
quickly before needing to check for clt.

~~~

pg_relation_is_publishable:

4.
 /*
- * SQL-callable variant of the above
+ * SQL-callable variant of the above and this should not be a conflict log rel
  *
  * This returns null when the relation does not exist.  This is intended to be
  * used for example in psql to avoid gratuitous errors when there are

I felt this new comment should be in the code, instead of in the
function comment.

SUGGESTION
/* subscription conflict log tables are not published */
result = is_publishable_class(relid, (Form_pg_class) GETSTRUCT(tuple)) &&
  !IsConflictLogRelid(relid);

~~~

5.
It seemed strange that function
pg_relation_is_publishable(PG_FUNCTION_ARGS) is checking
IsConflictLogRelid, but function is_publishable_relation(Relation rel)
is not.

~~~

GetAllPublicationRelations:

6.
+ /* conflict history tables are not published. */
  if (is_publishable_class(relid, relForm) &&
+ !IsConflictLogRelid(relid) &&
  !(relForm->relispartition && pubviaroot))
  result = lappend_oid(result, relid);
Inconsistent "history table" terminology.

Maybe this comment should be identical to the other one above. e.g.
/* subscription conflict log tables are not published */

======
src/backend/commands/subscriptioncmds.c

IsConflictLogRelid:

8.
+/*
+ * Is relation used as a conflict log table
+ *
+ * Scan all the subscription and check whether the relation is used as
+ * conflict log table.
+ */

typo: "all the subscription"

Also, the 2nd sentence repeats the purpose of the function;  I don't
think you need to say it twice.

SUGGESTION
Check if the specified relation is used as a conflict log table by any
subscription.

~~~

9.
+ if (relname == NULL)
+ continue;
+ if (relid == get_relname_relid(relname, nspid))
+ {
+ found = true;
+ break;
+ }

It seemed unnecessary to separate out the 'continue' like that.

In passing, consider renaming that generic 'found' to be the proper
meaning of the boolean.

SUGGESTION
if (relname && relid == get_relname_relid(relname, nspid))
{
  is_clt = true;
  break;
}

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi Dilip,

FYI, patch v4-0003 (docs) needs rebasing due to ada78cd.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Nov 18, 2025 at 4:47 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Thanks for the patch.  Some feedback about the clt:
>
> 1)
> local_origin is always NULL in my tests for all conflict types I tried.

You need to set the replication origin as shown below
On subscriber side:
---------------------------
SELECT pg_replication_origin_create('my_remote_source_2');
SELECT pg_replication_origin_session_setup('my_remote_source_2');
UPDATE test SET b=200 where a=1;

On remote:
---------------
UPDATE test SET b=300 where a=1; -- conflicting operation with local node

On subscriber
------------------
postgres[1514377]=# select local_origin, remote_origin from
myschema.conflict_log_history2 ;
    local_origin    | remote_origin
--------------------+---------------------
 my_remote_source_2 | pg_16396

> 2)
> Do we need 'key_tuple' as such or replica_identity is enough/better?
> I see 'key_tuple' inserted as {"i":10,"j":null} for delete_missing
> case where query was 'delete from tab1 where i=10'; here 'i' is PK;
> which seems okay.
> But it is '{"i":20,"j":200}' for update_origin_differ case where query
> was 'update tab1 set j=200 where i =20'. Here too RI is 'i' alone. I
> feel 'j' should not be part of the key but let me know if I have
> misunderstood. IMO, 'j' being part of remote_tuple should be good
> enough.

Yeah we should display the replica identity only, I assumed in
ReportApplyConflict() the searchslot should only have RI tuple but it
is sending a remote tuple in the searchslot, so might need to extract
the RI from this slot, I will work on this.

> 3)
> Do we need to have a timestamp column as well to say when conflict was
> recorded? Or local_commit_ts, remote_commit_ts are sufficient?
> Thoughts

You mean we can record the timestamp now while inserting, not sure if
it will add some more meaningful information than remote_commit_ts,
but let's see what others think.

> 4)
> Also, it makes sense if we have 'conflict_type' next to 'relid'. I
> feel relid and conflict_type are primary columns and rest are related
> details.

Sure

> 5)
> Do we need table_schema, table_name when we have relid already? If we
> want to retain these, we can name them as schemaname and relname to be
> consistent with all other stats tables. IMO, then the order can be:
> relid, schemaname, relname, conflcit_type and then the rest of the
> details.

Yeah this makes the table denormalized as we can fetch this
information by joining with pg_class, but I think it might be better
for readability, lets see what others think, for now I will reorder as
suggested.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Nov 19, 2025 at 3:46 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 18, 2025 at 4:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Thanks for the patch.  Some feedback about the clt:
> >
> > 1)
> > local_origin is always NULL in my tests for all conflict types I tried.
>
> You need to set the replication origin as shown below
> On subscriber side:
> ---------------------------
> SELECT pg_replication_origin_create('my_remote_source_2');
> SELECT pg_replication_origin_session_setup('my_remote_source_2');
> UPDATE test SET b=200 where a=1;
>
> On remote:
> ---------------
> UPDATE test SET b=300 where a=1; -- conflicting operation with local node
>
> On subscriber
> ------------------
> postgres[1514377]=# select local_origin, remote_origin from
> myschema.conflict_log_history2 ;
>     local_origin    | remote_origin
> --------------------+---------------------
>  my_remote_source_2 | pg_16396

Okay, I see, thanks!

>
> > 2)
> > Do we need 'key_tuple' as such or replica_identity is enough/better?
> > I see 'key_tuple' inserted as {"i":10,"j":null} for delete_missing
> > case where query was 'delete from tab1 where i=10'; here 'i' is PK;
> > which seems okay.
> > But it is '{"i":20,"j":200}' for update_origin_differ case where query
> > was 'update tab1 set j=200 where i =20'. Here too RI is 'i' alone. I
> > feel 'j' should not be part of the key but let me know if I have
> > misunderstood. IMO, 'j' being part of remote_tuple should be good
> > enough.
>
> Yeah we should display the replica identity only, I assumed in
> ReportApplyConflict() the searchslot should only have RI tuple but it
> is sending a remote tuple in the searchslot, so might need to extract
> the RI from this slot, I will work on this.

yeah, we have extracted it already in
errdetail_apply_conflict()->build_tuple_value_details(). See it dumps
it in log:

LOG:  conflict detected on relation "public.tab1":
conflict=update_origin_differs
DETAIL:  Updating the row that was modified locally in transaction 768
at 2025-11-18 12:09:19.658502+05:30.
        Existing local row (20, 100); remote row (20, 200); replica
identity (i)=(20).

We somehow need to reuse it.

>
> > 3)
> > Do we need to have a timestamp column as well to say when conflict was
> > recorded? Or local_commit_ts, remote_commit_ts are sufficient?
> > Thoughts
>
> You mean we can record the timestamp now while inserting, not sure if
> it will add some more meaningful information than remote_commit_ts,
> but let's see what others think.
>

On rethinking, we can skip it. The commit-ts of both sides are enough.

> > 4)
> > Also, it makes sense if we have 'conflict_type' next to 'relid'. I
> > feel relid and conflict_type are primary columns and rest are related
> > details.
>
> Sure
>
> > 5)
> > Do we need table_schema, table_name when we have relid already? If we
> > want to retain these, we can name them as schemaname and relname to be
> > consistent with all other stats tables. IMO, then the order can be:
> > relid, schemaname, relname, conflcit_type and then the rest of the
> > details.
>
> Yeah this makes the table denormalized as we can fetch this
> information by joining with pg_class, but I think it might be better
> for readability, lets see what others think, for now I will reorder as
> suggested.
>

Okay, works for me if we want to keep these. I see that most of the
other statistics tables (pg_stat_all_indexes, pg_statio_all_tables,
pg_statio_all_sequences etc)  that maintain a relid also retain the
names.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Nov 19, 2025 at 7:01 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Dilip.
>
> I started to look at this thread. Here are some comments for patch v4-0001.

Thanks Peter for your review, worked on most of the comments for 0001
>
> =====
> GENERAL
>
> 1.
> There's some inconsistency in how this new table is called at different times :
> a) "conflict table"
> b) "conflict log table"
> c) "conflict log history table"
> d) "conflict history"
>
> My preference was (b). Making this consistent will have impacts on
> many macros, variables, comments, function names, etc.

Yeah even my preference is b) so used everywhere.

> ~~~
>
> 2.
> What about enhancements to description \dRs+ so the subscription
> conflict log table is displayed?

Done, I have displayed the conflict log table name, not sure shall we
display complete schema qualified name, if so we might need to join
with pg_namespace.

> ~~~
>
> 3.
> What about enhancements to the tab-complete code?

Done

> ======
> src/backend/commands/subscriptioncmds.c
>
> 4.
>  #define SUBOPT_MAX_RETENTION_DURATION 0x00008000
>  #define SUBOPT_LSN 0x00010000
>  #define SUBOPT_ORIGIN 0x00020000
> +#define SUBOPT_CONFLICT_TABLE 0x00030000
>
> Bug? Shouldn't that be 0x00040000.

Yeah, fixed.

> ~~~
>
> 5.
> + char    *conflicttable;
>   XLogRecPtr lsn;
>  } SubOpts;
>
> IMO 'conflicttable' looks too much like 'conflictable', which may
> cause some confusion on first reading.

Changed to conflictlogtable

> ~~~
>
> 6.
> +static void CreateConflictLogTable(Oid namespaceId, char *conflictrel);
> +static void DropConflictLogTable(Oid namespaceId, char *conflictrel);
>
> AFAIK it is more conventional for the static functions to be
> snake_case and the extern functions to use CamelCase. So these would
> be:
> - create_conflict_log_table
> - drop_conflict_log_table

Done

> ~~~
>
> CreateSubscription:
>
> 7.
> + /* If conflict log table name is given than create the table. */
> + if (opts.conflicttable)
> + CreateConflictLogTable(conflict_table_nspid, conflict_table);
> +
>
> typo: /If conflict/If a conflict/
>
> typo: "than"

Fixed

> ~~~
>
> AlterSubscription:
>
> 8.
> -   SUBOPT_ORIGIN);
> +   SUBOPT_ORIGIN |
> +   SUBOPT_CONFLICT_TABLE);
>
> The line wrapping doesn't seem necessary.

Without wrapping it crosses 80 characters per line limit.

> ~~~
>
> 9.
> + replaces[Anum_pg_subscription_subconflictnspid - 1] = true;
> + replaces[Anum_pg_subscription_subconflicttable - 1] = true;
> +
> + CreateConflictLogTable(nspid, relname);
> + }
> +
>
> What are the rules regarding replacing one log table with a different
> log table for the same subscription? I didn't see anything about this
> scenario, nor any test cases.

Added test and updated the code as well, so if we set different log
table, we will drop the old and create new table, however if you set
the same table, just NOTICE will be issued and table will not be
created again.

> ~~~
>
> CreateConflictLogTable:
>
> 10.
> + /*
> + * Check if table with same name already present, if so report an error
> + * as currently we do not support user created table as conflict log
> + * table.
> + */
>
> Is the comment about "user-created table" strictly correct? e.g. Won't
> you encounter the same problem if there are 2 subscriptions trying to
> set the same-named conflict log table?
>
> SUGGESTION
> Report an error if the specified conflict log table already exists.

Done

> ~~~
>
> DropConflictLogTable:
>
> 11.
> + /*
> + * Drop conflict log table if exist, use if exists ensures the command
> + * won't error if the table is already gone.
> + */
>
> The reason for EXISTS was already mentioned in the function comment.
>
> SUGGESTION
> Drop the conflict log table if it exists.

Done

> ======
> src/backend/replication/logical/conflict.c
>
> 12.
> +static Datum TupleTableSlotToJsonDatum(TupleTableSlot *slot);
> +
> +static void InsertConflictLog(Relation rel,
> +   TransactionId local_xid,
> +   TimestampTz local_ts,
> +   ConflictType conflict_type,
> +   RepOriginId origin_id,
> +   TupleTableSlot *searchslot,
> +   TupleTableSlot *localslot,
> +   TupleTableSlot *remoteslot);
>
> Same as earlier comment #6 -- isn't it conventional to use snake_case
> for the static function names?

Done

> ~~~
>
> TupleTableSlotToJsonDatum:
>
> 13.
> + * This would be a new internal helper function for logical replication
> + * Needs to handle various data types and potentially TOASTed data
>
> What's this comment about? Something doesn't look quite right.

Hmm, that's bad, fixed.

> ~~~
>
> InsertConflictLog:
>
> 14.
> + /* TODO: proper error code */
> + relid = get_relname_relid(relname, nspid);
> + if (!OidIsValid(relid))
> + elog(ERROR, "conflict log history table does not exists");
> + conflictrel = table_open(relid, RowExclusiveLock);
> + if (conflictrel == NULL)
> + elog(ERROR, "could not open conflict log history table");
>
> 14a.
> What's the TODO comment for? Are you going to replace these elogs?

replaced with ereport
> ~
>
> 14b.
> Typo: "does not exists"

fixed

> ~
>
> 14c.
> An unnecessary double-blank line follows this code fragment.

fixed

> ~~~
>
> 15.
> + /* Populate the values and nulls arrays */
> + attno = 0;
> + values[attno] = ObjectIdGetDatum(RelationGetRelid(rel));
> + attno++;
> +
> + if (TransactionIdIsValid(local_xid))
> + values[attno] = TransactionIdGetDatum(local_xid);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (TransactionIdIsValid(remote_xid))
> + values[attno] = TransactionIdGetDatum(remote_xid);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + values[attno] = LSNGetDatum(remote_final_lsn);
> + attno++;
> +
> + if (local_ts > 0)
> + values[attno] = TimestampTzGetDatum(local_ts);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (remote_commit_ts > 0)
> + values[attno] = TimestampTzGetDatum(remote_commit_ts);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + values[attno] =
> + CStringGetTextDatum(get_namespace_name(RelationGetNamespace(rel)));
> + attno++;
> +
> + values[attno] = CStringGetTextDatum(RelationGetRelationName(rel));
> + attno++;
> +
> + values[attno] = CStringGetTextDatum(ConflictTypeNames[conflict_type]);
> + attno++;
> +
> + if (origin_id != InvalidRepOriginId)
> + replorigin_by_oid(origin_id, true, &origin);
> +
> + if (origin != NULL)
> + values[attno] = CStringGetTextDatum(origin);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (replorigin_session_origin != InvalidRepOriginId)
> + replorigin_by_oid(replorigin_session_origin, true, &remote_origin);
> +
> + if (remote_origin != NULL)
> + values[attno] = CStringGetTextDatum(remote_origin);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (searchslot != NULL)
> + values[attno] = TupleTableSlotToJsonDatum(searchslot);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (localslot != NULL)
> + values[attno] = TupleTableSlotToJsonDatum(localslot);
> + else
> + nulls[attno] = true;
> + attno++;
> +
> + if (remoteslot != NULL)
> + values[attno] = TupleTableSlotToJsonDatum(remoteslot);
> + else
> + nulls[attno] = true;
> +
>
> 15a.
> It might be simpler to just post-increment that 'attno' in all the
> assignments and save a dozen lines of code:
> e.g. values[attno++] = ...

Yeah done that

> ~
>
> 15b.
> Also, put a sanity Assert check at the end, like:
> Assert(attno + 1 == MAX_CONFLICT_ATTR_NUM);

Done
>
> ======
> src/backend/utils/cache/lsyscache.c
>
> 16.
> + if (isnull)
> + {
> + ReleaseSysCache(tup);
> + return NULL;
> + }
> +
> + *nspid = subform->subconflictnspid;
> + relname = pstrdup(TextDatumGetCString(datum));
> +
> + ReleaseSysCache(tup);
> +
> + return relname;
>
> It would be tidier to have a single release/return by coding this
> slightly differently.
>
> SUGGESTION:
>
> char *relname = NULL;
> ...
> if (!isnull)
> {
>   *nspid = subform->subconflictnspid;
>   relname = pstrdup(TextDatumGetCString(datum));
> }
>
> ReleaseSysCache(tup);
> return relname;

Right, changed it.

> ======
> src/include/catalog/pg_subscription.h
>
> 17.
> + Oid subconflictnspid; /* Namespace Oid in which the conflict history
> + * table is created. */
>
> Would it be better to make these 2 new member names more alike, since
> they go together. e.g.
> confl_table_nspid
> confl_table_name

In pg_subscription.h all field follows same convention without "_" so
I have changed to

subconflictlognspid
subconflictlogtable


> ======
> src/include/replication/conflict.h
>
> 18.
> +#define MAX_CONFLICT_ATTR_NUM 15
>
> I felt this doesn't really belong here. Just define it atop/within the
> function InsertConflictLog()

Done
> ~~~
>
> 19.
>  extern void InitConflictIndexes(ResultRelInfo *relInfo);
> +
>  #endif
>
> Spurious whitespace change not needed for this patch.

Fixed

> ======
> src/test/regress/sql/subscription.sql
>
> 20.
> How about adding some more test scenarios:
> e.g.1. ALTER the conflict log table of some subscription that already has one
> e.g.2. Have multiple subscriptions that specify the same conflict log table

Added

Pending:
1) fixed review comments of 0002 and 0003
2) Need to add replica identity tuple instead of full tuple - reported by Shveta
3) Keeping the logs in case of outer transaction failure by moving log
insertion outside the main transaction - reported by Shveta

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Thanks for addressing all my previous review comment of v4.

Here are some more comments for the latest  patch v5-0001.

======
GENERAL

1.
There are still a couple of place remainig where this new table was
not consistent called a "Conflict Log Table" (e.g. search for
"history")

e.g. Subject: [PATCH v5] Add configurable conflict log history table
for Logical Replication
e.g. + /* Insert conflict details to log history table. */
e.g. +-- CONFLICT LOG HISTORY TABLE TESTS

~~~

2.
Is automatically dropping the log tables always what the user might
want to happen? Maybe someone want them lying around afterwards for
later analysis -- I don't really know the answer; Just wondering if
this is (a) good to be tidy or (b) bad to remove user flexibility. Or
maybe the answer is leave if but make sure to add more documentation
to say "if you are going to want to do some post analysis then be sure
to copy this table data before it gets automatically dropped".

======
Commit message.

3.
User-Defined Table: The conflict log is stored in a user-managed table
rather than a system catalog.

~

I felt "User-defined" makes it sound like the user does CREATE TABLE
themselves and has some control over the schema. Maybe say
"User-Managed Table:" instead?

======
src/backend/commands/subscriptioncmds.c

4.
 #define SUBOPT_LSN 0x00010000
 #define SUBOPT_ORIGIN 0x00020000
+#define SUBOPT_CONFLICT_LOG_TABLE 0x00040000

Whitespace alignment.

~~~

AlterSubscription:

5.
+ values[Anum_pg_subscription_subconflictlognspid - 1] =
+ ObjectIdGetDatum(nspid);
+ values[Anum_pg_subscription_subconflictlogtable - 1] =
+ CStringGetTextDatum(relname);
+
+ replaces[Anum_pg_subscription_subconflictlognspid - 1] = true;
+ replaces[Anum_pg_subscription_subconflictlogtable - 1] = true;

Something feels back-to-front, because if the same clt is being
re-used (like the NOTICE part taht follows) then why do you need to
reassign and say replaces[] = true here?

~~~

6.
+ /*
+ * If the subscription already has the conflict log table
+ * set to the exact same name and namespace currently being
+ * specified, and that table exists, just give notice and
+ * skip creation.
+ */

Is there a simpler way to say the same thing?

SUGGESTION
If the subscription already uses this conflict log table and it
exists, just issue a notice.

~~~

7.
+ ereport(NOTICE,
+ (errmsg("skipping table creation because \"%s.%s\" is already set as
conflict log table",
+ nspname, relname)));

I wasn't sure you need to say "skipping table creation because"... it
seems kind of internal details. How about just:

\"%s.%s\" is already in use as the conflict log table for this subscription

~~~

8.
+ /*
+ * Drop the existing conflict log table if we are
+ * setting a new table.
+ */

The comment didn't feel right by implying there is something to drop.

SUGGESTION
Create the conflict log table after dropping any pre-existing one.

~~~

drop_conflict_log_table:

9.
+ /* Drop the conflict log table if it exist. */

typo: /exist./exists./

======
src/backend/replication/logical/conflict.c

10.
+static Datum
+tuple_table_slot_to_json_datum(TupleTableSlot *slot)
+{
+ HeapTuple tuple = ExecCopySlotHeapTuple(slot);
+ Datum datum = heap_copy_tuple_as_datum(tuple, slot->tts_tupleDescriptor);
+ Datum json;
+
+ if (TupIsNull(slot))
+ return 0;
+
+ json = DirectFunctionCall1(row_to_json, datum);
+ heap_freetuple(tuple);
+
+ return json;
+}

Bug? Shouldn't that TupIsNull(slot) check *precede* using that slot
for the tuple/datum assignments?

~~~

insert_conflict_log:

11.
+ Datum values[MAX_CONFLICT_ATTR_NUM];
+ bool nulls[MAX_CONFLICT_ATTR_NUM];
+ Oid nspid;
+ Oid relid;
+ Relation conflictrel = NULL;
+ int attno;
+ int options = HEAP_INSERT_NO_LOGICAL;
+ char    *relname;
+ char    *origin = NULL;
+ char    *remote_origin = NULL;
+ HeapTuple tup;

I felt some of these var names can be confusing:

11A.
e.g. "conflictlogrel" (instead of 'conflictrel') would emphasise this
is the rel of the log file, not the rel that encountered a conflict.

~

11B.
Similarly, maybe 'relname' could be 'conflictlogtable', which is also
what it was called elsewhere.

~

11C.
AFAICT, the 'relid' is really the relid of the conflict log. So, maybe
name it as it 'confliglogreid', otherwise it seems confusing when
there is already parameter called 'rel' that is unrelated to thia
'relid'.

~~~

12.
+ if (searchslot != NULL)
+ values[attno++] = tuple_table_slot_to_json_datum(searchslot);
+ else
+ nulls[attno++] = true;
+
+ if (localslot != NULL)
+ values[attno++] = tuple_table_slot_to_json_datum(localslot);
+ else
+ nulls[attno++] = true;
+
+ if (remoteslot != NULL)
+ values[attno++] = tuple_table_slot_to_json_datum(remoteslot);
+ else
+ nulls[attno++] = true;

That function tuple_table_slot_to_json_datum() has potential to return
0. Is that something that needs checking, so you can assign nulls[] =
true?

======
src/backend/replication/logical/worker.c

13.
+char *
+get_subscription_conflict_log_table(Oid subid, Oid *nspid)
+{
+ HeapTuple tup;
+ Datum datum;
+ bool isnull;
+ char    *relname = NULL;
+ Form_pg_subscription subform;
+
+ tup = SearchSysCache1(SUBSCRIPTIONOID, ObjectIdGetDatum(subid));
+
+ if (!HeapTupleIsValid(tup))
+ return NULL;
+
+ subform = (Form_pg_subscription) GETSTRUCT(tup);
+
+ /* Get conflict log table name. */
+ datum = SysCacheGetAttr(SUBSCRIPTIONOID,
+ tup,
+ Anum_pg_subscription_subconflictlogtable,
+ &isnull);
+ if (!isnull)
+ {
+ *nspid = subform->subconflictlognspid;
+ relname = pstrdup(TextDatumGetCString(datum));
+ }
+
+ ReleaseSysCache(tup);
+ return relname;
+}

You could consider assigning *nspid = InvalidOid when 'isnull' is
true, so then you don't have to rely on the caller pre-assigning a
default sane value. YMMV.

======
src/bin/psql/tab-complete.in.c

14.
- COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
+ COMPLETE_WITH("binary", "connect", "conflict_log_table",
"copy_data", "create_slot",

'conflict_log_table' comes before 'connect' alphabetically.

======
src/test/regress/sql/subscription.sql

15.
+-- ok - change the conlfict log table name for existing subscription
already had old table
+ALTER SUBSCRIPTION regress_conflict_test2 SET (conflict_log_table =
'public.regress_conflict_log3');
+SELECT subname, subconflictlogtable, subconflictlognspid = (SELECT
oid FROM pg_namespace WHERE nspname = 'public') AS is_public_schema
+FROM pg_subscription WHERE subname = 'regress_conflict_test2';
+

typos in comment.
- /conlfict/conlflict/
- /for existing subscription already had old table/for an existing
subscription that already had one/

~~~

16.
+-- check new table should be created and old should be dropped

SUGGESTION
check the new table was created and the old table was dropped

~~~

17.
+-- ok (NOTICE) - try to set the conflict log table which is used by
same subscription
+ALTER SUBSCRIPTION regress_conflict_test2 SET (conflict_log_table =
'public.regress_conflict_log3');
+
+-- fail - try to use the conflict log table being used by some other
subscription
+ALTER SUBSCRIPTION regress_conflict_test2 SET (conflict_log_table =
'public.regress_conflict_log1');

Make those 2 comment more alike:

SUGGESTIONS
-- ok (NOTICE) - set conflict_log_table to one already used by this subscription
...
-- fail - set conflict_log_table to one already used by a different subscription

~~~

18.
Missing tests for describe \dRs+.

e.g. there are already dozens of \dRs+ examples where there is no clt
assigned, but I did not see any tests where the clt *is* assigned.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Nov 20, 2025 at 5:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
I was working on these pending items, there is something where I got
stuck, I am exploring this more but would like to share the problem.

> 2) Need to add replica identity tuple instead of full tuple - reported by Shveta
I have worked on fixing this along with other comments by Peter, now
we can see only RI tuple is inserted as part of the key_tuple, IMHO
lets keep the name as key tuple as it will use the primary key or
unique key if no explicit replicate identity is set, thoughts?

postgres[3048044]=# select * from myschema.conflict_log_history2;
 relid | schemaname | relname |     conflict_type     | local_xid |
remote_xid | remote_commit_lsn |        local_commit_ts        |
remote_commit_ts        | local_o
rigin | remote_origin | key_tuple |  local_tuple   |  remote_tuple

-------+------------+---------+-----------------------+-----------+------------+-------------------+-------------------------------+-------------------------------+--------
------+---------------+-----------+----------------+----------------
 16385 | public     | test    | update_origin_differs |       765 |
    759 | 0/0174F2E8        | 2025-11-24 06:16:50.468263+00 |
2025-11-24 06:16:55.483507+00 |
      | pg_16396      | {"a":1}   | {"a":1,"b":10} | {"a":1,"b":20}

Now pending work status
1) fixed review comments of 0002 and 0003 - Pending
2) Need to add replica identity tuple instead of full tuple -- Done
3) Keeping the logs in case of outer transaction failure by moving log
insertion outside the main transaction - reported by Shveta - Pending
4) Run pgindent -- planning to do it after we complete the first level
of review - Pending
5) Subscription test cases for logging the actual conflicts - Pending



--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi Dilip.

Here are a couple of review comments for v6-0001.

======
GENERAL.

1.
Firstly, here is one of my "what if" ideas...

The current patch is described as making a "structured, queryable
record of all logical replication conflicts".

What if we go bigger than that? What if this were made a more generic
"structured, queryable record of logical replication activity"?

AFAIK, there don't have to be too many logic changes to achieve this.
e.g. I'm imagining it is mostly:

* Rename the subscription parameter "conflict_log_table" to
"log_table" or similar.
* Remove/modify the "conflict_" name part from many of the variables
and function names.
* Add another 'type' column to the log table -- e.g. everything this
patch writes can be type="CONFL", or type='c', or whatever.
* Maybe tweak/add some of the other columns for more generic future use

Anyway, it might be worth considering this now, before everything
becomes set in stone with a conflict-only focus, making it too
difficult to add more potential/unknown log table enhancements later.

Thoughts?

======
src/backend/replication/logical/conflict.c

2.
+#include "funcapi.h"
+#include "funcapi.h"

double include of the same header.

~~~

3.
+static Datum tuple_table_slot_to_ri_json_datum(EState *estate,
+    Relation localrel,
+    Oid replica_index,
+    TupleTableSlot *slot);
+
+static void insert_conflict_log(EState *estate, Relation rel,
+ TransactionId local_xid,
+ TimestampTz local_ts,
+ ConflictType conflict_type,
+ RepOriginId origin_id,
+ TupleTableSlot *searchslot,
+ TupleTableSlot *localslot,
+ TupleTableSlot *remoteslot);

There were no spaces between any of the other static declarations, so
why is this one different?

~~~

insert_conflict_log:

insert_conflict_log:

4.
+#define MAX_CONFLICT_ATTR_NUM 15
+ Datum values[MAX_CONFLICT_ATTR_NUM];
+ bool nulls[MAX_CONFLICT_ATTR_NUM];
+ Oid nspid;
+ Oid confliglogreid;
+ Relation conflictlogrel = NULL;
+ int attno;
+ int options = HEAP_INSERT_NO_LOGICAL;
+ char    *conflictlogtable;
+ char    *origin = NULL;
+ char    *remote_origin = NULL;
+ HeapTuple tup;

Typo: Oops. Looks like that typo originated from my previous review
comment, and you took it as-is.

/confliglogreid/confliglogrelid/

~~~

5.
+ if (searchslot != NULL && !TupIsNull(searchslot))
  {
- tableslot = table_slot_create(localrel, &estate->es_tupleTable);
- tableslot = ExecCopySlot(tableslot, slot);
+ Oid replica_index = GetRelationIdentityOrPK(rel);
+
+ /*
+ * If the table has a valid replica identity index, build the index
+ * json datum from key value. Otherwise, construct it from the complete
+ * tuple in REPLICA IDENTITY FULL cases.
+ */
+ if (OidIsValid(replica_index))
+ values[attno++] = tuple_table_slot_to_ri_json_datum(estate, rel,
+ replica_index,
+ searchslot);
+ else
+ values[attno++] = tuple_table_slot_to_json_datum(searchslot);
  }
+ else
+ nulls[attno++] = true;

- /*
- * Initialize ecxt_scantuple for potential use in FormIndexDatum when
- * index expressions are present.
- */
- GetPerTupleExprContext(estate)->ecxt_scantuple = tableslot;
+ if (localslot != NULL && !TupIsNull(localslot))
+ values[attno++] = tuple_table_slot_to_json_datum(localslot);
+ else
+ nulls[attno++] = true;

- /*
- * The values/nulls arrays passed to BuildIndexValueDescription should be
- * the results of FormIndexDatum, which are the "raw" input to the index
- * AM.
- */
- FormIndexDatum(BuildIndexInfo(indexDesc), tableslot, estate, values, isnull);
+ if (remoteslot != NULL && !TupIsNull(remoteslot))
+ values[attno++] = tuple_table_slot_to_json_datum(remoteslot);
+ else
+ nulls[attno++] = true;

AFAIK, the TupIsNull() already includes the NULL check anyway, so you
don't need to double up those. I saw at least 3 conditions above where
the code could be simpler. e.g.

BEFORE
+ if (remoteslot != NULL && !TupIsNull(remoteslot))

SUGGESTION
if (!TupIsNull(remoteslot))

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Nov 25, 2025 at 9:03 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Dilip.
>
> Here are a couple of review comments for v6-0001.
>
> ======
> GENERAL.
>
> 1.
> Firstly, here is one of my "what if" ideas...
>
> The current patch is described as making a "structured, queryable
> record of all logical replication conflicts".
>
> What if we go bigger than that? What if this were made a more generic
> "structured, queryable record of logical replication activity"?
>
> AFAIK, there don't have to be too many logic changes to achieve this.
> e.g. I'm imagining it is mostly:
>
> * Rename the subscription parameter "conflict_log_table" to
> "log_table" or similar.
> * Remove/modify the "conflict_" name part from many of the variables
> and function names.
> * Add another 'type' column to the log table -- e.g. everything this
> patch writes can be type="CONFL", or type='c', or whatever.
> * Maybe tweak/add some of the other columns for more generic future use
>
> Anyway, it might be worth considering this now, before everything
> becomes set in stone with a conflict-only focus, making it too
> difficult to add more potential/unknown log table enhancements later.
>
> Thoughts?

Yeah that's an interesting thought for sure, but honestly I believe
the conflict log table only for storing the conflict and conflict
resolution related data is standard followed across the databases who
provide active-active setup e.g. Oracle Golden Gate, BDR, pg active,
so IMHO to keep the feature clean and focused, we should follow the
same.

I will work on other review comments and post the patch soon.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Nov 25, 2025 at 1:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>

On a separate note, I've been considering how to manage conflict log
insertions when an error causes the outer transaction to abort, which
seems to be a non-trivial.

Here is what I have in mind:
======================
First, prepare_conflict_log() would be executed from
ReportApplyConflict(). This function would handle all preliminary
work, such as preparing the tuple for the conflict log table. Second,
insert_conflict_log() would be executed. If the error level in
ReportApplyConflict() is LOG, the insertion would occur directly.
Otherwise, the log information would be stored in a global variable
and inserted in a separate transaction once we exit start_apply() due
to the error.

@shveta malik @Amit Kapila let me know what you think?  Or do you
think it can be simplified?


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Nov 25, 2025 at 4:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 25, 2025 at 1:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
>
> On a separate note, I've been considering how to manage conflict log
> insertions when an error causes the outer transaction to abort, which
> seems to be a non-trivial.
>
> Here is what I have in mind:
> ======================
> First, prepare_conflict_log() would be executed from
> ReportApplyConflict(). This function would handle all preliminary
> work, such as preparing the tuple for the conflict log table. Second,
> insert_conflict_log() would be executed. If the error level in
> ReportApplyConflict() is LOG, the insertion would occur directly.
> Otherwise, the log information would be stored in a global variable
> and inserted in a separate transaction once we exit start_apply() due
> to the error.
>
> @shveta malik @Amit Kapila let me know what you think?  Or do you
> think it can be simplified?

While digging more into this I am wondering why
CT_MULTIPLE_UNIQUE_CONFLICTS is reported as an error and all other
conflicts as LOG?

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Tue, Nov 25, 2025 at 4:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 25, 2025 at 1:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
>
> On a separate note, I've been considering how to manage conflict log
> insertions when an error causes the outer transaction to abort, which
> seems to be a non-trivial.
>
> Here is what I have in mind:
> ======================
> First, prepare_conflict_log() would be executed from
> ReportApplyConflict(). This function would handle all preliminary
> work, such as preparing the tuple for the conflict log table. Second,
> insert_conflict_log() would be executed. If the error level in
> ReportApplyConflict() is LOG, the insertion would occur directly.
> Otherwise, the log information would be stored in a global variable
> and inserted in a separate transaction once we exit start_apply() due
> to the error.
>
> @shveta malik @Amit Kapila let me know what you think?  Or do you
> think it can be simplified?
>

I could not think of a better way. This idea works for me. I had
doubts if it will be okay to start a new transaction in catch-block
(if we plan to do it in start_apply's), but then I found few other
functions doing it (see do_autovacuum, perform_work_item,
_SPI_commit). So IMO, we should be good.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Nov 26, 2025 at 2:05 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Nov 25, 2025 at 4:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Nov 25, 2025 at 1:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> >
> > On a separate note, I've been considering how to manage conflict log
> > insertions when an error causes the outer transaction to abort, which
> > seems to be a non-trivial.
> >
> > Here is what I have in mind:
> > ======================
> > First, prepare_conflict_log() would be executed from
> > ReportApplyConflict(). This function would handle all preliminary
> > work, such as preparing the tuple for the conflict log table. Second,
> > insert_conflict_log() would be executed. If the error level in
> > ReportApplyConflict() is LOG, the insertion would occur directly.
> > Otherwise, the log information would be stored in a global variable
> > and inserted in a separate transaction once we exit start_apply() due
> > to the error.
> >
> > @shveta malik @Amit Kapila let me know what you think?  Or do you
> > think it can be simplified?
> >
>
> I could not think of a better way. This idea works for me. I had
> doubts if it will be okay to start a new transaction in catch-block
> (if we plan to do it in start_apply's), but then I found few other
> functions doing it (see do_autovacuum, perform_work_item,
> _SPI_commit). So IMO, we should be good.
>

On re-reading, I think you were not suggesting to handle it in the
CATCH block. Where exactly once we exit start_apply?
But since the situation will arise only in case of ERROR, I thought
handling in catch-block could be one option.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Nov 26, 2025 at 4:15 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Nov 26, 2025 at 2:05 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Tue, Nov 25, 2025 at 4:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Nov 25, 2025 at 1:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > >
> > > On a separate note, I've been considering how to manage conflict log
> > > insertions when an error causes the outer transaction to abort, which
> > > seems to be a non-trivial.
> > >
> > > Here is what I have in mind:
> > > ======================
> > > First, prepare_conflict_log() would be executed from
> > > ReportApplyConflict(). This function would handle all preliminary
> > > work, such as preparing the tuple for the conflict log table. Second,
> > > insert_conflict_log() would be executed. If the error level in
> > > ReportApplyConflict() is LOG, the insertion would occur directly.
> > > Otherwise, the log information would be stored in a global variable
> > > and inserted in a separate transaction once we exit start_apply() due
> > > to the error.
> > >
> > > @shveta malik @Amit Kapila let me know what you think?  Or do you
> > > think it can be simplified?
> > >
> >
> > I could not think of a better way. This idea works for me. I had
> > doubts if it will be okay to start a new transaction in catch-block
> > (if we plan to do it in start_apply's), but then I found few other
> > functions doing it (see do_autovacuum, perform_work_item,
> > _SPI_commit). So IMO, we should be good.
> >
>
> On re-reading, I think you were not suggesting to handle it in the
> CATCH block. Where exactly once we exit start_apply?
> But since the situation will arise only in case of ERROR, I thought
> handling in catch-block could be one option.

Yeah it makes sense to handle in catch block, I have done that in the
attached patch and also handled other comments by Peter.

Now pending work status
1) fixed review comments of 0002 and 0003 - Pending
2) Need to add replica identity tuple instead of full tuple -- Done
3) Keeping the logs in case of outer transaction failure by moving log
insertion outside the main transaction - reported by Shveta - Done
(might need more validation and testing)
4) Run pgindent -- planning to do it after we complete the first level
of review - Pending
5) Subscription test cases for logging the actual conflicts - Pending

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi Dilip. Some review comments for v7-0001.

======
src/backend/replication/logical/conflict.c

1.
+ /* Insert conflict details to conflict log table. */
+ if (conflictlogrel)
+ {
+ /*
+ * Prepare the conflict log tuple. If the error level is below
+ * ERROR, insert it immediately. Otherwise, defer the insertion to
+ * a new transaction after the current one aborts, ensuring the log
+ * tuple is not rolled back.
+ */
+ conflictlogtuple = prepare_conflict_log_tuple(estate,
+ relinfo->ri_RelationDesc,
+ conflictlogrel,
+ conflicttuple->xmin,
+ conflicttuple->ts, type,
+ conflicttuple->origin,
+ searchslot, conflicttuple->slot,
+ remoteslot);
+ if (elevel < ERROR)
+ {
+ InsertConflictLogTuple(conflictlogrel, conflictlogtuple);
+ heap_freetuple(conflictlogtuple);
+ }
+ else
+ MyLogicalRepWorker->conflict_log_tuple = conflictlogtuple;
+
+ table_close(conflictlogrel, AccessExclusiveLock);
+ }
+ }
+

IMO, some refactoring would help simplify conflictlogtuple processing. e.g.

i)   You don't need any separate 'conflictlogtuple' var
- Use MyLogicalRepWorker->conflict_log_tuple always for this purpose
ii)  prepare_conflict_log_tuple()
- Change this to a void; it will always side-effect
MyLogicalRepWorker->conflict_log_tuple
- Assert MyLogicalRepWorker->conflict_log_tuple must be NULL on entry
iii) InsertConflictLogTuple()
- The 2nd param it not needed if you always use
MyLogicalRepWorker->conflict_log_tuple
- Asserts MyLogicalRepWorker->conflict_log_tuple is not NULL, then writes it
- BTW, I felt that heap_freetuple could also be done here too
- Finally, sets to MyLogicalRepWorker->conflict_log_tuple to NULL
(ready for the next conflict)

~~~

InsertConflictLogTuple:

2.
+/*
+ * InsertConflictLogTuple
+ *
+ * Persistently records the input conflict log tuple into the conflict log
+ * table. It uses HEAP_INSERT_NO_LOGICAL to explicitly block logical decoding
+ * of the tuple inserted into the conflict log table.
+ */
+void
+InsertConflictLogTuple(Relation conflictlogrel, HeapTuple tup)
+{
+ int options = HEAP_INSERT_NO_LOGICAL;
+
+ heap_insert(conflictlogrel, tup, GetCurrentCommandId(true), options, NULL);
+}

See the above review comment (iii), for some suggested changes to this function.

~~~

prepare_conflict_log_tuple:

3.
+ * The caller is responsible for explicitly freeing the returned heap tuple
+ * after inserting.
+ */
+static HeapTuple
+prepare_conflict_log_tuple(EState *estate, Relation rel,

As per the above review comment (iii), I thought the Insert function
could handle the freeing.

~~~

4.
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+ tup = heap_form_tuple(RelationGetDescr(conflictlogrel), values, nulls);
+ MemoryContextSwitchTo(oldctx);

- return index_value;
+ return tup;

Per the above comment (ii), change this to assign to
MyLogicalRepWorker->conflict_log_tuple.

======
src/backend/replication/logical/worker.c

start_apply:

5.
+ /*
+ * Insert the pending conflict log tuple under a new transaction.
+ */

/Insert the/Insert any/

~~~

6.
+ InsertConflictLogTuple(conflictlogrel,
+    MyLogicalRepWorker->conflict_log_tuple);
+ heap_freetuple(MyLogicalRepWorker->conflict_log_tuple);
+ MyLogicalRepWorker->conflict_log_tuple = NULL;

Per earlier reqview comment (iii), remove the 2nd parm to
InsertConflictLogTuple, and those other 2 statements can also be
handled within InsertConflictLogTuple.

======
src/include/replication/worker_internal.h

7.
+ /* Store conflict log tuple to be inserted before worker exit. */
+ HeapTuple conflict_log_tuple;
+

Per my above suggestions, this member comment becomes something more
like "A conflict log tuple which is prepared but not yet written. */

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Nov 27, 2025 at 6:30 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Dilip. Some review comments for v7-0001.
>
> ======
> src/backend/replication/logical/conflict.c
>
> 1.
> + /* Insert conflict details to conflict log table. */
> + if (conflictlogrel)
> + {
> + /*
> + * Prepare the conflict log tuple. If the error level is below
> + * ERROR, insert it immediately. Otherwise, defer the insertion to
> + * a new transaction after the current one aborts, ensuring the log
> + * tuple is not rolled back.
> + */
> + conflictlogtuple = prepare_conflict_log_tuple(estate,
> + relinfo->ri_RelationDesc,
> + conflictlogrel,
> + conflicttuple->xmin,
> + conflicttuple->ts, type,
> + conflicttuple->origin,
> + searchslot, conflicttuple->slot,
> + remoteslot);
> + if (elevel < ERROR)
> + {
> + InsertConflictLogTuple(conflictlogrel, conflictlogtuple);
> + heap_freetuple(conflictlogtuple);
> + }
> + else
> + MyLogicalRepWorker->conflict_log_tuple = conflictlogtuple;
> +
> + table_close(conflictlogrel, AccessExclusiveLock);
> + }
> + }
> +
>
> IMO, some refactoring would help simplify conflictlogtuple processing. e.g.
>
> i)   You don't need any separate 'conflictlogtuple' var
> - Use MyLogicalRepWorker->conflict_log_tuple always for this purpose
> ii)  prepare_conflict_log_tuple()
> - Change this to a void; it will always side-effect
> MyLogicalRepWorker->conflict_log_tuple
> - Assert MyLogicalRepWorker->conflict_log_tuple must be NULL on entry
> iii) InsertConflictLogTuple()
> - The 2nd param it not needed if you always use
> MyLogicalRepWorker->conflict_log_tuple
> - Asserts MyLogicalRepWorker->conflict_log_tuple is not NULL, then writes it
> - BTW, I felt that heap_freetuple could also be done here too
> - Finally, sets to MyLogicalRepWorker->conflict_log_tuple to NULL
> (ready for the next conflict)
>
> ~~~
>
> InsertConflictLogTuple:
>
> 2.
> +/*
> + * InsertConflictLogTuple
> + *
> + * Persistently records the input conflict log tuple into the conflict log
> + * table. It uses HEAP_INSERT_NO_LOGICAL to explicitly block logical decoding
> + * of the tuple inserted into the conflict log table.
> + */
> +void
> +InsertConflictLogTuple(Relation conflictlogrel, HeapTuple tup)
> +{
> + int options = HEAP_INSERT_NO_LOGICAL;
> +
> + heap_insert(conflictlogrel, tup, GetCurrentCommandId(true), options, NULL);
> +}
>
> See the above review comment (iii), for some suggested changes to this function.
>
> ~~~
>
> prepare_conflict_log_tuple:
>
> 3.
> + * The caller is responsible for explicitly freeing the returned heap tuple
> + * after inserting.
> + */
> +static HeapTuple
> +prepare_conflict_log_tuple(EState *estate, Relation rel,
>
> As per the above review comment (iii), I thought the Insert function
> could handle the freeing.
>
> ~~~
>
> 4.
> + oldctx = MemoryContextSwitchTo(ApplyContext);
> + tup = heap_form_tuple(RelationGetDescr(conflictlogrel), values, nulls);
> + MemoryContextSwitchTo(oldctx);
>
> - return index_value;
> + return tup;
>
> Per the above comment (ii), change this to assign to
> MyLogicalRepWorker->conflict_log_tuple.
>
> ======
> src/backend/replication/logical/worker.c
>
> start_apply:
>
> 5.
> + /*
> + * Insert the pending conflict log tuple under a new transaction.
> + */
>
> /Insert the/Insert any/
>
> ~~~
>
> 6.
> + InsertConflictLogTuple(conflictlogrel,
> +    MyLogicalRepWorker->conflict_log_tuple);
> + heap_freetuple(MyLogicalRepWorker->conflict_log_tuple);
> + MyLogicalRepWorker->conflict_log_tuple = NULL;
>
> Per earlier reqview comment (iii), remove the 2nd parm to
> InsertConflictLogTuple, and those other 2 statements can also be
> handled within InsertConflictLogTuple.
>
> ======
> src/include/replication/worker_internal.h
>
> 7.
> + /* Store conflict log tuple to be inserted before worker exit. */
> + HeapTuple conflict_log_tuple;
> +
>
> Per my above suggestions, this member comment becomes something more
> like "A conflict log tuple which is prepared but not yet written. */
>

I have fixed all these comments and also the comments of 0002, now I
feel we can actually merge 0001 and 0002, so I have merged both of
them.

Now pending work status
1) fixed review comments of 0003
2) Run pgindent -- planning to do it after we complete the first level
of review
3) Subscription TAP test for logging the actual conflicts

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi Dilip.

Some review comments for v8-0001.

======
Commit message

1.
When the patches 0001 and 0002 got merged, I think the commit message
should have been updated also to say something along the lines of:

When ALL TABLES or ALL TABLES IN SCHEMA is used with publication won't
publish the clt.

======
src/backend/catalog/pg_publication.c

check_publication_add_relation:

2.
+ /* Can't be conflict log table */
+ if (IsConflictLogRelid(RelationGetRelid(targetrel)))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot add relation \"%s\" to publication",
+ RelationGetRelationName(targetrel)),
+ errdetail("This operation is not supported for conflict log tables.")));

Should it also show the schema name of the clt in the message?

======
src/backend/commands/subscriptioncmds.c

3.
+/*
+ * Check if the specified relation is used as a conflict log table by any
+ * subscription.
+ */
+bool
+IsConflictLogRelid(Oid relid)

Most places refer to the clt. Wondering if this function ought to be
called 'IsConflictLogTable'.

======
src/backend/replication/logical/conflict.c

InsertConflictLogTuple:

4.
+ /* A valid tuple must be prepared and store into MyLogicalRepWorker. */

typo: /store into/stored in/

~~~

prepare_conflict_log_tuple:

5.
- index_close(indexDesc, NoLock);
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+ tup = heap_form_tuple(RelationGetDescr(conflictlogrel), values, nulls);
+ MemoryContextSwitchTo(oldctx);

- return index_value;
+ /* Store conflict_log_tuple into the worker slot for inserting it later. */
+ MyLogicalRepWorker->conflict_log_tuple = tup;

5a.
I don't think you need the 'tup' variable. Just assign directly to
MyLogicalRepWorker->conflict_log_tuple.

~

5b.
"worker slot" -- I don't think this is a "slot".

======
src/backend/replication/logical/worker.c

6.
+ /* Open conflict log table. */
+ conflictlogrel = GetConflictLogTableRel();
+ InsertConflictLogTuple(conflictlogrel);
+ MyLogicalRepWorker->conflict_log_tuple = NULL;
+ table_close(conflictlogrel, AccessExclusiveLock);

Maybe that comment should say:
/* Open conflict log table and write the tuple. */


======
src/include/replication/conflict.h

7.
+ /* A conflict log tuple which is prepared but not yet inserted. */
+ HeapTuple conflict_log_tuple;
+

typo: /which/that/  (sorry, this one is my bad from a previous review comment)


======
src/test/regress/expected/subscription.out

8.
+-- ok - change the conflict log table name for an existing
subscription that already had one
+CREATE SCHEMA clt;
+ALTER SUBSCRIPTION regress_conflict_test2 SET (conflict_log_table =
'clt.regress_conflict_log3');
+SELECT subname, subconflictlogtable, subconflictlognspid = (SELECT
oid FROM pg_namespace WHERE nspname = 'public') AS is_public_schema
+FROM pg_subscription WHERE subname = 'regress_conflict_test2';
+        subname         |  subconflictlogtable  | is_public_schema
+------------------------+-----------------------+------------------
+ regress_conflict_test2 | regress_conflict_log3 | f
+(1 row)
+
+\dRs+
+

                    List of subscriptions
+          Name          |           Owner           | Enabled |
Publication | Binary | Streaming | Two-phase commit | Disable on error
| Origin | Password required | Run as owner? | Failover | Retain dead
tuples | Max retention duration | Retention active | Synchronous
commit |          Conninfo           |  Skip LSN  |  Conflict log
table

+------------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------+------------------+--------------------+-----------------------------+------------+-----------------------
+ regress_conflict_test1 | regress_subscription_user | f       |
{testpub}   | f      | parallel  | d                | f
| any    | t                 | f             | f        | f
      |                      0 | f                | off
| dbname=regress_doesnotexist | 0/00000000 | regress_conflict_log1
+ regress_conflict_test2 | regress_subscription_user | f       |
{testpub}   | f      | parallel  | d                | f
| any    | t                 | f             | f        | f
      |                      0 | f                | off
| dbname=regress_doesnotexist | 0/00000000 | regress_conflict_log3
+(2 rows)

~

After going to the trouble of specifying the CLT on a different
schema, that information is lost by the \dRs+. How about also showing
the CLT schema name (at least when it is not "public") in the \dRs+
output.

~~~

9.
+-- ok - conflict_log_table should not be published with ALL TABLE
+CREATE PUBLICATION pub FOR TABLES IN SCHEMA clt;
+SELECT * FROM pg_publication_tables WHERE pubname = 'pub';
+ pubname | schemaname | tablename | attnames | rowfilter
+---------+------------+-----------+----------+-----------
+(0 rows)

Perhaps you should repeat this same test but using FOR ALL TABLES,
instead of only FOR TABLES IN SCHEMA

======
src/test/regress/sql/subscription.sql

10.
In one of the tests, you could call the function
pg_relation_is_publishable(clt) to verify that it returns false.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Thu, 27 Nov 2025 at 17:50, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 27, 2025 at 6:30 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> I have fixed all these comments and also the comments of 0002, now I
> feel we can actually merge 0001 and 0002, so I have merged both of
> them.

I just started to have a look at the patch, while using I found lock
level used is not correct:
I felt the reason is that table is opened with RowExclusiveLock but
closed in AccessExclusiveLock:

+       /* If conflict log table is not set for the subscription just return. */
+       conflictlogtable = get_subscription_conflict_log_table(
+
MyLogicalRepWorker->subid, &nspid);
+       if (conflictlogtable == NULL)
+       {
+               pfree(conflictlogtable);
+               return NULL;
+       }
+
+       conflictlogrelid = get_relname_relid(conflictlogtable, nspid);
+       if (OidIsValid(conflictlogrelid))
+               conflictlogrel = table_open(conflictlogrelid, RowExclusiveLock);

....
+                       if (elevel < ERROR)
+                               InsertConflictLogTuple(conflictlogrel);
+
+                       table_close(conflictlogrel, AccessExclusiveLock);
....

2025-11-28 12:17:55.631 IST [504133] WARNING:  you don't own a lock of
type AccessExclusiveLock
2025-11-28 12:17:55.631 IST [504133] CONTEXT:  processing remote data
for replication origin "pg_16402" during message type "INSERT" for
replication target relation "public.t1" in transaction 761, finished
at 0/01789AB8
2025-11-28 12:17:58.033 IST [504133] WARNING:  you don't own a lock of
type AccessExclusiveLock
2025-11-28 12:17:58.033 IST [504133] ERROR:  conflict detected on
relation "public.t1": conflict=insert_exists
2025-11-28 12:17:58.033 IST [504133] DETAIL:  Key already exists in
unique index "t1_pkey", modified in transaction 766.
        Key (c1)=(1); existing local row (1, 1); remote row (1, 1).
2025-11-28 12:17:58.033 IST [504133] CONTEXT:  processing remote data
for replication origin "pg_16402" during message type "INSERT" for
replication target relation "public.t1" in transaction 761, finished
at 0/01789AB8

Regards,
Vignesh



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Thu, Nov 27, 2025 at 5:50 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
>
> I have fixed all these comments and also the comments of 0002, now I
> feel we can actually merge 0001 and 0002, so I have merged both of
> them.
>
> Now pending work status
> 1) fixed review comments of 0003
> 2) Run pgindent -- planning to do it after we complete the first level
> of review
> 3) Subscription TAP test for logging the actual conflicts
>

Thanks  for the patch. A few observations:

1)
It seems, as per LOG, 'key' and 'replica-identity' are different when
it comes to insert_exists, update_exists and
multiple_unique_conflicts, while I believe in CLT, key is
replica-identity i.e. there are no 2 separate terms. Please see below:

a)
Update_Exists:
2025-11-28 14:08:56.179 IST [60383] ERROR:  conflict detected on
relation "public.tab1": conflict=update_exists
2025-11-28 14:08:56.179 IST [60383] DETAIL:  Key already exists in
unique index "tab1_pkey", modified locally in transaction 790 at
2025-11-28 14:07:17.578887+05:30.
Key (i)=(40); existing local row (40, 10); remote row (40, 200);
replica identity (i)=(20).

postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple
from clt where conflict_type='update_exists';
 conflict_type | key_tuple |   local_tuple   |   remote_tuple
---------------+-----------+-----------------+------------------
 update_exists | {"i":20}  | {"i":40,"j":10} | {"i":40,"j":200}

b)
insert_Exists:
ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
DETAIL:  Key already exists in unique index "tab1_pkey", modified
locally in transaction 767 at 2025-11-28 13:59:22.431097+05:30.
Key (i)=(30); existing local row (30, 10); remote row (30, 10).

postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple from clt;
 conflict_type  | key_tuple |   local_tuple   |  remote_tuple
----------------+-----------+-----------------+-----------------
 insert_exists  |               | {"i":30,"j":10} | {"i":30,"j":10}

case a) has key_tuple same as replica-identity of LOG
case b) does not have replica-identity and thus key_tuple is NULL.

Does that mean we need to maintain both key_tuple and RI separately in
CLT? Thoughts?


2)
For multiple_unique_conflict (testcase is same as I shared earlier),
it asserts here:
CONTEXT:  processing remote data for replication origin "pg_16390"
during message type "INSERT" for replication target relation
"public.conf_tab" in transaction 778, finished at 0/017E6DE8
TRAP: failed Assert("MyLogicalRepWorker->conflict_log_tuple == NULL"),
File: "conflict.c", Line: 749, PID: 60627

I have not checked it, but maybe
'MyLogicalRepWorker->conflict_log_tuple' is left over from the
previous few tests I tried?

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Tue, Nov 18, 2025 at 3:40 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > 3)
> > > We also need to think how we are going to display the info in case of
> > > multiple_unique_conflicts as there could be multiple local and remote
> > > tuples conflicting for one single operation. Example:
> > >
> > > create table conf_tab (a int primary key, b int unique, c int unique);
> > >
> > > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
> > >
> > > pub: insert into conf_tab values (2,3,4);
> > >
> > > ERROR:  conflict detected on relation "public.conf_tab":
> > > conflict=multiple_unique_conflicts
> > > DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> > > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> > > Key already exists in unique index "conf_tab_b_key", modified locally
> > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> > > Key already exists in unique index "conf_tab_c_key", modified locally
> > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> > > CONTEXT:  processing remote data for replication origin "pg_16392"
> > > during message type "INSERT" for replication target relation
> > > "public.conf_tab" in transaction 781, finished at 0/017FDDA0
> > >
> > > Currently in clt, we have singular terms such as 'key_tuple',
> > > 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> > > But it does not look reasonable to have multiple rows inserted for a
> > > single conflict raised. I will think more about this.
> >
> > Currently I am inserting multiple records in the conflict history
> > table, the same as each tuple is logged, but couldn't find any better
> > way for this.
> >

The biggest drawback of this approach is data bloat. The incoming data
row will be stored multiple times.

> > Another option is to use an array of tuples instead of a
> > single tuple but not sure this might make things more complicated to
> > process by any external tool.
>
> It’s arguable and hard to say what the correct behaviour should be.
> I’m slightly leaning toward having a single row per conflict.
>

Yeah, it is better to either have a single row per conflict or have
two tables conflict_history and conflict_history_details to avoid data
bloat as pointed above. For example, two-table approach could be:

1. The Header Table (Incoming Data)
This stores the data that tried to be applied.
SQL
CREATE TABLE conflict_header (
    conflict_id     SERIAL PRIMARY KEY,
    source_tx_id    VARCHAR(100),    -- Transaction ID from source
    table_name      VARCHAR(100),
    operation       CHAR(1),         -- 'I' for Insert
    incoming_data   JSONB,           -- Store the incoming row as JSON
...
);

2. The Detail Table (Existing Conflicting Data)
This stores the actual rows currently in the database that caused the
violations.
CREATE TABLE conflict_details (
    detail_id       SERIAL PRIMARY KEY,
    conflict_id     INT REFERENCES conflict_header(conflict_id),
    constraint_name/key_tuple VARCHAR(100),
    conflicting_row_data JSONB       -- The existing row in the DB
that blocked the insert
);

Please don't consider these exact columns; you can use something on
the lines of what is proposed in the patch. This is just to show how
the conflict data can be rearranged. Now, one argument against this is
that users need to use JOIN to query data but still better than
bloating the table. The idea to store in a single table could be
changed to have columns like violated_constraints TEXT[],      --
e.g., ['uk_email', 'uk_phone'], error_details   JSONB  -- e.g.,
[{"const": "uk_email", "val": "a@b.com"}, ...]. If we want to store
multiple conflicting tuples in a single column, we need to ensure it
is queryable via a JSONB column. The point in favour of a single JSONB
column to combine multiple conflicting tuples is that we need this
combination only for one kind of conflict.

Both the approaches have their pros and cons. I feel we should dig a
bit deeper for both by laying out details for each method and see what
others think.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Nov 28, 2025 at 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 18, 2025 at 3:40 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > 3)
> > > > We also need to think how we are going to display the info in case of
> > > > multiple_unique_conflicts as there could be multiple local and remote
> > > > tuples conflicting for one single operation. Example:
> > > >
> > > > create table conf_tab (a int primary key, b int unique, c int unique);
> > > >
> > > > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
> > > >
> > > > pub: insert into conf_tab values (2,3,4);
> > > >
> > > > ERROR:  conflict detected on relation "public.conf_tab":
> > > > conflict=multiple_unique_conflicts
> > > > DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> > > > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> > > > Key already exists in unique index "conf_tab_b_key", modified locally
> > > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> > > > Key already exists in unique index "conf_tab_c_key", modified locally
> > > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> > > > CONTEXT:  processing remote data for replication origin "pg_16392"
> > > > during message type "INSERT" for replication target relation
> > > > "public.conf_tab" in transaction 781, finished at 0/017FDDA0
> > > >
> > > > Currently in clt, we have singular terms such as 'key_tuple',
> > > > 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> > > > But it does not look reasonable to have multiple rows inserted for a
> > > > single conflict raised. I will think more about this.
> > >
> > > Currently I am inserting multiple records in the conflict history
> > > table, the same as each tuple is logged, but couldn't find any better
> > > way for this.
> > >
>
> The biggest drawback of this approach is data bloat. The incoming data
> row will be stored multiple times.
>
> > > Another option is to use an array of tuples instead of a
> > > single tuple but not sure this might make things more complicated to
> > > process by any external tool.
> >
> > It’s arguable and hard to say what the correct behaviour should be.
> > I’m slightly leaning toward having a single row per conflict.
> >
>
> Yeah, it is better to either have a single row per conflict or have
> two tables conflict_history and conflict_history_details to avoid data
> bloat as pointed above. For example, two-table approach could be:
>
> 1. The Header Table (Incoming Data)
> This stores the data that tried to be applied.
> SQL
> CREATE TABLE conflict_header (
>     conflict_id     SERIAL PRIMARY KEY,
>     source_tx_id    VARCHAR(100),    -- Transaction ID from source
>     table_name      VARCHAR(100),
>     operation       CHAR(1),         -- 'I' for Insert
>     incoming_data   JSONB,           -- Store the incoming row as JSON
> ...
> );
>
> 2. The Detail Table (Existing Conflicting Data)
> This stores the actual rows currently in the database that caused the
> violations.
> CREATE TABLE conflict_details (
>     detail_id       SERIAL PRIMARY KEY,
>     conflict_id     INT REFERENCES conflict_header(conflict_id),
>     constraint_name/key_tuple VARCHAR(100),
>     conflicting_row_data JSONB       -- The existing row in the DB
> that blocked the insert
> );
>
> Please don't consider these exact columns; you can use something on
> the lines of what is proposed in the patch. This is just to show how
> the conflict data can be rearranged. Now, one argument against this is
> that users need to use JOIN to query data but still better than
> bloating the table. The idea to store in a single table could be
> changed to have columns like violated_constraints TEXT[],      --
> e.g., ['uk_email', 'uk_phone'], error_details   JSONB  -- e.g.,
> [{"const": "uk_email", "val": "a@b.com"}, ...]. If we want to store
> multiple conflicting tuples in a single column, we need to ensure it
> is queryable via a JSONB column. The point in favour of a single JSONB
> column to combine multiple conflicting tuples is that we need this
> combination only for one kind of conflict.
>
> Both the approaches have their pros and cons. I feel we should dig a
> bit deeper for both by laying out details for each method and see what
> others think.

The specific scenario we are discussing is when a single row from the
publisher attempts to apply an operation that causes a conflict across
multiple unique keys, with each of those unique key violations
conflicting with a different local row on the subscriber, is very
rare.  IMHO this low-frequency scenario does not justify
overcomplicating the design with an array field or a multi-level
table.

Consider the infrequency of the root causes:
- How often does a table have more than 3 to 4 unique keys?
- How frequently would each of these keys conflict with a unique row
on the subscriber side?

If resolving this occasional, synthetic conflict requires inserting
two or three rows instead of a single one, this is an acceptable
trade-off considering how rare it can occur.  Anyway this is my
opinion and I am open to opinions from others.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Nov 28, 2025 at 12:24 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, 27 Nov 2025 at 17:50, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Nov 27, 2025 at 6:30 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > I have fixed all these comments and also the comments of 0002, now I
> > feel we can actually merge 0001 and 0002, so I have merged both of
> > them.
>
> I just started to have a look at the patch, while using I found lock
> level used is not correct:
> I felt the reason is that table is opened with RowExclusiveLock but
> closed in AccessExclusiveLock:
>
> +       /* If conflict log table is not set for the subscription just return. */
> +       conflictlogtable = get_subscription_conflict_log_table(
> +
> MyLogicalRepWorker->subid, &nspid);
> +       if (conflictlogtable == NULL)
> +       {
> +               pfree(conflictlogtable);
> +               return NULL;
> +       }
> +
> +       conflictlogrelid = get_relname_relid(conflictlogtable, nspid);
> +       if (OidIsValid(conflictlogrelid))
> +               conflictlogrel = table_open(conflictlogrelid, RowExclusiveLock);
>
> ....
> +                       if (elevel < ERROR)
> +                               InsertConflictLogTuple(conflictlogrel);
> +
> +                       table_close(conflictlogrel, AccessExclusiveLock);
> ....
>
> 2025-11-28 12:17:55.631 IST [504133] WARNING:  you don't own a lock of
> type AccessExclusiveLock
> 2025-11-28 12:17:55.631 IST [504133] CONTEXT:  processing remote data
> for replication origin "pg_16402" during message type "INSERT" for
> replication target relation "public.t1" in transaction 761, finished
> at 0/01789AB8
> 2025-11-28 12:17:58.033 IST [504133] WARNING:  you don't own a lock of
> type AccessExclusiveLock
> 2025-11-28 12:17:58.033 IST [504133] ERROR:  conflict detected on
> relation "public.t1": conflict=insert_exists
> 2025-11-28 12:17:58.033 IST [504133] DETAIL:  Key already exists in
> unique index "t1_pkey", modified in transaction 766.
>         Key (c1)=(1); existing local row (1, 1); remote row (1, 1).
> 2025-11-28 12:17:58.033 IST [504133] CONTEXT:  processing remote data
> for replication origin "pg_16402" during message type "INSERT" for
> replication target relation "public.t1" in transaction 761, finished
> at 0/01789AB8

Thanks, I will fix this.


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Nov 28, 2025 at 2:32 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Nov 27, 2025 at 5:50 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> >
> > I have fixed all these comments and also the comments of 0002, now I
> > feel we can actually merge 0001 and 0002, so I have merged both of
> > them.
> >
> > Now pending work status
> > 1) fixed review comments of 0003
> > 2) Run pgindent -- planning to do it after we complete the first level
> > of review
> > 3) Subscription TAP test for logging the actual conflicts
> >
>
> Thanks  for the patch. A few observations:
>
> 1)
> It seems, as per LOG, 'key' and 'replica-identity' are different when
> it comes to insert_exists, update_exists and
> multiple_unique_conflicts, while I believe in CLT, key is
> replica-identity i.e. there are no 2 separate terms. Please see below:
>
> a)
> Update_Exists:
> 2025-11-28 14:08:56.179 IST [60383] ERROR:  conflict detected on
> relation "public.tab1": conflict=update_exists
> 2025-11-28 14:08:56.179 IST [60383] DETAIL:  Key already exists in
> unique index "tab1_pkey", modified locally in transaction 790 at
> 2025-11-28 14:07:17.578887+05:30.
> Key (i)=(40); existing local row (40, 10); remote row (40, 200);
> replica identity (i)=(20).
>
> postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple
> from clt where conflict_type='update_exists';
>  conflict_type | key_tuple |   local_tuple   |   remote_tuple
> ---------------+-----------+-----------------+------------------
>  update_exists | {"i":20}  | {"i":40,"j":10} | {"i":40,"j":200}
>
> b)
> insert_Exists:
> ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> DETAIL:  Key already exists in unique index "tab1_pkey", modified
> locally in transaction 767 at 2025-11-28 13:59:22.431097+05:30.
> Key (i)=(30); existing local row (30, 10); remote row (30, 10).
>
> postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple from clt;
>  conflict_type  | key_tuple |   local_tuple   |  remote_tuple
> ----------------+-----------+-----------------+-----------------
>  insert_exists  |               | {"i":30,"j":10} | {"i":30,"j":10}
>
> case a) has key_tuple same as replica-identity of LOG
> case b) does not have replica-identity and thus key_tuple is NULL.
>
> Does that mean we need to maintain both key_tuple and RI separately in
> CLT? Thoughts?

Maybe we should then have a place for both key_tuple as well as
replica identity as we are logging, what others think about this case?

> 2)
> For multiple_unique_conflict (testcase is same as I shared earlier),
> it asserts here:
> CONTEXT:  processing remote data for replication origin "pg_16390"
> during message type "INSERT" for replication target relation
> "public.conf_tab" in transaction 778, finished at 0/017E6DE8
> TRAP: failed Assert("MyLogicalRepWorker->conflict_log_tuple == NULL"),
> File: "conflict.c", Line: 749, PID: 60627
>
> I have not checked it, but maybe
> 'MyLogicalRepWorker->conflict_log_tuple' is left over from the
> previous few tests I tried?

Yeah, prepare_conflict_log_tuple() is called in loop and when there
are multiple tuple we need to collect all of the tuple before
inserting it at worker exit so the current code has a bug, I will see
how we can fix it, I think this also depends upon the other discussion
we are having related to how to insert multiple unique conflict.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > Few observations related to publication.
> > > ------------------------------
>
> Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> conflict log tables it should be good enough if we restrict it when
> ALL TABLE options are used, I don't think we need to put extra effort
> to completely restrict it even if users want to explicitly list it
> into the publication.
>
> > >
> > > (In the below comments, clt/CLT implies Conflict Log Table)
> > >
> > > 1)
> > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
>
> This function is used while publishing every single change and I don't
> think we want to add a cost to check each subscription to identify
> whether the table is listed as CLT.
>
> > > 2)
> > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > show that for clt.
>
> I think we should fix this.
>
> > > 3)
> > > I am able to create a publication for clt table, should it be allowed?
>
> I believe we should not do any specific handling to restrict this but
> I am open for the opinions.
>
> > > create subscription sub1 connection '...' publication pub1
> > > WITH(conflict_log_table='clt');
> > > create publication pub3 for table clt;
> > >
> > > 4)
> > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > part of is_publishable_class() itself? If we do so, other code-logics
> > > will also get clt as non-publishable always (and will solve a few of
> > > the above issues I think). IIUC, there is no place where we want to
> > > mark CLT as publishable or is there any?
>
> IMHO the main reason is performance.
>
> > > 5) Also, I feel we can add some documentation now to help others to
> > > understand/review the patch better without going through the long
> > > thread.
>
> Make sense, I will do that in the next version.
>
> > >
> > > Few observations related to conflict-logging:
> > > ------------------------------
> > > 1)
> > > I found that for the conflicts which ultimately result in Error, we do
> > > not insert any conflict-record in clt.
> > >
> > > a)
> > > Example: insert_exists, update_Exists
> > > create table tab1 (i int primary key, j int);
> > > sub: insert into tab1 values(30,10);
> > > pub: insert into tab1 values(30,10);
> > > ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> > > No record in clt.
> > >
> > > sub:
> > > <some pre-data needed>
> > > update tab1 set i=40 where i = 30;
> > > pub: update tab1 set i=40 where i = 20;
> > > ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> > > No record in clt.
>
> Yeah that interesting need to put thought on how to commit this record
> when an outer transaction is aborted as we do not have autonomous
> transactions which are generally used for this kind of logging.  But
> we can explore more options like inserting into conflict log tables
> outside the outer transaction.
>
> > > b)
> > > Another question related to this is, since these conflicts (which
> > > results in error) keep on happening until user resolves these or skips
> > > these or 'disable_on_error' is set. Then are we going to insert these
> > > multiple times? We do count these in 'confl_insert_exists' and
> > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > time in clt as well. Thoughts?
>
> I think it make sense to insert every time we see the conflict, but it
> would be good to have opinion from others as well.

Since there is a concern that multiple rows for
multiple_unique_conflicts can cause data-bloat, it made me rethink
that this is actually more prone to causing data-bloat if it is not
resolved on time, as it seems a far more frequent scenario. So shall
we keep inserting the record or insert it once and avoid inserting it
again based on lsn?  Thoughts?

>
> > > 2)
> > > Conflicts where row on sub is missing, local_ts incorrectly inserted.
> > > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something
> > > indicating that it is not applicable for this conflict-type?
> > >
> > > Example: delete_missing, update_missing
> > > pub:
> > >  insert into tab1 values(10,10);
> > >  insert into tab1 values(20,10);
> > >  sub:  delete from tab1 where i=10;
> > >  pub:  delete from tab1 where i=10;
>
> Sure I will test this.
>
> >
> > 3)
> > We also need to think how we are going to display the info in case of
> > multiple_unique_conflicts as there could be multiple local and remote
> > tuples conflicting for one single operation. Example:
> >
> > create table conf_tab (a int primary key, b int unique, c int unique);
> >
> > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
> >
> > pub: insert into conf_tab values (2,3,4);
> >
> > ERROR:  conflict detected on relation "public.conf_tab":
> > conflict=multiple_unique_conflicts
> > DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> > Key already exists in unique index "conf_tab_b_key", modified locally
> > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> > Key already exists in unique index "conf_tab_c_key", modified locally
> > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> > CONTEXT:  processing remote data for replication origin "pg_16392"
> > during message type "INSERT" for replication target relation
> > "public.conf_tab" in transaction 781, finished at 0/017FDDA0
> >
> > Currently in clt, we have singular terms such as 'key_tuple',
> > 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> > But it does not look reasonable to have multiple rows inserted for a
> > single conflict raised. I will think more about this.
>
> Currently I am inserting multiple records in the conflict history
> table, the same as each tuple is logged, but couldn't find any better
> way for this. Another option is to use an array of tuples instead of a
> single tuple but not sure this might make things more complicated to
> process by any external tool.  But you are right, this needs more
> discussion.
>
> --
> Regards,
> Dilip Kumar
> Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Dec 1, 2025 at 1:57 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > Few observations related to publication.
> > > > ------------------------------
> >
> > Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> > conflict log tables it should be good enough if we restrict it when
> > ALL TABLE options are used, I don't think we need to put extra effort
> > to completely restrict it even if users want to explicitly list it
> > into the publication.
> >
> > > >
> > > > (In the below comments, clt/CLT implies Conflict Log Table)
> > > >
> > > > 1)
> > > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
> >
> > This function is used while publishing every single change and I don't
> > think we want to add a cost to check each subscription to identify
> > whether the table is listed as CLT.
> >
> > > > 2)
> > > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > > show that for clt.
> >
> > I think we should fix this.
> >
> > > > 3)
> > > > I am able to create a publication for clt table, should it be allowed?
> >
> > I believe we should not do any specific handling to restrict this but
> > I am open for the opinions.
> >
> > > > create subscription sub1 connection '...' publication pub1
> > > > WITH(conflict_log_table='clt');
> > > > create publication pub3 for table clt;
> > > >
> > > > 4)
> > > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > > part of is_publishable_class() itself? If we do so, other code-logics
> > > > will also get clt as non-publishable always (and will solve a few of
> > > > the above issues I think). IIUC, there is no place where we want to
> > > > mark CLT as publishable or is there any?
> >
> > IMHO the main reason is performance.
> >
> > > > 5) Also, I feel we can add some documentation now to help others to
> > > > understand/review the patch better without going through the long
> > > > thread.
> >
> > Make sense, I will do that in the next version.
> >
> > > >
> > > > Few observations related to conflict-logging:
> > > > ------------------------------
> > > > 1)
> > > > I found that for the conflicts which ultimately result in Error, we do
> > > > not insert any conflict-record in clt.
> > > >
> > > > a)
> > > > Example: insert_exists, update_Exists
> > > > create table tab1 (i int primary key, j int);
> > > > sub: insert into tab1 values(30,10);
> > > > pub: insert into tab1 values(30,10);
> > > > ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> > > > No record in clt.
> > > >
> > > > sub:
> > > > <some pre-data needed>
> > > > update tab1 set i=40 where i = 30;
> > > > pub: update tab1 set i=40 where i = 20;
> > > > ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> > > > No record in clt.
> >
> > Yeah that interesting need to put thought on how to commit this record
> > when an outer transaction is aborted as we do not have autonomous
> > transactions which are generally used for this kind of logging.  But
> > we can explore more options like inserting into conflict log tables
> > outside the outer transaction.
> >
> > > > b)
> > > > Another question related to this is, since these conflicts (which
> > > > results in error) keep on happening until user resolves these or skips
> > > > these or 'disable_on_error' is set. Then are we going to insert these
> > > > multiple times? We do count these in 'confl_insert_exists' and
> > > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > > time in clt as well. Thoughts?
> >
> > I think it make sense to insert every time we see the conflict, but it
> > would be good to have opinion from others as well.
>
> Since there is a concern that multiple rows for
> multiple_unique_conflicts can cause data-bloat, it made me rethink
> that this is actually more prone to causing data-bloat if it is not
> resolved on time, as it seems a far more frequent scenario. So shall
> we keep inserting the record or insert it once and avoid inserting it
> again based on lsn?  Thoughts?

I agree, this is the real problem related to bloat so maybe we can see
if the same tuple exists we can avoid inserting it again, although I
haven't put thought on how to we distinguish between the new conflict
on the same row vs the same conflict being inserted multiple times due
to worker restart.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Mon, Dec 1, 2025 at 2:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 1:57 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > > Few observations related to publication.
> > > > > ------------------------------
> > >
> > > Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> > > conflict log tables it should be good enough if we restrict it when
> > > ALL TABLE options are used, I don't think we need to put extra effort
> > > to completely restrict it even if users want to explicitly list it
> > > into the publication.
> > >
> > > > >
> > > > > (In the below comments, clt/CLT implies Conflict Log Table)
> > > > >
> > > > > 1)
> > > > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table.
> > >
> > > This function is used while publishing every single change and I don't
> > > think we want to add a cost to check each subscription to identify
> > > whether the table is listed as CLT.
> > >
> > > > > 2)
> > > > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > > > show that for clt.
> > >
> > > I think we should fix this.
> > >
> > > > > 3)
> > > > > I am able to create a publication for clt table, should it be allowed?
> > >
> > > I believe we should not do any specific handling to restrict this but
> > > I am open for the opinions.
> > >
> > > > > create subscription sub1 connection '...' publication pub1
> > > > > WITH(conflict_log_table='clt');
> > > > > create publication pub3 for table clt;
> > > > >
> > > > > 4)
> > > > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > > > part of is_publishable_class() itself? If we do so, other code-logics
> > > > > will also get clt as non-publishable always (and will solve a few of
> > > > > the above issues I think). IIUC, there is no place where we want to
> > > > > mark CLT as publishable or is there any?
> > >
> > > IMHO the main reason is performance.
> > >
> > > > > 5) Also, I feel we can add some documentation now to help others to
> > > > > understand/review the patch better without going through the long
> > > > > thread.
> > >
> > > Make sense, I will do that in the next version.
> > >
> > > > >
> > > > > Few observations related to conflict-logging:
> > > > > ------------------------------
> > > > > 1)
> > > > > I found that for the conflicts which ultimately result in Error, we do
> > > > > not insert any conflict-record in clt.
> > > > >
> > > > > a)
> > > > > Example: insert_exists, update_Exists
> > > > > create table tab1 (i int primary key, j int);
> > > > > sub: insert into tab1 values(30,10);
> > > > > pub: insert into tab1 values(30,10);
> > > > > ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> > > > > No record in clt.
> > > > >
> > > > > sub:
> > > > > <some pre-data needed>
> > > > > update tab1 set i=40 where i = 30;
> > > > > pub: update tab1 set i=40 where i = 20;
> > > > > ERROR:  conflict detected on relation "public.tab1": conflict=update_exists
> > > > > No record in clt.
> > >
> > > Yeah that interesting need to put thought on how to commit this record
> > > when an outer transaction is aborted as we do not have autonomous
> > > transactions which are generally used for this kind of logging.  But
> > > we can explore more options like inserting into conflict log tables
> > > outside the outer transaction.
> > >
> > > > > b)
> > > > > Another question related to this is, since these conflicts (which
> > > > > results in error) keep on happening until user resolves these or skips
> > > > > these or 'disable_on_error' is set. Then are we going to insert these
> > > > > multiple times? We do count these in 'confl_insert_exists' and
> > > > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > > > time in clt as well. Thoughts?
> > >
> > > I think it make sense to insert every time we see the conflict, but it
> > > would be good to have opinion from others as well.
> >
> > Since there is a concern that multiple rows for
> > multiple_unique_conflicts can cause data-bloat, it made me rethink
> > that this is actually more prone to causing data-bloat if it is not
> > resolved on time, as it seems a far more frequent scenario. So shall
> > we keep inserting the record or insert it once and avoid inserting it
> > again based on lsn?  Thoughts?
>
> I agree, this is the real problem related to bloat so maybe we can see
> if the same tuple exists we can avoid inserting it again, although I
> haven't put thought on how to we distinguish between the new conflict
> on the same row vs the same conflict being inserted multiple times due
> to worker restart.
>

If there is consensus on this approach, IMO, it appears safe to rely
on 'remote_origin' and 'remote_commit_lsn' as the comparison keys for
the given 'conflict_type' before we insert a new record.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Dec 1, 2025 at 2:58 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 2:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Dec 1, 2025 at 1:57 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > Since there is a concern that multiple rows for
> > > multiple_unique_conflicts can cause data-bloat, it made me rethink
> > > that this is actually more prone to causing data-bloat if it is not
> > > resolved on time, as it seems a far more frequent scenario. So shall
> > > we keep inserting the record or insert it once and avoid inserting it
> > > again based on lsn?  Thoughts?
> >
> > I agree, this is the real problem related to bloat so maybe we can see
> > if the same tuple exists we can avoid inserting it again, although I
> > haven't put thought on how to we distinguish between the new conflict
> > on the same row vs the same conflict being inserted multiple times due
> > to worker restart.
> >
>
> If there is consensus on this approach, IMO, it appears safe to rely
> on 'remote_origin' and 'remote_commit_lsn' as the comparison keys for
> the given 'conflict_type' before we insert a new record.
>

What happens if as part of multiple_unique_conflict, in the next apply
round only some of the rows conflict (say in the meantime user has
removed a few conflicting rows)? I think the ideal way for users to
avoid such multiple occurrences is to configure subscription with
disable_on_error. I think we should LOG errors again on retry and it
is better to keep it consistent with what we print in LOG because we
may want to give an option to users in future where to LOG (in
conflict_history_table, LOG, or both) the conflicts.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Mon, Dec 1, 2025 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 2:58 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Dec 1, 2025 at 2:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Dec 1, 2025 at 1:57 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > Since there is a concern that multiple rows for
> > > > multiple_unique_conflicts can cause data-bloat, it made me rethink
> > > > that this is actually more prone to causing data-bloat if it is not
> > > > resolved on time, as it seems a far more frequent scenario. So shall
> > > > we keep inserting the record or insert it once and avoid inserting it
> > > > again based on lsn?  Thoughts?
> > >
> > > I agree, this is the real problem related to bloat so maybe we can see
> > > if the same tuple exists we can avoid inserting it again, although I
> > > haven't put thought on how to we distinguish between the new conflict
> > > on the same row vs the same conflict being inserted multiple times due
> > > to worker restart.
> > >
> >
> > If there is consensus on this approach, IMO, it appears safe to rely
> > on 'remote_origin' and 'remote_commit_lsn' as the comparison keys for
> > the given 'conflict_type' before we insert a new record.
> >
>
> What happens if as part of multiple_unique_conflict, in the next apply
> round only some of the rows conflict (say in the meantime user has
> removed a few conflicting rows)? I think the ideal way for users to
> avoid such multiple occurrences is to configure subscription with
> disable_on_error. I think we should LOG errors again on retry and it
> is better to keep it consistent with what we print in LOG because we
> may want to give an option to users in future where to LOG (in
> conflict_history_table, LOG, or both) the conflicts.
>

Yeah that makes sense, because if the user tried to fix the conflict
and if still didn't get fixed then next time onward user will have no
way to know that conflict reoccurred.  And also it make sense to
maintain consistency with LOGs.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Nov 28, 2025 at 6:06 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Some review comments for v8-0001.

Thank Peter, yes these all make sense and will fix in next version
along with other comments by Vignesh/Shveta and Amit, except one
comment

> 9.
> +-- ok - conflict_log_table should not be published with ALL TABLE
> +CREATE PUBLICATION pub FOR TABLES IN SCHEMA clt;
> +SELECT * FROM pg_publication_tables WHERE pubname = 'pub';
> + pubname | schemaname | tablename | attnames | rowfilter
> +---------+------------+-----------+----------+-----------
> +(0 rows)
>
> Perhaps you should repeat this same test but using FOR ALL TABLES,
> instead of only FOR TABLES IN SCHEMA

I will have to see how we can safely do this in testing without having
any side effects on the concurrent test, generally we run
publication.sql and subscription.sql concurrently in regression test
so if we do FOR ALL TABLES it can affect each others, one option is to
don't run these 2 test concurrently, I think we can do that as there
is no real concurrency we are testing by running them concurrently,
any thought on this?


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Nov 28, 2025 at 2:32 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Nov 27, 2025 at 5:50 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> >
> > I have fixed all these comments and also the comments of 0002, now I
> > feel we can actually merge 0001 and 0002, so I have merged both of
> > them.
> >
> > Now pending work status
> > 1) fixed review comments of 0003
> > 2) Run pgindent -- planning to do it after we complete the first level
> > of review
> > 3) Subscription TAP test for logging the actual conflicts
> >
>
> Thanks  for the patch. A few observations:
>
> 1)
> It seems, as per LOG, 'key' and 'replica-identity' are different when
> it comes to insert_exists, update_exists and
> multiple_unique_conflicts, while I believe in CLT, key is
> replica-identity i.e. there are no 2 separate terms. Please see below:
>
> a)
> Update_Exists:
> 2025-11-28 14:08:56.179 IST [60383] ERROR:  conflict detected on
> relation "public.tab1": conflict=update_exists
> 2025-11-28 14:08:56.179 IST [60383] DETAIL:  Key already exists in
> unique index "tab1_pkey", modified locally in transaction 790 at
> 2025-11-28 14:07:17.578887+05:30.
> Key (i)=(40); existing local row (40, 10); remote row (40, 200);
> replica identity (i)=(20).
>
> postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple
> from clt where conflict_type='update_exists';
>  conflict_type | key_tuple |   local_tuple   |   remote_tuple
> ---------------+-----------+-----------------+------------------
>  update_exists | {"i":20}  | {"i":40,"j":10} | {"i":40,"j":200}
>
> b)
> insert_Exists:
> ERROR:  conflict detected on relation "public.tab1": conflict=insert_exists
> DETAIL:  Key already exists in unique index "tab1_pkey", modified
> locally in transaction 767 at 2025-11-28 13:59:22.431097+05:30.
> Key (i)=(30); existing local row (30, 10); remote row (30, 10).
>
> postgres=# select conflict_type, key_tuple,local_tuple,remote_tuple from clt;
>  conflict_type  | key_tuple |   local_tuple   |  remote_tuple
> ----------------+-----------+-----------------+-----------------
>  insert_exists  |               | {"i":30,"j":10} | {"i":30,"j":10}
>
> case a) has key_tuple same as replica-identity of LOG
> case b) does not have replica-identity and thus key_tuple is NULL.
>
> Does that mean we need to maintain both key_tuple and RI separately in
> CLT? Thoughts?
>

Yeah, it could be useful to display RI values separately. What should
be the column name? Few options could be: remote_val_for_ri, or
remote_value_ri, or something else. I think it may also be useful to
display conflicting_index but OTOH, it would be difficult to decide in
the first version what other information could be required, so it is
better to stick with what is being displayed in LOG.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Wed, Nov 19, 2025 at 3:46 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 18, 2025 at 4:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
>
> > 3)
> > Do we need to have a timestamp column as well to say when conflict was
> > recorded? Or local_commit_ts, remote_commit_ts are sufficient?
> > Thoughts
>
> You mean we can record the timestamp now while inserting, not sure if
> it will add some more meaningful information than remote_commit_ts,
> but let's see what others think.
>

local_commit_ts and remote_commit_ts sounds sufficient as one can
identify the truth of information from those two. The key/schema
values displayed in this table could change later but the information
about a particular row is based on the time shown by those two
columns.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Mon, Dec 1, 2025 at 10:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> The specific scenario we are discussing is when a single row from the
> publisher attempts to apply an operation that causes a conflict across
> multiple unique keys, with each of those unique key violations
> conflicting with a different local row on the subscriber, is very
> rare.  IMHO this low-frequency scenario does not justify
> overcomplicating the design with an array field or a multi-level
> table.
>

I did some analysis and search on the internet to answer your
following two questions.

> Consider the infrequency of the root causes:
> - How often does a table have more than 3 to 4 unique keys?

It is extremely common—in fact, it is considered the industry "best
practice" for modern database design.

One can find this pattern in almost every enterprise system (e.g.
banking apps, CRMs). It relies on distinguishing between Technical
Identity (for the database) and Business Identity (for the real
world).

1. The Design Pattern: Surrogate vs. Natural Keys
Primary Key (Surrogate Key): Usually a meaningless number (e.g.,
10452) or a UUID. It is used strictly for the database to join tables
efficiently. It never changes.
Unique Key (Natural Key): A real-world value (e.g., john@email.com or
SSN-123). This is how humans or external systems identify the row. It
can change (e.g., someone updates their email).

2. Common Real-World Use Cases
A. User Management (The most classic example)
Primary Key: user_id (Integer). Used for foreign keys in the ORDERS table.
Unique Key 1: email (Varchar). Prevents two people from registering
with the same email.
Unique Key 2: username (Varchar). Ensures unique display names.
Why? If a user changes their email address, you only update one field
in one table. If you used email as the Primary Key, you would have to
update millions of rows in the ORDERS table that reference that email.

B. Inventory / E-Commerce
Primary Key: product_id (Integer). Used internally by the code.
Unique Key: SKU (Stock Keeping Unit) or Barcode (EAN/UPC).
Why? Companies often re-organize their SKU formats. If the SKU was the
Primary Key, a format change would require a massive database
migration.

C. Government / HR Systems
Primary Key: employee_id (Integer).
Unique Key: National_ID (SSN, Aadhaar, Passport Number).
Why? Privacy and security. You do not want to expose a National ID in
every URL or API call (e.g., api/employee/552 is safer than
api/employee/SSN-123).

> - How frequently would each of these keys conflict with a unique row
> on the subscriber side?
>

It can occur with medium-to-high probability in following cases. (a)
In Bi-Directional replication systems; for example, If two users
create the same "User Profile" on two different servers at the same
time, the row will conflict on every unique field (ID, Email, SSN)
simultaneously. (b) The chances of bloat are high, on retrying to fix
the error as mentioned by Shveta. Say, if Ops team fixes errors by
just "trying again" without checking the full row, you will hit the ID
error, fix it, then immediately hit the Email error. (c) The chances
are medium during initial data-load; If a user is loading data from a
legacy system with "dirty" data, rows often violate multiple rules
(e.g., a duplicate user with both a reused ID and a reused Email).

> If resolving this occasional, synthetic conflict requires inserting
> two or three rows instead of a single one, this is an acceptable
> trade-off considering how rare it can occur.
>

As per above analysis and the re-try point Shveta raises, I don't
think we can ignore the possibility of data-bloat especially for this
multiple_unique_key conflict. We can consider logging multiple local
conflicting rows as JSON Array.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 2, 2025 at 11:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 10:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > The specific scenario we are discussing is when a single row from the
> > publisher attempts to apply an operation that causes a conflict across
> > multiple unique keys, with each of those unique key violations
> > conflicting with a different local row on the subscriber, is very
> > rare.  IMHO this low-frequency scenario does not justify
> > overcomplicating the design with an array field or a multi-level
> > table.
> >
>
> I did some analysis and search on the internet to answer your
> following two questions.
>
> > Consider the infrequency of the root causes:
> > - How often does a table have more than 3 to 4 unique keys?
>
> It is extremely common—in fact, it is considered the industry "best
> practice" for modern database design.
>
> One can find this pattern in almost every enterprise system (e.g.
> banking apps, CRMs). It relies on distinguishing between Technical
> Identity (for the database) and Business Identity (for the real
> world).
>
> 1. The Design Pattern: Surrogate vs. Natural Keys
> Primary Key (Surrogate Key): Usually a meaningless number (e.g.,
> 10452) or a UUID. It is used strictly for the database to join tables
> efficiently. It never changes.
> Unique Key (Natural Key): A real-world value (e.g., john@email.com or
> SSN-123). This is how humans or external systems identify the row. It
> can change (e.g., someone updates their email).
>
> 2. Common Real-World Use Cases
> A. User Management (The most classic example)
> Primary Key: user_id (Integer). Used for foreign keys in the ORDERS table.
> Unique Key 1: email (Varchar). Prevents two people from registering
> with the same email.
> Unique Key 2: username (Varchar). Ensures unique display names.
> Why? If a user changes their email address, you only update one field
> in one table. If you used email as the Primary Key, you would have to
> update millions of rows in the ORDERS table that reference that email.
>
> B. Inventory / E-Commerce
> Primary Key: product_id (Integer). Used internally by the code.
> Unique Key: SKU (Stock Keeping Unit) or Barcode (EAN/UPC).
> Why? Companies often re-organize their SKU formats. If the SKU was the
> Primary Key, a format change would require a massive database
> migration.
>
> C. Government / HR Systems
> Primary Key: employee_id (Integer).
> Unique Key: National_ID (SSN, Aadhaar, Passport Number).
> Why? Privacy and security. You do not want to expose a National ID in
> every URL or API call (e.g., api/employee/552 is safer than
> api/employee/SSN-123).
>
> > - How frequently would each of these keys conflict with a unique row
> > on the subscriber side?
> >
>
> It can occur with medium-to-high probability in following cases. (a)
> In Bi-Directional replication systems; for example, If two users
> create the same "User Profile" on two different servers at the same
> time, the row will conflict on every unique field (ID, Email, SSN)
> simultaneously. (b) The chances of bloat are high, on retrying to fix
> the error as mentioned by Shveta. Say, if Ops team fixes errors by
> just "trying again" without checking the full row, you will hit the ID
> error, fix it, then immediately hit the Email error. (c) The chances
> are medium during initial data-load; If a user is loading data from a
> legacy system with "dirty" data, rows often violate multiple rules
> (e.g., a duplicate user with both a reused ID and a reused Email).
>
> > If resolving this occasional, synthetic conflict requires inserting
> > two or three rows instead of a single one, this is an acceptable
> > trade-off considering how rare it can occur.
> >
>
> As per above analysis and the re-try point Shveta raises, I don't
> think we can ignore the possibility of data-bloat especially for this
> multiple_unique_key conflict. We can consider logging multiple local
> conflicting rows as JSON Array.

Okay, I will try to make multiple local rows as JSON Array in the next version.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 2, 2025 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 11:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 1, 2025 at 10:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > The specific scenario we are discussing is when a single row from the
> > > publisher attempts to apply an operation that causes a conflict across
> > > multiple unique keys, with each of those unique key violations
> > > conflicting with a different local row on the subscriber, is very
> > > rare.  IMHO this low-frequency scenario does not justify
> > > overcomplicating the design with an array field or a multi-level
> > > table.
> > >
> >
> > I did some analysis and search on the internet to answer your
> > following two questions.
> >
> > > Consider the infrequency of the root causes:
> > > - How often does a table have more than 3 to 4 unique keys?
> >
> > It is extremely common—in fact, it is considered the industry "best
> > practice" for modern database design.
> >
> > One can find this pattern in almost every enterprise system (e.g.
> > banking apps, CRMs). It relies on distinguishing between Technical
> > Identity (for the database) and Business Identity (for the real
> > world).
> >
> > 1. The Design Pattern: Surrogate vs. Natural Keys
> > Primary Key (Surrogate Key): Usually a meaningless number (e.g.,
> > 10452) or a UUID. It is used strictly for the database to join tables
> > efficiently. It never changes.
> > Unique Key (Natural Key): A real-world value (e.g., john@email.com or
> > SSN-123). This is how humans or external systems identify the row. It
> > can change (e.g., someone updates their email).
> >
> > 2. Common Real-World Use Cases
> > A. User Management (The most classic example)
> > Primary Key: user_id (Integer). Used for foreign keys in the ORDERS table.
> > Unique Key 1: email (Varchar). Prevents two people from registering
> > with the same email.
> > Unique Key 2: username (Varchar). Ensures unique display names.
> > Why? If a user changes their email address, you only update one field
> > in one table. If you used email as the Primary Key, you would have to
> > update millions of rows in the ORDERS table that reference that email.
> >
> > B. Inventory / E-Commerce
> > Primary Key: product_id (Integer). Used internally by the code.
> > Unique Key: SKU (Stock Keeping Unit) or Barcode (EAN/UPC).
> > Why? Companies often re-organize their SKU formats. If the SKU was the
> > Primary Key, a format change would require a massive database
> > migration.
> >
> > C. Government / HR Systems
> > Primary Key: employee_id (Integer).
> > Unique Key: National_ID (SSN, Aadhaar, Passport Number).
> > Why? Privacy and security. You do not want to expose a National ID in
> > every URL or API call (e.g., api/employee/552 is safer than
> > api/employee/SSN-123).
> >
> > > - How frequently would each of these keys conflict with a unique row
> > > on the subscriber side?
> > >
> >
> > It can occur with medium-to-high probability in following cases. (a)
> > In Bi-Directional replication systems; for example, If two users
> > create the same "User Profile" on two different servers at the same
> > time, the row will conflict on every unique field (ID, Email, SSN)
> > simultaneously. (b) The chances of bloat are high, on retrying to fix
> > the error as mentioned by Shveta. Say, if Ops team fixes errors by
> > just "trying again" without checking the full row, you will hit the ID
> > error, fix it, then immediately hit the Email error. (c) The chances
> > are medium during initial data-load; If a user is loading data from a
> > legacy system with "dirty" data, rows often violate multiple rules
> > (e.g., a duplicate user with both a reused ID and a reused Email).
> >
> > > If resolving this occasional, synthetic conflict requires inserting
> > > two or three rows instead of a single one, this is an acceptable
> > > trade-off considering how rare it can occur.
> > >
> >
> > As per above analysis and the re-try point Shveta raises, I don't
> > think we can ignore the possibility of data-bloat especially for this
> > multiple_unique_key conflict. We can consider logging multiple local
> > conflicting rows as JSON Array.
>
> Okay, I will try to make multiple local rows as JSON Array in the next version.
>
Just to clarify so that we are on the same page, along with the local
tuple the other local fields like local_xid, local_commit_ts,
local_origin will also be converted into the array.  Hope that makes
sense?

So we will change the table like this, not sure if this makes sense to
keep all local array fields nearby in the table, or let it be near the
respective remote field, like we are doing now remote_xid and local
xid together etc.

      Column       |           Type           | Collation | Nullable | Default
-------------------+--------------------------+-----------+----------+---------
 relid             | oid                      |           |          |
 schemaname        | text                     |           |          |
 relname           | text                     |           |          |
 conflict_type     | text                     |           |          |
 local_xid         | xid[]                      |           |          |
 remote_xid        | xid                      |           |          |
 remote_commit_lsn | pg_lsn                   |           |          |
 local_commit_ts   | timestamp with time zone[] |           |          |
 remote_commit_ts  | timestamp with time zone |           |          |
 local_origin      | text[]                     |           |          |
 remote_origin     | text                     |           |          |
 key_tuple         | json                     |           |          |
 local_tuple       | json[]                     |           |          |
 remote_tuple      | json                     |           |          |

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Tue, Dec 2, 2025 at 12:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> >
> > Okay, I will try to make multiple local rows as JSON Array in the next version.
> >
> Just to clarify so that we are on the same page, along with the local
> tuple the other local fields like local_xid, local_commit_ts,
> local_origin will also be converted into the array.  Hope that makes
> sense?
>

Yes, what about key_tuple or RI?

> So we will change the table like this, not sure if this makes sense to
> keep all local array fields nearby in the table, or let it be near the
> respective remote field, like we are doing now remote_xid and local
> xid together etc.
>

It is better to keep the array fields together at the end. I think it
would be better to read via CLI. Also, it may take more space due to
padding/alignment if we store fixed-width and variable-width columns
interleaved and similarly the access will also be slower for
interleaved cases.

Having said that, can we consider an alternative way to store all
local_conflict_info together as a JSONB column (that can be used to
store an array of objects). For example, the multiple conflicting
tuple information can be stored as:

[
{ "xid": "1001", "commit_ts": "2023-10-27 10:00:00", "origin":
"node_A", "tuple": { "id": 1, "email": "a@b.com" } },
{ "xid": "1005", "commit_ts": "2023-10-27 10:01:00", "origin":
"node_B", "tuple": { "id": 2, "phone": "555-0199" } }
]

To access JSON array columns, I think one needs to use the unnest
function, whereas JSONB could be accessed with something like: "SELECT
* FROM conflicts WHERE local_conflicts @> '[{"xid": "1001"}]".

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 2, 2025 at 2:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 12:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Dec 2, 2025 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > >
> > > Okay, I will try to make multiple local rows as JSON Array in the next version.
> > >
> > Just to clarify so that we are on the same page, along with the local
> > tuple the other local fields like local_xid, local_commit_ts,
> > local_origin will also be converted into the array.  Hope that makes
> > sense?
> >
>
> Yes, what about key_tuple or RI?
>
> > So we will change the table like this, not sure if this makes sense to
> > keep all local array fields nearby in the table, or let it be near the
> > respective remote field, like we are doing now remote_xid and local
> > xid together etc.
> >
>
> It is better to keep the array fields together at the end. I think it
> would be better to read via CLI. Also, it may take more space due to
> padding/alignment if we store fixed-width and variable-width columns
> interleaved and similarly the access will also be slower for
> interleaved cases.
>
> Having said that, can we consider an alternative way to store all
> local_conflict_info together as a JSONB column (that can be used to
> store an array of objects). For example, the multiple conflicting
> tuple information can be stored as:
>
> [
> { "xid": "1001", "commit_ts": "2023-10-27 10:00:00", "origin":
> "node_A", "tuple": { "id": 1, "email": "a@b.com" } },
> { "xid": "1005", "commit_ts": "2023-10-27 10:01:00", "origin":
> "node_B", "tuple": { "id": 2, "phone": "555-0199" } }
> ]
>
> To access JSON array columns, I think one needs to use the unnest
> function, whereas JSONB could be accessed with something like: "SELECT
> * FROM conflicts WHERE local_conflicts @> '[{"xid": "1001"}]".

Yeah we can do that as well, maybe that's a better idea compared to
creating separate array fields for each local element.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Tue, Dec 2, 2025 at 4:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 2:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Dec 2, 2025 at 12:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Dec 2, 2025 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > >
> > > > Okay, I will try to make multiple local rows as JSON Array in the next version.
> > > >
> > > Just to clarify so that we are on the same page, along with the local
> > > tuple the other local fields like local_xid, local_commit_ts,
> > > local_origin will also be converted into the array.  Hope that makes
> > > sense?
> > >
> >
> > Yes, what about key_tuple or RI?
> >
> > > So we will change the table like this, not sure if this makes sense to
> > > keep all local array fields nearby in the table, or let it be near the
> > > respective remote field, like we are doing now remote_xid and local
> > > xid together etc.
> > >
> >
> > It is better to keep the array fields together at the end. I think it
> > would be better to read via CLI. Also, it may take more space due to
> > padding/alignment if we store fixed-width and variable-width columns
> > interleaved and similarly the access will also be slower for
> > interleaved cases.
> >
> > Having said that, can we consider an alternative way to store all
> > local_conflict_info together as a JSONB column (that can be used to
> > store an array of objects). For example, the multiple conflicting
> > tuple information can be stored as:
> >
> > [
> > { "xid": "1001", "commit_ts": "2023-10-27 10:00:00", "origin":
> > "node_A", "tuple": { "id": 1, "email": "a@b.com" } },
> > { "xid": "1005", "commit_ts": "2023-10-27 10:01:00", "origin":
> > "node_B", "tuple": { "id": 2, "phone": "555-0199" } }
> > ]
> >
> > To access JSON array columns, I think one needs to use the unnest
> > function, whereas JSONB could be accessed with something like: "SELECT
> > * FROM conflicts WHERE local_conflicts @> '[{"xid": "1001"}]".
>
> Yeah we can do that as well, maybe that's a better idea compared to
> creating separate array fields for each local element.

So I tried the POC idea with this approach and tested with one of the
test cases given by Shveta, and now the conflict log table entry looks
like this.  So we can see the local conflicts field which is an array
of JSON and each entry of the array is formed using (xid, commit_ts,
origin, json tuple).  I will send the updated patch by tomorrow after
doing some more cleanup and testing.

relid             | 16391
schemaname        | public
relname           | conf_tab
conflict_type     | multiple_unique_conflicts
remote_xid        | 761
remote_commit_lsn | 0/01761400
remote_commit_ts  | 2025-12-02 15:02:07.045935+00
remote_origin     | pg_16406
key_tuple         |
remote_tuple      | {"a":2,"b":3,"c":4}
local_conflicts   |

{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"

773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Tue, Dec 2, 2025 at 8:40 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 4:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Dec 2, 2025 at 2:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Dec 2, 2025 at 12:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Tue, Dec 2, 2025 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > >
> > > > > Okay, I will try to make multiple local rows as JSON Array in the next version.
> > > > >
> > > > Just to clarify so that we are on the same page, along with the local
> > > > tuple the other local fields like local_xid, local_commit_ts,
> > > > local_origin will also be converted into the array.  Hope that makes
> > > > sense?
> > > >
> > >
> > > Yes, what about key_tuple or RI?
> > >
> > > > So we will change the table like this, not sure if this makes sense to
> > > > keep all local array fields nearby in the table, or let it be near the
> > > > respective remote field, like we are doing now remote_xid and local
> > > > xid together etc.
> > > >
> > >
> > > It is better to keep the array fields together at the end. I think it
> > > would be better to read via CLI. Also, it may take more space due to
> > > padding/alignment if we store fixed-width and variable-width columns
> > > interleaved and similarly the access will also be slower for
> > > interleaved cases.
> > >
> > > Having said that, can we consider an alternative way to store all
> > > local_conflict_info together as a JSONB column (that can be used to
> > > store an array of objects). For example, the multiple conflicting
> > > tuple information can be stored as:
> > >
> > > [
> > > { "xid": "1001", "commit_ts": "2023-10-27 10:00:00", "origin":
> > > "node_A", "tuple": { "id": 1, "email": "a@b.com" } },
> > > { "xid": "1005", "commit_ts": "2023-10-27 10:01:00", "origin":
> > > "node_B", "tuple": { "id": 2, "phone": "555-0199" } }
> > > ]
> > >
> > > To access JSON array columns, I think one needs to use the unnest
> > > function, whereas JSONB could be accessed with something like: "SELECT
> > > * FROM conflicts WHERE local_conflicts @> '[{"xid": "1001"}]".
> >
> > Yeah we can do that as well, maybe that's a better idea compared to
> > creating separate array fields for each local element.
>
> So I tried the POC idea with this approach and tested with one of the
> test cases given by Shveta, and now the conflict log table entry looks
> like this.  So we can see the local conflicts field which is an array
> of JSON and each entry of the array is formed using (xid, commit_ts,
> origin, json tuple).  I will send the updated patch by tomorrow after
> doing some more cleanup and testing.
>
> relid             | 16391
> schemaname        | public
> relname           | conf_tab
> conflict_type     | multiple_unique_conflicts
> remote_xid        | 761
> remote_commit_lsn | 0/01761400
> remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> remote_origin     | pg_16406
> key_tuple         |
> remote_tuple      | {"a":2,"b":3,"c":4}
> local_conflicts   |
>
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
>
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
>

Thanks, it looks good. For the benefit of others, could you include a
brief note, perhaps in the commit message for now, describing how to
access or read this array column? We can remove it later.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > relid             | 16391
> > schemaname        | public
> > relname           | conf_tab
> > conflict_type     | multiple_unique_conflicts
> > remote_xid        | 761
> > remote_commit_lsn | 0/01761400
> > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > remote_origin     | pg_16406
> > key_tuple         |
> > remote_tuple      | {"a":2,"b":3,"c":4}
> > local_conflicts   |
> >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> >
>
> Thanks, it looks good. For the benefit of others, could you include a
> brief note, perhaps in the commit message for now, describing how to
> access or read this array column? We can remove it later.

Thanks, okay, temporarily I have added in a commit message how we can
fetch the data from the JSON array field.  In next version I will add
a test to get the conflict stored in conflict log history table and
fetch from it.

--
Regards,
Dilip Kumar
Google

Вложения

Re: Proposal: Conflict log history table for Logical Replication

От
Masahiko Sawada
Дата:
On Wed, Dec 3, 2025 at 3:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > relid             | 16391
> > > schemaname        | public
> > > relname           | conf_tab
> > > conflict_type     | multiple_unique_conflicts
> > > remote_xid        | 761
> > > remote_commit_lsn | 0/01761400
> > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > remote_origin     | pg_16406
> > > key_tuple         |
> > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > local_conflicts   |
> > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > >
> >
> > Thanks, it looks good. For the benefit of others, could you include a
> > brief note, perhaps in the commit message for now, describing how to
> > access or read this array column? We can remove it later.
>
> Thanks, okay, temporarily I have added in a commit message how we can
> fetch the data from the JSON array field.  In next version I will add
> a test to get the conflict stored in conflict log history table and
> fetch from it.
>

I've reviewed the v9 patch and here are some comments:

The patch utilizes SPI for creating and dropping the conflict history
table, but I'm really not sure if it's okay because it's actually
affected by some GUC parameters such as default_tablespace and
default_toast_compression etc. Also, probably some hooks and event
triggers could be fired during the creation and removal. Is it
intentional behavior? I'm concerned that it would make investigation
harder if an issue happened in the user environment.

---
+   /* build and execute the CREATE TABLE query. */
+   appendStringInfo(&querybuf,
+                    "CREATE TABLE %s.%s ("
+                    "relid Oid,"
+                    "schemaname TEXT,"
+                    "relname TEXT,"
+                    "conflict_type TEXT,"
+                    "remote_xid xid,"
+                    "remote_commit_lsn pg_lsn,"
+                    "remote_commit_ts TIMESTAMPTZ,"
+                    "remote_origin TEXT,"
+                    "key_tuple     JSON,"
+                    "remote_tuple  JSON,"
+                    "local_conflicts JSON[])",
+                    quote_identifier(get_namespace_name(namespaceId)),
+                    quote_identifier(conflictrel));

If we want to use SPI for history table creation, we should use
qualified names in all the places including data types.

---
The patch doesn't create the dependency between the subscription and
the conflict history table. So users can entirely drop the schema
(with CASCADE option) where the history table is created. And once
dropping the schema along with the history table, ALTER SUBSCRIPTION
... SET (conflict_history_table = '') seems not to work (I got a
SEGV).

---
We can create the history table in pg_temp namespace but it should not
be allowed.

---
I think the conflict history table should not be transferred to the
new cluster when pg_upgrade since the table definition could be
different across major versions.

I got the following log when the publisher disables track_commit_timestamp:

local_conflicts   |
{"{\"xid\":\"790\",\"commit_ts\":\"1999-12-31T16:00:00-08:00\",\"origin\":\"\",\"tuple\":{\"c\":1}}"}

I think we can omit commit_ts when it's omitted.

---
I think we should keep the history table name case-sensitive:

postgres(1:351685)=# create subscription sub connection
'dbname=postgres port=5551' publication pub with (conflict_log_table =
'LOGTABLE');
CREATE SUBSCRIPTION
postgres(1:351685)=# \d
          List of relations
 Schema |   Name   | Type  |  Owner
--------+----------+-------+----------
 public | test     | table | masahiko
 public | logtable | table | masahiko
(2 rows)

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Dec 4, 2025 at 7:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Dec 3, 2025 at 3:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > relid             | 16391
> > > > schemaname        | public
> > > > relname           | conf_tab
> > > > conflict_type     | multiple_unique_conflicts
> > > > remote_xid        | 761
> > > > remote_commit_lsn | 0/01761400
> > > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > > remote_origin     | pg_16406
> > > > key_tuple         |
> > > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > > local_conflicts   |
> > > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > > >
> > >
> > > Thanks, it looks good. For the benefit of others, could you include a
> > > brief note, perhaps in the commit message for now, describing how to
> > > access or read this array column? We can remove it later.
> >
> > Thanks, okay, temporarily I have added in a commit message how we can
> > fetch the data from the JSON array field.  In next version I will add
> > a test to get the conflict stored in conflict log history table and
> > fetch from it.
> >
>
> I've reviewed the v9 patch and here are some comments:

Thanks for reviewing this and your valuable comments.

> The patch utilizes SPI for creating and dropping the conflict history
> table, but I'm really not sure if it's okay because it's actually
> affected by some GUC parameters such as default_tablespace and
> default_toast_compression etc. Also, probably some hooks and event
> triggers could be fired during the creation and removal. Is it
> intentional behavior? I'm concerned that it would make investigation
> harder if an issue happened in the user environment.

Hmm, interesting point, well we can control the value of default
parameters while creating the table using SPI, but I don't see any
reason to not use heap_create_with_catalog() directly, so maybe that's
a better choice than using SPI because then we don't need to bother
about any event triggers/utility hooks etc.  Although I don't see any
specific issue with that, unless the user intentionally wants to
create trouble while creating this table.  What do others think about
it?

> ---
> +   /* build and execute the CREATE TABLE query. */
> +   appendStringInfo(&querybuf,
> +                    "CREATE TABLE %s.%s ("
> +                    "relid Oid,"
> +                    "schemaname TEXT,"
> +                    "relname TEXT,"
> +                    "conflict_type TEXT,"
> +                    "remote_xid xid,"
> +                    "remote_commit_lsn pg_lsn,"
> +                    "remote_commit_ts TIMESTAMPTZ,"
> +                    "remote_origin TEXT,"
> +                    "key_tuple     JSON,"
> +                    "remote_tuple  JSON,"
> +                    "local_conflicts JSON[])",
> +                    quote_identifier(get_namespace_name(namespaceId)),
> +                    quote_identifier(conflictrel));
>
> If we want to use SPI for history table creation, we should use
> qualified names in all the places including data types.

That's true, so that we can avoid interference of any user created types.

> ---
> The patch doesn't create the dependency between the subscription and
> the conflict history table. So users can entirely drop the schema
> (with CASCADE option) where the history table is created.

I think as part of the initial discussion we thought since it is
created under the subscription owner privileges so only that user can
drop that table and if the user intentionally drops the table the
conflict will not be recorded in the table and that's acceptable. But
now I think it would be a good idea to maintain the dependency with
subscription so that users can not drop it without dropping the
subscription.

 And once
> dropping the schema along with the history table, ALTER SUBSCRIPTION
> ... SET (conflict_history_table = '') seems not to work (I got a
> SEGV).

I will check this, thanks

> ---
> We can create the history table in pg_temp namespace but it should not
> be allowed.

Right, will check this and also add the test for the same.

> ---
> I think the conflict history table should not be transferred to the
> new cluster when pg_upgrade since the table definition could be
> different across major versions.

Let me think more on this with respect to behaviour of other factors
like subscriptions etc.

> I got the following log when the publisher disables track_commit_timestamp:
>
> local_conflicts   |
> {"{\"xid\":\"790\",\"commit_ts\":\"1999-12-31T16:00:00-08:00\",\"origin\":\"\",\"tuple\":{\"c\":1}}"}
>
> I think we can omit commit_ts when it's omitted.

+1

> ---
> I think we should keep the history table name case-sensitive:

Yeah we can do that, it looks good to me, what do others think about it?


--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Wed, Dec 3, 2025 at 4:57 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> >
> > Thanks, it looks good. For the benefit of others, could you include a
> > brief note, perhaps in the commit message for now, describing how to
> > access or read this array column? We can remove it later.
>
> Thanks, okay, temporarily I have added in a commit message how we can
> fetch the data from the JSON array field.  In next version I will add
> a test to get the conflict stored in conflict log history table and
> fetch from it.
>

Thanks, I have not looked at the patch in detail yet, but a few things:

1)
Assert is hit here:
 LOG:  logical replication apply worker for subscription "sub1" has started
TRAP: failed Assert("slot != NULL"), File: "conflict.c", Line: 669, PID: 137604

Steps: create table tab1 (i int primary key, j int);
Pub: insert into tab1 values(10,10); insert into tab1 values(20,10);
Sub:  delete from tab1 where i=10;
Pub:  delete from tab1 where i=10;

2)
I see that key_tuple still points to RI and there is no RI field
added. It seems that discussion at [1] is missed in this patch.

[1]: https://www.postgresql.org/message-id/CAA4eK1L3umixUUik7Ef1eU%3Dx-JMb8iXD7rWWExBMP4dmOGTS9A%40mail.gmail.com

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Peter Smith
Дата:
Hi. Some review comments for v9-0001.

======
Commit message.

1.
Note: A single remote tuple may conflict with multiple local conflict
when conflict type
is CT_MULTIPLE_UNIQUE_CONFLICTS, so for handling this case we create a
single row in
conflict log table with respect to each remote conflict row even if it
conflicts with
multiple local rows and we store the multiple conflict tuples as a
single JSON array
element in format as
[ { "xid": "1001", "commit_ts": "...", "origin": "...", "tuple": {...} }, ... ]
We can extract the elements from local tuple as given in below example

~

Something seems broken/confused with this description:

1a.
"A single remote tuple may conflict with multiple local conflict"
Should that say "... with multiple local tuples" ?

~

1b.
There is a mixture of terminology here, "row" vs "tuple", which
doesn't seem correct.

~

1c.
"We can extract the elements from local tuple"
Should that say "... elements of the local tuples from the CLT row ..."

======
src/backend/replication/logical/conflict.c

2.
+
+#define N_LOCAL_CONFLICT_INFO_ATTRS 4

I felt it would be better to put this where it is used. e.g. IMO put
it within the build_conflict_tupledesc().

~~~

InsertConflictLogTuple:

3.
+ /* A valid tuple must be prepared and store in MyLogicalRepWorker. */

Typo still here: /store in/stored in/

~~~

4.
+static TupleDesc
+build_conflict_tupledesc(void)
+{
+ TupleDesc tupdesc;
+
+ tupdesc = CreateTemplateTupleDesc(N_LOCAL_CONFLICT_INFO_ATTRS);
+
+ TupleDescInitEntry(tupdesc, (AttrNumber) 1, "xid",
+ XIDOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "commit_ts",
+ TIMESTAMPTZOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 3, "origin",
+ TEXTOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 4, "tuple",
+ JSONOID, -1, 0);

If you had some incrementing attno instead of hard-wiring the
(1,2,3,4) then you'd be able to add a sanity check like Assert(attno +
1 ==  N_LOCAL_CONFLICT_INFO_ATTRS); that can safeguard against future
mistakes in case something changes without updating the constant.

~~~

build_local_conflicts_json_array:

5.
+ /* Process local conflict tuple list and prepare a array of JSON. */
+ foreach(lc, conflicttuples)
  {
- tableslot = table_slot_create(localrel, &estate->es_tupleTable);
- tableslot = ExecCopySlot(tableslot, slot);
+ ConflictTupleInfo *conflicttuple = (ConflictTupleInfo *) lfirst(lc);

5a.
typo in comment: /a array/an array/

~

5b.
SUGGESTION
foreach_ptr(ConflictTupleInfo, conflicttuple, confrlicttuples)
{

~~~

6.
+ i = 0;
+ foreach(lc, json_datums)
+ {
+ json_datum_array[i] = (Datum) lfirst(lc);
+ json_null_array[i] = false;
+ i++;
+ }

6a.
The loop seemed to be unnecessarily complicated since you already know
the size. Isn't it the same as below?

SUGGESTION
for (int i = 0; i < num_conflicts; i++)
{
  json_datum_array[i] = (Datum) list_nth(json_datums, i);
  json_null_array[i] = false;
}

6b.
Also, there is probably no need to do json_null_array[i] = false; at
every iteration here, because you could have just used palloc0 for the
whole array in the first place.

======
src/test/regress/expected/subscription.out

7.
+-- check if the table exists and has the correct schema (15 columns)
+SELECT count(*) FROM pg_attribute WHERE attrelid =
'public.regress_conflict_log1'::regclass AND attnum > 0;
+ count
+-------
+    11
+(1 row)
+

That comment is wrong; there aren't 15 columns anymore.

~~~

8.
(mentioned in a previous review)

I felt that \dRs should display the CLT's schema name in the "Conflict
log table" field -- at least when it's not "public". Otherwise, it
won't be easy for the user to know it.

I did not see a test case for this.

~~~

9.
(mentioned in a previous review)

You could have another test case to explicitly call the function
pg_relation_is_publishable(clt) to verify it returns false for a CTL
table.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Dec 4, 2025 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Dec 4, 2025 at 7:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
>
> > The patch utilizes SPI for creating and dropping the conflict history
> > table, but I'm really not sure if it's okay because it's actually
> > affected by some GUC parameters such as default_tablespace and
> > default_toast_compression etc. Also, probably some hooks and event
> > triggers could be fired during the creation and removal. Is it
> > intentional behavior? I'm concerned that it would make investigation
> > harder if an issue happened in the user environment.
>
> Hmm, interesting point, well we can control the value of default
> parameters while creating the table using SPI, but I don't see any
> reason to not use heap_create_with_catalog() directly, so maybe that's
> a better choice than using SPI because then we don't need to bother
> about any event triggers/utility hooks etc.  Although I don't see any
> specific issue with that, unless the user intentionally wants to
> create trouble while creating this table.  What do others think about
> it?
>
> > ---
> > +   /* build and execute the CREATE TABLE query. */
> > +   appendStringInfo(&querybuf,
> > +                    "CREATE TABLE %s.%s ("
> > +                    "relid Oid,"
> > +                    "schemaname TEXT,"
> > +                    "relname TEXT,"
> > +                    "conflict_type TEXT,"
> > +                    "remote_xid xid,"
> > +                    "remote_commit_lsn pg_lsn,"
> > +                    "remote_commit_ts TIMESTAMPTZ,"
> > +                    "remote_origin TEXT,"
> > +                    "key_tuple     JSON,"
> > +                    "remote_tuple  JSON,"
> > +                    "local_conflicts JSON[])",
> > +                    quote_identifier(get_namespace_name(namespaceId)),
> > +                    quote_identifier(conflictrel));
> >
> > If we want to use SPI for history table creation, we should use
> > qualified names in all the places including data types.
>
> That's true, so that we can avoid interference of any user created types.
>
> > ---
> > The patch doesn't create the dependency between the subscription and
> > the conflict history table. So users can entirely drop the schema
> > (with CASCADE option) where the history table is created.
>
> I think as part of the initial discussion we thought since it is
> created under the subscription owner privileges so only that user can
> drop that table and if the user intentionally drops the table the
> conflict will not be recorded in the table and that's acceptable. But
> now I think it would be a good idea to maintain the dependency with
> subscription so that users can not drop it without dropping the
> subscription.
>

Yeah, it seems reasonable to maintain its dependency with the
subscription in this model. BTW, for this it would be easier to record
dependency, if we use heap_create_with_catalog() as we do for
create_toast_table(). The other places where we use SPI interface to
execute statements are either the places where we need to execute
multiple SQL statements or non-CREATE Table statements. So, for this
patch's purpose, I feel heap_create_with_catalog() suits more.

I was also thinking whether it is a good idea to create one global
conflict table and let all subscriptions use it. However, it has
disadvantages like whenever, user drops any subscription, we need to
DELETE all conflict rows for that subscription causing the need for
vacuum. Then we somehow need to ensure that conflicts from one
subscription_owner are not visible to other subscription_owner via
some RLS policy. So, catalog table per-subscription (aka) the current
way appears better.

Also, shall we give the option to the user where she wants to see
conflict/resolution information? One idea to achieve the same is to
provide subscription options like (a) conflict_resolution_format, the
values could be log and table for now, in future, one could extend it
to other options like xml, json, etc. (b) conflict_log_table: in this
user can specify the conflict table name, this can be optional such
that if user omits this and conflict_resolution_format is table, then
we will use internally generated table name like
pg_conflicts_<subscription_id>.

>  And once
> > dropping the schema along with the history table, ALTER SUBSCRIPTION
> > ... SET (conflict_history_table = '') seems not to work (I got a
> > SEGV).
>
> I will check this, thanks
>
> > ---
> > We can create the history table in pg_temp namespace but it should not
> > be allowed.
>
> Right, will check this and also add the test for the same.
>
> > ---
> > I think the conflict history table should not be transferred to the
> > new cluster when pg_upgrade since the table definition could be
> > different across major versions.
>
> Let me think more on this with respect to behaviour of other factors
> like subscriptions etc.
>

Can we deal with different schema of tables across versions via
pg_dump/restore during upgrade?

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Wed, 3 Dec 2025 at 16:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > relid             | 16391
> > > schemaname        | public
> > > relname           | conf_tab
> > > conflict_type     | multiple_unique_conflicts
> > > remote_xid        | 761
> > > remote_commit_lsn | 0/01761400
> > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > remote_origin     | pg_16406
> > > key_tuple         |
> > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > local_conflicts   |
> > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > >
> >
> > Thanks, it looks good. For the benefit of others, could you include a
> > brief note, perhaps in the commit message for now, describing how to
> > access or read this array column? We can remove it later.
>
> Thanks, okay, temporarily I have added in a commit message how we can
> fetch the data from the JSON array field.  In next version I will add
> a test to get the conflict stored in conflict log history table and
> fetch from it.

I noticed that the table structure can get changed by the time the
conflict record is prepared. In ReportApplyConflict(), the code
currently prepares the conflict log tuple before deciding whether the
insertion will be immediate or deferred:
+       /* Insert conflict details to conflict log table. */
+       if (conflictlogrel)
+       {
+               /*
+                * Prepare the conflict log tuple. If the error level
is below ERROR,
+                * insert it immediately. Otherwise, defer the
insertion to a new
+                * transaction after the current one aborts, ensuring
the insertion of
+                * the log tuple is not rolled back.
+                */
+               prepare_conflict_log_tuple(estate,
+
relinfo->ri_RelationDesc,
+
conflictlogrel,
+                                                                  type,
+                                                                  searchslot,
+
conflicttuples,
+                                                                  remoteslot);
+               if (elevel < ERROR)
+                       InsertConflictLogTuple(conflictlogrel);
+
+               table_close(conflictlogrel, RowExclusiveLock);
+       }

If the conflict history table defintion is changed just before
prepare_conflict_log_tuple, the tuple creation will crash:
Program received signal SIGSEGV, Segmentation fault.
0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
../../../../src/include/varatt.h:419
419 return VARATT_IS_4B_U(PTR) &&
(gdb) bt
#0  0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
../../../../src/include/varatt.h:419
#1  0x00005a342e01e5ed in heap_compute_data_size
(tupleDesc=0x7ab405e5dda8, values=0x7ffd7af3ad20,
isnull=0x7ffd7af3ad15) at heaptuple.c:239
#2  0x00005a342e0200dd in heap_form_tuple
(tupleDescriptor=0x7ab405e5dda8, values=0x7ffd7af3ad20,
isnull=0x7ffd7af3ad15) at heaptuple.c:1158
#3  0x00005a342e55e8c2 in prepare_conflict_log_tuple
(estate=0x5a3467944530, rel=0x7ab405e594e8,
conflictlogrel=0x7ab405e5da88, conflict_type=CT_INSERT_EXISTS,
searchslot=0x0,
    conflicttuples=0x5a3467942da0, remoteslot=0x5a346792e498) at conflict.c:936
#4  0x00005a342e55cea6 in ReportApplyConflict (estate=0x5a3467944530,
relinfo=0x5a346792e778, elevel=21, type=CT_INSERT_EXISTS,
searchslot=0x0, remoteslot=0x5a346792e498,
    conflicttuples=0x5a3467942da0) at conflict.c:168
#5  0x00005a342e348c35 in CheckAndReportConflict
(resultRelInfo=0x5a346792e778, estate=0x5a3467944530,
type=CT_INSERT_EXISTS, recheckIndexes=0x5a3467942648, searchslot=0x0,
    remoteslot=0x5a346792e498) at execReplication.c:793

This can be reproduced by the following steps:
CREATE PUBLICATION pub;
CREATE SUBSCRIPTION sub ... WITH (conflict_log_table = 'conflict');
ALTER TABLE conflict RENAME TO conflict1:
CREATE TABLE conflict(c1 varchar, c2 varchar);
-- Cause a conflict, this will crash while trying to prepare the
conflicting tuple

Regards,
Vignesh



Re: Proposal: Conflict log history table for Logical Replication

От
vignesh C
Дата:
On Wed, 3 Dec 2025 at 16:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > relid             | 16391
> > > schemaname        | public
> > > relname           | conf_tab
> > > conflict_type     | multiple_unique_conflicts
> > > remote_xid        | 761
> > > remote_commit_lsn | 0/01761400
> > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > remote_origin     | pg_16406
> > > key_tuple         |
> > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > local_conflicts   |
> > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > >
> >
> > Thanks, it looks good. For the benefit of others, could you include a
> > brief note, perhaps in the commit message for now, describing how to
> > access or read this array column? We can remove it later.
>
> Thanks, okay, temporarily I have added in a commit message how we can
> fetch the data from the JSON array field.  In next version I will add
> a test to get the conflict stored in conflict log history table and
> fetch from it.

Few comments:
1) Currently pg_dump is not dumping conflict_log_table option, I felt
it should be included while dumping.

2) Is there a way to unset the conflict log table after we create the
subscription with conflict_log_table option

3) Any reason why this table should not be allowed to add to a publication:
+       /* Can't be conflict log table */
+       if (IsConflictLogTable(RelationGetRelid(targetrel)))
+               ereport(ERROR,
+                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                errmsg("cannot add relation \"%s.%s\"
to publication",
+
get_namespace_name(RelationGetNamespace(targetrel)),
+
RelationGetRelationName(targetrel)),
+                                errdetail("This operation is not
supported for conflict log tables.")));

Is the reason like the same table can be a conflict table in the
subscriber and prevent corruption in the subscriber

4) I did not find any documentation for this feature, can we include
documentation in create_subscription.sgml, alter_subscription.sgml and
logical_replication.sgml

Regards,
Vignesh



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Thu, Dec 4, 2025 at 8:05 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 3 Dec 2025 at 16:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > relid             | 16391
> > > > schemaname        | public
> > > > relname           | conf_tab
> > > > conflict_type     | multiple_unique_conflicts
> > > > remote_xid        | 761
> > > > remote_commit_lsn | 0/01761400
> > > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > > remote_origin     | pg_16406
> > > > key_tuple         |
> > > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > > local_conflicts   |
> > > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > > >
> > >
> > > Thanks, it looks good. For the benefit of others, could you include a
> > > brief note, perhaps in the commit message for now, describing how to
> > > access or read this array column? We can remove it later.
> >
> > Thanks, okay, temporarily I have added in a commit message how we can
> > fetch the data from the JSON array field.  In next version I will add
> > a test to get the conflict stored in conflict log history table and
> > fetch from it.
>
> I noticed that the table structure can get changed by the time the
> conflict record is prepared. In ReportApplyConflict(), the code
> currently prepares the conflict log tuple before deciding whether the
> insertion will be immediate or deferred:
> +       /* Insert conflict details to conflict log table. */
> +       if (conflictlogrel)
> +       {
> +               /*
> +                * Prepare the conflict log tuple. If the error level
> is below ERROR,
> +                * insert it immediately. Otherwise, defer the
> insertion to a new
> +                * transaction after the current one aborts, ensuring
> the insertion of
> +                * the log tuple is not rolled back.
> +                */
> +               prepare_conflict_log_tuple(estate,
> +
> relinfo->ri_RelationDesc,
> +
> conflictlogrel,
> +                                                                  type,
> +                                                                  searchslot,
> +
> conflicttuples,
> +                                                                  remoteslot);
> +               if (elevel < ERROR)
> +                       InsertConflictLogTuple(conflictlogrel);
> +
> +               table_close(conflictlogrel, RowExclusiveLock);
> +       }
>
> If the conflict history table defintion is changed just before
> prepare_conflict_log_tuple, the tuple creation will crash:
> Program received signal SIGSEGV, Segmentation fault.
> 0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
> ../../../../src/include/varatt.h:419
> 419 return VARATT_IS_4B_U(PTR) &&
> (gdb) bt
> #0  0x00005a342e01df4f in VARATT_CAN_MAKE_SHORT (PTR=0x4000) at
> ../../../../src/include/varatt.h:419
> #1  0x00005a342e01e5ed in heap_compute_data_size
> (tupleDesc=0x7ab405e5dda8, values=0x7ffd7af3ad20,
> isnull=0x7ffd7af3ad15) at heaptuple.c:239
> #2  0x00005a342e0200dd in heap_form_tuple
> (tupleDescriptor=0x7ab405e5dda8, values=0x7ffd7af3ad20,
> isnull=0x7ffd7af3ad15) at heaptuple.c:1158
> #3  0x00005a342e55e8c2 in prepare_conflict_log_tuple
> (estate=0x5a3467944530, rel=0x7ab405e594e8,
> conflictlogrel=0x7ab405e5da88, conflict_type=CT_INSERT_EXISTS,
> searchslot=0x0,
>     conflicttuples=0x5a3467942da0, remoteslot=0x5a346792e498) at conflict.c:936
> #4  0x00005a342e55cea6 in ReportApplyConflict (estate=0x5a3467944530,
> relinfo=0x5a346792e778, elevel=21, type=CT_INSERT_EXISTS,
> searchslot=0x0, remoteslot=0x5a346792e498,
>     conflicttuples=0x5a3467942da0) at conflict.c:168
> #5  0x00005a342e348c35 in CheckAndReportConflict
> (resultRelInfo=0x5a346792e778, estate=0x5a3467944530,
> type=CT_INSERT_EXISTS, recheckIndexes=0x5a3467942648, searchslot=0x0,
>     remoteslot=0x5a346792e498) at execReplication.c:793
>
> This can be reproduced by the following steps:
> CREATE PUBLICATION pub;
> CREATE SUBSCRIPTION sub ... WITH (conflict_log_table = 'conflict');
> ALTER TABLE conflict RENAME TO conflict1:
> CREATE TABLE conflict(c1 varchar, c2 varchar);
> -- Cause a conflict, this will crash while trying to prepare the
> conflicting tuple

Yeah while it is allowed to drop or alter the conflict log table, it
should not seg fault, IMHO error is acceptable as per the initial
discussion, so I will look into this and tighten up the logic so that
it will throw an error whenever it can not insert into the conflict
log table.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 5, 2025 at 9:24 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 3 Dec 2025 at 16:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Dec 3, 2025 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > relid             | 16391
> > > > schemaname        | public
> > > > relname           | conf_tab
> > > > conflict_type     | multiple_unique_conflicts
> > > > remote_xid        | 761
> > > > remote_commit_lsn | 0/01761400
> > > > remote_commit_ts  | 2025-12-02 15:02:07.045935+00
> > > > remote_origin     | pg_16406
> > > > key_tuple         |
> > > > remote_tuple      | {"a":2,"b":3,"c":4}
> > > > local_conflicts   |
> > > >
{"{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":2,\"b\":2,\"c\":2}}","{\"xid\":\"
> > > >
773\",\"commit_ts\":\"2025-12-02T15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":3,\"b\":3,\"c\":3}}","{\"xid\":\"773\",\"commit_ts\":\"2025-12-02T
> > > > 15:02:00.640253+00:00\",\"origin\":\"\",\"tuple\":{\"a\":4,\"b\":4,\"c\":4}}"}
> > > >
> > >
> > > Thanks, it looks good. For the benefit of others, could you include a
> > > brief note, perhaps in the commit message for now, describing how to
> > > access or read this array column? We can remove it later.
> >
> > Thanks, okay, temporarily I have added in a commit message how we can
> > fetch the data from the JSON array field.  In next version I will add
> > a test to get the conflict stored in conflict log history table and
> > fetch from it.
>
> Few comments:
> 1) Currently pg_dump is not dumping conflict_log_table option, I felt
> it should be included while dumping.

Yeah, we should.

> 2) Is there a way to unset the conflict log table after we create the
> subscription with conflict_log_table option

IMHO we can use ALTER SUBSCRIPTION...WITH(conflict_log_table='') so
unset? What do others think about it?

> 3) Any reason why this table should not be allowed to add to a publication:
> +       /* Can't be conflict log table */
> +       if (IsConflictLogTable(RelationGetRelid(targetrel)))
> +               ereport(ERROR,
> +                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                                errmsg("cannot add relation \"%s.%s\"
> to publication",
> +
> get_namespace_name(RelationGetNamespace(targetrel)),
> +
> RelationGetRelationName(targetrel)),
> +                                errdetail("This operation is not
> supported for conflict log tables.")));
>
> Is the reason like the same table can be a conflict table in the
> subscriber and prevent corruption in the subscriber

The main reason was that, since these tables are internally created
for maintaining the conflict information which is very much internal
node specific details, so there is no reason someone want to replicate
those tables, so we blocked it with ALL TABLES option and then based
on suggestion from Shveta we blocked it from getting added to
publication as well.  So there is no strong reason to disallow from
forcefully getting added to publication OTOH there is no reason why
someone wants to do that considering those are internally managed
tables.

> 4) I did not find any documentation for this feature, can we include
> documentation in create_subscription.sgml, alter_subscription.sgml and
> logical_replication.sgml

Yeah, in the initial version I posted a doc patch, but since we are
doing changes in the first patch and also some behavior might change
so I will postpone it for a later stage after we have consensus on
most of the behaviour.

--
Regards,
Dilip Kumar
Google



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Fri, Dec 5, 2025 at 10:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 5, 2025 at 9:24 AM vignesh C <vignesh21@gmail.com> wrote:
> >
>
> > 2) Is there a way to unset the conflict log table after we create the
> > subscription with conflict_log_table option
>
> IMHO we can use ALTER SUBSCRIPTION...WITH(conflict_log_table='') so
> unset? What do others think about it?
>

We already have a syntax: ALTER SUBSCRIPTION name SET (
subscription_parameter [= value] [, ... ] ) which can be used to
set/unset this new subscription option.

> > 3) Any reason why this table should not be allowed to add to a publication:
> > +       /* Can't be conflict log table */
> > +       if (IsConflictLogTable(RelationGetRelid(targetrel)))
> > +               ereport(ERROR,
> > +                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > +                                errmsg("cannot add relation \"%s.%s\"
> > to publication",
> > +
> > get_namespace_name(RelationGetNamespace(targetrel)),
> > +
> > RelationGetRelationName(targetrel)),
> > +                                errdetail("This operation is not
> > supported for conflict log tables.")));
> >
> > Is the reason like the same table can be a conflict table in the
> > subscriber and prevent corruption in the subscriber
>
> The main reason was that, since these tables are internally created
> for maintaining the conflict information which is very much internal
> node specific details, so there is no reason someone want to replicate
> those tables, so we blocked it with ALL TABLES option and then based
> on suggestion from Shveta we blocked it from getting added to
> publication as well.  So there is no strong reason to disallow from
> forcefully getting added to publication OTOH there is no reason why
> someone wants to do that considering those are internally managed
> tables.
>

I also don't see any reason to allow such internal tables to be
replicated. So, it is okay to prohibit them for now. If we see any use
case, we can allow it.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
Amit Kapila
Дата:
On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Also, shall we give the option to the user where she wants to see
> conflict/resolution information? One idea to achieve the same is to
> provide subscription options like (a) conflict_resolution_format, the
> values could be log and table for now, in future, one could extend it
> to other options like xml, json, etc. (b) conflict_log_table: in this
> user can specify the conflict table name, this can be optional such
> that if user omits this and conflict_resolution_format is table, then
> we will use internally generated table name like
> pg_conflicts_<subscription_id>.
>

In this idea, we can keep the name of the second option as
conflict_log_name instead of conflict_log_table. This can help us LOG
the conflicts in a totally separate conflict file instead of in server
log. Say, the user provides conflict_resolution_format as 'log' and
conflict_log_name as 'conflict_report' then we can report conflicts in
this separate file by appending subid to distinguish it. And, if the
user gives only the first option conflict_resolution_format as 'log'
then we can keep reporting the information in server log files.

--
With Regards,
Amit Kapila.



Re: Proposal: Conflict log history table for Logical Replication

От
shveta malik
Дата:
On Fri, Dec 5, 2025 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Also, shall we give the option to the user where she wants to see
> > conflict/resolution information? One idea to achieve the same is to
> > provide subscription options like (a) conflict_resolution_format, the
> > values could be log and table for now, in future, one could extend it
> > to other options like xml, json, etc. (b) conflict_log_table: in this
> > user can specify the conflict table name, this can be optional such
> > that if user omits this and conflict_resolution_format is table, then
> > we will use internally generated table name like
> > pg_conflicts_<subscription_id>.
> >
>
> In this idea, we can keep the name of the second option as
> conflict_log_name instead of conflict_log_table. This can help us LOG
> the conflicts in a totally separate conflict file instead of in server
> log. Say, the user provides conflict_resolution_format as 'log' and
> conflict_log_name as 'conflict_report' then we can report conflicts in
> this separate file by appending subid to distinguish it. And, if the
> user gives only the first option conflict_resolution_format as 'log'
> then we can keep reporting the information in server log files.
>

+1 on the idea.
Instead of using conflict_resolution_format, I feel it should be
conflict_log_format as we are referring to LOGs and not resolutions.

thanks
Shveta



Re: Proposal: Conflict log history table for Logical Replication

От
Dilip Kumar
Дата:
On Fri, Dec 5, 2025 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 4, 2025 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Also, shall we give the option to the user where she wants to see
> > conflict/resolution information? One idea to achieve the same is to
> > provide subscription options like (a) conflict_resolution_format, the
> > values could be log and table for now, in future, one could extend it
> > to other options like xml, json, etc. (b) conflict_log_table: in this
> > user can specify the conflict table name, this can be optional such
> > that if user omits this and conflict_resolution_format is table, then
> > we will use internally generated table name like
> > pg_conflicts_<subscription_id>.
> >
>
> In this idea, we can keep the name of the second option as
> conflict_log_name instead of conflict_log_table. This can help us LOG
> the conflicts in a totally separate conflict file instead of in server
> log. Say, the user provides conflict_resolution_format as 'log' and
> conflict_log_name as 'conflict_report' then we can report conflicts in
> this separate file by appending subid to distinguish it. And, if the
> user gives only the first option conflict_resolution_format as 'log'
> then we can keep reporting the information in server log files.

Yeah that looks good, so considering the extensibility I think we can
keep the option name as 'conflict_log_name' from the first version
itself even if we don't provide all the options in the first version.

--
Regards,
Dilip Kumar
Google