Обсуждение: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18

Поиск
Список
Период
Сортировка

BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      19360
Logged by:          Mostafa Hassanzadeh
Email address:      mostafaa.hasanzadeh@gmail.com
PostgreSQL version: 18.1
Operating system:   Ubuntu 24.04
Description:

Description: I am encountering a persistent issue during the initial
synchronization (Logical Replication) migrating from PostgreSQL 12 (Source)
to PostgreSQL 18 (Target/Devel).

Despite ensuring a clean state (truncated tables, disabled triggers, dropped
indexes), the replication fails immediately after the initial COPY phase
when it tries to apply concurrent updates from WAL. The error indicates an
origin conflict, even though origin is set to none.

It appears that the rows inserted during the initial COPY process in PG18
are not being treated correctly regarding their origin status, causing a
conflict when the Apply Worker tries to update these rows with incoming WAL
entries.

Environment:

    Publisher: PostgreSQL 12

    Subscriber: PostgreSQL 18 (Development/Beta version)

    OS: Linux (Kernel > 5.10)

    Setup: High-volume data migration (~100GB tables)

Steps to Reproduce:

    Publisher (PG12): Create a publication for tables with moderate write
traffic.

    Subscriber (PG18):

        DISABLE TRIGGER ALL on target tables.

        TRUNCATE target tables.

        Create a subscription with:
        SQL

        CREATE SUBSCRIPTION sub_name
        CONNECTION '...'
        PUBLICATION pub_name
        WITH (copy_data = true, origin = 'none', binary = false);

    Observation:

        The COPY phase starts and writes data to the disk.

        As soon as COPY finishes and the worker switches to streaming to
catch up, it crashes with the following error.

Error Log:

LOG:  conflict detected on relation "public.player":
conflict=update_origin_differs
DETAIL:  Updating the row that was modified by a non-existent origin in
transaction [TXID] at [TIMESTAMP].
Existing local row (...); remote row (...); replica identity (id)=(...).
CONTEXT:  processing remote data for replication origin "pg_..." during
message type "UPDATE" ...

Analysis: I have verified that:

    There are no other active subscriptions writing to the target database.

    All triggers and foreign keys are disabled on the subscriber.

    The issue persists even after multiple cleanups (DROP SUBSCRIPTION /
TRUNCATE).

Suspected Cause: It seems there is an incompatibility or regression in
PostgreSQL 18's logical replication handling. Specifically, tuples inserted
via the initial COPY protocol (from a PG12 source) might be tagged with a
local or null origin in a way that conflicts with the conflict_resolver or
origin checking logic in PG18, even when origin = 'none' is explicitly
configured.

I suspect the COPY process does not correctly set the tuple origin state
that the WAL apply worker expects, leading it to believe the row was
modified locally by a third party.


Hi,




        WITH (copy_data = true, origin = 'none', binary = false);


There is the following note in the documentation about the setting of copy_data = true
and origin = 'none'.
``` 
When using a subscription parameter combination of copy_data = true and origin = NONE,
the initial sync table data is copied directly from the publisher, meaning that knowledge of the
true origin of that data is not possible. If the publisher also has subscriptions then the copied
 table data might have originated from further upstream. This scenario is detected and a
WARNING is logged to the user, but the warning is only an indication of a potential problem;
it is the user's responsibility to make the necessary checks to ensure the copied data origins
are really as wanted or not.
```
Kindly check if this is happening in your case.


    Observation:

        The COPY phase starts and writes data to the disk.

        As soon as COPY finishes and the worker switches to streaming to
catch up, it crashes with the following error.

Error Log:

LOG:  conflict detected on relation "public.player":
conflict=update_origin_differs
DETAIL:  Updating the row that was modified by a non-existent origin in
transaction [TXID] at [TIMESTAMP].
Existing local row (...); remote row (...); replica identity (id)=(...).
CONTEXT:  processing remote data for replication origin "pg_..." during
message type "UPDATE" ...


This does not seem like an error and the apply operation can proceed
successfully even after logging this. Can you please check if there is
another message with ERROR/FATAL/PANIC log level in the logs?

Thank you,
Rahila Syed
On Tue, Dec 23, 2025 at 1:20 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
>
>
>>
>>
>>         WITH (copy_data = true, origin = 'none', binary = false);
>>
>
> There is the following note in the documentation about the setting of copy_data = true
> and origin = 'none'.
> ```
> When using a subscription parameter combination of copy_data = true and origin = NONE,
> the initial sync table data is copied directly from the publisher, meaning that knowledge of the
> true origin of that data is not possible. If the publisher also has subscriptions then the copied
>  table data might have originated from further upstream. This scenario is detected and a
> WARNING is logged to the user, but the warning is only an indication of a potential problem;
> it is the user's responsibility to make the necessary checks to ensure the copied data origins
> are really as wanted or not.
> ```
> Kindly check if this is happening in your case.
>

If this would have been the case then the OP should see a message at
the time CREATE SUBSCRIPTION. The WARNING message in the above quoted
section is the WARNING displayed in check_publications_origin_tables()
which should happen at the subscription specific DDL command.

>
>>     Observation:
>>
>>         The COPY phase starts and writes data to the disk.
>>
>>         As soon as COPY finishes and the worker switches to streaming to
>> catch up, it crashes with the following error.
>>
>> Error Log:
>>
>> LOG:  conflict detected on relation "public.player":
>> conflict=update_origin_differs
>> DETAIL:  Updating the row that was modified by a non-existent origin in
>> transaction [TXID] at [TIMESTAMP].
>> Existing local row (...); remote row (...); replica identity (id)=(...).
>> CONTEXT:  processing remote data for replication origin "pg_..." during
>> message type "UPDATE" ...
>>
>
> This does not seem like an error and the apply operation can proceed
> successfully even after logging this. Can you please check if there is
> another message with ERROR/FATAL/PANIC log level in the logs?
>

Yeah, this part is also not clear to me. The messages shown should be
LOG messages and shouldn't lead to crash or error as seems to be the
what the OP seems to be seeing.

The one thing we can check is whether track_commit_timestamp is set on
PG18 Subscriber node? If not, then you can try once after setting the
same.

--
With Regards,
Amit Kapila.



On Mon, 22 Dec 2025 at 19:00, PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      19360
> Logged by:          Mostafa Hassanzadeh
> Email address:      mostafaa.hasanzadeh@gmail.com
> PostgreSQL version: 18.1
> Operating system:   Ubuntu 24.04
> Description:
>
> Description: I am encountering a persistent issue during the initial
> synchronization (Logical Replication) migrating from PostgreSQL 12 (Source)
> to PostgreSQL 18 (Target/Devel).
>
> Despite ensuring a clean state (truncated tables, disabled triggers, dropped
> indexes), the replication fails immediately after the initial COPY phase
> when it tries to apply concurrent updates from WAL. The error indicates an
> origin conflict, even though origin is set to none.
>
> It appears that the rows inserted during the initial COPY process in PG18
> are not being treated correctly regarding their origin status, causing a
> conflict when the Apply Worker tries to update these rows with incoming WAL
> entries.
>
> Environment:
>
>     Publisher: PostgreSQL 12
>
>     Subscriber: PostgreSQL 18 (Development/Beta version)
>
>     OS: Linux (Kernel > 5.10)
>
>     Setup: High-volume data migration (~100GB tables)
>
> Steps to Reproduce:
>
>     Publisher (PG12): Create a publication for tables with moderate write
> traffic.
>
>     Subscriber (PG18):
>
>         DISABLE TRIGGER ALL on target tables.
>
>         TRUNCATE target tables.
>
>         Create a subscription with:
>         SQL
>
>         CREATE SUBSCRIPTION sub_name
>         CONNECTION '...'
>         PUBLICATION pub_name
>         WITH (copy_data = true, origin = 'none', binary = false);
>
>     Observation:
>
>         The COPY phase starts and writes data to the disk.
>
>         As soon as COPY finishes and the worker switches to streaming to
> catch up, it crashes with the following error.
>
> Error Log:
>
> LOG:  conflict detected on relation "public.player":
> conflict=update_origin_differs
> DETAIL:  Updating the row that was modified by a non-existent origin in
> transaction [TXID] at [TIMESTAMP].
> Existing local row (...); remote row (...); replica identity (id)=(...).
> CONTEXT:  processing remote data for replication origin "pg_..." during
> message type "UPDATE" ...
>
> Analysis: I have verified that:
>
>     There are no other active subscriptions writing to the target database.
>
>     All triggers and foreign keys are disabled on the subscriber.
>
>     The issue persists even after multiple cleanups (DROP SUBSCRIPTION /
> TRUNCATE).
>
> Suspected Cause: It seems there is an incompatibility or regression in
> PostgreSQL 18's logical replication handling. Specifically, tuples inserted
> via the initial COPY protocol (from a PG12 source) might be tagged with a
> local or null origin in a way that conflicts with the conflict_resolver or
> origin checking logic in PG18, even when origin = 'none' is explicitly
> configured.
>
> I suspect the COPY process does not correctly set the tuple origin state
> that the WAL apply worker expects, leading it to believe the row was
> modified locally by a third party.

This can occur in the following scenario: commit timestamp tracking is
enabled on the subscriber; the same table exists on both publisher and
subscriber; a publication is created on the publisher with initial
data; and a subscription is created on the subscriber with origin =
none. During the initial table synchronization, the row is inserted
using a tablesync replication origin, which is dropped once
synchronization completes. If the row is updated on the publisher
after the initial sync, the apply worker attempts to update a row that
was inserted using a different replication origin(tablesync origin),
resulting in an origin mismatch.

The conflict is logged and logical replication continues normally. No
crash occurs, and the log entry is informational rather than
indicative of a failure. These messages can be safely ignored for now.

We are currently evaluating possible improvements to handle this
scenario more gracefully and to avoid reporting these conflicts in the
future.

Regards,
Vignesh



On Mon, Dec 29, 2025 at 4:26 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Mon, 22 Dec 2025 at 19:00, PG Bug reporting form
> <noreply@postgresql.org> wrote:
> >
>
> This can occur in the following scenario: commit timestamp tracking is
> enabled on the subscriber; the same table exists on both publisher and
> subscriber; a publication is created on the publisher with initial
> data; and a subscription is created on the subscriber with origin =
> none. During the initial table synchronization, the row is inserted
> using a tablesync replication origin, which is dropped once
> synchronization completes. If the row is updated on the publisher
> after the initial sync, the apply worker attempts to update a row that
> was inserted using a different replication origin(tablesync origin),
> resulting in an origin mismatch.
>
> The conflict is logged and logical replication continues normally. No
> crash occurs, and the log entry is informational rather than
> indicative of a failure.
>

I agree with this analysis.

> These messages can be safely ignored for now.
>
> We are currently evaluating possible improvements to handle this
> scenario more gracefully and to avoid reporting these conflicts in the
> future.
>

One idea to safely ignore these LOGs is we could modify the state
management in the catalog pg_subscription_rel to store originID. When
a tablesync worker completes, instead of just deleting the origin and
setting the relation state to ready, it could record the origin_id it
used into pg_subscription_rel.  When the apply worker encounters an
origin mismatch, it checks pg_subscription_rel for that specific
table. If the "old" origin ID matches the one recorded during the sync
phase, the worker knows the row is "ours" and suppresses the log. Now,
as the origin ID could be reused, we could additionally store local
timestamp along with originId in pg_subscription_rel. Then, we can
suppress the log if: row_origin_id == srsuboriginid AND
row_commit_time <= srsubsynctime.

I think addressing this is much more important conflict resolution to
avoid doing any wrong resolution for conflicts.

--
With Regards,
Amit Kapila.



On Mon, Dec 29, 2025 at 10:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 29, 2025 at 4:26 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Mon, 22 Dec 2025 at 19:00, PG Bug reporting form
> > <noreply@postgresql.org> wrote:
> > >
> >
> > This can occur in the following scenario: commit timestamp tracking is
> > enabled on the subscriber; the same table exists on both publisher and
> > subscriber; a publication is created on the publisher with initial
> > data; and a subscription is created on the subscriber with origin =
> > none. During the initial table synchronization, the row is inserted
> > using a tablesync replication origin, which is dropped once
> > synchronization completes. If the row is updated on the publisher
> > after the initial sync, the apply worker attempts to update a row that
> > was inserted using a different replication origin(tablesync origin),
> > resulting in an origin mismatch.
> >
> > The conflict is logged and logical replication continues normally. No
> > crash occurs, and the log entry is informational rather than
> > indicative of a failure.
> >
>
> I agree with this analysis.
>
> > These messages can be safely ignored for now.
> >
> > We are currently evaluating possible improvements to handle this
> > scenario more gracefully and to avoid reporting these conflicts in the
> > future.
> >
>
> One idea to safely ignore these LOGs is we could modify the state
> management in the catalog pg_subscription_rel to store originID. When
> a tablesync worker completes, instead of just deleting the origin and
> setting the relation state to ready, it could record the origin_id it
> used into pg_subscription_rel.  When the apply worker encounters an
> origin mismatch, it checks pg_subscription_rel for that specific
> table. If the "old" origin ID matches the one recorded during the sync
> phase, the worker knows the row is "ours" and suppresses the log. Now,
> as the origin ID could be reused, we could additionally store local
> timestamp along with originId in pg_subscription_rel. Then, we can
> suppress the log if: row_origin_id == srsuboriginid AND
> row_commit_time <= srsubsynctime.

It sounds very costly. IIUC we would need these checks for every first
update to tuples loaded via initial table sync. Can we somehow share
the apply worker's origin with tablesync workers so that they can
refer to the same origin ID? Or can we invent special origin IDs
(e.g., > 0x00FF) that are the same as the normal origin ID except for
being ignored by the conflict detection system?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



On Fri, Jan 9, 2026 at 4:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Dec 29, 2025 at 10:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 29, 2025 at 4:26 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Mon, 22 Dec 2025 at 19:00, PG Bug reporting form
> > > <noreply@postgresql.org> wrote:
> > > >
> > >
> > > This can occur in the following scenario: commit timestamp tracking is
> > > enabled on the subscriber; the same table exists on both publisher and
> > > subscriber; a publication is created on the publisher with initial
> > > data; and a subscription is created on the subscriber with origin =
> > > none. During the initial table synchronization, the row is inserted
> > > using a tablesync replication origin, which is dropped once
> > > synchronization completes. If the row is updated on the publisher
> > > after the initial sync, the apply worker attempts to update a row that
> > > was inserted using a different replication origin(tablesync origin),
> > > resulting in an origin mismatch.
> > >
> > > The conflict is logged and logical replication continues normally. No
> > > crash occurs, and the log entry is informational rather than
> > > indicative of a failure.
> > >
> >
> > I agree with this analysis.
> >
> > > These messages can be safely ignored for now.
> > >
> > > We are currently evaluating possible improvements to handle this
> > > scenario more gracefully and to avoid reporting these conflicts in the
> > > future.
> > >
> >
> > One idea to safely ignore these LOGs is we could modify the state
> > management in the catalog pg_subscription_rel to store originID. When
> > a tablesync worker completes, instead of just deleting the origin and
> > setting the relation state to ready, it could record the origin_id it
> > used into pg_subscription_rel.  When the apply worker encounters an
> > origin mismatch, it checks pg_subscription_rel for that specific
> > table. If the "old" origin ID matches the one recorded during the sync
> > phase, the worker knows the row is "ours" and suppresses the log. Now,
> > as the origin ID could be reused, we could additionally store local
> > timestamp along with originId in pg_subscription_rel. Then, we can
> > suppress the log if: row_origin_id == srsuboriginid AND
> > row_commit_time <= srsubsynctime.
>
> It sounds very costly. IIUC we would need these checks for every first
> update to tuples loaded via initial table sync. Can we somehow share
> the apply worker's origin with tablesync workers so that they can
> refer to the same origin ID? Or can we invent special origin IDs
> (e.g., > 0x00FF) that are the same as the normal origin ID except for
> being ignored by the conflict detection system?

How will this distinguish between the initial sync is done from the
publisher node we are getting the update vs the initial sync is done
from some other node?  Can we always ignore conflict checking for
initial synced data or do we just want to ignore if the  initial sync
is done from the same node?

--
Regards,
Dilip Kumar
Google



On Thu, Jan 8, 2026 at 8:46 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Jan 9, 2026 at 4:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Dec 29, 2025 at 10:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Dec 29, 2025 at 4:26 PM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > On Mon, 22 Dec 2025 at 19:00, PG Bug reporting form
> > > > <noreply@postgresql.org> wrote:
> > > > >
> > > >
> > > > This can occur in the following scenario: commit timestamp tracking is
> > > > enabled on the subscriber; the same table exists on both publisher and
> > > > subscriber; a publication is created on the publisher with initial
> > > > data; and a subscription is created on the subscriber with origin =
> > > > none. During the initial table synchronization, the row is inserted
> > > > using a tablesync replication origin, which is dropped once
> > > > synchronization completes. If the row is updated on the publisher
> > > > after the initial sync, the apply worker attempts to update a row that
> > > > was inserted using a different replication origin(tablesync origin),
> > > > resulting in an origin mismatch.
> > > >
> > > > The conflict is logged and logical replication continues normally. No
> > > > crash occurs, and the log entry is informational rather than
> > > > indicative of a failure.
> > > >
> > >
> > > I agree with this analysis.
> > >
> > > > These messages can be safely ignored for now.
> > > >
> > > > We are currently evaluating possible improvements to handle this
> > > > scenario more gracefully and to avoid reporting these conflicts in the
> > > > future.
> > > >
> > >
> > > One idea to safely ignore these LOGs is we could modify the state
> > > management in the catalog pg_subscription_rel to store originID. When
> > > a tablesync worker completes, instead of just deleting the origin and
> > > setting the relation state to ready, it could record the origin_id it
> > > used into pg_subscription_rel.  When the apply worker encounters an
> > > origin mismatch, it checks pg_subscription_rel for that specific
> > > table. If the "old" origin ID matches the one recorded during the sync
> > > phase, the worker knows the row is "ours" and suppresses the log. Now,
> > > as the origin ID could be reused, we could additionally store local
> > > timestamp along with originId in pg_subscription_rel. Then, we can
> > > suppress the log if: row_origin_id == srsuboriginid AND
> > > row_commit_time <= srsubsynctime.
> >
> > It sounds very costly. IIUC we would need these checks for every first
> > update to tuples loaded via initial table sync. Can we somehow share
> > the apply worker's origin with tablesync workers so that they can
> > refer to the same origin ID? Or can we invent special origin IDs
> > (e.g., > 0x00FF) that are the same as the normal origin ID except for
> > being ignored by the conflict detection system?
>
> How will this distinguish between the initial sync is done from the
> publisher node we are getting the update vs the initial sync is done
> from some other node?  Can we always ignore conflict checking for
> initial synced data or do we just want to ignore if the  initial sync
> is done from the same node?

I imagined the former idea; always ignore conflict checking, so we
don't need to distinguish them. IOW we treat the changes via the
initial tablesync as if the changes made by the normal backend process
(who doesn't use replication origin) while using the replication
tracking ability of the replication origin.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com