Обсуждение: [PATCH] Preserve replication origin OIDs in pg_upgrade

Поиск
Список
Период
Сортировка

[PATCH] Preserve replication origin OIDs in pg_upgrade

От
Ajin Cherian
Дата:
Hello hackers,

The idea for this patch came up during discussions in the thread [1]
on migration of the pg_commit_ts directory as part of pg_upgrade.
There was a problem raised by Sawada-san in that thread which this
patch addresses. [2]

The problem:
The pg_commit_ts directory stores commit-timestamp records for each
transaction, and each record embeds the replication origin ID
(roident) that identifies which subscription wrote that transaction.
When pg_upgrade migrates a subscriber, the pg_commit_ts directory is
copied directly from the old cluster to the new cluster. This means
those embedded roidents must remain valid in the new cluster.  When
pg_upgrade migrates a subscriber, CREATE SUBSCRIPTION on the new
cluster calls replorigin_create() which assigns fresh roidents to each
subscription's replication origin. Because subscription OIDs are not
stable across upgrades, the origin names change (e.g. pg_16392 becomes
pg_16403), and consequently the roidents can be assigned differently —
or in the worst case, swapped between subscriptions.

Consider two subscriptions subA and subB with roidents 1 and 2
respectively before upgrade. After upgrade, due to OID reassignment,
subA might get roident 2 and subB might get roident 1. The
commit-timestamp records copied from the old cluster still say roident
1 for rows written by subA, but the new cluster now thinks roident 1
belongs to subB. This causes spurious update_origin_differs conflicts
— the new cluster incorrectly thinks a row was last modified by a
different subscription than it actually was.

This patch attempts to fix this by replicating the roident of the
replication origins of each subscription on migration. This patch also
migrates all replication origins as part of pg_upgrade.

Sequence of Events During Upgrade

1. pg_dumpall dumps all non-subscription replication origins from the
old cluster with their roidents and LSN positions.
2. pg_dump dumps each subscription, but now records the old roident
alongside the subscription info.
3. During restore, pg_dumpall's output recreates non-subscription
origins on the new cluster with their original roidents via
binary_upgrade_create_replication_origin().
4. During per-database restore, CREATE SUBSCRIPTION runs but skips
origin creation.
5. binary_upgrade_set_next_replorigin_oid() creates the origin for
each subscription with the preserved roident.
6. binary_upgrade_replorigin_advance() restores the LSN position for
each subscription.
7. Subscriptions that were running before upgrade are re-enabled.

Please let me know your feedback regarding this patch

[1] - https://www.postgresql.org/message-id/flat/182311743703924%40mail.yandex.ru
[2] - https://www.postgresql.org/message-id/CAD21AoDG8zQpHHfw7OvaEy7W0ZSyP%3D_dS-hrcquJ3C_ctMDmMQ%40mail.gmail.com

regards,
Ajin Cherian
Fujitsu Australia

Вложения

RE: [PATCH] Preserve replication origin OIDs in pg_upgrade

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Ajin,

> Sequence of Events During Upgrade
> 
> 1. pg_dumpall dumps all non-subscription replication origins from the
> old cluster with their roidents and LSN positions.
> 2. pg_dump dumps each subscription, but now records the old roident
> alongside the subscription info.
> 3. During restore, pg_dumpall's output recreates non-subscription
> origins on the new cluster with their original roidents via
> binary_upgrade_create_replication_origin().

To confirm, why do we have to handle separately for subscription-associated
origins? I'm thinking it's not needed if the subscription's OID is preserved
during the upgrade. 

I checked the old thread to preserve it [1], but it could not be accepted because
there are no strong motivations. But I feel this is the good reason to do so now.

How do you feel?

[1]: https://www.postgresql.org/message-id/CALDaNm2Wj63VcbB0SY2NECHr1mKM1YSaV1ZydrdQVxyox2O2hg%40mail.gmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: [PATCH] Preserve replication origin OIDs in pg_upgrade

От
vignesh C
Дата:
On Wed, 29 Apr 2026 at 14:11, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Ajin,
>
> > Sequence of Events During Upgrade
> >
> > 1. pg_dumpall dumps all non-subscription replication origins from the
> > old cluster with their roidents and LSN positions.
> > 2. pg_dump dumps each subscription, but now records the old roident
> > alongside the subscription info.
> > 3. During restore, pg_dumpall's output recreates non-subscription
> > origins on the new cluster with their original roidents via
> > binary_upgrade_create_replication_origin().
>
> To confirm, why do we have to handle separately for subscription-associated
> origins? I'm thinking it's not needed if the subscription's OID is preserved
> during the upgrade.

+1 to preserve the subscription OID. This should make preserving
replication origin easier.

> I checked the old thread to preserve it [1], but it could not be accepted because
> there are no strong motivations. But I feel this is the good reason to do so now.

Here is a rebased version of the patch.

Regards,
Vignesh

Вложения

Re: [PATCH] Preserve replication origin OIDs in pg_upgrade

От
shveta malik
Дата:
On Wed, Apr 29, 2026 at 2:11 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Ajin,
>
> > Sequence of Events During Upgrade
> >
> > 1. pg_dumpall dumps all non-subscription replication origins from the
> > old cluster with their roidents and LSN positions.
> > 2. pg_dump dumps each subscription, but now records the old roident
> > alongside the subscription info.
> > 3. During restore, pg_dumpall's output recreates non-subscription
> > origins on the new cluster with their original roidents via
> > binary_upgrade_create_replication_origin().
>
> To confirm, why do we have to handle separately for subscription-associated
> origins? I'm thinking it's not needed if the subscription's OID is preserved
> during the upgrade.
>

I’m not sure how preserving the subscription OID would ensure that the
origin ID is also preserved for sub-associated origins. Could you
please elaborate?

As I understand it, roident values are assigned independently during
origin creation. Even if subscription OIDs are preserved, the origin
IDs could still be reassigned differently on the new cluster. For
example, suppose we have two subscriptions, sub1 and sub2, with
roident values 2 and 3, assuming 1 was previously used and dropped.
After upgrade, origin creation may start allocating from 1 again,
resulting in roident values 1 and 2 instead. Since pg_commit_ts stores
the numeric roident, not the origin name, this mismatch could still
lead to incorrect conflict detection. Wouldn’t that result in the same
wrong conflict detection issue we are trying to avoid?
Please let me know if my understanding is wrong.

thanks
Shveta



Re: [PATCH] Preserve replication origin OIDs in pg_upgrade

От
Ajin Cherian
Дата:
On Thu, Apr 30, 2026 at 4:52 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 29 Apr 2026 at 14:11, Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Ajin,
> >
> > > Sequence of Events During Upgrade
> > >
> > > 1. pg_dumpall dumps all non-subscription replication origins from the
> > > old cluster with their roidents and LSN positions.
> > > 2. pg_dump dumps each subscription, but now records the old roident
> > > alongside the subscription info.
> > > 3. During restore, pg_dumpall's output recreates non-subscription
> > > origins on the new cluster with their original roidents via
> > > binary_upgrade_create_replication_origin().
> >
> > To confirm, why do we have to handle separately for subscription-associated
> > origins? I'm thinking it's not needed if the subscription's OID is preserved
> > during the upgrade.
>
> +1 to preserve the subscription OID. This should make preserving
> replication origin easier.
>
> > I checked the old thread to preserve it [1], but it could not be accepted because
> > there are no strong motivations. But I feel this is the good reason to do so now.
>
> Here is a rebased version of the patch.


Thanks Vignesh for the patch. I have used your patch as 0001 and
created mine on top of that as 0002. Like Kuroda-san said, with your
patch, I no longer need to have special handling of subscription
replication origins when pg_dumpall creates all replication origins on
the new cluster as now the name of origin is also guaranteed to be the
same because the replication origin name is created using the oid of
the subscription which is now the same because of the the changes in
patch 0001.
Here's v3 with the updated changes.

regards,
Ajin Cherian
Fujitsu Australia

Вложения

Re: [PATCH] Preserve replication origin OIDs in pg_upgrade

От
Ajin Cherian
Дата:
On Thu, Apr 30, 2026 at 7:37 PM shveta malik <shveta.malik@gmail.com> wrote:
>
>
> I’m not sure how preserving the subscription OID would ensure that the
> origin ID is also preserved for sub-associated origins. Could you
> please elaborate?
>
> As I understand it, roident values are assigned independently during
> origin creation. Even if subscription OIDs are preserved, the origin
> IDs could still be reassigned differently on the new cluster. For
> example, suppose we have two subscriptions, sub1 and sub2, with
> roident values 2 and 3, assuming 1 was previously used and dropped.
> After upgrade, origin creation may start allocating from 1 again,
> resulting in roident values 1 and 2 instead. Since pg_commit_ts stores
> the numeric roident, not the origin name, this mismatch could still
> lead to incorrect conflict detection. Wouldn’t that result in the same
> wrong conflict detection issue we are trying to avoid?
> Please let me know if my understanding is wrong.

In the first patch, the replication origins were duplicated from the
old cluster to the new with matching roidents and ronames. This
couldn't be done for subscription replication origins as subscriptions
weren't preserving OIDs on the new cluster and therefore the
corresponding roname which is derived from the subscription OIDs also
differed. Now with matching roname and roident, all the replication
origins from the old cluster can be copied over to the new cluster in
one shot.

regards,
Ajin Cherian
Fujitsu Australia