Обсуждение: [PATCH] Preserve replication origin OIDs in pg_upgrade
Hello hackers, The idea for this patch came up during discussions in the thread [1] on migration of the pg_commit_ts directory as part of pg_upgrade. There was a problem raised by Sawada-san in that thread which this patch addresses. [2] The problem: The pg_commit_ts directory stores commit-timestamp records for each transaction, and each record embeds the replication origin ID (roident) that identifies which subscription wrote that transaction. When pg_upgrade migrates a subscriber, the pg_commit_ts directory is copied directly from the old cluster to the new cluster. This means those embedded roidents must remain valid in the new cluster. When pg_upgrade migrates a subscriber, CREATE SUBSCRIPTION on the new cluster calls replorigin_create() which assigns fresh roidents to each subscription's replication origin. Because subscription OIDs are not stable across upgrades, the origin names change (e.g. pg_16392 becomes pg_16403), and consequently the roidents can be assigned differently — or in the worst case, swapped between subscriptions. Consider two subscriptions subA and subB with roidents 1 and 2 respectively before upgrade. After upgrade, due to OID reassignment, subA might get roident 2 and subB might get roident 1. The commit-timestamp records copied from the old cluster still say roident 1 for rows written by subA, but the new cluster now thinks roident 1 belongs to subB. This causes spurious update_origin_differs conflicts — the new cluster incorrectly thinks a row was last modified by a different subscription than it actually was. This patch attempts to fix this by replicating the roident of the replication origins of each subscription on migration. This patch also migrates all replication origins as part of pg_upgrade. Sequence of Events During Upgrade 1. pg_dumpall dumps all non-subscription replication origins from the old cluster with their roidents and LSN positions. 2. pg_dump dumps each subscription, but now records the old roident alongside the subscription info. 3. During restore, pg_dumpall's output recreates non-subscription origins on the new cluster with their original roidents via binary_upgrade_create_replication_origin(). 4. During per-database restore, CREATE SUBSCRIPTION runs but skips origin creation. 5. binary_upgrade_set_next_replorigin_oid() creates the origin for each subscription with the preserved roident. 6. binary_upgrade_replorigin_advance() restores the LSN position for each subscription. 7. Subscriptions that were running before upgrade are re-enabled. Please let me know your feedback regarding this patch [1] - https://www.postgresql.org/message-id/flat/182311743703924%40mail.yandex.ru [2] - https://www.postgresql.org/message-id/CAD21AoDG8zQpHHfw7OvaEy7W0ZSyP%3D_dS-hrcquJ3C_ctMDmMQ%40mail.gmail.com regards, Ajin Cherian Fujitsu Australia
Вложения
Dear Ajin, > Sequence of Events During Upgrade > > 1. pg_dumpall dumps all non-subscription replication origins from the > old cluster with their roidents and LSN positions. > 2. pg_dump dumps each subscription, but now records the old roident > alongside the subscription info. > 3. During restore, pg_dumpall's output recreates non-subscription > origins on the new cluster with their original roidents via > binary_upgrade_create_replication_origin(). To confirm, why do we have to handle separately for subscription-associated origins? I'm thinking it's not needed if the subscription's OID is preserved during the upgrade. I checked the old thread to preserve it [1], but it could not be accepted because there are no strong motivations. But I feel this is the good reason to do so now. How do you feel? [1]: https://www.postgresql.org/message-id/CALDaNm2Wj63VcbB0SY2NECHr1mKM1YSaV1ZydrdQVxyox2O2hg%40mail.gmail.com Best regards, Hayato Kuroda FUJITSU LIMITED
On Wed, 29 Apr 2026 at 14:11, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Ajin, > > > Sequence of Events During Upgrade > > > > 1. pg_dumpall dumps all non-subscription replication origins from the > > old cluster with their roidents and LSN positions. > > 2. pg_dump dumps each subscription, but now records the old roident > > alongside the subscription info. > > 3. During restore, pg_dumpall's output recreates non-subscription > > origins on the new cluster with their original roidents via > > binary_upgrade_create_replication_origin(). > > To confirm, why do we have to handle separately for subscription-associated > origins? I'm thinking it's not needed if the subscription's OID is preserved > during the upgrade. +1 to preserve the subscription OID. This should make preserving replication origin easier. > I checked the old thread to preserve it [1], but it could not be accepted because > there are no strong motivations. But I feel this is the good reason to do so now. Here is a rebased version of the patch. Regards, Vignesh
Вложения
On Wed, Apr 29, 2026 at 2:11 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Ajin, > > > Sequence of Events During Upgrade > > > > 1. pg_dumpall dumps all non-subscription replication origins from the > > old cluster with their roidents and LSN positions. > > 2. pg_dump dumps each subscription, but now records the old roident > > alongside the subscription info. > > 3. During restore, pg_dumpall's output recreates non-subscription > > origins on the new cluster with their original roidents via > > binary_upgrade_create_replication_origin(). > > To confirm, why do we have to handle separately for subscription-associated > origins? I'm thinking it's not needed if the subscription's OID is preserved > during the upgrade. > I’m not sure how preserving the subscription OID would ensure that the origin ID is also preserved for sub-associated origins. Could you please elaborate? As I understand it, roident values are assigned independently during origin creation. Even if subscription OIDs are preserved, the origin IDs could still be reassigned differently on the new cluster. For example, suppose we have two subscriptions, sub1 and sub2, with roident values 2 and 3, assuming 1 was previously used and dropped. After upgrade, origin creation may start allocating from 1 again, resulting in roident values 1 and 2 instead. Since pg_commit_ts stores the numeric roident, not the origin name, this mismatch could still lead to incorrect conflict detection. Wouldn’t that result in the same wrong conflict detection issue we are trying to avoid? Please let me know if my understanding is wrong. thanks Shveta
On Thu, Apr 30, 2026 at 4:52 PM vignesh C <vignesh21@gmail.com> wrote: > > On Wed, 29 Apr 2026 at 14:11, Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Dear Ajin, > > > > > Sequence of Events During Upgrade > > > > > > 1. pg_dumpall dumps all non-subscription replication origins from the > > > old cluster with their roidents and LSN positions. > > > 2. pg_dump dumps each subscription, but now records the old roident > > > alongside the subscription info. > > > 3. During restore, pg_dumpall's output recreates non-subscription > > > origins on the new cluster with their original roidents via > > > binary_upgrade_create_replication_origin(). > > > > To confirm, why do we have to handle separately for subscription-associated > > origins? I'm thinking it's not needed if the subscription's OID is preserved > > during the upgrade. > > +1 to preserve the subscription OID. This should make preserving > replication origin easier. > > > I checked the old thread to preserve it [1], but it could not be accepted because > > there are no strong motivations. But I feel this is the good reason to do so now. > > Here is a rebased version of the patch. Thanks Vignesh for the patch. I have used your patch as 0001 and created mine on top of that as 0002. Like Kuroda-san said, with your patch, I no longer need to have special handling of subscription replication origins when pg_dumpall creates all replication origins on the new cluster as now the name of origin is also guaranteed to be the same because the replication origin name is created using the oid of the subscription which is now the same because of the the changes in patch 0001. Here's v3 with the updated changes. regards, Ajin Cherian Fujitsu Australia
Вложения
On Thu, Apr 30, 2026 at 7:37 PM shveta malik <shveta.malik@gmail.com> wrote: > > > I’m not sure how preserving the subscription OID would ensure that the > origin ID is also preserved for sub-associated origins. Could you > please elaborate? > > As I understand it, roident values are assigned independently during > origin creation. Even if subscription OIDs are preserved, the origin > IDs could still be reassigned differently on the new cluster. For > example, suppose we have two subscriptions, sub1 and sub2, with > roident values 2 and 3, assuming 1 was previously used and dropped. > After upgrade, origin creation may start allocating from 1 again, > resulting in roident values 1 and 2 instead. Since pg_commit_ts stores > the numeric roident, not the origin name, this mismatch could still > lead to incorrect conflict detection. Wouldn’t that result in the same > wrong conflict detection issue we are trying to avoid? > Please let me know if my understanding is wrong. In the first patch, the replication origins were duplicated from the old cluster to the new with matching roidents and ronames. This couldn't be done for subscription replication origins as subscriptions weren't preserving OIDs on the new cluster and therefore the corresponding roname which is derived from the subscription OIDs also differed. Now with matching roname and roident, all the replication origins from the old cluster can be copied over to the new cluster in one shot. regards, Ajin Cherian Fujitsu Australia