Greetings,
* Rob Emery (re-pgsql@codeweavers.net) wrote:
> We're pretty sure that we've just hit a scenario where the instructions for
> pg_upgrade with standby servers aren't quite safe.
I've pretty much always felt those instructions should come with a
"expert use only" note.
> We were testing our upgrade process with a copy of our live cluster in a
> lab environment and we found that using
>
> rsync --verbose --archive --delete --hard-links --size-only
> --no-inc-recursive /media/postgresql/data/main /media/postgresql/data/9.6
> PGRETESTA02:/media/postgresql/data
>
> Resulted in:
>
> 2020-12-02 14:49:11.513 GMT [20884-1] LOG: database system was shut
> down in recovery at 2020-12-02 13:57:56 GMT
> 2020-12-02 14:49:11.513 GMT [20884-2] LOG: entering standby mode
> 2020-12-02 14:49:11.557 GMT [20884-3] LOG: consistent recovery state
> reached at 19E/25000098
> 2020-12-02 14:49:11.557 GMT [20884-4] LOG: invalid record length at
> 19E/25000098: wanted 24, got 0
> 2020-12-02 14:49:11.559 GMT [20883-1] LOG: database system is ready
> to accept read only connections
> 2020-12-02 14:49:11.593 GMT [20888-1] FATAL: database system
> identifier differs between the primary and standby
> 2020-12-02 14:49:11.593 GMT [20888-2] DETAIL: The primary's
> identifier is 6901669428825624285, the standby's identifier is
> 690161835164
> 1138930.
>
> however if we don't use --size-only, then the process worked fine and we a
> smooth upgrade.
This doesn't look like a --size-only issue though- how did you end up
with this when the pg_controldata file (where the system identifier is
pulled from) shouldn't even exist and won't be a hardlink to the
existing one on the old system since it's a new cluster, and therefore
should definitely be copied?
When you go to run the rsync to get the replica up to date you should
have a system that looks like this:
PRIMARY:
/srv/old_cluster
... lots of files
/srv/new_cluster
... lots of *new* files, including pg_controldata
... hardlinks to PG table/index files that aren't part of the catalog
REPLICA:
/srv/old_cluster
... lots of files
/srv/new_cluster
... entirely empty
Then you run the rsync, at the /srv level, and all those 'new files' in
/srv/new_cluster on the primary should get copied over to
/srv/new_cluster on the replica, while any files which are hard-linked
between old_cluster and new_cluster should end up as hard links on the
replica.
Thanks,
Stephen