Обсуждение: pg_rewind problem: cannot find WAL
Hi all, running 17.4 on ubuntu 24.04 machines. I've three hosts, pg-1 (primary) and two physical replicas. I then promote host pg-3 as a master (pg_promote()) and want to rewind the pg-1 to follow the new master, so: ssh pg-3 'sudo -u postgres /usr/lib/postgresql/17/bin/pg_rewind -D /var/lib/postgresql/17/main --source-server="user=replica_fluca host=pg-3 dbname=replica_fluca"' pg_rewind: servers diverged at WAL location 0/B8550F8 on timeline 1 pg_rewind: error: could not open file "/var/lib/postgresql/17/main/pg_wal/00000001000000000000000A": No such file or directory pg_rewind: error: could not find previous WAL record at 0/AFFF4E8 But the file 0x010000A is not there: % ssh pg-3 'sudo ls /var/lib/postgresql/17/main/pg_wal' 00000001000000000000000B.partial 00000002.history 00000002000000000000000B 00000002000000000000000C 00000002000000000000000D 00000002000000000000000E archive_status summaries % ssh pg-1 'sudo ls /var/lib/postgresql/17/main/pg_wal' 000000010000000000000005.00000028.backup 00000001000000000000000B 00000001000000000000000C 00000001000000000000000D 00000001000000000000000E archive_status summaries Do i have to ensure the old primary pg-1 does a wal switch before promoting the other one and try to rewind? Thanks, Luca
On Wed, 2025-05-07 at 12:51 +0200, Luca Ferrari wrote: > running 17.4 on ubuntu 24.04 machines. I've three hosts, pg-1 > (primary) and two physical replicas. > I then promote host pg-3 as a master (pg_promote()) and want to rewind > the pg-1 to follow the new master, so: > > ssh pg-3 'sudo -u postgres /usr/lib/postgresql/17/bin/pg_rewind -D > /var/lib/postgresql/17/main --source-server="user=replica_fluca > host=pg-3 dbname=replica_fluca"' > pg_rewind: servers diverged at WAL location 0/B8550F8 on timeline 1 > pg_rewind: error: could not open file > "/var/lib/postgresql/17/main/pg_wal/00000001000000000000000A": No such > file or directory > pg_rewind: error: could not find previous WAL record at 0/AFFF4E8 > > But the file 0x010000A is not there: > > > % ssh pg-3 'sudo ls /var/lib/postgresql/17/main/pg_wal' > 00000001000000000000000B.partial > 00000002.history > 00000002000000000000000B > 00000002000000000000000C > 00000002000000000000000D > 00000002000000000000000E > archive_status > summaries > > % ssh pg-1 'sudo ls /var/lib/postgresql/17/main/pg_wal' > 000000010000000000000005.00000028.backup > 00000001000000000000000B > 00000001000000000000000C > 00000001000000000000000D > 00000001000000000000000E > archive_status > summaries > > Do i have to ensure the old primary pg-1 does a wal switch before > promoting the other one and try to rewind? I don't think it is connected to a WAL switch. I'd say that you should set "wal_keep_size" high enough that all the WAL needed for pg_rewind is still present. If you have a WAL archive, you could define a restore_command on the server you want to rewind. Yours, Laurenz Albe
On Wed, May 7, 2025 at 3:55 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > I don't think it is connected to a WAL switch. > Thanks. > I'd say that you should set "wal_keep_size" high enough that all the WAL > needed for pg_rewind is still present. > > If you have a WAL archive, you could define a restore_command on the server > you want to rewind. I've pgbackrest making backups, so I have an archive_command. I'm going to see if putting a restore_command can fix the problem. Thanks for the suggestion. Luca
On Thu, May 8, 2025 at 8:54 AM Luca Ferrari <fluca1978@gmail.com> wrote: > > I've pgbackrest making backups, so I have an archive_command. I'm > going to see if putting a restore_command can fix the problem. > But I'm facing a quite trivial problem: in ubuntu installation the configuration files are separated from the PGDATA. Apparently pg_rewind is trying to read postgresql.conf to get the restore_command, and I don't know how to specify the different location of the postgresql.conf (cannot specifcy -c as in postgres): $ /usr/lib/postgresql/17/bin/pg_rewind -D /var/lib/postgresql/17/main --source-server="user=replica_fluca host=dev-psqlha3 dbname=replica_fluca" -R -P --debug -c postgres: could not access the server configuration file "/var/lib/postgresql/17/main/postgresql.conf": No such file or directory no data was returned by command "/usr/lib/postgresql/17/bin/postgres -D /var/lib/postgresql/17/main -C restore_command" child process exited with exit code 2 pg_rewind: error: could not read restore_command from target cluster Any idea? Clearly, postgresql.auto.conf is within PGDATA, and since my recovery_command is there, one trick could be to touch and empty PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file. But I'm sure there is a smarter solution. Thanks, Luca
> > Any idea? > Clearly, postgresql.auto.conf is within PGDATA, and since my > recovery_command is there, one trick could be to touch and empty > PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file. > But I'm sure there is a smarter solution. > > Thanks, > Luca > > A symlink from $PGDATA to where actual file?
On Thu, May 8, 2025 at 4:04 PM Rob Sargent <robjsargent@gmail.com> wrote: > > > A symlink from $PGDATA to where actual file? > Could be, I need to experiment with pg_basebackup to ensure it is not conflicting with the /etc/ configuration file when creating a clone. Luca
On 5/8/25 04:26, Luca Ferrari wrote:
> On Thu, May 8, 2025 at 8:54 AM Luca Ferrari <fluca1978@gmail.com> wrote:
>>
>> I've pgbackrest making backups, so I have an archive_command. I'm
>> going to see if putting a restore_command can fix the problem.
>>
>
> But I'm facing a quite trivial problem: in ubuntu installation the
> configuration files are separated from the PGDATA.
> Apparently pg_rewind is trying to read postgresql.conf to get the
> restore_command, and I don't know how to specify the different
> location of the postgresql.conf (cannot specifcy -c as in postgres):
>
> $ /usr/lib/postgresql/17/bin/pg_rewind -D /var/lib/postgresql/17/main
> --source-server="user=replica_fluca host=dev-psqlha3
> dbname=replica_fluca" -R -P --debug -c
> postgres: could not access the server configuration file
> "/var/lib/postgresql/17/main/postgresql.conf": No such file or
> directory
> no data was returned by command "/usr/lib/postgresql/17/bin/postgres
> -D /var/lib/postgresql/17/main -C restore_command"
> child process exited with exit code 2
> pg_rewind: error: could not read restore_command from target cluster
>
> Any idea?
/usr/lib/postgresql/17/bin/pg_rewind --help
pg_rewind resynchronizes a PostgreSQL cluster with another copy of the
cluster.
Usage:
pg_rewind [OPTION]...
Options:
-c, --restore-target-wal use "restore_command" in target
configuration to
retrieve WAL files from archives
-D, --target-pgdata=DIRECTORY existing data directory to modify
--source-pgdata=DIRECTORY source data directory to synchronize with
--source-server=CONNSTR source server to synchronize with
-n, --dry-run stop before modifying anything
-N, --no-sync do not wait for changes to be written
safely to disk
-P, --progress write progress messages
-R, --write-recovery-conf write configuration for replication
(requires --source-server)
--config-file=FILENAME use specified main server configuration
file when running target cluster
--debug write a lot of debug messages
--no-ensure-shutdown do not automatically fix unclean shutdown
--sync-method=METHOD set method for syncing files to disk
-V, --version output version information, then exit
-?, --help show this help, then exit
So use --config-file=FILENAME?
> Clearly, postgresql.auto.conf is within PGDATA, and since my
> recovery_command is there, one trick could be to touch and empty
> PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file.
> But I'm sure there is a smarter solution.
>
> Thanks,
> Luca
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
On Thu, May 8, 2025 at 5:11 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote: > /usr/lib/postgresql/17/bin/pg_rewind --help > pg_rewind resynchronizes a PostgreSQL cluster with another copy of the > cluster. > --config-file=FILENAME use specified main server configuration shame on me! I was grepping config_file as in pg_ctl... Thanks! Luca