Обсуждение: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir
Hi team,
With wal_level = ‘logical’, backup was taken using non-exclusive backup method.
Following procedure here for restore and recovery - PostgreSQL: Documentation: 16: 26.3. Continuous Archiving and Point-in-Time Recovery (PITR)
While starting the PostgreSQL server, below issue is seen:
2024-06-05 11:41:32.369 IST [54369] LOG: restored log file "00000005000000010000006A" from archive
2024-06-05 11:41:33.112 IST [54369] LOG: restored log file "00000005000000010000006B" from archive
cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory
2024-06-05 11:41:33.167 IST [54369] LOG: redo done at 1/6B000100
2024-06-05 11:41:33.172 IST [54369] FATAL: archive file "00000005000000010000006B" has wrong size: 0 instead of 16777216
2024-06-05 11:41:33.173 IST [54367] LOG: startup process (PID 54369) exited with exit code 1
2024-06-05 11:41:33.173 IST [54367] LOG: terminating any other active server processes
2024-06-05 11:41:33.174 IST [54375] FATAL: archive command was terminated by signal 3: Quit
2024-06-05 11:41:33.174 IST [54375] DETAIL: The failed archive command was: cp pg_wal/00000005000000010000006B /home/pgsql/wmaster/00000005000000010000006B
2024-06-05 11:41:33.175 IST [54367] LOG: archiver process (PID 54375) exited with exit code 1
2024-06-05 11:41:33.177 IST [54367] LOG: database system is shut down
Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from)
Before attempting start, size of 00000005000000010000006B file was 16 MB.
After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for 00000005000000010000006B
Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.
-rw------- 1 postgres postgres 2359296 Jun 5 11:34 00000005000000010000006B
Why is it changing the size of WAL segment in archive log directory?
All necessary WAL segments are present and 00000005000000010000006C was never archived.
bash-4.2$ cat /home/pgsql/dmaster/backup_label.old
START WAL LOCATION: 1/69000028 (file 000000050000000100000069)
CHECKPOINT LOCATION: 1/69000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2024-05-31 17:39:43 IST
LABEL: pgida_backup_4321_315606_1717157383
START TIMELINE: 5
bash-4.2$ cat /home/pgsql/wmaster/00000005000000010000006B.00000028.backup
START WAL LOCATION: 1/6B000028 (file 00000005000000010000006B)
STOP WAL LOCATION: 1/6B000100 (file 00000005000000010000006B)
CHECKPOINT LOCATION: 1/6B000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2024-05-31 17:40:28 IST
LABEL: pgida_backup_4321_315606_1717157427
START TIMELINE: 5
STOP TIME: 2024-05-31 17:40:28 IST
STOP TIMELINE: 5
bash-4.2$ cat /home/pgsql/wmaster/00000005.history
1 0/3E000000 before 2000-01-01 05:30:00+05:30
2 0/63000000 before 2000-01-01 05:30:00+05:30
3 0/E8000000 no recovery target specified
4 1/68000000 before 2000-01-01 05:30:00+05:30
Despite our efforts to troubleshoot, the problem persists. Please help to find a solution.
Regards,
Meera
On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote: > 2024-06-05 11:41:32.369 IST [54369] LOG: restored log file "00000005000000010000006A" from archive > 2024-06-05 11:41:33.112 IST [54369] LOG: restored log file "00000005000000010000006B" from archive > cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory > 2024-06-05 11:41:33.167 IST [54369] LOG: redo done at 1/6B000100 > 2024-06-05 11:41:33.172 IST [54369] FATAL: archive file "00000005000000010000006B" has wrong size: 0 instead of 16777216 > 2024-06-05 11:41:33.173 IST [54367] LOG: startup process (PID 54369) exited with exit code 1 > 2024-06-05 11:41:33.173 IST [54367] LOG: terminating any other active server processes > 2024-06-05 11:41:33.174 IST [54375] FATAL: archive command was terminated by signal 3: Quit > 2024-06-05 11:41:33.174 IST [54375] DETAIL: The failed archive command was: cp pg_wal/00000005000000010000006B /home/pgsql/wmaster/00000005000000010000006B > 2024-06-05 11:41:33.175 IST [54367] LOG: archiver process (PID 54375) exited with exit code 1 > 2024-06-05 11:41:33.177 IST [54367] LOG: database system is shut down > > Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from) > > Before attempting start, size of > 00000005000000010000006B file was 16 MB. > After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for 00000005000000010000006B > Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB. > > -rw------- 1 postgres postgres 2359296 Jun 5 11:34 00000005000000010000006B That looks like you have "archive_mode = always", and "archive_command" writes back to the archive. Don't do that. Yours, Laurenz Albe
Hi Laurenz, Thanks a lot! Regards, Meera -----Original Message----- From: Laurenz Albe <laurenz.albe@cybertec.at> Sent: Wednesday, June 5, 2024 3:56 PM To: Meera Nair <mnair@commvault.com>; pgsql-general@lists.postgresql.org Cc: Punit Pranesh Koujalgi <pkoujalgi@commvault.com>; B Ganesh Kishan <bkishan@commvault.com> Subject: Re: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir [You don't often get email from laurenz.albe@cybertec.at. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification] External email. Inspect before opening. On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote: > 2024-06-05 11:41:32.369 IST [54369] LOG: restored log file > "00000005000000010000006A" from archive > 2024-06-05 11:41:33.112 IST [54369] LOG: restored log file > "00000005000000010000006B" from archive > cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No > such file or directory > 2024-06-05 11:41:33.167 IST [54369] LOG: redo done at 1/6B000100 > 2024-06-05 11:41:33.172 IST [54369] FATAL: archive file > "00000005000000010000006B" has wrong size: 0 instead of 16777216 > 2024-06-05 11:41:33.173 IST [54367] LOG: startup process (PID 54369) > exited with exit code 1 > 2024-06-05 11:41:33.173 IST [54367] LOG: terminating any other active > server processes > 2024-06-05 11:41:33.174 IST [54375] FATAL: archive command was > terminated by signal 3: Quit > 2024-06-05 11:41:33.174 IST [54375] DETAIL: The failed archive > command was: cp pg_wal/00000005000000010000006B > /home/pgsql/wmaster/00000005000000010000006B > 2024-06-05 11:41:33.175 IST [54367] LOG: archiver process (PID 54375) > exited with exit code 1 > 2024-06-05 11:41:33.177 IST [54367] LOG: database system is shut down > > Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL > segments are restored from) > > Before attempting start, size of > 00000005000000010000006B file was 16 MB. > After failing to detect 00000005000000010000006C, there is a FATAL > error saying wrong size for 00000005000000010000006B Now the size of 00000005000000010000006B is observed as 2 MB. Sizeof all other WAL segments remain 16 MB. > > -rw------- 1 postgres postgres 2359296 Jun 5 11:34 > 00000005000000010000006B That looks like you have "archive_mode = always", and "archive_command" writes back to the archive. Don't do that. Yours, Laurenz Albe
On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote:
> 2024-06-05 11:41:32.369 IST [54369] LOG: restored log file "00000005000000010000006A" from archive
> 2024-06-05 11:41:33.112 IST [54369] LOG: restored log file "00000005000000010000006B" from archive
> cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory
> 2024-06-05 11:41:33.167 IST [54369] LOG: redo done at 1/6B000100
> 2024-06-05 11:41:33.172 IST [54369] FATAL: archive file "00000005000000010000006B" has wrong size: 0 instead of 16777216
> 2024-06-05 11:41:33.173 IST [54367] LOG: startup process (PID 54369) exited with exit code 1
> 2024-06-05 11:41:33.173 IST [54367] LOG: terminating any other active server processes
> 2024-06-05 11:41:33.174 IST [54375] FATAL: archive command was terminated by signal 3: Quit
> 2024-06-05 11:41:33.174 IST [54375] DETAIL: The failed archive command was: cp pg_wal/00000005000000010000006B /home/pgsql/wmaster/00000005000000010000006B
> 2024-06-05 11:41:33.175 IST [54367] LOG: archiver process (PID 54375) exited with exit code 1
> 2024-06-05 11:41:33.177 IST [54367] LOG: database system is shut down
>
> Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from)
>
> Before attempting start, size of
> 00000005000000010000006B file was 16 MB.
> After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for 00000005000000010000006B
> Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.
>
> -rw------- 1 postgres postgres 2359296 Jun 5 11:34 00000005000000010000006B
That looks like you have "archive_mode = always", and "archive_command" writes
back to the archive. Don't do that.
Hi Ron,
We do have our own solution. We work for Commvault, a data protection company.
We support backup of PostgreSQL in different ways - streaming, snapshot, block level, backup from standby server.
Regards,
Meera
From: Ron Johnson <ronljohnsonjr@gmail.com>
Sent: Thursday, June 6, 2024 9:56 PM
To: Laurenz Albe <laurenz.albe@cybertec.at>
Cc: Meera Nair <mnair@commvault.com>; pgsql-general@lists.postgresql.org; Punit Pranesh Koujalgi <pkoujalgi@commvault.com>; B Ganesh Kishan <bkishan@commvault.com>
Subject: Re: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir
You don't often get email from ronljohnsonjr@gmail.com. Learn why this is important |
External email. Inspect before opening. |
On Wed, Jun 5, 2024 at 6:26 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote:
> 2024-06-05 11:41:32.369 IST [54369] LOG: restored log file "00000005000000010000006A" from archive
> 2024-06-05 11:41:33.112 IST [54369] LOG: restored log file "00000005000000010000006B" from archive
> cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory
> 2024-06-05 11:41:33.167 IST [54369] LOG: redo done at 1/6B000100
> 2024-06-05 11:41:33.172 IST [54369] FATAL: archive file "00000005000000010000006B" has wrong size: 0 instead of 16777216
> 2024-06-05 11:41:33.173 IST [54367] LOG: startup process (PID 54369) exited with exit code 1
> 2024-06-05 11:41:33.173 IST [54367] LOG: terminating any other active server processes
> 2024-06-05 11:41:33.174 IST [54375] FATAL: archive command was terminated by signal 3: Quit
> 2024-06-05 11:41:33.174 IST [54375] DETAIL: The failed archive command was: cp pg_wal/00000005000000010000006B /home/pgsql/wmaster/00000005000000010000006B
> 2024-06-05 11:41:33.175 IST [54367] LOG: archiver process (PID 54375) exited with exit code 1
> 2024-06-05 11:41:33.177 IST [54367] LOG: database system is shut down
>
> Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from)
>
> Before attempting start, size of
> 00000005000000010000006B file was 16 MB.
> After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for 00000005000000010000006B
> Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.
>
> -rw------- 1 postgres postgres 2359296 Jun 5 11:34 00000005000000010000006B
That looks like you have "archive_mode = always", and "archive_command" writes
back to the archive. Don't do that.
In fact, don't write your own PITR backup process. Use something like PgBackRest or BarMan.