Обсуждение: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

Поиск
Список
Период
Сортировка

Hi team,

 

With wal_level = ‘logical’, backup was taken using non-exclusive backup method.

Following procedure here for restore and recovery - PostgreSQL: Documentation: 16: 26.3. Continuous Archiving and Point-in-Time Recovery (PITR)

 

While starting the PostgreSQL server, below issue is seen:

 

2024-06-05 11:41:32.369 IST [54369] LOG:  restored log file "00000005000000010000006A" from archive
2024-06-05 11:41:33.112 IST [54369] LOG:  restored log file "00000005000000010000006B" from archive
cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory
2024-06-05 11:41:33.167 IST [54369] LOG:  redo done at 1/6B000100   
2024-06-05 11:41:33.172 IST [54369] FATAL:  archive file "00000005000000010000006B" has wrong size: 0 instead of 16777216
2024-06-05 11:41:33.173 IST [54367] LOG:  startup process (PID 54369) exited with exit code 1
2024-06-05 11:41:33.173 IST [54367] LOG:  terminating any other active server processes
2024-06-05 11:41:33.174 IST [54375] FATAL:  archive command was terminated by signal 3: Quit
2024-06-05 11:41:33.174 IST [54375] DETAIL:  The failed archive command was: cp pg_wal/00000005000000010000006B /home/pgsql/wmaster/00000005000000010000006B
2024-06-05 11:41:33.175 IST [54367] LOG:  archiver process (PID 54375) exited with exit code 1
2024-06-05 11:41:33.177 IST [54367] LOG:  database system is shut down

 

Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from)

 

Before attempting start, size of 00000005000000010000006B file was 16 MB.

After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for 00000005000000010000006B

Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.

 

-rw------- 1 postgres postgres  2359296 Jun  5 11:34 00000005000000010000006B

 

Why is it changing the size of WAL segment in archive log directory?

All necessary WAL segments are present and 00000005000000010000006C was never archived.

 

bash-4.2$ cat /home/pgsql/dmaster/backup_label.old

START WAL LOCATION: 1/69000028 (file 000000050000000100000069)

CHECKPOINT LOCATION: 1/69000060

BACKUP METHOD: streamed

BACKUP FROM: master

START TIME: 2024-05-31 17:39:43 IST

LABEL: pgida_backup_4321_315606_1717157383

START TIMELINE: 5

 

bash-4.2$ cat /home/pgsql/wmaster/00000005000000010000006B.00000028.backup

START WAL LOCATION: 1/6B000028 (file 00000005000000010000006B)

STOP WAL LOCATION: 1/6B000100 (file 00000005000000010000006B)

CHECKPOINT LOCATION: 1/6B000060

BACKUP METHOD: streamed

BACKUP FROM: master

START TIME: 2024-05-31 17:40:28 IST

LABEL: pgida_backup_4321_315606_1717157427

START TIMELINE: 5

STOP TIME: 2024-05-31 17:40:28 IST

STOP TIMELINE: 5

 

bash-4.2$ cat /home/pgsql/wmaster/00000005.history

1       0/3E000000      before 2000-01-01 05:30:00+05:30

2       0/63000000      before 2000-01-01 05:30:00+05:30

3       0/E8000000      no recovery target specified

4       1/68000000      before 2000-01-01 05:30:00+05:30

 

Despite our efforts to troubleshoot, the problem persists. Please help to find a solution.

 

Regards,

Meera

 

On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote:
> 2024-06-05 11:41:32.369 IST [54369] LOG:  restored log file "00000005000000010000006A" from archive
> 2024-06-05 11:41:33.112 IST [54369] LOG:  restored log file "00000005000000010000006B" from archive
> cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory
> 2024-06-05 11:41:33.167 IST [54369] LOG:  redo done at 1/6B000100   
> 2024-06-05 11:41:33.172 IST [54369] FATAL:  archive file "00000005000000010000006B" has wrong size: 0 instead of
16777216
> 2024-06-05 11:41:33.173 IST [54367] LOG:  startup process (PID 54369) exited with exit code 1
> 2024-06-05 11:41:33.173 IST [54367] LOG:  terminating any other active server processes
> 2024-06-05 11:41:33.174 IST [54375] FATAL:  archive command was terminated by signal 3: Quit
> 2024-06-05 11:41:33.174 IST [54375] DETAIL:  The failed archive command was: cp pg_wal/00000005000000010000006B
/home/pgsql/wmaster/00000005000000010000006B
> 2024-06-05 11:41:33.175 IST [54367] LOG:  archiver process (PID 54375) exited with exit code 1
> 2024-06-05 11:41:33.177 IST [54367] LOG:  database system is shut down
>  
> Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from)
>  
> Before attempting start, size of
> 00000005000000010000006B file was 16 MB.
> After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for
00000005000000010000006B
> Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.
>  
> -rw------- 1 postgres postgres  2359296 Jun  5 11:34 00000005000000010000006B

That looks like you have "archive_mode = always", and "archive_command" writes
back to the archive.  Don't do that.

Yours,
Laurenz Albe



Hi Laurenz,

Thanks a lot! 

Regards,
Meera

-----Original Message-----
From: Laurenz Albe <laurenz.albe@cybertec.at> 
Sent: Wednesday, June 5, 2024 3:56 PM
To: Meera Nair <mnair@commvault.com>; pgsql-general@lists.postgresql.org
Cc: Punit Pranesh Koujalgi <pkoujalgi@commvault.com>; B Ganesh Kishan <bkishan@commvault.com>
Subject: Re: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

[You don't often get email from laurenz.albe@cybertec.at. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification]
 

External email. Inspect before opening.



On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote:
> 2024-06-05 11:41:32.369 IST [54369] LOG:  restored log file 
> "00000005000000010000006A" from archive
> 2024-06-05 11:41:33.112 IST [54369] LOG:  restored log file 
> "00000005000000010000006B" from archive
> cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No 
> such file or directory
> 2024-06-05 11:41:33.167 IST [54369] LOG:  redo done at 1/6B000100
> 2024-06-05 11:41:33.172 IST [54369] FATAL:  archive file 
> "00000005000000010000006B" has wrong size: 0 instead of 16777216
> 2024-06-05 11:41:33.173 IST [54367] LOG:  startup process (PID 54369) 
> exited with exit code 1
> 2024-06-05 11:41:33.173 IST [54367] LOG:  terminating any other active 
> server processes
> 2024-06-05 11:41:33.174 IST [54375] FATAL:  archive command was 
> terminated by signal 3: Quit
> 2024-06-05 11:41:33.174 IST [54375] DETAIL:  The failed archive 
> command was: cp pg_wal/00000005000000010000006B 
> /home/pgsql/wmaster/00000005000000010000006B
> 2024-06-05 11:41:33.175 IST [54367] LOG:  archiver process (PID 54375) 
> exited with exit code 1
> 2024-06-05 11:41:33.177 IST [54367] LOG:  database system is shut down
>
> Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL 
> segments are restored from)
>
> Before attempting start, size of
> 00000005000000010000006B file was 16 MB.
> After failing to detect 00000005000000010000006C, there is a FATAL 
> error saying wrong size for 00000005000000010000006B Now the size of 00000005000000010000006B is observed as 2 MB.
Sizeof all other WAL segments remain 16 MB.
 
>
> -rw------- 1 postgres postgres  2359296 Jun  5 11:34 
> 00000005000000010000006B

That looks like you have "archive_mode = always", and "archive_command" writes back to the archive.  Don't do that.

Yours,
Laurenz Albe

On Wed, Jun 5, 2024 at 6:26 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote:
> 2024-06-05 11:41:32.369 IST [54369] LOG:  restored log file "00000005000000010000006A" from archive
> 2024-06-05 11:41:33.112 IST [54369] LOG:  restored log file "00000005000000010000006B" from archive
> cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory
> 2024-06-05 11:41:33.167 IST [54369] LOG:  redo done at 1/6B000100   
> 2024-06-05 11:41:33.172 IST [54369] FATAL:  archive file "00000005000000010000006B" has wrong size: 0 instead of 16777216
> 2024-06-05 11:41:33.173 IST [54367] LOG:  startup process (PID 54369) exited with exit code 1
> 2024-06-05 11:41:33.173 IST [54367] LOG:  terminating any other active server processes
> 2024-06-05 11:41:33.174 IST [54375] FATAL:  archive command was terminated by signal 3: Quit
> 2024-06-05 11:41:33.174 IST [54375] DETAIL:  The failed archive command was: cp pg_wal/00000005000000010000006B /home/pgsql/wmaster/00000005000000010000006B
> 2024-06-05 11:41:33.175 IST [54367] LOG:  archiver process (PID 54375) exited with exit code 1
> 2024-06-05 11:41:33.177 IST [54367] LOG:  database system is shut down
>  
> Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from)
>  
> Before attempting start, size of
> 00000005000000010000006B file was 16 MB.
> After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for 00000005000000010000006B
> Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.
>  
> -rw------- 1 postgres postgres  2359296 Jun  5 11:34 00000005000000010000006B

That looks like you have "archive_mode = always", and "archive_command" writes
back to the archive.  Don't do that.
 
In fact, don't write your own PITR backup process.  Use something like PgBackRest or BarMan.

Hi Ron,

We do have our own solution. We work for Commvault, a data protection company.

We support backup of PostgreSQL in different ways - streaming, snapshot, block level, backup from standby server.

PostgreSQL (commvault.com)

 

Regards,

Meera

 

From: Ron Johnson <ronljohnsonjr@gmail.com>
Sent: Thursday, June 6, 2024 9:56 PM
To: Laurenz Albe <laurenz.albe@cybertec.at>
Cc: Meera Nair <mnair@commvault.com>; pgsql-general@lists.postgresql.org; Punit Pranesh Koujalgi <pkoujalgi@commvault.com>; B Ganesh Kishan <bkishan@commvault.com>
Subject: Re: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

 

External email. Inspect before opening.

 

On Wed, Jun 5, 2024 at 6:26 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote:
> 2024-06-05 11:41:32.369 IST [54369] LOG:  restored log file "00000005000000010000006A" from archive
> 2024-06-05 11:41:33.112 IST [54369] LOG:  restored log file "00000005000000010000006B" from archive
> cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory
> 2024-06-05 11:41:33.167 IST [54369] LOG:  redo done at 1/6B000100   
> 2024-06-05 11:41:33.172 IST [54369] FATAL:  archive file "00000005000000010000006B" has wrong size: 0 instead of 16777216
> 2024-06-05 11:41:33.173 IST [54367] LOG:  startup process (PID 54369) exited with exit code 1
> 2024-06-05 11:41:33.173 IST [54367] LOG:  terminating any other active server processes
> 2024-06-05 11:41:33.174 IST [54375] FATAL:  archive command was terminated by signal 3: Quit
> 2024-06-05 11:41:33.174 IST [54375] DETAIL:  The failed archive command was: cp pg_wal/00000005000000010000006B /home/pgsql/wmaster/00000005000000010000006B
> 2024-06-05 11:41:33.175 IST [54367] LOG:  archiver process (PID 54375) exited with exit code 1
> 2024-06-05 11:41:33.177 IST [54367] LOG:  database system is shut down
>  
> Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from)
>  
> Before attempting start, size of
> 00000005000000010000006B file was 16 MB.
> After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for 00000005000000010000006B
> Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.
>  
> -rw------- 1 postgres postgres  2359296 Jun  5 11:34 00000005000000010000006B

That looks like you have "archive_mode = always", and "archive_command" writes
back to the archive.  Don't do that.

 

In fact, don't write your own PITR backup process.  Use something like PgBackRest or BarMan.