Обсуждение: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

Поиск

Список

Период

Сортировка

Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

От

Meera Nair

Дата:

05 июня, 09:36:35

Hi team,

With wal_level = ‘logical’, backup was taken using non-exclusive backup method.

Following procedure here for restore and recovery - PostgreSQL: Documentation: 16: 26.3. Continuous Archiving and Point-in-Time Recovery (PITR)

While starting the PostgreSQL server, below issue is seen:

2024-06-05 11:41:32.369 IST [54369] LOG: restored log file "00000005000000010000006A" from archive
2024-06-05 11:41:33.112 IST [54369] LOG: restored log file "00000005000000010000006B" from archive
cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory
2024-06-05 11:41:33.167 IST [54369] LOG: redo done at 1/6B000100
2024-06-05 11:41:33.172 IST [54369] FATAL: archive file "00000005000000010000006B" has wrong size: 0 instead of 16777216
2024-06-05 11:41:33.173 IST [54367] LOG: startup process (PID 54369) exited with exit code 1
2024-06-05 11:41:33.173 IST [54367] LOG: terminating any other active server processes
2024-06-05 11:41:33.174 IST [54375] FATAL: archive command was terminated by signal 3: Quit
2024-06-05 11:41:33.174 IST [54375] DETAIL: The failed archive command was: cp pg_wal/00000005000000010000006B /home/pgsql/wmaster/00000005000000010000006B
2024-06-05 11:41:33.175 IST [54367] LOG: archiver process (PID 54375) exited with exit code 1
2024-06-05 11:41:33.177 IST [54367] LOG: database system is shut down

Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from)

Before attempting start, size of 00000005000000010000006B file was 16 MB.

After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for 00000005000000010000006B

Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.

-rw------- 1 postgres postgres 2359296 Jun 5 11:34 00000005000000010000006B

Why is it changing the size of WAL segment in archive log directory?

All necessary WAL segments are present and 00000005000000010000006C was never archived.

bash-4.2$ cat /home/pgsql/dmaster/backup_label.old

START WAL LOCATION: 1/69000028 (file 000000050000000100000069)

CHECKPOINT LOCATION: 1/69000060

BACKUP METHOD: streamed

BACKUP FROM: master

START TIME: 2024-05-31 17:39:43 IST

LABEL: pgida_backup_4321_315606_1717157383

START TIMELINE: 5

bash-4.2$ cat /home/pgsql/wmaster/00000005000000010000006B.00000028.backup

START WAL LOCATION: 1/6B000028 (file 00000005000000010000006B)

STOP WAL LOCATION: 1/6B000100 (file 00000005000000010000006B)

CHECKPOINT LOCATION: 1/6B000060

BACKUP METHOD: streamed

BACKUP FROM: master

START TIME: 2024-05-31 17:40:28 IST

LABEL: pgida_backup_4321_315606_1717157427

START TIMELINE: 5

STOP TIME: 2024-05-31 17:40:28 IST

STOP TIMELINE: 5

bash-4.2$ cat /home/pgsql/wmaster/00000005.history

1 0/3E000000 before 2000-01-01 05:30:00+05:30

2 0/63000000 before 2000-01-01 05:30:00+05:30

3 0/E8000000 no recovery target specified

4 1/68000000 before 2000-01-01 05:30:00+05:30

Despite our efforts to troubleshoot, the problem persists. Please help to find a solution.

Regards,

Meera

Re: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

От

Laurenz Albe

Дата:

05 июня, 13:26:03

On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote:
> 2024-06-05 11:41:32.369 IST [54369] LOG:  restored log file "00000005000000010000006A" from archive
> 2024-06-05 11:41:33.112 IST [54369] LOG:  restored log file "00000005000000010000006B" from archive
> cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory
> 2024-06-05 11:41:33.167 IST [54369] LOG:  redo done at 1/6B000100   
> 2024-06-05 11:41:33.172 IST [54369] FATAL:  archive file "00000005000000010000006B" has wrong size: 0 instead of
16777216
> 2024-06-05 11:41:33.173 IST [54367] LOG:  startup process (PID 54369) exited with exit code 1
> 2024-06-05 11:41:33.173 IST [54367] LOG:  terminating any other active server processes
> 2024-06-05 11:41:33.174 IST [54375] FATAL:  archive command was terminated by signal 3: Quit
> 2024-06-05 11:41:33.174 IST [54375] DETAIL:  The failed archive command was: cp pg_wal/00000005000000010000006B
/home/pgsql/wmaster/00000005000000010000006B
> 2024-06-05 11:41:33.175 IST [54367] LOG:  archiver process (PID 54375) exited with exit code 1
> 2024-06-05 11:41:33.177 IST [54367] LOG:  database system is shut down
>  
> Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from)
>  
> Before attempting start, size of
> 00000005000000010000006B file was 16 MB.
> After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for
00000005000000010000006B
> Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.
>  
> -rw------- 1 postgres postgres  2359296 Jun  5 11:34 00000005000000010000006B

That looks like you have "archive_mode = always", and "archive_command" writes
back to the archive.  Don't do that.

Yours,
Laurenz Albe

RE: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

От

Meera Nair

Дата:

06 июня, 04:52:07

Hi Laurenz,

Thanks a lot! 

Regards,
Meera

-----Original Message-----
From: Laurenz Albe <laurenz.albe@cybertec.at> 
Sent: Wednesday, June 5, 2024 3:56 PM
To: Meera Nair <mnair@commvault.com>; pgsql-general@lists.postgresql.org
Cc: Punit Pranesh Koujalgi <pkoujalgi@commvault.com>; B Ganesh Kishan <bkishan@commvault.com>
Subject: Re: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

[You don't often get email from laurenz.albe@cybertec.at. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification]
 

External email. Inspect before opening.



On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote:
> 2024-06-05 11:41:32.369 IST [54369] LOG:  restored log file 
> "00000005000000010000006A" from archive
> 2024-06-05 11:41:33.112 IST [54369] LOG:  restored log file 
> "00000005000000010000006B" from archive
> cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No 
> such file or directory
> 2024-06-05 11:41:33.167 IST [54369] LOG:  redo done at 1/6B000100
> 2024-06-05 11:41:33.172 IST [54369] FATAL:  archive file 
> "00000005000000010000006B" has wrong size: 0 instead of 16777216
> 2024-06-05 11:41:33.173 IST [54367] LOG:  startup process (PID 54369) 
> exited with exit code 1
> 2024-06-05 11:41:33.173 IST [54367] LOG:  terminating any other active 
> server processes
> 2024-06-05 11:41:33.174 IST [54375] FATAL:  archive command was 
> terminated by signal 3: Quit
> 2024-06-05 11:41:33.174 IST [54375] DETAIL:  The failed archive 
> command was: cp pg_wal/00000005000000010000006B 
> /home/pgsql/wmaster/00000005000000010000006B
> 2024-06-05 11:41:33.175 IST [54367] LOG:  archiver process (PID 54375) 
> exited with exit code 1
> 2024-06-05 11:41:33.177 IST [54367] LOG:  database system is shut down
>
> Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL 
> segments are restored from)
>
> Before attempting start, size of
> 00000005000000010000006B file was 16 MB.
> After failing to detect 00000005000000010000006C, there is a FATAL 
> error saying wrong size for 00000005000000010000006B Now the size of 00000005000000010000006B is observed as 2 MB.
Sizeof all other WAL segments remain 16 MB.
 
>
> -rw------- 1 postgres postgres  2359296 Jun  5 11:34 
> 00000005000000010000006B

That looks like you have "archive_mode = always", and "archive_command" writes back to the archive.  Don't do that.

Yours,
Laurenz Albe

Re: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

От

Ron Johnson

Дата:

06 июня, 19:25:38

On Wed, Jun 5, 2024 at 6:26 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote:
> 2024-06-05 11:41:32.369 IST [54369] LOG: restored log file "00000005000000010000006A" from archive
> 2024-06-05 11:41:33.112 IST [54369] LOG: restored log file "00000005000000010000006B" from archive
> cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory
> 2024-06-05 11:41:33.167 IST [54369] LOG: redo done at 1/6B000100
> 2024-06-05 11:41:33.172 IST [54369] FATAL: archive file "00000005000000010000006B" has wrong size: 0 instead of 16777216
> 2024-06-05 11:41:33.173 IST [54367] LOG: startup process (PID 54369) exited with exit code 1
> 2024-06-05 11:41:33.173 IST [54367] LOG: terminating any other active server processes
> 2024-06-05 11:41:33.174 IST [54375] FATAL: archive command was terminated by signal 3: Quit
> 2024-06-05 11:41:33.174 IST [54375] DETAIL: The failed archive command was: cp pg_wal/00000005000000010000006B /home/pgsql/wmaster/00000005000000010000006B
> 2024-06-05 11:41:33.175 IST [54367] LOG: archiver process (PID 54375) exited with exit code 1
> 2024-06-05 11:41:33.177 IST [54367] LOG: database system is shut down
>
> Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from)
>
> Before attempting start, size of
> 00000005000000010000006B file was 16 MB.
> After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for 00000005000000010000006B
> Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.
>
> -rw------- 1 postgres postgres 2359296 Jun 5 11:34 00000005000000010000006B

That looks like you have "archive_mode = always", and "archive_command" writes
back to the archive. Don't do that.

In fact, don't write your own PITR backup process. Use something like PgBackRest or BarMan.

RE: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

От

Meera Nair

Дата:

07 июня, 04:45:39

Hi Ron,

We do have our own solution. We work for Commvault, a data protection company.

We support backup of PostgreSQL in different ways - streaming, snapshot, block level, backup from standby server.

PostgreSQL (commvault.com)

Regards,

Meera

From: Ron Johnson <ronljohnsonjr@gmail.com>
Sent: Thursday, June 6, 2024 9:56 PM
To: Laurenz Albe <laurenz.albe@cybertec.at>
Cc: Meera Nair <mnair@commvault.com>; pgsql-general@lists.postgresql.org; Punit Pranesh Koujalgi <pkoujalgi@commvault.com>; B Ganesh Kishan <bkishan@commvault.com>
Subject: Re: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

You don't often get email from ronljohnsonjr@gmail.com. Learn why this is important

External email. Inspect before opening.

On Wed, Jun 5, 2024 at 6:26 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2024-06-05 at 06:36 +0000, Meera Nair wrote:
> 2024-06-05 11:41:32.369 IST [54369] LOG: restored log file "00000005000000010000006A" from archive
> 2024-06-05 11:41:33.112 IST [54369] LOG: restored log file "00000005000000010000006B" from archive
> cp: cannot stat ‘/home/pgsql/wmaster/00000005000000010000006C’: No such file or directory
> 2024-06-05 11:41:33.167 IST [54369] LOG: redo done at 1/6B000100
> 2024-06-05 11:41:33.172 IST [54369] FATAL: archive file "00000005000000010000006B" has wrong size: 0 instead of 16777216
> 2024-06-05 11:41:33.173 IST [54367] LOG: startup process (PID 54369) exited with exit code 1
> 2024-06-05 11:41:33.173 IST [54367] LOG: terminating any other active server processes
> 2024-06-05 11:41:33.174 IST [54375] FATAL: archive command was terminated by signal 3: Quit
> 2024-06-05 11:41:33.174 IST [54375] DETAIL: The failed archive command was: cp pg_wal/00000005000000010000006B /home/pgsql/wmaster/00000005000000010000006B
> 2024-06-05 11:41:33.175 IST [54367] LOG: archiver process (PID 54375) exited with exit code 1
> 2024-06-05 11:41:33.177 IST [54367] LOG: database system is shut down
>
> Here ‘/home/pgsql/wmaster’ is my archivedir (the folder where WAL segments are restored from)
>
> Before attempting start, size of
> 00000005000000010000006B file was 16 MB.
> After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for 00000005000000010000006B
> Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.
>
> -rw------- 1 postgres postgres 2359296 Jun 5 11:34 00000005000000010000006B

That looks like you have "archive_mode = always", and "archive_command" writes
back to the archive. Don't do that.

In fact, don't write your own PITR backup process. Use something like PgBackRest or BarMan.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

Re: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

RE: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

Re: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

RE: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir