Обсуждение: Streaming wal from primiry terminating

Поиск
Список
Период
Сортировка

Streaming wal from primiry terminating

От
"Lahnov, Igor"
Дата:

Hi,

After failover all stand by nodes could not start streaming wal recovery.

Streaming recovery start from 1473/A5000000, but standby start at 1473/A5FFEE08, this seems to be the problem.

What can we do in this case to restore?

Is it possible to shift wal streaming recovery point on primary?

Can checkpoint on primary help in this situation?

 

 

2022-09-26 14:08:23.289  [3747796]     LOG:  consistent recovery state reached at 1473/A5FFEE08

2022-09-26 14:08:23.289  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0

2022-09-26 14:08:23.290  [3747793]     LOG:  database system is ready to accept read-only connections

2022-09-26 14:08:23.292  [3747867]     LOG:  starting bgworker pgsentinel

2022-09-26 14:08:23.672  [3747868]     LOG:  started streaming WAL from primary at 1473/A5000000 on timeline 18

2022-09-26 14:08:24.363  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0

2022-09-26 14:08:24.366  [3747868]     FATAL:  terminating walreceiver process due to administrator command

2022-09-26 14:08:24.366  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0

2022-09-26 14:08:24.366  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0

2022-09-26 14:08:24.366  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0

...

...

...

2022-09-26 14:12:24.055  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0

2022-09-26 14:12:29.060  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0

2022-09-26 14:12:34.064  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0

2022-09-26 14:12:39.068  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0

2022-09-26 14:12:39.910  [3747793]     LOG:  received fast shutdown request

2022-09-26 14:12:39.911  [3747793]     LOG:  aborting any active transactions

2022-09-26 14:12:39.911  [3747867]     LOG:  bgworker pgsentinel signal: processed SIGTERM

2022-09-26 14:12:39.914  [3747797]     LOG:  shutting down

2022-09-26 14:12:39.922  [3747793]     LOG:  database system is shut down

Re: Streaming wal from primiry terminating

От
Kyotaro Horiguchi
Дата:
At Wed, 28 Sep 2022 08:50:12 +0000, "Lahnov, Igor" <Igor.Lahnov@nexign.com> wrote in 
> Hi,
> After failover all stand by nodes could not start streaming wal recovery.
> Streaming recovery start from 1473/A5000000, but standby start at 1473/A5FFEE08, this seems to be the problem.

It's not a problem at all. It is quite normal for standby to start
streaming from the beginning of a WAL segment.

> What can we do in this case to restore?
> Is it possible to shift wal streaming recovery point on primary?
> Can checkpoint on primary help in this situation?


> 2022-09-26 14:08:23.672  [3747868]     LOG:  started streaming WAL from primary at 1473/A5000000 on timeline 18
> 2022-09-26 14:08:24.363  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0
> 2022-09-26 14:08:24.366  [3747868]     FATAL:  terminating walreceiver process due to administrator command

This seems to mean someone emtpied primary_conninfo.

> 2022-09-26 14:08:24.366  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0
> 2022-09-26 14:08:24.366  [3747796]     LOG:  invalid record length at 1473/A5FFEE08: wanted 24, got 0

I don't fully understand the situation. A situation that leads the
this state I can come up with is that somehow the standby restored an
incomplete WAL segment from the primary. For example, in a case
wheresomeone copied the current active WAL file from pg_wal to archive
on the primary, or a case where restore_command on the standby fetches
WAL files from pg_wal on the primary instead of its archive. Both are
not normal operations.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Streaming wal from primiry terminating

От
"Lahnov, Igor"
Дата:
Primary_conninfo is set in postgresql.conf
The reason is definitely something else, we do not know what. This is fatal in this case.


> 2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
> 2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0

LSN 1473/A5FFEE08 does not need to be read, according data from the following log file

rmgr: XLOG len (rec/tot): 42/ 42, tx: 0, lsn: 1473/A6000028, prev 1473/A5FFD070, desc: OVERWRITE_CONTRECORD lsn
1473/A5FFEE08;time 2022-09-26 11:36:58.104667 MSK 





Re: Streaming wal from primiry terminating

От
Michael Paquier
Дата:
On Thu, Sep 29, 2022 at 01:04:08PM +0000, Lahnov, Igor wrote:
> Primary_conninfo is set in postgresql.conf
> The reason is definitely something else, we do not know what. This is fatal in this case.

Which version of PostgreSQL are you using?  There has been a few
commits around continuation records lately, so it is hard to say if
you are pointing at a new issue or if this is a past one already
addressed.
--
Michael

Вложения

RE: Streaming wal from primiry terminating

От
"Lahnov, Igor"
Дата:
>> Which version of PostgreSQL are you using?  There has been a few commits around continuation records lately, so it
ishard to say if you are pointing at a new issue or if this is a past one already addressed. 

Hi Michael,
We are using PostgreSQL 14.4.





RE: Streaming wal from primiry terminating

От
"Lahnov, Igor"
Дата:
Do you need any additional information?

-----Original Message-----
From: Lahnov, Igor
Sent: Friday, September 30, 2022 10:18 AM
To: 'Michael Paquier' <michael@paquier.xyz>
Cc: Kyotaro Horiguchi <horikyota.ntt@gmail.com>; pgsql-general@lists.postgresql.org; Timonin, Alexey
<Alexey.Timonin@nexign.com>;Dmitriyev, Andrey <Andrey.Dmitriyev@nexign.com> 
Subject: RE: Streaming wal from primiry terminating

>> Which version of PostgreSQL are you using?  There has been a few commits around continuation records lately, so it
ishard to say if you are pointing at a new issue or if this is a past one already addressed. 

Hi Michael,
We are using PostgreSQL 14.4.