Обсуждение: Streaming wal from primiry terminating
Hi,
After failover all stand by nodes could not start streaming wal recovery.
Streaming recovery start from 1473/A5000000, but standby start at 1473/A5FFEE08, this seems to be the problem.
What can we do in this case to restore?
Is it possible to shift wal streaming recovery point on primary?
Can checkpoint on primary help in this situation?
2022-09-26 14:08:23.289 [3747796] LOG: consistent recovery state reached at 1473/A5FFEE08
2022-09-26 14:08:23.289 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
2022-09-26 14:08:23.290 [3747793] LOG: database system is ready to accept read-only connections
2022-09-26 14:08:23.292 [3747867] LOG: starting bgworker pgsentinel
2022-09-26 14:08:23.672 [3747868] LOG: started streaming WAL from primary at 1473/A5000000 on timeline 18
2022-09-26 14:08:24.363 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
2022-09-26 14:08:24.366 [3747868] FATAL: terminating walreceiver process due to administrator command
2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
...
...
...
2022-09-26 14:12:24.055 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
2022-09-26 14:12:29.060 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
2022-09-26 14:12:34.064 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
2022-09-26 14:12:39.068 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
2022-09-26 14:12:39.910 [3747793] LOG: received fast shutdown request
2022-09-26 14:12:39.911 [3747793] LOG: aborting any active transactions
2022-09-26 14:12:39.911 [3747867] LOG: bgworker pgsentinel signal: processed SIGTERM
2022-09-26 14:12:39.914 [3747797] LOG: shutting down
2022-09-26 14:12:39.922 [3747793] LOG: database system is shut down
At Wed, 28 Sep 2022 08:50:12 +0000, "Lahnov, Igor" <Igor.Lahnov@nexign.com> wrote in > Hi, > After failover all stand by nodes could not start streaming wal recovery. > Streaming recovery start from 1473/A5000000, but standby start at 1473/A5FFEE08, this seems to be the problem. It's not a problem at all. It is quite normal for standby to start streaming from the beginning of a WAL segment. > What can we do in this case to restore? > Is it possible to shift wal streaming recovery point on primary? > Can checkpoint on primary help in this situation? > 2022-09-26 14:08:23.672 [3747868] LOG: started streaming WAL from primary at 1473/A5000000 on timeline 18 > 2022-09-26 14:08:24.363 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0 > 2022-09-26 14:08:24.366 [3747868] FATAL: terminating walreceiver process due to administrator command This seems to mean someone emtpied primary_conninfo. > 2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0 > 2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0 I don't fully understand the situation. A situation that leads the this state I can come up with is that somehow the standby restored an incomplete WAL segment from the primary. For example, in a case wheresomeone copied the current active WAL file from pg_wal to archive on the primary, or a case where restore_command on the standby fetches WAL files from pg_wal on the primary instead of its archive. Both are not normal operations. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Primary_conninfo is set in postgresql.conf The reason is definitely something else, we do not know what. This is fatal in this case. > 2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0 > 2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0 LSN 1473/A5FFEE08 does not need to be read, according data from the following log file rmgr: XLOG len (rec/tot): 42/ 42, tx: 0, lsn: 1473/A6000028, prev 1473/A5FFD070, desc: OVERWRITE_CONTRECORD lsn 1473/A5FFEE08;time 2022-09-26 11:36:58.104667 MSK
On Thu, Sep 29, 2022 at 01:04:08PM +0000, Lahnov, Igor wrote: > Primary_conninfo is set in postgresql.conf > The reason is definitely something else, we do not know what. This is fatal in this case. Which version of PostgreSQL are you using? There has been a few commits around continuation records lately, so it is hard to say if you are pointing at a new issue or if this is a past one already addressed. -- Michael
Вложения
>> Which version of PostgreSQL are you using? There has been a few commits around continuation records lately, so it ishard to say if you are pointing at a new issue or if this is a past one already addressed. Hi Michael, We are using PostgreSQL 14.4.
Do you need any additional information? -----Original Message----- From: Lahnov, Igor Sent: Friday, September 30, 2022 10:18 AM To: 'Michael Paquier' <michael@paquier.xyz> Cc: Kyotaro Horiguchi <horikyota.ntt@gmail.com>; pgsql-general@lists.postgresql.org; Timonin, Alexey <Alexey.Timonin@nexign.com>;Dmitriyev, Andrey <Andrey.Dmitriyev@nexign.com> Subject: RE: Streaming wal from primiry terminating >> Which version of PostgreSQL are you using? There has been a few commits around continuation records lately, so it ishard to say if you are pointing at a new issue or if this is a past one already addressed. Hi Michael, We are using PostgreSQL 14.4.