Обсуждение: BUG #18006: recovery_target_action=shutdown triggers automatic recovery on next startup (beyond point in time)

Поиск
Список
Период
Сортировка
The following bug has been logged on the website:

Bug reference:      18006
Logged by:          Philipp Hebing
Email address:      p.hebing@slashwhy.de
PostgreSQL version: 15.2
Operating system:   Windows + Docker
Description:

I am using continuous archiving of wal files to be able to recover to a
point in time.
The point in time is named via `select
pg_create_restore_point('recovery_target')`.

When I run the recovery steps mentioned in
https://www.postgresql.org/docs/current/continuous-archiving.html 26.3.4 and
set `recovery_target_action=pause` then I am able to inspect the Database
and everything is recovered to the correct `recovery_target`.
I can run `pg_wal_replay_resume()`, the recovery finishes, the
`recovery.signal` file is automatically removed, I shutdown the server and
when I start it up all is fine and as expected.

When I run everything with `recovery_target_action=shutdown` the server
while in recovery mode reaches the target and successfully shuts down.
I then manually remove the `recovery.signal` file as mentioned in the docs
of `recovery_target_action`
https://www.postgresql.org/docs/current/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVE-RECOVERY
, so that the server does a normal startup next run.
But as you can see in the logs below the server logs `database system was
not properly shut down; automatic recovery in progress` and it continues to
recover everything.
I would expect it to stay at the recovery target and not replay any further
wal files.

To me that feels like a bug, or undesired behavior, or is there anything
else needed to be done manually that `pg_wal_replay_resume()` does when
everything halts on pause (maybe docs are not comprehensive)?

The logs from the recovery using `shutdown`
```
listening on IPv4 address "0.0.0.0", port 5432
listening on IPv6 address "::", port 5432
listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database system was interrupted; last known up at 2023-06-28 12:55:54 UTC
creating missing WAL directory "pg_wal/archive_status"
cp: can't stat '/var/lib/postgresql/pg_wal_archive/00000002.history': No
such file or directory
starting point-in-time recovery to "recovery_target"
restored log file "000000010000000000000007" from archive
redo starts at 0/7000110
restored log file "000000010000000000000008" from archive
consistent recovery state reached at 0/7332490
database system is ready to accept read-only connections
recovery stopping at restore point "recovery_target", time 2023-06-28
12:56:26.858791+00
shutdown at recovery target
shutting down
database system is shut down
```

The logs when starting up after recovery (`recovery.signal` file was
manually removed)
```
listening on IPv4 address "0.0.0.0", port 5432
listening on IPv6 address "::", port 5432
listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database system was shut down in recovery at 2023-06-29 07:45:41 UTC
database system was not properly shut down; automatic recovery in progress
redo starts at 0/7000110
redo done at 0/8169188 system usage: CPU: user: 0.02 s, system: 0.00 s,
elapsed: 0.02 s
checkpoint starting: end-of-recovery immediate wait
checkpoint complete: wrote 1192 buffers (7.3%); 0 WAL file(s) added, 0
removed, 2 recycled; write=0.023 s, sync=0.093 s, total=0.128 s; sync
files=505, longest=0.008 s, average=0.001 s; distance=32767 kB,
estimate=32767 kB
database system is ready to accept connections
```


This is not a bug.

At Thu, 29 Jun 2023 08:04:21 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in 
> When I run everything with `recovery_target_action=shutdown` the server
> while in recovery mode reaches the target and successfully shuts down.
> I then manually remove the `recovery.signal` file as mentioned in the docs
> of `recovery_target_action`
> https://www.postgresql.org/docs/current/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVE-RECOVERY
> , so that the server does a normal startup next run.
> But as you can see in the logs below the server logs `database system was
> not properly shut down; automatic recovery in progress` and it continues to
> recover everything.

This is pretty standard when a stopped standby is launched as a
primary.

In that sense, a shutdown during recovery mode is always an "improper"
shutdown, because it doesn't run a shutdown-checkpoint. However, that
behavior is quite normal for a standby, so you don't see the message
when starting as a standby.

> I would expect it to stay at the recovery target and not replay any further
> wal files.

That's not really the case. Recovery always starts from the last
retartpoint LSN. That LSN is recorded in the control file and you can
check it using pg_controldata.

> To me that feels like a bug, or undesired behavior, or is there anything
> else needed to be done manually that `pg_wal_replay_resume()` does when
> everything halts on pause (maybe docs are not comprehensive)?

pg_wal_replay_resume() just continues a paused reocvery while the
server is running. Once the whole server goes down, it requires a
recovery unless it is known to have been shut down cleanly as a
non-standby.

The idea behind the "shutdown" mode of recoery_target_action is to
prepare a standby that needs the minimum WALs at the next startup. In
reality, recovery always starts from the last restart point. It is
normal behavior of the recovery process, too. And it is written in the
documentation.

https://www.postgresql.org/docs/15/runtime-config-wal.html

> The shutdown setting is useful to have the instance ready at the exact
> replay point desired. The instance will still be able to replay more
> WAL records (and in fact will have to replay WAL records since the
> last checkpoint next time it is started).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center