Issue with recovery_target = 'immediate'

Поиск
Список
Период
Сортировка
От David Steele
Тема Issue with recovery_target = 'immediate'
Дата
Msg-id 5a6ce6bf-2a27-2b90-5cdb-14f6c89ca238@pgmasters.net
обсуждение исходный текст
Ответы Re: Issue with recovery_target = 'immediate'  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
Hackers,

We have noticed an issue when performing recovery with recovery_target = 
'immediate' when the latest timeline cannot be replayed to from the 
backup (current) timeline.

First, create two backups:

$ pgbackrest --stanza=demo --type=full --start-fast backup
$ pgbackrest --stanza=demo --type=full --start-fast backup
$ pgbackrest info

<snip>
         full backup: 20220713-175710F
             timestamp start/stop: 2022-07-13 17:57:10 / 2022-07-13 17:57:14
             wal start/stop: 000000010000000000000003 / 
000000010000000000000003
             database size: 23.2MB, database backup size: 23.2MB
             repo1: backup set size: 2.8MB, backup size: 2.8MB

         full backup: 20220713-175748F
             timestamp start/stop: 2022-07-13 17:57:48 / 2022-07-13 17:57:52
             wal start/stop: 000000010000000000000005 / 
000000010000000000000005
             database size: 23.2MB, database backup size: 23.2MB
             repo1: backup set size: 2.8MB, backup size: 2.8MB

Restore the first backup:

$ pg_ctlcluster 13 demo stop
$ pgbackrest --stanza=demo --delta --set=20220713-175710F 
--type=immediate --target-action=promote restore

Recovery settings:

$ cat /var/lib/postgresql/13/demo/postgresql.auto.conf

<snip>
restore_command = 'pgbackrest --stanza=demo archive-get %f "%p"'
recovery_target = 'immediate'
recovery_target_action = 'promote'

Starting PostgreSQL performs recovery as expected:

$ pg_ctlcluster 13 demo start
$ cat /var/log/postgresql/postgresql-13-demo.log

LOG:  database system was interrupted; last known up at 2022-07-13 
17:57:10 UTC
LOG:  starting point-in-time recovery to earliest consistent point
LOG:  restored log file "000000010000000000000003" from archive
LOG:  redo starts at 0/3000028
LOG:  consistent recovery state reached at 0/3000138
LOG:  recovery stopping after reaching consistency
LOG:  redo done at 0/3000138
LOG:  database system is ready to accept read only connections
LOG:  selected new timeline ID: 2
LOG:  archive recovery complete
LOG:  database system is ready to accept connections

Now restore the second backup (recovery settings are identical):

$ pg_ctlcluster 13 demo stop
$ pgbackrest --stanza=demo --delta --set=20220713-175748F 
--type=immediate --target-action=promote restore

Recovery now fails:

$ pg_ctlcluster 13 demo start
$ cat /var/log/postgresql/postgresql-13-demo.log

LOG:  database system was interrupted; last known up at 2022-07-13 
17:57:48 UTC
LOG:  restored log file "00000002.history" from archive
LOG:  starting point-in-time recovery to earliest consistent point
LOG:  restored log file "00000002.history" from archive
LOG:  restored log file "000000010000000000000005" from archive
FATAL:  requested timeline 2 is not a child of this server's history
DETAIL:  Latest checkpoint is at 0/5000060 on timeline 1, but in the 
history of the requested timeline, the server forked off from that 
timeline at 0/3000138.
LOG:  startup process (PID 511) exited with exit code 1
LOG:  aborting startup due to startup process failure
LOG:  database system is shut down

While it is certainly true that timeline 2 cannot be replayed to from 
timeline 1, it should not matter for an immediate recovery that stops at 
consistency. No timeline switch will occur until promotion. Of course 
the cluster could be shut down before promotion and the target changed, 
but in that case couldn't timeline be adjusted at that point?

This works by default for PostgreSQL < 12 because the default timeline 
is current. Since PostgreSQL 12 the default has been latest, which does 
not work by default.

When a user does a number of recoveries it is pretty common for the 
timelines to get in a state that will make most subsequent recoveries 
fail. We think it makes sense for recovery_target = 'immediate' to be a 
fail safe that always works no matter the state of the latest timeline.

Our solution has been to force recovery_target_timeline = 'current' when 
recovery_target = 'immediate', but it seems like this is something that 
should be done in PostgreSQL instead.

Thoughts?

Regards,
-David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: Add --{no-,}bypassrls flags to createuser
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: allow building trusted languages without the untrusted versions