Обсуждение: BUG #19432: recovery fails at invalid checkpoint record

Поиск
Список
Период
Сортировка

BUG #19432: recovery fails at invalid checkpoint record

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      19432
Logged by:          Felix Hamme
Email address:      felix.hamme@ionos.com
PostgreSQL version: 17.9
Operating system:   Debian 13
Description:

Hi, I'm trying to restore from a pg_basebackup at timeline 1 to a
restore_target_time in timeline 2.
It fails at "invalid checkpoint record", "could not locate required
checkpoint record at 0/3000080".
All relevant wal files are in the archive, the restore_command works and
backup_label, pg_controldata, pg_waldump and 00000002.history look like
everything should work.
Recovering only timeline 1 works, but it fails as soon as it should proceed
in timeline 2.
A 6.7MB tar of the basebackup and the wal archive is available at
https://get.hidrive.com/i/PwMejRQG . This link expires on 2026-03-13, I can
provide a new link if needed.
Why does this recovery fail?





Re: BUG #19432: recovery fails at invalid checkpoint record

От
Laurenz Albe
Дата:
On Thu, 2026-03-12 at 16:20 +0000, PG Bug reporting form wrote:
> Hi, I'm trying to restore from a pg_basebackup at timeline 1 to a
> restore_target_time in timeline 2.
> It fails at "invalid checkpoint record", "could not locate required
> checkpoint record at 0/3000080".
> All relevant wal files are in the archive, the restore_command works and
> backup_label, pg_controldata, pg_waldump and 00000002.history look like
> everything should work.
> Recovering only timeline 1 works, but it fails as soon as it should proceed
> in timeline 2.
> A 6.7MB tar of the basebackup and the wal archive is available at
> https://get.hidrive.com/i/PwMejRQG . This link expires on 2026-03-13, I can
> provide a new link if needed.
> Why does this recovery fail?

Funny.  I unpacked your data directory and reduced your postgresql.auto.conf
to something that fits my system:

log_min_messages = 'DEBUG5'
restore_command = 'cp /home/laurenz/hamme/fakearchive/%f %p'
recovery_target_time = '2026-03-11 14:51:28 UTC'
recovery_target_action = 'promote'
hot_standby_feedback = 'on'
log_destination = 'csvlog'
log_directory = '/home/laurenz/hamme/log'
logging_collector = 'on'
wal_level = 'logical'
port = 5433
unix_socket_directories = '/home/laurenz/hamme'
max_connections = 300

Recovery worked like a charm.  pg_waldump shows the checkpoint record in
000000010000000000000003 at the correct position.

Not sure what you did wrong.

Yours,
Laurenz Albe



Re: BUG #19432: recovery fails at invalid checkpoint record

От
Felix Hamme
Дата:
Thank you for checking, now I found what I did wrong: "mv" doesn't
work as a restore_command because the same .history file is restored
multiple times during recovery.
I successfully recovered using a restore_command which does a cp for
history files and mv for wal files. It logged this:

cp /DBDATA/test/fakearchive/00000002.history pg_wal/RECOVERYHISTORY success
cp /DBDATA/test/fakearchive/00000003.history pg_wal/RECOVERYHISTORY not found
cp /DBDATA/test/fakearchive/00000002.history pg_wal/RECOVERYHISTORY success
mv /DBDATA/test/fakearchive/000000010000000000000003 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000010000000000000004 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000020000000000000005 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000020000000000000006 pg_wal/RECOVERYXLOG success
mv /DBDATA/test/fakearchive/000000020000000000000007 pg_wal/RECOVERYXLOG success
cp /DBDATA/test/fakearchive/00000003.history pg_wal/RECOVERYHISTORY not found
cp /DBDATA/test/fakearchive/00000002.history pg_wal/RECOVERYHISTORY success

Is it safe in general to use mv for wal files? In other words, do the
currently supported postgres versions run restore_command only once
per wal file?

Best regards
Felix Hamme


On Thu, Mar 12, 2026 at 8:29 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Thu, 2026-03-12 at 16:20 +0000, PG Bug reporting form wrote:
> > Hi, I'm trying to restore from a pg_basebackup at timeline 1 to a
> > restore_target_time in timeline 2.
> > It fails at "invalid checkpoint record", "could not locate required
> > checkpoint record at 0/3000080".
> > All relevant wal files are in the archive, the restore_command works and
> > backup_label, pg_controldata, pg_waldump and 00000002.history look like
> > everything should work.
> > Recovering only timeline 1 works, but it fails as soon as it should proceed
> > in timeline 2.
> > A 6.7MB tar of the basebackup and the wal archive is available at
> > https://get.hidrive.com/i/PwMejRQG . This link expires on 2026-03-13, I can
> > provide a new link if needed.
> > Why does this recovery fail?
>
> Funny.  I unpacked your data directory and reduced your postgresql.auto.conf
> to something that fits my system:
>
> log_min_messages = 'DEBUG5'
> restore_command = 'cp /home/laurenz/hamme/fakearchive/%f %p'
> recovery_target_time = '2026-03-11 14:51:28 UTC'
> recovery_target_action = 'promote'
> hot_standby_feedback = 'on'
> log_destination = 'csvlog'
> log_directory = '/home/laurenz/hamme/log'
> logging_collector = 'on'
> wal_level = 'logical'
> port = 5433
> unix_socket_directories = '/home/laurenz/hamme'
> max_connections = 300
>
> Recovery worked like a charm.  pg_waldump shows the checkpoint record in
> 000000010000000000000003 at the correct position.
>
> Not sure what you did wrong.
>
> Yours,
> Laurenz Albe



Re: BUG #19432: recovery fails at invalid checkpoint record

От
Laurenz Albe
Дата:
On Fri, 2026-03-13 at 09:35 +0100, Felix Hamme wrote:
> Is it safe in general to use mv for wal files? In other words, do the
> currently supported postgres versions run restore_command only once
> per wal file?

As you found out, no...

Yours,
Laurenz Albe



Re: BUG #19432: recovery fails at invalid checkpoint record

От
Felix Hamme
Дата:
Timeline history files can be needed multiple times, ok. My question
was about WAL files only.
I'm tempted to use a restore_command which does cp for history files
and mv for WAL files, to optimize performance and disk usage.
An AI told me that a second restore attempt for the same WAL file
could only happen if recovery is resumed after a crash.

Kind regards
Felix Hamme


On Fri, Mar 13, 2026 at 5:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Fri, 2026-03-13 at 09:35 +0100, Felix Hamme wrote:
> > Is it safe in general to use mv for wal files? In other words, do the
> > currently supported postgres versions run restore_command only once
> > per wal file?
>
> As you found out, no...
>
> Yours,
> Laurenz Albe



Re: BUG #19432: recovery fails at invalid checkpoint record

От
Laurenz Albe
Дата:
On Mon, 2026-03-16 at 14:56 +0100, Felix Hamme wrote:
> I'm tempted to use a restore_command which does cp for history files
> and mv for WAL files, to optimize performance and disk usage.
> An AI told me that a second restore attempt for the same WAL file
> could only happen if recovery is resumed after a crash.

Don't do that.  Make the restore_command idempotent.
Trying to optimize for storage space often causes problems elsewhere.

Yours,
Laurenz Albe