Обсуждение: Fix crash during recovery when redo segment is missing

Поиск

Список

Период

Сортировка

Fix crash during recovery when redo segment is missing

От

Nitin Jadhav

Дата:

21 февраля 2025 г., 13:59:27

Hi,

In [1], Andres reported a bug where PostgreSQL crashes during recovery
if the segment containing the redo pointer does not exist. I have
attempted to address this issue and I am sharing a patch for the same.

The problem was that PostgreSQL did not PANIC when the redo LSN and
checkpoint LSN were in separate segments, and the file containing the
redo LSN was missing, leading to a crash. Andres has provided a
detailed analysis of the behavior across different settings and
versions. Please refer to [1] for more information. This issue arises
because PostgreSQL does not PANIC initially.

The issue was resolved by ensuring that the REDO location exists once
we successfully read the checkpoint record in InitWalRecovery(). This
prevents control from reaching PerformWalRecovery() unless the WAL
file containing the redo record exists. A new test script,
044_redo_segment_missing.pl, has been added to validate this. To
populate the WAL file with a redo record different from the WAL file
with the checkpoint record, I wait for the checkpoint start message
and then issue a pg_switch_wal(), which should occur before the
completion of the checkpoint. Then, I crash the server, and during the
restart, it should log an appropriate error indicating that it could
not find the redo location. Please let me know if there is a better
way to reproduce this behavior. I have tested and verified this with
the various scenarios Andres pointed out in [1]. Please note that this
patch does not address error checking in StartupXLOG(),
CreateCheckPoint(), etc., nor does it focus on cleaning up existing
code.

Attaching the patch. Please review and share your feedback. Thanks to
Andres for spotting the bug and providing the detailed report [1].

[1]: https://www.postgresql.org/message-id/20231023232145.cmqe73stvivsmlhs%40awork3.anarazel.de

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

Вложения

0001-Fix-crash-during-recovery-when-redo-segment-is-missi.patch

Re: Fix crash during recovery when redo segment is missing

От

Nitin Jadhav

Дата:

04 декабря 2025 г., 09:31:24

The patch wasn’t applying cleanly on master, so I’ve rebased it and
also added it to the PG19‑4 CommitFest:
https://commitfest.postgresql.org/patch/6279/
Please review and share your feedback.

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft


On Fri, Feb 21, 2025 at 4:29 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> Hi,
>
> In [1], Andres reported a bug where PostgreSQL crashes during recovery
> if the segment containing the redo pointer does not exist. I have
> attempted to address this issue and I am sharing a patch for the same.
>
> The problem was that PostgreSQL did not PANIC when the redo LSN and
> checkpoint LSN were in separate segments, and the file containing the
> redo LSN was missing, leading to a crash. Andres has provided a
> detailed analysis of the behavior across different settings and
> versions. Please refer to [1] for more information. This issue arises
> because PostgreSQL does not PANIC initially.
>
> The issue was resolved by ensuring that the REDO location exists once
> we successfully read the checkpoint record in InitWalRecovery(). This
> prevents control from reaching PerformWalRecovery() unless the WAL
> file containing the redo record exists. A new test script,
> 044_redo_segment_missing.pl, has been added to validate this. To
> populate the WAL file with a redo record different from the WAL file
> with the checkpoint record, I wait for the checkpoint start message
> and then issue a pg_switch_wal(), which should occur before the
> completion of the checkpoint. Then, I crash the server, and during the
> restart, it should log an appropriate error indicating that it could
> not find the redo location. Please let me know if there is a better
> way to reproduce this behavior. I have tested and verified this with
> the various scenarios Andres pointed out in [1]. Please note that this
> patch does not address error checking in StartupXLOG(),
> CreateCheckPoint(), etc., nor does it focus on cleaning up existing
> code.
>
> Attaching the patch. Please review and share your feedback. Thanks to
> Andres for spotting the bug and providing the detailed report [1].
>
> [1]: https://www.postgresql.org/message-id/20231023232145.cmqe73stvivsmlhs%40awork3.anarazel.de
>
> Best Regards,
> Nitin Jadhav
> Azure Database for PostgreSQL
> Microsoft

Re: Fix crash during recovery when redo segment is missing

От

Nitin Jadhav

Дата:

04 декабря 2025 г., 09:36:30

Apologies, I missed attaching the patch earlier. Please find the v2
version attached.

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

On Thu, Dec 4, 2025 at 12:01 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> The patch wasn’t applying cleanly on master, so I’ve rebased it and
> also added it to the PG19‑4 CommitFest:
> https://commitfest.postgresql.org/patch/6279/
> Please review and share your feedback.
>
> Best Regards,
> Nitin Jadhav
> Azure Database for PostgreSQL
> Microsoft
>
> Best Regards,
> Nitin Jadhav
> Azure Database for PostgreSQL
> Microsoft
>
>
> On Fri, Feb 21, 2025 at 4:29 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > In [1], Andres reported a bug where PostgreSQL crashes during recovery
> > if the segment containing the redo pointer does not exist. I have
> > attempted to address this issue and I am sharing a patch for the same.
> >
> > The problem was that PostgreSQL did not PANIC when the redo LSN and
> > checkpoint LSN were in separate segments, and the file containing the
> > redo LSN was missing, leading to a crash. Andres has provided a
> > detailed analysis of the behavior across different settings and
> > versions. Please refer to [1] for more information. This issue arises
> > because PostgreSQL does not PANIC initially.
> >
> > The issue was resolved by ensuring that the REDO location exists once
> > we successfully read the checkpoint record in InitWalRecovery(). This
> > prevents control from reaching PerformWalRecovery() unless the WAL
> > file containing the redo record exists. A new test script,
> > 044_redo_segment_missing.pl, has been added to validate this. To
> > populate the WAL file with a redo record different from the WAL file
> > with the checkpoint record, I wait for the checkpoint start message
> > and then issue a pg_switch_wal(), which should occur before the
> > completion of the checkpoint. Then, I crash the server, and during the
> > restart, it should log an appropriate error indicating that it could
> > not find the redo location. Please let me know if there is a better
> > way to reproduce this behavior. I have tested and verified this with
> > the various scenarios Andres pointed out in [1]. Please note that this
> > patch does not address error checking in StartupXLOG(),
> > CreateCheckPoint(), etc., nor does it focus on cleaning up existing
> > code.
> >
> > Attaching the patch. Please review and share your feedback. Thanks to
> > Andres for spotting the bug and providing the detailed report [1].
> >
> > [1]: https://www.postgresql.org/message-id/20231023232145.cmqe73stvivsmlhs%40awork3.anarazel.de
> >
> > Best Regards,
> > Nitin Jadhav
> > Azure Database for PostgreSQL
> > Microsoft

Вложения

v2-0001-Fix-crash-during-recovery-when-redo-segment-is-missi.patch

Re: Fix crash during recovery when redo segment is missing

От

Kirill Reshke

Дата:

04 декабря 2025 г., 10:00:23

On Thu, 4 Dec 2025 at 11:37, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
>
> Apologies, I missed attaching the patch earlier. Please find the v2
> version attached.
>
> Best Regards,
> Nitin Jadhav
> Azure Database for PostgreSQL
> Microsoft
>
> On Thu, Dec 4, 2025 at 12:01 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> >
> > The patch wasn’t applying cleanly on master, so I’ve rebased it and
> > also added it to the PG19‑4 CommitFest:
> > https://commitfest.postgresql.org/patch/6279/
> > Please review and share your feedback.
> >
> > Best Regards,
> > Nitin Jadhav
> > Azure Database for PostgreSQL
> > Microsoft
> >
> > Best Regards,
> > Nitin Jadhav
> > Azure Database for PostgreSQL
> > Microsoft
> >
> >
> > On Fri, Feb 21, 2025 at 4:29 PM Nitin Jadhav
> > <nitinjadhavpostgres@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > In [1], Andres reported a bug where PostgreSQL crashes during recovery
> > > if the segment containing the redo pointer does not exist. I have
> > > attempted to address this issue and I am sharing a patch for the same.
> > >
> > > The problem was that PostgreSQL did not PANIC when the redo LSN and
> > > checkpoint LSN were in separate segments, and the file containing the
> > > redo LSN was missing, leading to a crash. Andres has provided a
> > > detailed analysis of the behavior across different settings and
> > > versions. Please refer to [1] for more information. This issue arises
> > > because PostgreSQL does not PANIC initially.
> > >
> > > The issue was resolved by ensuring that the REDO location exists once
> > > we successfully read the checkpoint record in InitWalRecovery(). This
> > > prevents control from reaching PerformWalRecovery() unless the WAL
> > > file containing the redo record exists. A new test script,
> > > 044_redo_segment_missing.pl, has been added to validate this. To
> > > populate the WAL file with a redo record different from the WAL file
> > > with the checkpoint record, I wait for the checkpoint start message
> > > and then issue a pg_switch_wal(), which should occur before the
> > > completion of the checkpoint. Then, I crash the server, and during the
> > > restart, it should log an appropriate error indicating that it could
> > > not find the redo location. Please let me know if there is a better
> > > way to reproduce this behavior. I have tested and verified this with
> > > the various scenarios Andres pointed out in [1]. Please note that this
> > > patch does not address error checking in StartupXLOG(),
> > > CreateCheckPoint(), etc., nor does it focus on cleaning up existing
> > > code.
> > >
> > > Attaching the patch. Please review and share your feedback. Thanks to
> > > Andres for spotting the bug and providing the detailed report [1].
> > >
> > > [1]: https://www.postgresql.org/message-id/20231023232145.cmqe73stvivsmlhs%40awork3.anarazel.de
> > >
> > > Best Regards,
> > > Nitin Jadhav
> > > Azure Database for PostgreSQL
> > > Microsoft

Hi!

1) Please do not top-post on these lists

2) I did not get the exact reason this thread is different from [0]

3)

Your tests are using a busy loop with usleep which nowadays is
considered as bad practice.  There 3 of such places, and I think the
first two of them
can be replaced with injection point wait.

> +# Wait for the checkpoint to complete
> +my $checkpoint_complete = 0;
> +foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default) {
> + if ($node->log_contains("checkpoint complete", $logstart)) {
> + $checkpoint_complete = 1;
> + last;
> + }
> + usleep(100_000);
> +}

4) There are comments from Robert, which are not covered [1].

[0] https://www.postgresql.org/message-id/20231023232145.cmqe73stvivsmlhs%40awork3.anarazel.de
[1] https://www.postgresql.org/message-id/CA%2BTgmob1x7HMcWAb%3D6ep2cBuWuwpT-p9E7EmQegWFu6E8nKHeg%40mail.gmail.com


--
Best regards,
Kirill Reshke

Re: Fix crash during recovery when redo segment is missing

От

Michael Paquier

Дата:

04 декабря 2025 г., 11:48:59

On Thu, Dec 04, 2025 at 12:00:23PM +0500, Kirill Reshke wrote:
> Your tests are using a busy loop with usleep which nowadays is
> considered as bad practice.  There 3 of such places, and I think the
> first two of them
> can be replaced with injection point wait.

(This fell off my radar, apologies about that.)

The problem with this test is not related to its use of sleeps, which
is perfectly fine to check the start or end timings of a checkpoint
depending on the contents of the logs.  It has two issues.

One first issue is that the test is unnecessary long, taking more than
30s to finish because it relies on checkpoint_timeout to kick a
checkpoint.  This could use a background psql object to kick a
checkpoint to accelerate the whole.  So the test is sitting idle for a
long time, doing nothing.

Your intuition about injection points is right, though, but it points
to a second problem: the test is not reliable because we could finish
the checkpoint *before* the WAL segment is switched, and we expect the
WAL segment switch to happen while the checkpoint is processing.  If
you want to make that deterministic, having a wait in the middle of
checkpointing would make the test actually test what it should.  In
this case the test would randomly die on its "redo record wal file is
the same" message.  That's OK to prove the point of the initial patch,
but it's not acceptable for a test that could be added to the tree.

An update of src/test/recovery/meson.build to add the new test is
also required.

Anyway, I think that the patch is overdoing it in issuing a PANIC in
this case, going backwards with the other thread from [0]: a FATAL
would be perfectly OK, like in the backup_label path because your
manipulations of WAL segment missing are indeed possible, as the test
posted is proving.  And there have been many arguments in the past
about performing recovery without a backup_label, as well.  And that
would make the test something that we could use, because no backtrace
on buildfarm hosts or disk bloat.

If we do not re-check in InitWalRecovery() that the redo record is
around, the startup process happily goes down to PerformWalRecovery()
in an everything-is-fine mode, initializing a bunch of states, to then
crash due to a pointer dereference while attempting to read the record
from the redo LSN, which does not exist.  This is not right.  So,
oops.  I would see an argument good enough for a backpatch when it
comes to crash recovery, because we actually don't know what has
happened in this case.  Or not based on the lack of complaints on the
matter over the years.

So I'd argue about updating the patch among these lines, instead:
- Switch the PANIC to a FATAL, to inform about the pilot error.  This
addresses the pointer dereference with WAL replay going crazy for the
redo LSN.
- Add a test, with an injection point after checkpoint startup, based
on a manual checkpoint done in a backgroud psql, without a
checkpoint_timeout to make the test faster.
- We could be careful first based on the lack of complaints, doing
that extra check only on HEAD, for v19.

> 4) There are comments from Robert, which are not covered [1].
>
> [0] https://www.postgresql.org/message-id/20231023232145.cmqe73stvivsmlhs%40awork3.anarazel.de
> [1] https://www.postgresql.org/message-id/CA%2BTgmob1x7HMcWAb%3D6ep2cBuWuwpT-p9E7EmQegWFu6E8nKHeg%40mail.gmail.com

I get the argument about the spaghetti code in general, however the
code in question here deals with the WAL replay initialization, for a
backup_label vs a non-backup_label path.  Perhaps I'm more used to
this area than others, but it's not that much pasta to me.

I don't see a point in moving the memcpy() and the wasShutdown parts
as proposed in the patch, by the way.  The PANIC would block things
in its else block.  Let's limit outselves to the extra ReadRecord()
check for the redo record when we find a checkpoint record.

I'd actually wonder if we should not lower the existing PANIC to a
FATAL if ReadCheckpointRecord() fails, as well.  The result would be
the same for operators, without the need to deal with backtrace
cleanups after a crash.  And we still have the error message that
tells us what's going on.

Any thoughts or opinions from others?
--
Michael

Вложения

signature.asc

Re: Fix crash during recovery when redo segment is missing

От

Nitin Jadhav

Дата:

08 декабря 2025 г., 07:38:00

Hi,

Thanks for reviewing and sharing your feedback.

> 1) Please do not top-post on these lists
Apologies for that. I will make sure to follow this going forward.

> 2) I did not get the exact reason this thread is different from [0]
The purpose of this thread was to share the actual patch addressing
the crash scenario discussed in [0]. That said, I agree I should have
posted the patch in [0] itself. I will make sure to use existing
threads wherever possible going forward.

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

Re: Fix crash during recovery when redo segment is missing

От

Nitin Jadhav

Дата:

08 декабря 2025 г., 10:18:52

Thank you for the detailed feedback.

> > Your tests are using a busy loop with usleep which nowadays is
> > considered as bad practice.  There 3 of such places, and I think the
> > first two of them
> > can be replaced with injection point wait.
>
> (This fell off my radar, apologies about that.)
>
> The problem with this test is not related to its use of sleeps, which
> is perfectly fine to check the start or end timings of a checkpoint
> depending on the contents of the logs.  It has two issues.
>
> One first issue is that the test is unnecessary long, taking more than
> 30s to finish because it relies on checkpoint_timeout to kick a
> checkpoint.  This could use a background psql object to kick a
> checkpoint to accelerate the whole.  So the test is sitting idle for a
> long time, doing nothing.
>
> Your intuition about injection points is right, though, but it points
> to a second problem: the test is not reliable because we could finish
> the checkpoint *before* the WAL segment is switched, and we expect the
> WAL segment switch to happen while the checkpoint is processing.  If
> you want to make that deterministic, having a wait in the middle of
> checkpointing would make the test actually test what it should.  In
> this case the test would randomly die on its "redo record wal file is
> the same" message.  That's OK to prove the point of the initial patch,
> but it's not acceptable for a test that could be added to the tree.
>
> An update of src/test/recovery/meson.build to add the new test is
> also required.

I will work on improving the test accordingly and include the changes
in the next version.

> Anyway, I think that the patch is overdoing it in issuing a PANIC in
> this case, going backwards with the other thread from [0]: a FATAL
> would be perfectly OK, like in the backup_label path because your
> manipulations of WAL segment missing are indeed possible, as the test
> posted is proving.  And there have been many arguments in the past
> about performing recovery without a backup_label, as well.  And that
> would make the test something that we could use, because no backtrace
> on buildfarm hosts or disk bloat.

The main reason I chose PANIC is that when the ReadCheckpointRecord()
fails, we already use PANIC for the error message ‘could not locate a
valid checkpoint record…’ in the no_backup_label case, whereas the
similar flow with backup_label uses FATAL. I am not entirely sure why
this difference exists. I looked into it but couldn’t find much. If we
decide to change this patch to use FATAL for ‘could not find redo
location…’, should we also consider changing the existing PANIC to
FATAL for consistency?

> > 4) There are comments from Robert, which are not covered [1].
> >
> > [0] https://www.postgresql.org/message-id/20231023232145.cmqe73stvivsmlhs%40awork3.anarazel.de
> > [1] https://www.postgresql.org/message-id/CA%2BTgmob1x7HMcWAb%3D6ep2cBuWuwpT-p9E7EmQegWFu6E8nKHeg%40mail.gmail.com
>
> I get the argument about the spaghetti code in general, however the
> code in question here deals with the WAL replay initialization, for a
> backup_label vs a non-backup_label path.  Perhaps I'm more used to
> this area than others, but it's not that much pasta to me.

Yes. As noted in my initial email, this patch is focused solely on
fixing the crash issue. It does not address error handling in
StartupXLOG(), CreateCheckPoint(), or involve any broader code
cleanup.

> I don't see a point in moving the memcpy() and the wasShutdown parts
> as proposed in the patch, by the way.  The PANIC would block things
> in its else block.  Let's limit outselves to the extra ReadRecord()
> check for the redo record when we find a checkpoint record.

I agree and will take care in the next patch.

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

Re: Fix crash during recovery when redo segment is missing

От

Michael Paquier

Дата:

08 декабря 2025 г., 11:23:08

On Mon, Dec 08, 2025 at 12:48:52PM +0530, Nitin Jadhav wrote:
>> An update of src/test/recovery/meson.build to add the new test is
>> also required.
>
> I will work on improving the test accordingly and include the changes
> in the next version.

Cool, thanks.

> The main reason I chose PANIC is that when the ReadCheckpointRecord()
> fails, we already use PANIC for the error message ‘could not locate a
> valid checkpoint record…’ in the no_backup_label case, whereas the
> similar flow with backup_label uses FATAL. I am not entirely sure why
> this difference exists. I looked into it but couldn’t find much. If we
> decide to change this patch to use FATAL for ‘could not find redo
> location…’, should we also consider changing the existing PANIC to
> FATAL for consistency?

Using PANIC is an inherited historical artifact that has been
introduced around 4d14fe0048cf with the introduction of WAL.  There
was nothing like archiving or even base backup back then.  Switching
the existing surrounding one to also use a FATAL is something that
seems worth considering to me for the checkpoint record, at least
based on the pattern that there could be a driver error even if there
is no backup_label file (aka for example the case of FS-levelsnapshots
with one partition used for the data folder, no?).

This offers bonus points in the shape of more tests like the one you
have sent upthread.  It's not something that I would backpatch as it
is a behavior change, but I'm open to seeing that as an improvement in
usability for future releases: PANIC is for cases that should never
happen for internal states, due to an internal logic error, or an OS
going crazy.  Here we have a should-no-happen case triggered by a
user, and a FATAL still provides the same information about what's
wrong.  Let's make such changes separate patches, of course, depending
on what we find on the way.

> Yes. As noted in my initial email, this patch is focused solely on
> fixing the crash issue. It does not address error handling in
> StartupXLOG(), CreateCheckPoint(), or involve any broader code
> cleanup.

That sounds fine to me.
--
Michael

Вложения

signature.asc

Re: Fix crash during recovery when redo segment is missing

От

Nitin Jadhav

Дата:

09 декабря 2025 г., 07:00:41

> Using PANIC is an inherited historical artifact that has been
> introduced around 4d14fe0048cf with the introduction of WAL.  There
> was nothing like archiving or even base backup back then.  Switching
> the existing surrounding one to also use a FATAL is something that
> seems worth considering to me for the checkpoint record, at least
> based on the pattern that there could be a driver error even if there
> is no backup_label file (aka for example the case of FS-levelsnapshots
> with one partition used for the data folder, no?).

Thanks for explaining the historical context. I agree that switching
the existing PANIC to FATAL for the checkpoint record case makes
sense. I will include this change in the next patch if there are no
objections from others.

> This offers bonus points in the shape of more tests like the one you
> have sent upthread.  It's not something that I would backpatch as it
> is a behavior change, but I'm open to seeing that as an improvement in
> usability for future releases: PANIC is for cases that should never
> happen for internal states, due to an internal logic error, or an OS
> going crazy.  Here we have a should-no-happen case triggered by a
> user, and a FATAL still provides the same information about what's
> wrong.  Let's make such changes separate patches, of course, depending
> on what we find on the way.

Thanks for the suggestion. I will keep that in mind and look to add
more such tests in future.

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

Re: Fix crash during recovery when redo segment is missing

От

Nitin Jadhav

Дата:

10 декабря 2025 г., 13:31:28

I have incorporated all the feedback discussed above and attached the v3 patch.
Please take a look and let me know if you have any additional feedback.

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

Вложения

v3-0001-Fix-crash-during-recovery-when-redo-segment-is-missi.patch

Re: Fix crash during recovery when redo segment is missing

От

Michael Paquier

Дата:

15 декабря 2025 г., 11:48:29

On Wed, Dec 10, 2025 at 04:01:28PM +0530, Nitin Jadhav wrote:
> I have incorporated all the feedback discussed above and attached the v3 patch.
> Please take a look and let me know if you have any additional feedback.

I have been looking at this patch, and as far as I can see this is an
old problem, with consequences that vary depending on the branches
dealt with (looked at the code and tested down to 9.2, did not go
further down).

HEAD is actually not a bad student: if the REDO record is missing, we
crash on a pointer dereference.  Now, older branches have a different
story to offer, leading to a worse behavior.  For example on v14, our
oldest stable branch still supported: if the REDO record is missing
the startup process thinks that no REDO is required at all because
there is no record, and skips everything until the end of recovery.  I
have been lacking time today to look at the state of the branches
between v15 and v17, but I am pretty sure that we are fighting between
the pointer dereference and the v14 behavior for these as well.  v18
should be the pointer issue.

FWIW, this new ReadRecord() call has given me a pause for a few hours,
where I was wondering what is the effect of fetching_ckpt = false when
recovery or standby mode is requested.  At the end I think that we are
OK: if there is a restore_command and we don't have a backup_label, we
would be able to fetch the redo record from an archive, finishing
recovery even in this case where the backup_label is missing.  For
example, allowing recovery for the node in the test (recovery.signal +
restore_command) while copying the missing segment to archive
completes recovery, with consistency reached at the end of the
checkpoint and all the records between the redo point and the end of
checkpoint record, then a TLI jump happens.

Another set of things that were wrong in the patch, found during
review:
- The TAP test with injection points was failing for two memory
allocations attempted as the point is added in a critical section,
after the REDO record is generated.  There is one allocation for the
point loaded, due to the library path.  There was a second one due to
the wait machinery in the library injection_points for the DSM
registry and the shmem state we need.  Both of these problems can be
avoided with two techniques by using two points based on a wait: one
wait happens before the critical section, and is used to initialize
the shmem state.  The second is loaded outside the critical section,
run inside the critical section inside from the cache.
- The patch was actually wrong: we need to rely on the redo LSN as
found in the checkpoint record obtained a bit earlier.  There was at
least one regression due to that: recovery test 030 for the standby
node.

As a whole, this has to be backpatched, because we are just ignoring
recovery if the REDO record is simply missing while the checkpoint
record is found.  For the back branches, the PANIC is actually what
I'm planning to go with, to match with what is happening when the
checkpoint record is missing.  On HEAD, let's use a softer FATAL to
give a way to test this driver error moving forward.  There is a
secondary argument for softening the PANIC when the checkpoint record
is missing to a FATAL, but let's discuss that separately.  Hence that
would make two patches:
- Something simpler than the attached, without the test with a PANIC
for the redo record missing, for all the branches.
- A second patch lowering this PANIC to a FATAL, with the test
included, only for HEAD.

I have done a couple of tests in the CI and locally, and that was
looking stable.  Attached is the result of what would happen on HEAD,
where the change in xlogrecovery.c would include the back-branch
versions.

Thoughts or comments are welcome.
--
Michael

Вложения

Re: Fix crash during recovery when redo segment is missing

От

Michael Paquier

Дата:

16 декабря 2025 г., 09:04:56

On Mon, Dec 15, 2025 at 05:48:29PM +0900, Michael Paquier wrote:
> I have done a couple of tests in the CI and locally, and that was
> looking stable.  Attached is the result of what would happen on HEAD,
> where the change in xlogrecovery.c would include the back-branch
> versions.
>
> Thoughts or comments are welcome.

And done all of that, with the test added to HEAD and a backpatch down
to v14 for the main fix.
--
Michael

Вложения

signature.asc

Re: Fix crash during recovery when redo segment is missing

От

Nitin Jadhav

Дата:

16 декабря 2025 г., 10:10:25

> As a whole, this has to be backpatched, because we are just ignoring
> recovery if the REDO record is simply missing while the checkpoint
> record is found.  For the back branches, the PANIC is actually what
> I'm planning to go with, to match with what is happening when the
> checkpoint record is missing.  On HEAD, let's use a softer FATAL to
> give a way to test this driver error moving forward.  There is a
> secondary argument for softening the PANIC when the checkpoint record
> is missing to a FATAL, but let's discuss that separately.  Hence that
> would make two patches:
> - Something simpler than the attached, without the test with a PANIC
> for the redo record missing, for all the branches.
> - A second patch lowering this PANIC to a FATAL, with the test
> included, only for HEAD.
>
> And done all of that, with the test added to HEAD and a backpatch down
> to v14 for the main fix.

Thanks for fixing the test code. The plan for what goes into the back
branches and what goes into HEAD makes sense. Thanks for committing
it.

> There is a
> secondary argument for softening the PANIC when the checkpoint record
> is missing to a FATAL, but let's discuss that separately.

I was planning to start a separate thread for this point, but since it
was a small change I had included it here earlier. I understand the
considerations involved even for these minor adjustments. I will start
a separate thread for this.

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

On Tue, Dec 16, 2025 at 11:35 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Dec 15, 2025 at 05:48:29PM +0900, Michael Paquier wrote:
> > I have done a couple of tests in the CI and locally, and that was
> > looking stable.  Attached is the result of what would happen on HEAD,
> > where the change in xlogrecovery.c would include the back-branch
> > versions.
> >
> > Thoughts or comments are welcome.
>
> And done all of that, with the test added to HEAD and a backpatch down
> to v14 for the main fix.
> --
> Michael

Re: Fix crash during recovery when redo segment is missing

От

Michael Paquier

Дата:

16 декабря 2025 г., 10:46:56

On Tue, Dec 16, 2025 at 12:40:25PM +0530, Nitin Jadhav wrote:
> I was planning to start a separate thread for this point, but since it
> was a small change I had included it here earlier. I understand the
> considerations involved even for these minor adjustments. I will start
> a separate thread for this.

An argument that would sound in favor of a switch from PANIC to FATAL
is the testing side: if one removes the segment where the checkpoint
record resides, we crash.  Of course, one should not do that, but I
have been wondering for years if it would not be a good thing idea to
lift that a bit and expand the in-core tests and how we expect the
startup process to deal with things.  One of my line of thoughts is
that the PANIC behavior is inherited from a time where we did not have
online backups and archive recovery, where such manipulations have
never been possible to start with because WAL segments had a full life
only linked to the backend in pg_wal.  Perhaps others don't agree with
that, that's fine.

It would be easy enough to expand the test added by 15f68cebdcec to
check the no-checkpoint case, of course.  I just did that this morning
while quickly testing various recovery patterns, which was easier than
rewriting a new script for the job.  :)
--
Michael

Вложения

signature.asc

Re: Fix crash during recovery when redo segment is missing

От

Nitin Jadhav

Дата:

16 декабря 2025 г., 13:01:58

> An argument that would sound in favor of a switch from PANIC to FATAL
> is the testing side: if one removes the segment where the checkpoint
> record resides, we crash.  Of course, one should not do that, but I
> have been wondering for years if it would not be a good thing idea to
> lift that a bit and expand the in-core tests and how we expect the
> startup process to deal with things.  One of my line of thoughts is
> that the PANIC behavior is inherited from a time where we did not have
> online backups and archive recovery, where such manipulations have
> never been possible to start with because WAL segments had a full life
> only linked to the backend in pg_wal.  Perhaps others don't agree with
> that, that's fine.

Makes sense. I agree that with modern features like backups, archiving
and external WAL handling, it’s common for WAL segments to go missing
due to operational scenarios, and these cases are often recoverable.
So switching to FATAL seems appropriate.

I would prefer to discuss this in a separate thread with a more
accurate subject line so we can get more opinions. Please let me know
if it is ok or you would rather continue the discussion here.

> It would be easy enough to expand the test added by 15f68cebdcec to
> check the no-checkpoint case, of course.  I just did that this morning
> while quickly testing various recovery patterns, which was easier than
> rewriting a new script for the job.  :)

Thanks for this. I also tested it by adding a TAP test. I initially
planned to share it, but decided instead to start a new thread first,
gather more opinions, and then share a patch if we agree on making the
change.


Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

Re: Fix crash during recovery when redo segment is missing

От

Michael Paquier

Дата:

16 декабря 2025 г., 13:35:36

On Tue, Dec 16, 2025 at 03:31:58PM +0530, Nitin Jadhav wrote:
> I would prefer to discuss this in a separate thread with a more
> accurate subject line so we can get more opinions. Please let me know
> if it is ok or you would rather continue the discussion here.

Yes, let's do that.
--
Michael

Вложения

signature.asc

Re: Fix crash during recovery when redo segment is missing

От

Kirill Reshke

Дата:

22 января, 00:07:51

On Tue, 16 Dec 2025 at 15:35, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Dec 16, 2025 at 03:31:58PM +0530, Nitin Jadhav wrote:
> > I would prefer to discuss this in a separate thread with a more
> > accurate subject line so we can get more opinions. Please let me know
> > if it is ok or you would rather continue the discussion here.
>
> Yes, let's do that.
> --
> Michael


Hi all

I changed [0] to committed

thx

[0] https://commitfest.postgresql.org/patch/6279/


-- 
Best regards,
Kirill Reshke

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Fix crash during recovery when redo segment is missing

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения