Обсуждение: Should the archiver process always make sure that the timeline history files exist in the archive?

Поиск
Список
Период
Сортировка

Should the archiver process always make sure that the timeline history files exist in the archive?

От
Jimmy Yih
Дата:
Hello pgsql-hackers,

While testing out some WAL archiving and PITR scenarios, it was observed that
enabling WAL archiving for the first time on a primary that was on a timeline
higher than 1 would not initially archive the timeline history file for the
timeline it was currently on. While this might be okay for most use cases, there
are scenarios where this leads to unexpected failures that seem to expose some
flaws in the logic.

Scenario 1:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Create a
standby with that backup that will be continuously restoring from the WAL
archives, the standby will not contain the timeline 2 history file. The standby
will operate normally but if you try to create a cascading standby off it using
streaming replication, the cascade standby's WAL receiver will continuously
FATAL trying to request the timeline 2 history file that the main standby does
not have.

Scenario 2:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Then try
to create a new node by doing PITR with recovery_target_timeline set to
'current' or 'latest' which will succeed. However, doing PITR with
recovery_target_timeline = '2' will fail since it is unable to find the timeline
2 history file in the WAL archives. This may be a bit contradicting since we
allow 'current' and 'latest' to recover but explicitly setting the
recovery_target_timeline to the control file's timeline id ends up with failure.

Attached is a patch containing two TAP tests that demonstrate the scenarios.

My questions are:
1. Why doesn't the archiver process try to archive timeline history files when
   WAL archiving is first configured and/or continually check (maybe when the
   archiver process gets started before the main loop)?
2. Why does explicitly setting the recovery_target_timeline to the control
   file's timeline id not follow the same logic as recovery_target_timeline set
   to 'current'?
3. Why does a cascaded standby require the timeline history file of its control
   file's timeline id (startTLI) when the main replica is able to operate fine
   without the timeline history file?

Note that my initial observations came from testing with pgBackRest (copying
pg_wal/ during backup is disabled by default) but using `pg_basebackup -Xnone`
reproduced the issues similarly and is what I present in the TAP tests. At the
moment, the only workaround I can think of is to manually run the
archive_command on the missing timeline history file(s).

Are these valid issues that should be looked into or are they expected? Scenario
2 seems like it could be easily fixed if we determine that the
recovery_target_timeline numeric value is equal to the control file's timeline
id (compare rtli and recoveryTargetTLI in validateRecoveryParameters()?) but I
wasn't sure if maybe the opposite was true where we should make 'current' and
'latest' require retrieving the timeline history files instead to help prevent
Scenario 1.

Regards,
Jimmy Yih
Вложения
Hello pgsql-hackers,

After doing some more debugging on the matter, I believe this issue might be a
minor regression from commit 5332b8cec541. Prior to that commit, the archiver
process when first started on a previously promoted primary would have all the
timeline history files marked as ready for immediate archiving. If that had
happened, none of my mentioned failure scenarios would be theoretically possible
(barring someone manually deleting the timeline history files). With that in
mind, I decided to look more into my Question 1 and created a patch proposal.
The attached patch will try to archive the current timeline history file if it
has not been archived yet when the archiver process starts up.

Regards,
Jimmy Yih

________________________________________
From: Jimmy Yih <jyih@vmware.com>
Sent: Wednesday, August 9, 2023 5:00 PM
To: pgsql-hackers@postgresql.org
Subject: Should the archiver process always make sure that the timeline history files exist in the archive?

Hello pgsql-hackers,

While testing out some WAL archiving and PITR scenarios, it was observed that
enabling WAL archiving for the first time on a primary that was on a timeline
higher than 1 would not initially archive the timeline history file for the
timeline it was currently on. While this might be okay for most use cases, there
are scenarios where this leads to unexpected failures that seem to expose some
flaws in the logic.

Scenario 1:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Create a
standby with that backup that will be continuously restoring from the WAL
archives, the standby will not contain the timeline 2 history file. The standby
will operate normally but if you try to create a cascading standby off it using
streaming replication, the cascade standby's WAL receiver will continuously
FATAL trying to request the timeline 2 history file that the main standby does
not have.

Scenario 2:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Then try
to create a new node by doing PITR with recovery_target_timeline set to
'current' or 'latest' which will succeed. However, doing PITR with
recovery_target_timeline = '2' will fail since it is unable to find the timeline
2 history file in the WAL archives. This may be a bit contradicting since we
allow 'current' and 'latest' to recover but explicitly setting the
recovery_target_timeline to the control file's timeline id ends up with failure.

Attached is a patch containing two TAP tests that demonstrate the scenarios.

My questions are:
1. Why doesn't the archiver process try to archive timeline history files when
   WAL archiving is first configured and/or continually check (maybe when the
   archiver process gets started before the main loop)?
2. Why does explicitly setting the recovery_target_timeline to the control
   file's timeline id not follow the same logic as recovery_target_timeline set
   to 'current'?
3. Why does a cascaded standby require the timeline history file of its control
   file's timeline id (startTLI) when the main replica is able to operate fine
   without the timeline history file?

Note that my initial observations came from testing with pgBackRest (copying
pg_wal/ during backup is disabled by default) but using `pg_basebackup -Xnone`
reproduced the issues similarly and is what I present in the TAP tests. At the
moment, the only workaround I can think of is to manually run the
archive_command on the missing timeline history file(s).

Are these valid issues that should be looked into or are they expected? Scenario
2 seems like it could be easily fixed if we determine that the
recovery_target_timeline numeric value is equal to the control file's timeline
id (compare rtli and recoveryTargetTLI in validateRecoveryParameters()?) but I
wasn't sure if maybe the opposite was true where we should make 'current' and
'latest' require retrieving the timeline history files instead to help prevent
Scenario 1.

Regards,
Jimmy Yih

Вложения

Re: Should the archiver process always make sure that the timeline history files exist in the archive?

От
Kyotaro Horiguchi
Дата:
At Wed, 16 Aug 2023 07:33:29 +0000, Jimmy Yih <jyih@vmware.com> wrote in 
> Hello pgsql-hackers,
> 
> After doing some more debugging on the matter, I believe this issue might be a
> minor regression from commit 5332b8cec541. Prior to that commit, the archiver
> process when first started on a previously promoted primary would have all the
> timeline history files marked as ready for immediate archiving. If that had
> happened, none of my mentioned failure scenarios would be theoretically possible
> (barring someone manually deleting the timeline history files). With that in
> mind, I decided to look more into my Question 1 and created a patch proposal.
> The attached patch will try to archive the current timeline history file if it
> has not been archived yet when the archiver process starts up.

In essence, after taking a subtle but not necessarily wrong steps,
there's a case where a primary server lacks the timeline history file
for the current timeline in both pg_wal and archive, even if that
timeline is larger than 1. This primary can start, but a new standby
created form the primary cannot start streaming, as it can't fetch the
timeline history file for the initial TLI.

A. The OP suggests archiving the timeline history file for the current
 timeline every time the archiver starts. However, I don't think we
 want to keep archiving the same file over and over. (Granted, we're
 not always perfect at avoiding that..)

B. Given that the steps valid, I concur to what is described in the
 test script provided: standbys don't really need that history file
 for the initial TLI (though I have yet to fully verify this).  If the
 walreceiver just overlooks a fetch error for this file, the standby
 can successfully start. (Just skipping the first history file seems
 to work, but it feels a tad aggressive to me.)

C. If those steps aren't valid, we might want to add a note stating
 that -X none basebackups do need the timeline history file for the
 initial TLI. And don't forget to enable archive mode before the
 latest timeline switch if any.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Thanks for the insightful response! I have attached an updated patch
that moves the proposed logic to the end of StartupXLOG where it seems
more correct to do this. It also helps with backporting (if it's
needed) since the archiver process only has access to shared memory
starting from Postgres 14.

Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> A. The OP suggests archiving the timeline history file for the current
>  timeline every time the archiver starts. However, I don't think we
>  want to keep archiving the same file over and over. (Granted, we're
>  not always perfect at avoiding that..)

With the updated proposed patch, we'll be checking if the current
timeline history file needs to be archived at the end of StartupXLOG
if archiving is enabled. If it detects that a .ready or .done file
already exists, then it won't do anything (which will be the common
case). I agree though that this may be an excessive check since it'll
be a no-op the majority of the time. However, it shouldn't execute
often and seems like a quick safe preventive measure. Could you give
more details on why this would be too cumbersome?

Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> B. Given that the steps valid, I concur to what is described in the
>  test script provided: standbys don't really need that history file
>  for the initial TLI (though I have yet to fully verify this).  If the
>  walreceiver just overlooks a fetch error for this file, the standby
>  can successfully start. (Just skipping the first history file seems
>  to work, but it feels a tad aggressive to me.)

This was my initial thought as well but I wasn't sure if it was okay
to overlook the fetch error. Initial testing and brainstorming seems
to show that it's okay. I think the main bad thing is that these new
standbys will not have their initial timeline history files which can
be useful for administration. I've attached a patch that attempts this
approach if we want to switch to this approach as the solution. The
patch contains an updated TAP test as well to better showcase the
issue and fix.

Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> C. If those steps aren't valid, we might want to add a note stating
>  that -X none basebackups do need the timeline history file for the
>  initial TLI.

The difficult thing about only documenting this is that it forces the
user to manually store and track the timeline history files. It can be
a bit cumbersome for WAL archiving users to recognize this scenario
when they're just trying to optimize their basebackups by using
-Xnone. But then again -Xnone does seem like it's designed for
advanced users so this might be okay.

Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> And don't forget to enable archive mode before the latest timeline
> switch if any.

This might not be reasonable since a user could've been using
streaming replication and doing failover/failbacks as part of general
high availability to manage their Postgres without knowing they were
going to enable WAL archiving later on. The user would need to
configure archiving and force a failover which may not be
straightforward.

Regards,
Jimmy Yih
Вложения
On Mon, Aug 28, 2023 at 8:59 PM Jimmy Yih <jyih@vmware.com> wrote:
> Thanks for the insightful response! I have attached an updated patch
> that moves the proposed logic to the end of StartupXLOG where it seems
> more correct to do this. It also helps with backporting (if it's
> needed) since the archiver process only has access to shared memory
> starting from Postgres 14.

Hmm. Do I understand correctly that the two patches you attached are
alternatives to each other, i.e. we need one or the other to fix the
issue, but not both?

It seems to me that trying to fetch a timeline history file and then
ignoring any error has got to be wrong. Either the file is needed or
it isn't. If it's needed, then failing to fetch it is a problem. If
it's not needed, there's no reason to try fetching it in the first
place. So I feel like we could either try to archive the file at the
end of recovery, as you propose in
v2-0001-Archive-current-timeline-history-file-after-recovery.patch.
Alternatively, we could try to find a way not to request the file in
the first place, if it's not required. But
v1-0001-Allow-recovery-to-proceed-when-initial-timeline-hist.patch
doesn't seem good to me.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Tue, 29 Aug 2023 at 06:29, Jimmy Yih <jyih@vmware.com> wrote:
>
> Thanks for the insightful response! I have attached an updated patch
> that moves the proposed logic to the end of StartupXLOG where it seems
> more correct to do this. It also helps with backporting (if it's
> needed) since the archiver process only has access to shared memory
> starting from Postgres 14.
>
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > A. The OP suggests archiving the timeline history file for the current
> >  timeline every time the archiver starts. However, I don't think we
> >  want to keep archiving the same file over and over. (Granted, we're
> >  not always perfect at avoiding that..)
>
> With the updated proposed patch, we'll be checking if the current
> timeline history file needs to be archived at the end of StartupXLOG
> if archiving is enabled. If it detects that a .ready or .done file
> already exists, then it won't do anything (which will be the common
> case). I agree though that this may be an excessive check since it'll
> be a no-op the majority of the time. However, it shouldn't execute
> often and seems like a quick safe preventive measure. Could you give
> more details on why this would be too cumbersome?
>
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > B. Given that the steps valid, I concur to what is described in the
> >  test script provided: standbys don't really need that history file
> >  for the initial TLI (though I have yet to fully verify this).  If the
> >  walreceiver just overlooks a fetch error for this file, the standby
> >  can successfully start. (Just skipping the first history file seems
> >  to work, but it feels a tad aggressive to me.)
>
> This was my initial thought as well but I wasn't sure if it was okay
> to overlook the fetch error. Initial testing and brainstorming seems
> to show that it's okay. I think the main bad thing is that these new
> standbys will not have their initial timeline history files which can
> be useful for administration. I've attached a patch that attempts this
> approach if we want to switch to this approach as the solution. The
> patch contains an updated TAP test as well to better showcase the
> issue and fix.
>
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > C. If those steps aren't valid, we might want to add a note stating
> >  that -X none basebackups do need the timeline history file for the
> >  initial TLI.
>
> The difficult thing about only documenting this is that it forces the
> user to manually store and track the timeline history files. It can be
> a bit cumbersome for WAL archiving users to recognize this scenario
> when they're just trying to optimize their basebackups by using
> -Xnone. But then again -Xnone does seem like it's designed for
> advanced users so this might be okay.
>
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > And don't forget to enable archive mode before the latest timeline
> > switch if any.
>
> This might not be reasonable since a user could've been using
> streaming replication and doing failover/failbacks as part of general
> high availability to manage their Postgres without knowing they were
> going to enable WAL archiving later on. The user would need to
> configure archiving and force a failover which may not be
> straightforward.

I have changed the status of the patch to "Waiting on Author" as
Robert's suggestions have not yet been addressed. Feel free to address
the suggestions and update the status accordingly.

Regards,
Vignesh



On Thu, 11 Jan 2024 at 20:38, vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 29 Aug 2023 at 06:29, Jimmy Yih <jyih@vmware.com> wrote:
> >
> > Thanks for the insightful response! I have attached an updated patch
> > that moves the proposed logic to the end of StartupXLOG where it seems
> > more correct to do this. It also helps with backporting (if it's
> > needed) since the archiver process only has access to shared memory
> > starting from Postgres 14.
> >
> > Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > A. The OP suggests archiving the timeline history file for the current
> > >  timeline every time the archiver starts. However, I don't think we
> > >  want to keep archiving the same file over and over. (Granted, we're
> > >  not always perfect at avoiding that..)
> >
> > With the updated proposed patch, we'll be checking if the current
> > timeline history file needs to be archived at the end of StartupXLOG
> > if archiving is enabled. If it detects that a .ready or .done file
> > already exists, then it won't do anything (which will be the common
> > case). I agree though that this may be an excessive check since it'll
> > be a no-op the majority of the time. However, it shouldn't execute
> > often and seems like a quick safe preventive measure. Could you give
> > more details on why this would be too cumbersome?
> >
> > Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > B. Given that the steps valid, I concur to what is described in the
> > >  test script provided: standbys don't really need that history file
> > >  for the initial TLI (though I have yet to fully verify this).  If the
> > >  walreceiver just overlooks a fetch error for this file, the standby
> > >  can successfully start. (Just skipping the first history file seems
> > >  to work, but it feels a tad aggressive to me.)
> >
> > This was my initial thought as well but I wasn't sure if it was okay
> > to overlook the fetch error. Initial testing and brainstorming seems
> > to show that it's okay. I think the main bad thing is that these new
> > standbys will not have their initial timeline history files which can
> > be useful for administration. I've attached a patch that attempts this
> > approach if we want to switch to this approach as the solution. The
> > patch contains an updated TAP test as well to better showcase the
> > issue and fix.
> >
> > Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > C. If those steps aren't valid, we might want to add a note stating
> > >  that -X none basebackups do need the timeline history file for the
> > >  initial TLI.
> >
> > The difficult thing about only documenting this is that it forces the
> > user to manually store and track the timeline history files. It can be
> > a bit cumbersome for WAL archiving users to recognize this scenario
> > when they're just trying to optimize their basebackups by using
> > -Xnone. But then again -Xnone does seem like it's designed for
> > advanced users so this might be okay.
> >
> > Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > And don't forget to enable archive mode before the latest timeline
> > > switch if any.
> >
> > This might not be reasonable since a user could've been using
> > streaming replication and doing failover/failbacks as part of general
> > high availability to manage their Postgres without knowing they were
> > going to enable WAL archiving later on. The user would need to
> > configure archiving and force a failover which may not be
> > straightforward.
>
> I have changed the status of the patch to "Waiting on Author" as
> Robert's suggestions have not yet been addressed. Feel free to address
> the suggestions and update the status accordingly.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh