Обсуждение: pg_wal/RECOVERYHISTORY file remains after archive recovery

Поиск
Список
Период
Сортировка

pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Masahiko Sawada
Дата:
Hi,

When we do archive recovery from the database cluster of which
timeline ID is more than 2 pg_wal/RECOVERYHISTORY is remained even
after archive recovery completed.

The cause of this seems cbc55da556b that moved exitArchiveRecovery()
to before writeTimeLineHistory(). writeTimeLineHIstory() restores the
history file from archive directory and therefore creates
RECOVERYHISTORY file in pg_wal directory. We used to remove such
temporary file by exitArchiveRecovery() but with this commit the order
of calling these functions is reversed. Therefore we create
RECOVERYHISTORY file after exited from archive recovery mode and
remain it.

To fix it I think that we can remove RECOVERYHISTORY file before the
history file is archived in writeTimeLineHIstory(). The commit
cbc55da556b is intended to minimize the window between the moment the
file is written and the end-of-recovery record is generated. So I
think it's not good to put exitArchiveRecovery() after
writeTimeLineHIstory().

This issue seems to exist in all supported version as far as I read
the code, although I don't test all of them yet.

I've attached the draft patch to fix this issue. Regression test might
be required. Feedback and suggestion are very welcome.

Regards,

--
Masahiko Sawada

Вложения

Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Fujii Masao
Дата:
On Thu, Sep 26, 2019 at 5:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> When we do archive recovery from the database cluster of which
> timeline ID is more than 2 pg_wal/RECOVERYHISTORY is remained even
> after archive recovery completed.
>
> The cause of this seems cbc55da556b that moved exitArchiveRecovery()
> to before writeTimeLineHistory(). writeTimeLineHIstory() restores the
> history file from archive directory and therefore creates
> RECOVERYHISTORY file in pg_wal directory. We used to remove such
> temporary file by exitArchiveRecovery() but with this commit the order
> of calling these functions is reversed. Therefore we create
> RECOVERYHISTORY file after exited from archive recovery mode and
> remain it.
>
> To fix it I think that we can remove RECOVERYHISTORY file before the
> history file is archived in writeTimeLineHIstory(). The commit
> cbc55da556b is intended to minimize the window between the moment the
> file is written and the end-of-recovery record is generated. So I
> think it's not good to put exitArchiveRecovery() after
> writeTimeLineHIstory().
>
> This issue seems to exist in all supported version as far as I read
> the code, although I don't test all of them yet.
>
> I've attached the draft patch to fix this issue. Regression test might
> be required. Feedback and suggestion are very welcome.

What about moving the logic that removes RECO VERYXLOG and
RECOVERYHISTORY from exitArchiveRecovery() and performing it
just before/after RemoveNonParentXlogFiles()? Which looks simple.

Regards,

-- 
Fujii Masao



Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Masahiko Sawada
Дата:
On Thu, Sep 26, 2019 at 6:23 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Thu, Sep 26, 2019 at 5:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Hi,
> >
> > When we do archive recovery from the database cluster of which
> > timeline ID is more than 2 pg_wal/RECOVERYHISTORY is remained even
> > after archive recovery completed.
> >
> > The cause of this seems cbc55da556b that moved exitArchiveRecovery()
> > to before writeTimeLineHistory(). writeTimeLineHIstory() restores the
> > history file from archive directory and therefore creates
> > RECOVERYHISTORY file in pg_wal directory. We used to remove such
> > temporary file by exitArchiveRecovery() but with this commit the order
> > of calling these functions is reversed. Therefore we create
> > RECOVERYHISTORY file after exited from archive recovery mode and
> > remain it.
> >
> > To fix it I think that we can remove RECOVERYHISTORY file before the
> > history file is archived in writeTimeLineHIstory(). The commit
> > cbc55da556b is intended to minimize the window between the moment the
> > file is written and the end-of-recovery record is generated. So I
> > think it's not good to put exitArchiveRecovery() after
> > writeTimeLineHIstory().
> >
> > This issue seems to exist in all supported version as far as I read
> > the code, although I don't test all of them yet.
> >
> > I've attached the draft patch to fix this issue. Regression test might
> > be required. Feedback and suggestion are very welcome.
>
> What about moving the logic that removes RECO VERYXLOG and
> RECOVERYHISTORY from exitArchiveRecovery() and performing it
> just before/after RemoveNonParentXlogFiles()? Which looks simple.
>

Agreed. Attached the updated patch.

Regards,

--
Masahiko Sawada

Вложения

Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Michael Paquier
Дата:
On Fri, Sep 27, 2019 at 01:51:25PM +0900, Masahiko Sawada wrote:
> On Thu, Sep 26, 2019 at 6:23 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>> What about moving the logic that removes RECO VERYXLOG and
>> RECOVERYHISTORY from exitArchiveRecovery() and performing it
>> just before/after RemoveNonParentXlogFiles()? Which looks simple.
>
> Agreed. Attached the updated patch.

Mea culpa here, I have just noticed the thread.  Fujii-san, would you
prefer if I take care of it?  And also, what's the issue with not
doing the removal of both files just after writeTimeLineHistory()?
That's actually what happened before cbc55da5.
--
Michael

Вложения

Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Fujii Masao
Дата:
On Fri, Sep 27, 2019 at 5:09 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Sep 27, 2019 at 01:51:25PM +0900, Masahiko Sawada wrote:
> > On Thu, Sep 26, 2019 at 6:23 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> >> What about moving the logic that removes RECO VERYXLOG and
> >> RECOVERYHISTORY from exitArchiveRecovery() and performing it
> >> just before/after RemoveNonParentXlogFiles()? Which looks simple.
> >
> > Agreed. Attached the updated patch.
>
> Mea culpa here, I have just noticed the thread.  Fujii-san, would you
> prefer if I take care of it?  And also, what's the issue with not
> doing the removal of both files just after writeTimeLineHistory()?
> That's actually what happened before cbc55da5.

So you think that it's better to remove them just after writeTimeLineHistory()?
Per the following Sawada-san's comment, I was thinking that idea is basically
not good. And, RemoveNonParentXlogFiles() also removes garbage files from
pg_wal. It's simpler if similar codes exist near. Thought?

Regards,

-- 
Fujii Masao



Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Michael Paquier
Дата:
On Fri, Sep 27, 2019 at 05:58:21PM +0900, Fujii Masao wrote:
> So you think that it's better to remove them just after writeTimeLineHistory()?
> Per the following Sawada-san's comment, I was thinking that idea is basically
> not good. And, RemoveNonParentXlogFiles() also removes garbage files from
> pg_wal. It's simpler if similar codes exist near. Thought?

Sawada-san's argument of upthread is that it is not good to put
exitArchiveRecovery() after writeTimeLineHIstory(), which is what
cbc55da has done per the reasons mentioned in the commit log, and we
should not change that.

My argument is we know that RECOVERYXLOG and RECOVERYHISTORY are not
needed anymore at this stage of recovery, hence we had better remove
them as soon as possible.  I am not convinced that it is a good idea
to move the cleanup close to RemoveNonParentXlogFiles().  First, this
is an entirely different part of the logic where the startup process
has already switched to a new timeline.  Second, we add more steps
between the moment the two files are not needed and the moment they
are removed, so any failure in-between would cause those files to
still be there (we cannot say either that we will not manipulate this
code later on) and we don't want those files to lie around.  So,
mentioning that we do the cleanup just after writeTimeLineHIstory()
because we don't need them anymore is more consistent with what has
been done for ages for the end of archive recovery, something that
cbc55da unfortunately broke.
--
Michael

Вложения

Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Fujii Masao
Дата:
On Fri, Sep 27, 2019 at 7:16 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Sep 27, 2019 at 05:58:21PM +0900, Fujii Masao wrote:
> > So you think that it's better to remove them just after writeTimeLineHistory()?
> > Per the following Sawada-san's comment, I was thinking that idea is basically
> > not good. And, RemoveNonParentXlogFiles() also removes garbage files from
> > pg_wal. It's simpler if similar codes exist near. Thought?
>
> Sawada-san's argument of upthread is that it is not good to put
> exitArchiveRecovery() after writeTimeLineHIstory(), which is what
> cbc55da has done per the reasons mentioned in the commit log, and we
> should not change that.
>
> My argument is we know that RECOVERYXLOG and RECOVERYHISTORY are not
> needed anymore at this stage of recovery, hence we had better remove
> them as soon as possible.  I am not convinced that it is a good idea
> to move the cleanup close to RemoveNonParentXlogFiles().  First, this
> is an entirely different part of the logic where the startup process
> has already switched to a new timeline.  Second, we add more steps
> between the moment the two files are not needed and the moment they
> are removed, so any failure in-between would cause those files to
> still be there (we cannot say either that we will not manipulate this
> code later on) and we don't want those files to lie around.  So,
> mentioning that we do the cleanup just after writeTimeLineHIstory()
> because we don't need them anymore is more consistent with what has
> been done for ages for the end of archive recovery, something that
> cbc55da unfortunately broke.

Ok, I have no objection to remove them just after writeTimeLineHistory().

Regards,

-- 
Fujii Masao



Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Masahiko Sawada
Дата:
On Fri, Sep 27, 2019 at 8:41 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Fri, Sep 27, 2019 at 7:16 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Fri, Sep 27, 2019 at 05:58:21PM +0900, Fujii Masao wrote:
> > > So you think that it's better to remove them just after writeTimeLineHistory()?
> > > Per the following Sawada-san's comment, I was thinking that idea is basically
> > > not good. And, RemoveNonParentXlogFiles() also removes garbage files from
> > > pg_wal. It's simpler if similar codes exist near. Thought?
> >
> > Sawada-san's argument of upthread is that it is not good to put
> > exitArchiveRecovery() after writeTimeLineHIstory(), which is what
> > cbc55da has done per the reasons mentioned in the commit log, and we
> > should not change that.
> >
> > My argument is we know that RECOVERYXLOG and RECOVERYHISTORY are not
> > needed anymore at this stage of recovery, hence we had better remove
> > them as soon as possible.  I am not convinced that it is a good idea
> > to move the cleanup close to RemoveNonParentXlogFiles().  First, this
> > is an entirely different part of the logic where the startup process
> > has already switched to a new timeline.  Second, we add more steps
> > between the moment the two files are not needed and the moment they
> > are removed, so any failure in-between would cause those files to
> > still be there (we cannot say either that we will not manipulate this
> > code later on) and we don't want those files to lie around.  So,
> > mentioning that we do the cleanup just after writeTimeLineHIstory()
> > because we don't need them anymore is more consistent with what has
> > been done for ages for the end of archive recovery, something that
> > cbc55da unfortunately broke.
>
> Ok, I have no objection to remove them just after writeTimeLineHistory().
>

I abandoned once to move the removal code to between
writeTimeLineHistory() and timeline switching because of expanding the
window but since unlink itself will complete within a very short time
it would not be problamatic much.

Attached the updated patch that just moves the removal code.

Regards,

--
Masahiko Sawada

Вложения

Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Michael Paquier
Дата:
On Fri, Sep 27, 2019 at 10:00:16PM +0900, Masahiko Sawada wrote:
> I abandoned once to move the removal code to between
> writeTimeLineHistory() and timeline switching because of expanding the
> window but since unlink itself will complete within a very short time
> it would not be problamatic much.
>
> Attached the updated patch that just moves the removal code.

That's not quite it, as you forgot to move the declaration of
recoveryPath so the patch fails to compile.

Adding some tests would be nice, so I updated your patch to include
something.  One place where we recover files from archives is
002_archiving.pl, still the files get renamed to the segment names
when recovered so that's difficult to make that part 100%
deterministic yet.  Still as a reminder of the properties behind those
files it does not sound bad to document it in the test either, that's
cheap, and we get the future covered.
--
Michael

Вложения

Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Masahiko Sawada
Дата:
On Mon, Sep 30, 2019 at 10:10 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Sep 27, 2019 at 10:00:16PM +0900, Masahiko Sawada wrote:
> > I abandoned once to move the removal code to between
> > writeTimeLineHistory() and timeline switching because of expanding the
> > window but since unlink itself will complete within a very short time
> > it would not be problamatic much.
> >
> > Attached the updated patch that just moves the removal code.
>
> That's not quite it, as you forgot to move the declaration of
> recoveryPath so the patch fails to compile.

Oops, thanks.

>
> Adding some tests would be nice, so I updated your patch to include
> something.  One place where we recover files from archives is
> 002_archiving.pl, still the files get renamed to the segment names
> when recovered so that's difficult to make that part 100%
> deterministic yet.  Still as a reminder of the properties behind those
> files it does not sound bad to document it in the test either, that's
> cheap, and we get the future covered.

Thank you for updating the patch!

+1 to add tests but even the current postgres passes this tests
because of two reasons: one is $node_standby tries to restore
00000001.history but fails and therefore RECOVERYHISTORY isn't
created. Another one is described  To reproduce this issue the new
timeline ID of recovered database needs to be more than 3.

+isnt(
+ -f "$node_standby_data/pg_wal/RECOVERYHISTORY",
+ "RECOVERYHISTORY removed after promotion");
+isnt(
+ -f "$node_standby_data/pg_wal/RECOVERYXLOG",
+ "RECOVERYXLOG removed after promotion");

I think that the above checks are always true because isnt() function
checks if the 1st argument and 2nd argument are not the same.

I've attached the updated version patch including the tests. Please review it.

Regards,

--
Masahiko Sawada

Вложения

Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Michael Paquier
Дата:
On Mon, Sep 30, 2019 at 12:53:58PM +0900, Masahiko Sawada wrote:
> I think that the above checks are always true because isnt() function
> checks if the 1st argument and 2nd argument are not the same.

Dammit.  I overlooked this part of the module's doc.

> I've attached the updated version patch including the tests. Please
> review it.

Thanks, your test allows to reproduce the original problem, so that's
nice.  I don't have much to say, except some improvements to the
comments of the test as per the attached.  What do you think?
--
Michael

Вложения

Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Masahiko Sawada
Дата:
On Mon, Sep 30, 2019 at 5:03 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Sep 30, 2019 at 12:53:58PM +0900, Masahiko Sawada wrote:
> > I think that the above checks are always true because isnt() function
> > checks if the 1st argument and 2nd argument are not the same.
>
> Dammit.  I overlooked this part of the module's doc.
>
> > I've attached the updated version patch including the tests. Please
> > review it.
>
> Thanks, your test allows to reproduce the original problem, so that's
> nice.  I don't have much to say, except some improvements to the
> comments of the test as per the attached.  What do you think?

Thank you for updating! The comment in your patch is much better.

Regards,

--
Masahiko Sawada



Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Michael Paquier
Дата:
On Mon, Sep 30, 2019 at 05:07:08PM +0900, Masahiko Sawada wrote:
> Thank you for updating! The comment in your patch is much better.

Thanks, done and back-patched down to 9.5.
--
Michael

Вложения

Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

От
Masahiko Sawada
Дата:
On Wed, Oct 2, 2019 at 3:58 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Sep 30, 2019 at 05:07:08PM +0900, Masahiko Sawada wrote:
> > Thank you for updating! The comment in your patch is much better.
>
> Thanks, done and back-patched down to 9.5.

Thank you!

Regards,

--
Masahiko Sawada