Обсуждение: [BUG] pg_basebackup from disconnected standby fails

Поиск
Список
Период
Сортировка

[BUG] pg_basebackup from disconnected standby fails

От
Kyotaro HORIGUCHI
Дата:
Hello, I found that pg_basebackup from a replication standby
fails after the following steps, on 9.3 and the master.

- start a replication master
- start a replication standby
- stop the master in the mode other than immediate.

pg_basebackup to the standby will fail with the following error.

> pg_basebackup: could not get transaction log end position from 
> server: ERROR:  could not find any WAL files

The immediate cause is that do_pg_stop_backup returns an ealier
LSN to do_pg_start_backup. The backup start point is the redo
point of the last executed restart point. And the backup end
point is the minRecoveryPoint at the call time.

A restart point doesn't update the minRecoveryPoint when it is
actually executed. Even though, ControlFile->checkPointCopy is
updated to the redo point of the restart point just made. The
minRecoveryPoint is too small as the backup end point on this
situation. Thit is, end point can go behind the start point.

This can be caused by the simple steps above but it also can be
occur when pg_basebackup is connected after master's
disconnection during a restart point. (With some other
timing-dependet condition)

So, the following comment in do_pg_stop_backup says as the
following seems somewhat wrong.

> * We return the current minimum recovery point as the backup end
> * location. Note that it can be greater than the exact backup end
> * location if the minimum recovery point is updated after the backup of
> * pg_control. This is harmless for current uses.

After looking more closely, I found that the minRecoveryPoint
tends to be too small as the backup end point, and up to the
record at the lastReplayedRecPtr can affect the pages on disk and
they can go into the backup just taken.

My conclusion here is that do_pg_stop_backup should return
lastReplayedRecPtr, not minRecoveryPoint.

The attached small patch does this on the master. The first
problem is fixed by this for me.

Any thoughts?

# Sorry, but I'll be offline 'til Monday.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Thu, Jun 9, 2016 at 9:55 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello, I found that pg_basebackup from a replication standby
> fails after the following steps, on 9.3 and the master.
>
> - start a replication master
> - start a replication standby
> - stop the master in the mode other than immediate.
>
> pg_basebackup to the standby will fail with the following error.
>
>> pg_basebackup: could not get transaction log end position from server:
>> ERROR:  could not find any WAL files

Indeed, and you could just do the following to reproduce the failure
with the recovery test suite, so I would suggest adding this test in
the patch:
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -24,6 +24,11 @@ $node_standby_1->start;
 # pg_basebackup works on a standby).
 $node_standby_1->backup($backup_name);

+# Take a second backup of the standby while the master is offline.
+$node_master->stop;
+$node_standby_1->backup('my_backup_2');
+$node_master->start;
+

> After looking more closely, I found that the minRecoveryPoint
> tends to be too small as the backup end point, and up to the
> record at the lastReplayedRecPtr can affect the pages on disk and
> they can go into the backup just taken.
>
> My conclusion here is that do_pg_stop_backup should return
> lastReplayedRecPtr, not minRecoveryPoint.

I have been thinking quite a bit about this patch, and this logic
sounds quite right to me. When stopping the backup we need to let the
user know up to which point it needs to replay WAL, and relation pages
are touched up to lastReplayedEndRecPtr. This LSN could be greater
than the minimum recovery point as there is no record to mark the end
of the backup, and pg_control has normally, well surely, being backup
up last but that's not a new problem it would as well have been backup
up before the minimum recovery point has been reached...

Still, perhaps we could improve the documentation regarding that? Say
it is recommended to enforce the minimum recovery point in pg_control
to the stop backup LSN to ensure that the backup recovers to a
consistent state when taking a backup from a standby.

I am attaching an updated patch with the test btw.
--
Michael

Вложения

Re: [BUG] pg_basebackup from disconnected standby fails

От
Kyotaro HORIGUCHI
Дата:
Hello, thank you for looking this.

At Fri, 10 Jun 2016 17:39:59 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com>
> On Thu, Jun 9, 2016 at 9:55 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > Hello, I found that pg_basebackup from a replication standby
> > fails after the following steps, on 9.3 and the master.
> >
> > - start a replication master
> > - start a replication standby
> > - stop the master in the mode other than immediate.
> >
> > pg_basebackup to the standby will fail with the following error.
> >
> >> pg_basebackup: could not get transaction log end position from server:
> >> ERROR:  could not find any WAL files
> 
> Indeed, and you could just do the following to reproduce the failure
> with the recovery test suite, so I would suggest adding this test in
> the patch:
> --- a/src/test/recovery/t/001_stream_rep.pl
> +++ b/src/test/recovery/t/001_stream_rep.pl
> @@ -24,6 +24,11 @@ $node_standby_1->start;
>  # pg_basebackup works on a standby).
>  $node_standby_1->backup($backup_name);
> 
> +# Take a second backup of the standby while the master is offline.
> +$node_master->stop;
> +$node_standby_1->backup('my_backup_2');
> +$node_master->start;

I'm not sure that adding the test case for a particular bug like
this is appropriate. But it would be acceptable because it
doesn't take long time and it is separate from standard checks.

> > After looking more closely, I found that the minRecoveryPoint
> > tends to be too small as the backup end point, and up to the
> > record at the lastReplayedRecPtr can affect the pages on disk and
> > they can go into the backup just taken.
> >
> > My conclusion here is that do_pg_stop_backup should return
> > lastReplayedRecPtr, not minRecoveryPoint.
> 
> I have been thinking quite a bit about this patch, and this logic
> sounds quite right to me. When stopping the backup we need to let the
> user know up to which point it needs to replay WAL, and relation pages
> are touched up to lastReplayedEndRecPtr.

Yes, but by design, the changes in buffers don't go into disk
until buffer replacing occurs, which updates minRecoveryPoint. My
understanding is that the problem is that a restart point that is
not accompanied with buffer updates advances only the redo point
of the last checkpoint and doesn't update minRecoveryPoint, which
may be behind the redo point at the time.

It seems to me that we could more agressively advance the
minRecoveryPoint (but must not let it go too far..), but it is
right for it to aim a bit smaller than the ideal location.

So I proposed the patch as a solution instead of changing
minRecoveryPoint's movement. As the result, the explanation for
it is accurate enough, but obscuring the root cause.


> This LSN could be greater
> than the minimum recovery point as there is no record to mark the end
> of the backup, and pg_control has normally, well surely, being backup
> up last but that's not a new problem it would as well have been backup
> up before the minimum recovery point has been reached...

Yes. it would cause no new problem except that the amount of WAL
files to be copied could be larger than ideal amount.

> Still, perhaps we could improve the documentation regarding that? Say
> it is recommended to enforce the minimum recovery point in pg_control
> to the stop backup LSN to ensure that the backup recovers to a
> consistent state when taking a backup from a standby.

If I understand that correctly, I don't find a explicit mention
to minimum recovery point (and should not be, I suppose) in PITR
and pg_baseback pages in the documentaiton.

Up to where we need to reach a consistent state from a backup is
not the "Minimum recovery ending point" in pg_control. It holds
the consistency point in the previous recovery. What we need to
reach there for a backup is the WAL files up to the LSN returned
by the (do_)pg_stop_backup(). All of the files should have been
archived on master and should have been automatically transferred
by pg_basebackup with -X/x option.  (I don't know what to do when
-X/x is not specified for pg_basebackup to standby, though..)

> I am attaching an updated patch with the test btw.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Tue, Jun 14, 2016 at 8:31 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> +# Take a second backup of the standby while the master is offline.
>> +$node_master->stop;
>> +$node_standby_1->backup('my_backup_2');
>> +$node_master->start;
>
> I'm not sure that adding the test case for a particular bug like
> this is appropriate. But it would be acceptable because it
> doesn't take long time and it is separate from standard checks.

We already take a backup from a standby when master is connected, it
should not cost much in terms of time.

> It seems to me that we could more agressively advance the
> minRecoveryPoint (but must not let it go too far..), but it is
> right for it to aim a bit smaller than the ideal location.

It may be risky to propose such a change for a backpatch. Anyway, in
any case there is no guarantee that when using the low-level backup
routines pg_start/stop_backup with a custom backup method the minimum
recovery point will be correct.. pg_basebackup does that a bit more
carefully if I recall correctly (too lazy to look at the code now :)).
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Kyotaro HORIGUCHI
Дата:
Sorry, I'm confused about the minRecoveryPoint.

Reconsidered a bit.

At Tue, 14 Jun 2016 20:31:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20160614.203111.229211034.horiguchi.kyotaro@lab.ntt.co.jp>
> > > After looking more closely, I found that the minRecoveryPoint
> > > tends to be too small as the backup end point, and up to the
> > > record at the lastReplayedRecPtr can affect the pages on disk and
> > > they can go into the backup just taken.
> > >
> > > My conclusion here is that do_pg_stop_backup should return
> > > lastReplayedRecPtr, not minRecoveryPoint.
> > 
> > I have been thinking quite a bit about this patch, and this logic
> > sounds quite right to me. When stopping the backup we need to let the
> > user know up to which point it needs to replay WAL, and relation pages
> > are touched up to lastReplayedEndRecPtr.
> 
> Yes, but by design, the changes in buffers don't go into disk
> until buffer replacing occurs, which updates minRecoveryPoint. My
> understanding is that the problem is that a restart point that is
> not accompanied with buffer updates advances only the redo point
> of the last checkpoint and doesn't update minRecoveryPoint, which
> may be behind the redo point at the time.
> 
> It seems to me that we could more agressively advance the
> minRecoveryPoint (but must not let it go too far..), but it is
> right for it to aim a bit smaller than the ideal location.

It's wrong. minRecoveryPoint should be greater than or equal to
the maximum buffer-touching LSN reached in previous recoveries,
and it can be the same to replayEndRecPtr but may be behind it if
no acutual modification on page files is done hereafter. xlog.c
works that way. The value of the minRecoveryPoint smaller than
the redo point of the last checkpoint with no buffer flush is
allowable from this point of view but it is not proper for the
end point of a backup.

If we skip recording the last checkpoint position when it
eventually causes no buffer flush, minRecoveryPoint is again
usable for the purpose. However, it causes repeated restartpoint
trial on the same (skipped) checkpoint record.

As the consequence, we can solve this problemn also by explicitly
updating the minRecoveryPoint for an executed restartpoint
without no buffer flush.

The attached patch performs this way and also solves the problem.

Which one do you think is more preferable? Or any other solution?

This patch updates minRecoeryPoint only for restartpoints that
caused no buffer flush but such restriction might not be
necessary.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [BUG] pg_basebackup from disconnected standby fails

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Tue, 14 Jun 2016 21:24:58 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqQkJv-3Cbi=yDytoC9eWPfmjrj7-DLOn9C4YsB1twAKiw@mail.gmail.com>
> On Tue, Jun 14, 2016 at 8:31 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> +# Take a second backup of the standby while the master is offline.
> >> +$node_master->stop;
> >> +$node_standby_1->backup('my_backup_2');
> >> +$node_master->start;
> >
> > I'm not sure that adding the test case for a particular bug like
> > this is appropriate. But it would be acceptable because it
> > doesn't take long time and it is separate from standard checks.
> 
> We already take a backup from a standby when master is connected, it
> should not cost much in terms of time.

Agreed.

> > It seems to me that we could more agressively advance the
> > minRecoveryPoint (but must not let it go too far..), but it is
> > right for it to aim a bit smaller than the ideal location.
> 
> It may be risky to propose such a change for a backpatch. Anyway, in
> any case there is no guarantee that when using the low-level backup
> routines pg_start/stop_backup with a custom backup method the minimum
> recovery point will be correct.. pg_basebackup does that a bit more
> carefully if I recall correctly (too lazy to look at the code now :)).

Yes, as written in another mail, minRecoveryPoint seems to be
correctly upadtaed as its definition. The problem is that only
redo point of the last restartpoint is updated, even if no update
happened. But just inhibiting the update of redo point for the
case in turn causes an annoying repetition of restartpoints.

Addition to the TAP test above, the following SQL commands on the
master causes the same problem without disconnection.

=# checkpoint;
=# select pg_switch_xlog();

This is somewhat artifical but the same situation could be made
also in the nature. The exact condition for this is replaying a
checkpoint record having no buffer modification since the
preceding checkpoint in the previous WAL segments.


Returning to the discussion on which way to choose, we agreed
that using replayEndRecPtr instead of minRecoveryPoint in
do_pg_stop_backup(), as the patch attached to the following
message.

https://www.postgresql.org/message-id/CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Wed, Jun 15, 2016 at 4:52 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,

Sorry for the late reply, Horiguchi-san. I have finally been able to
put some mind power into that.

> This is somewhat artificial but the same situation could be made
> also in the nature. The exact condition for this is replaying a
> checkpoint record having no buffer modification since the
> preceding checkpoint in the previous WAL segments.

After thinking about that more, I am seeing your point.
CreateRestartPoint is clearly missing the shot for the update of
minRecoveryPoint even when a restart point can be created. Since
cdd46c7, we assume that a node in recovery will not update it, and
rely on the fact that it will be updated if some buffers are flushed,
but that assumption got broken since the introduction of the
possibility to take base backups from standbys, as in your case there
is no activity on the standby that would allow minRecoveryPoint to be
updated and make the backup finish in a clean state. By the way,
relying on CheckpointStats to decide if it should be updated or not
looks like a broken concept to me as this information is just stored
for logging and has no role in any decision-making, and I think that
it should remain this way. So, instead I think that we had better
update minRecoveryPoint unconditionally as in the attached. I reworked
the surrounding comments in consequence. That's a bit more aggressive,
and that addresses the backup problems.

> Returning to the discussion on which way to choose, we agreed
> that using replayEndRecPtr instead of minRecoveryPoint in
> do_pg_stop_backup(), as the patch attached to the following
> message.

Hm.. If there are no buffers flushed to disk that may increase the
time it takes to reach a consistent state at recovery for no real
reasons. So actually it is not that much a good idea...
--
Michael

Вложения

Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Thu, Jun 23, 2016 at 3:50 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Jun 15, 2016 at 4:52 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> Hello,
>
> Sorry for the late reply, Horiguchi-san. I have finally been able to
> put some mind power into that.
>
>> This is somewhat artificial but the same situation could be made
>> also in the nature. The exact condition for this is replaying a
>> checkpoint record having no buffer modification since the
>> preceding checkpoint in the previous WAL segments.
>
> After thinking about that more, I am seeing your point.
> CreateRestartPoint is clearly missing the shot for the update of
> minRecoveryPoint even when a restart point can be created. Since
> cdd46c7, we assume that a node in recovery will not update it, and
> rely on the fact that it will be updated if some buffers are flushed,
> but that assumption got broken since the introduction of the
> possibility to take base backups from standbys, as in your case there
> is no activity on the standby that would allow minRecoveryPoint to be
> updated and make the backup finish in a clean state. By the way,
> relying on CheckpointStats to decide if it should be updated or not
> looks like a broken concept to me as this information is just stored
> for logging and has no role in any decision-making, and I think that
> it should remain this way. So, instead I think that we had better
> update minRecoveryPoint unconditionally as in the attached. I reworked
> the surrounding comments in consequence. That's a bit more aggressive,
> and that addresses the backup problems.
>
>> Returning to the discussion on which way to choose, we agreed
>> that using replayEndRecPtr instead of minRecoveryPoint in
>> do_pg_stop_backup(), as the patch attached to the following
>> message.
>
> Hm.. If there are no buffers flushed to disk that may increase the
> time it takes to reach a consistent state at recovery for no real
> reasons. So actually it is not that much a good idea...

By the way, do you mind if I add this patch to the next CF? Better not
to lose track of it...
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Thu, Jun 23, 2016 at 3:51 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> By the way, do you mind if I add this patch to the next CF? Better not
> to lose track of it...

Well, I have added an entry here at the end:
https://commitfest.postgresql.org/10/654/
Better doing it now before I forget about it...
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Kyotaro HORIGUCHI
Дата:
Sorry for the absense. Thank you for registering it.

regards.

At Fri, 24 Jun 2016 14:46:25 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTLZE77Joz3EREBjunESB4TiZjyji4s_7b+HJnSSDUpLQ@mail.gmail.com>
> On Thu, Jun 23, 2016 at 3:51 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > By the way, do you mind if I add this patch to the next CF? Better not
> > to lose track of it...
> 
> Well, I have added an entry here at the end:
> https://commitfest.postgresql.org/10/654/
> Better doing it now before I forget about it...
> -- 
> Michael

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [BUG] pg_basebackup from disconnected standby fails

От
Alvaro Herrera
Дата:
Kyotaro HORIGUCHI wrote:

> At Fri, 10 Jun 2016 17:39:59 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com>
> > On Thu, Jun 9, 2016 at 9:55 PM, Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:

> > Indeed, and you could just do the following to reproduce the failure
> > with the recovery test suite, so I would suggest adding this test in
> > the patch:
> > --- a/src/test/recovery/t/001_stream_rep.pl
> > +++ b/src/test/recovery/t/001_stream_rep.pl
> > @@ -24,6 +24,11 @@ $node_standby_1->start;
> >  # pg_basebackup works on a standby).
> >  $node_standby_1->backup($backup_name);
> > 
> > +# Take a second backup of the standby while the master is offline.
> > +$node_master->stop;
> > +$node_standby_1->backup('my_backup_2');
> > +$node_master->start;
> 
> I'm not sure that adding the test case for a particular bug like
> this is appropriate. But it would be acceptable because it
> doesn't take long time and it is separate from standard checks.

The reason this test is appropiate is that it's testing a feature we
want to support, namely that taking a backup from a standby works even
when the master is stopped.  It's not a test for this particular bug,
even though the feature doesn't work because of this bug.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [BUG] pg_basebackup from disconnected standby fails

От
Kyotaro HORIGUCHI
Дата:
Hello, thank you for the comment.

At Fri, 8 Jul 2016 14:42:20 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
<20160708184220.GA733807@alvherre.pgsql>
> Kyotaro HORIGUCHI wrote:
> 
> > At Fri, 10 Jun 2016 17:39:59 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com>
> > > On Thu, Jun 9, 2016 at 9:55 PM, Kyotaro HORIGUCHI
> > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> 
> > > Indeed, and you could just do the following to reproduce the failure
> > > with the recovery test suite, so I would suggest adding this test in
> > > the patch:
> > > --- a/src/test/recovery/t/001_stream_rep.pl
> > > +++ b/src/test/recovery/t/001_stream_rep.pl
> > > @@ -24,6 +24,11 @@ $node_standby_1->start;
> > >  # pg_basebackup works on a standby).
> > >  $node_standby_1->backup($backup_name);
> > > 
> > > +# Take a second backup of the standby while the master is offline.
> > > +$node_master->stop;
> > > +$node_standby_1->backup('my_backup_2');
> > > +$node_master->start;
> > 
> > I'm not sure that adding the test case for a particular bug like
> > this is appropriate. But it would be acceptable because it
> > doesn't take long time and it is separate from standard checks.
> 
> The reason this test is appropiate is that it's testing a feature we
> want to support, namely that taking a backup from a standby works even
> when the master is stopped.  It's not a test for this particular bug,
> even though the feature doesn't work because of this bug.

That's true, but we don't always have a perfectly comprehensive
test suite, consciously or unconsciously. The sentence was
inattentive but the "bug" was just the negative comparable to
"feature" in my mind. My point was the comparison between adding
a test for a corner-case and its cost. It must be added if the
fixed feature is fragile. It can be added it doesn't take a too
long time to finish.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Mon, Jul 11, 2016 at 4:40 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> That's true, but we don't always have a perfectly comprehensive
> test suite, consciously or unconsciously. The sentence was
> inattentive but the "bug" was just the negative comparable to
> "feature" in my mind. My point was the comparison between adding
> a test for a corner-case and its cost. It must be added if the
> fixed feature is fragile. It can be added it doesn't take a too
> long time to finish.

I'd just add it to be honest. Taking backups from standbys, with or
without the master being connected is a supported feature, and we want
to follow-up to be sure that it does not in the future. Having now the
infrastructure for more complex scenarios does not mean of course that
we need to test everything and all kind of things, of course, but here
the benefits are good compared to the cost that a single call of
pg_basebackup has in the complete the test suite run.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Thu, Jun 23, 2016 at 12:20 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Jun 15, 2016 at 4:52 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> Hello,
>
> Sorry for the late reply, Horiguchi-san. I have finally been able to
> put some mind power into that.
>
>> This is somewhat artificial but the same situation could be made
>> also in the nature. The exact condition for this is replaying a
>> checkpoint record having no buffer modification since the
>> preceding checkpoint in the previous WAL segments.
>
> After thinking about that more, I am seeing your point.
> CreateRestartPoint is clearly missing the shot for the update of
> minRecoveryPoint even when a restart point can be created.
>

I think updating minRecoveryPoint unconditionally can change it's
purpose in some cases.  Refer below comments in code:

* minRecoveryPoint is updated to the latest replayed LSN whenever we
* flush a data change during archive recovery. That guards against
* starting archive recovery, aborting it, and restarting with an earlier
* stop location. If we've already flushed data changes from WAL record X
* to disk, we mustn't start up until we reach X again.

Now, as per above rule, the value of minRecoveryPoint can be much
smaller than XLogCtl->replayEndRecPtr.  I think this can change the
rules when we can allow read-only connections.

Another point to note is that we are updating checkpoint location
during restart points, when the database state is
DB_IN_ARCHIVE_RECOVERY and updating minRecoveryPoint unconditionally
doesn't look in sync with that as well.

I think your and Kyotaro-san's point that minRecoveryPoint should be
updated to support back-ups on standby has merits, but I think doing
it unconditionally might lead to change in behaviour in some cases.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Tue, Jul 12, 2016 at 12:22 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I think updating minRecoveryPoint unconditionally can change it's
> purpose in some cases.  Refer below comments in code:
>
> * minRecoveryPoint is updated to the latest replayed LSN whenever we
> * flush a data change during archive recovery. That guards against
> * starting archive recovery, aborting it, and restarting with an earlier
> * stop location. If we've already flushed data changes from WAL record X
> * to disk, we mustn't start up until we reach X again.
>
> Now, as per above rule, the value of minRecoveryPoint can be much
> smaller than XLogCtl->replayEndRecPtr.  I think this can change the
> rules when we can allow read-only connections.

That would delay the moment read-only connections in hot standby are
allowed in the case where a server is stopped with "immediate", then
restarted with a different target if no data has been flushed when
replaying.

> I think your and Kyotaro-san's point that minRecoveryPoint should be
> updated to support back-ups on standby has merits, but I think doing
> it unconditionally might lead to change in behaviour in some cases.

If we want to tackle the case I mentioned above, one way is to just
update minRecoveryPoint when an exclusive or a non-exclusive backup is
being taken by looking at their status in shared memory. See for
example the patch (1) attached, but this does not save from the case
where a node replays WAL, does not have data flushes, and from which a
backup is taken, in the case where this node gets restarted later with
the immediate mode and has different replay targets.

Another way that just popped into my mind is to add dedicated fields
to XLogCtl that set the stop LSN of a backup the way it should be
instead of using minRecoveryPoint. In short we'd update those fields
in CreateRestartPoint and UpdateMinRecoveryPoint under
XLogCtl->info_lck. The good thing is that this lock is already taken
there. See patch (2) accomplishing that.

Thinking about that, patch (2) is far cleaner, and takes care of not
advancing minRecoveryPoint where not needed, but it does it for the
base backups as they should be. So the dependency between the minimum
recovery point and the end locations of a backup get reduced.
--
Michael

Вложения

Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Jul 12, 2016 at 12:22 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I think updating minRecoveryPoint unconditionally can change it's
>> purpose in some cases.  Refer below comments in code:
>>
>> * minRecoveryPoint is updated to the latest replayed LSN whenever we
>> * flush a data change during archive recovery. That guards against
>> * starting archive recovery, aborting it, and restarting with an earlier
>> * stop location. If we've already flushed data changes from WAL record X
>> * to disk, we mustn't start up until we reach X again.
>>
>> Now, as per above rule, the value of minRecoveryPoint can be much
>> smaller than XLogCtl->replayEndRecPtr.  I think this can change the
>> rules when we can allow read-only connections.
>
> That would delay the moment read-only connections in hot standby are
> allowed in the case where a server is stopped with "immediate", then
> restarted with a different target if no data has been flushed when
> replaying.
>
>> I think your and Kyotaro-san's point that minRecoveryPoint should be
>> updated to support back-ups on standby has merits, but I think doing
>> it unconditionally might lead to change in behaviour in some cases.
>
> If we want to tackle the case I mentioned above, one way is to just
> update minRecoveryPoint when an exclusive or a non-exclusive backup is
> being taken by looking at their status in shared memory. See for
> example the patch (1) attached, but this does not save from the case
> where a node replays WAL, does not have data flushes, and from which a
> backup is taken, in the case where this node gets restarted later with
> the immediate mode and has different replay targets.
>

This looks clumsy as it updates minrecoverypoint for a specific
condition and it doesn't solve the above mentioned inconcistency.

> Another way that just popped into my mind is to add dedicated fields
> to XLogCtl that set the stop LSN of a backup the way it should be
> instead of using minRecoveryPoint. In short we'd update those fields
> in CreateRestartPoint and UpdateMinRecoveryPoint under
> XLogCtl->info_lck. The good thing is that this lock is already taken
> there. See patch (2) accomplishing that.
>

How is it different/preferable then directly using
XLogCtl->replayEndRecPtr and XLogCtl->replayEndTLI for stop backup
purpose?  Do you see any problem if we go with what Kyotaro-san has
proposed in the initial patch [1] (aka using
XLogCtl->lastReplayedEndRecPtr and XLogCtl->lastReplayedTLI as stop
backup location)?


[1] - https://www.postgresql.org/message-id/20160609.215558.118976703.horiguchi.kyotaro%40lab.ntt.co.jp


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Sat, Jul 16, 2016 at 9:20 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> If we want to tackle the case I mentioned above, one way is to just
>> update minRecoveryPoint when an exclusive or a non-exclusive backup is
>> being taken by looking at their status in shared memory. See for
>> example the patch (1) attached, but this does not save from the case
>> where a node replays WAL, does not have data flushes, and from which a
>> backup is taken, in the case where this node gets restarted later with
>> the immediate mode and has different replay targets.
>
> This looks clumsy as it updates minrecoverypoint for a specific
> condition and it doesn't solve the above mentioned inconcistency.

Yep. I am not saying the contrary. That's why (2) with its separate
fields would make more sense.

>> Another way that just popped into my mind is to add dedicated fields
>> to XLogCtl that set the stop LSN of a backup the way it should be
>> instead of using minRecoveryPoint. In short we'd update those fields
>> in CreateRestartPoint and UpdateMinRecoveryPoint under
>> XLogCtl->info_lck. The good thing is that this lock is already taken
>> there. See patch (2) accomplishing that.
>
> How is it different/preferable then directly using
> XLogCtl->replayEndRecPtr and XLogCtl->replayEndTLI for stop backup
> purpose?  Do you see any problem if we go with what Kyotaro-san has
> proposed in the initial patch [1] (aka using
> XLogCtl->lastReplayedEndRecPtr and XLogCtl->lastReplayedTLI as stop
> backup location)?

Re-reading this thread from scratch and scratching my mind, I am
actually not getting why we bumped into the topic of making
minRecoveryPoint updates more aggressive instead of the first proposal
:)

Knowing that we have no way to be sure if pg_control has been backed
up last or not, using the last replay LSN and TLI would be the most
simple solution, so let's do this for back-branches. It is an
annoyance to not be able to ensure that backups are taken while the
master is stopped or if there is no activity that updates relation
pages.

The thing that is really annoying btw is that there will be always a
gap between minRecoveryPoint and the actual moment where a backup
finishes because there is no way to rely on the XLOG_BACKUP_END
record. On top of that we can not be sure if pg_control has been
backed up last or not. Which is why it would be cool to document that
gap. Another crazy idea would be to return pg_control as an extra
return field of pg_stop_backup() and encourage users to write that
back in the backup itself. This would allow closing any hole in the
current logic for backups taken from live standbys: minRecoveryPoint
would be updated directly to the last replayed LSN/TLI in the control
file.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Tue, Jul 19, 2016 at 10:43 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sat, Jul 16, 2016 at 9:20 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> Another way that just popped into my mind is to add dedicated fields
>>> to XLogCtl that set the stop LSN of a backup the way it should be
>>> instead of using minRecoveryPoint. In short we'd update those fields
>>> in CreateRestartPoint and UpdateMinRecoveryPoint under
>>> XLogCtl->info_lck. The good thing is that this lock is already taken
>>> there. See patch (2) accomplishing that.
>>
>> How is it different/preferable then directly using
>> XLogCtl->replayEndRecPtr and XLogCtl->replayEndTLI for stop backup
>> purpose?  Do you see any problem if we go with what Kyotaro-san has
>> proposed in the initial patch [1] (aka using
>> XLogCtl->lastReplayedEndRecPtr and XLogCtl->lastReplayedTLI as stop
>> backup location)?
>
> Re-reading this thread from scratch and scratching my mind, I am
> actually not getting why we bumped into the topic of making
> minRecoveryPoint updates more aggressive instead of the first proposal
> :)
>
> Knowing that we have no way to be sure if pg_control has been backed
> up last or not, using the last replay LSN and TLI would be the most
> simple solution, so let's do this for back-branches.
>

Why only for back-branches? Do you have better solution for head?

> It is an
> annoyance to not be able to ensure that backups are taken while the
> master is stopped or if there is no activity that updates relation
> pages.
>
> The thing that is really annoying btw is that there will be always a
> gap between minRecoveryPoint and the actual moment where a backup
> finishes because there is no way to rely on the XLOG_BACKUP_END
> record.
>

Sorry, but I am not able to understand what you mean by above.  What
kind of relation you are trying to show between minRecoveryPoint and
backup finish point?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Tue, Jul 19, 2016 at 9:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Jul 19, 2016 at 10:43 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Sat, Jul 16, 2016 at 9:20 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>> Another way that just popped into my mind is to add dedicated fields
>>>> to XLogCtl that set the stop LSN of a backup the way it should be
>>>> instead of using minRecoveryPoint. In short we'd update those fields
>>>> in CreateRestartPoint and UpdateMinRecoveryPoint under
>>>> XLogCtl->info_lck. The good thing is that this lock is already taken
>>>> there. See patch (2) accomplishing that.
>>>
>>> How is it different/preferable then directly using
>>> XLogCtl->replayEndRecPtr and XLogCtl->replayEndTLI for stop backup
>>> purpose?  Do you see any problem if we go with what Kyotaro-san has
>>> proposed in the initial patch [1] (aka using
>>> XLogCtl->lastReplayedEndRecPtr and XLogCtl->lastReplayedTLI as stop
>>> backup location)?
>>
>> Re-reading this thread from scratch and scratching my mind, I am
>> actually not getting why we bumped into the topic of making
>> minRecoveryPoint updates more aggressive instead of the first proposal
>> :)
>>
>> Knowing that we have no way to be sure if pg_control has been backed
>> up last or not, using the last replay LSN and TLI would be the most
>> simple solution, so let's do this for back-branches.
>>
>
> Why only for back-branches? Do you have better solution for head?

Yes, I mentioned an idea upthread to set up the minimum recovery point
saved in the backup to the last replayed LSN. Though that's not
acceptable for 9.6 as this requires changing the output of
pg_stop_backup() with a new field containing the bytes of pg_control.
I am not sure how others feel about that, so for now and the
back-branches we could just document that updating the field after
taking a backup closes any holes, and prevents to open a hot standby
for read-only connections too early.

>> It is an
>> annoyance to not be able to ensure that backups are taken while the
>> master is stopped or if there is no activity that updates relation
>> pages.
>>
>> The thing that is really annoying btw is that there will be always a
>> gap between minRecoveryPoint and the actual moment where a backup
>> finishes because there is no way to rely on the XLOG_BACKUP_END
>> record.
>
> Sorry, but I am not able to understand what you mean by above.  What
> kind of relation you are trying to show between minRecoveryPoint and
> backup finish point?

When taking a backup from a standby, there is no XLOG_BACKUP_END that
can be used to determine when a hot standby is open for read-only
connections.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Wed, Jul 20, 2016 at 5:12 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Jul 19, 2016 at 9:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Tue, Jul 19, 2016 at 10:43 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Sat, Jul 16, 2016 at 9:20 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
>>>> <michael.paquier@gmail.com> wrote:
>> Why only for back-branches? Do you have better solution for head?
>
> Yes, I mentioned an idea upthread to set up the minimum recovery point
> saved in the backup to the last replayed LSN. Though that's not
> acceptable for 9.6 as this requires changing the output of
> pg_stop_backup() with a new field containing the bytes of pg_control.
> I am not sure how others feel about that,
>

Yeah, I think that is totally different angle to fix this issue, so
don't you think it is better to start a separate thread to discuss
about it for 10.0 and mark this patch as ready for committer.

>
>>> It is an
>>> annoyance to not be able to ensure that backups are taken while the
>>> master is stopped or if there is no activity that updates relation
>>> pages.
>>>
>>> The thing that is really annoying btw is that there will be always a
>>> gap between minRecoveryPoint and the actual moment where a backup
>>> finishes because there is no way to rely on the XLOG_BACKUP_END
>>> record.
>>
>> Sorry, but I am not able to understand what you mean by above.  What
>> kind of relation you are trying to show between minRecoveryPoint and
>> backup finish point?
>
> When taking a backup from a standby, there is no XLOG_BACKUP_END that
> can be used to determine when a hot standby is open for read-only
> connections.
>

Okay, may be we can add a comment in code to indicate the same, but
OTOH, it is apparent from code, so not sure if it is worth to add it.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Wed, Jul 20, 2016 at 8:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Jul 20, 2016 at 5:12 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Tue, Jul 19, 2016 at 9:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On Tue, Jul 19, 2016 at 10:43 AM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>> On Sat, Jul 16, 2016 at 9:20 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>> On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
>>>>> <michael.paquier@gmail.com> wrote:
>>> Why only for back-branches? Do you have better solution for head?
>>
>> Yes, I mentioned an idea upthread to set up the minimum recovery point
>> saved in the backup to the last replayed LSN. Though that's not
>> acceptable for 9.6 as this requires changing the output of
>> pg_stop_backup() with a new field containing the bytes of pg_control.
>> I am not sure how others feel about that,
>>
>
> Yeah, I think that is totally different angle to fix this issue, so
> don't you think it is better to start a separate thread to discuss
> about it for 10.0 and mark this patch as ready for committer.

I'd like to tackle this problem in 10.0, but that will strongly depend
on how my patches move on in CF1 and CF2.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Wed, Jul 20, 2016 at 8:56 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Jul 20, 2016 at 8:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Wed, Jul 20, 2016 at 5:12 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Tue, Jul 19, 2016 at 9:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> On Tue, Jul 19, 2016 at 10:43 AM, Michael Paquier
>>>> <michael.paquier@gmail.com> wrote:
>>>>> On Sat, Jul 16, 2016 at 9:20 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>>> On Wed, Jul 13, 2016 at 8:56 AM, Michael Paquier
>>>>>> <michael.paquier@gmail.com> wrote:
>>>> Why only for back-branches? Do you have better solution for head?
>>>
>>> Yes, I mentioned an idea upthread to set up the minimum recovery point
>>> saved in the backup to the last replayed LSN. Though that's not
>>> acceptable for 9.6 as this requires changing the output of
>>> pg_stop_backup() with a new field containing the bytes of pg_control.
>>> I am not sure how others feel about that,
>>>
>>
>> Yeah, I think that is totally different angle to fix this issue, so
>> don't you think it is better to start a separate thread to discuss
>> about it for 10.0 and mark this patch as ready for committer.
>
> I'd like to tackle this problem in 10.0, but that will strongly depend
> on how my patches move on in CF1 and CF2.

By the way, thank you for taking the time to provide input. I think
we're in good shape here now.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Thu, Jul 21, 2016 at 7:28 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Jul 20, 2016 at 8:56 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>>
>>> Yeah, I think that is totally different angle to fix this issue, so
>>> don't you think it is better to start a separate thread to discuss
>>> about it for 10.0 and mark this patch as ready for committer.
>>
>> I'd like to tackle this problem in 10.0, but that will strongly depend
>> on how my patches move on in CF1 and CF2.
>
> By the way, thank you for taking the time to provide input. I think
> we're in good shape here now.
>

So, if I understand correctly, then we can mark the version posted by
you upthread [1] which includes a test along with Kyotaro's fix can be
marked as Ready for committer.  If so, then please change the status
of patch accordingly.


[1] - https://www.postgresql.org/message-id/CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Thu, Jul 21, 2016 at 11:56 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Jul 21, 2016 at 7:28 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Wed, Jul 20, 2016 at 8:56 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>>>
>>>> Yeah, I think that is totally different angle to fix this issue, so
>>>> don't you think it is better to start a separate thread to discuss
>>>> about it for 10.0 and mark this patch as ready for committer.
>>>
>>> I'd like to tackle this problem in 10.0, but that will strongly depend
>>> on how my patches move on in CF1 and CF2.
>>
>> By the way, thank you for taking the time to provide input. I think
>> we're in good shape here now.
>>
>
> So, if I understand correctly, then we can mark the version posted by
> you upthread [1] which includes a test along with Kyotaro's fix can be
> marked as Ready for committer.  If so, then please change the status
> of patch accordingly.

Oops. I thought you did it already. So done.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Kyotaro HORIGUCHI
Дата:
Sorry for the absense. I've reached here.

At Thu, 21 Jul 2016 12:20:30 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqR+krKK9fqwquSJnA2oB-RcQ6Mi8ybz3WL0kCOB5SSZEg@mail.gmail.com>
> On Thu, Jul 21, 2016 at 11:56 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Jul 21, 2016 at 7:28 AM, Michael Paquier
> > <michael.paquier@gmail.com> wrote:
> >> On Wed, Jul 20, 2016 at 8:56 PM, Michael Paquier
> >> <michael.paquier@gmail.com> wrote:
> >>>>
> >>>> Yeah, I think that is totally different angle to fix this issue, so
> >>>> don't you think it is better to start a separate thread to discuss
> >>>> about it for 10.0 and mark this patch as ready for committer.
> >>>
> >>> I'd like to tackle this problem in 10.0, but that will strongly depend
> >>> on how my patches move on in CF1 and CF2.
> >>
> >> By the way, thank you for taking the time to provide input. I think
> >> we're in good shape here now.
> >>
> >
> > So, if I understand correctly, then we can mark the version posted by
> > you upthread [1] which includes a test along with Kyotaro's fix can be
> > marked as Ready for committer.  If so, then please change the status
> > of patch accordingly.
> 
> Oops. I thought you did it already. So done.

Thank you very much for the intensive discussion on this.

After some additional consideration, I attached two patches about
comment and documentation. 0001- is a patch related to the
previous patch that adds description on why we use replay end
point instead of minRecoveryPoint. And 0002- is a fix and an
additional mention on what would happen when pg_control is not
copied last during backup.


==== About the first patch.

Related to the previous patch, I found the following comment just
above where it applies.

xlog.c:10412
- * We return the current minimum recovery point as the backup end
- * location. Note that it can be greater than the exact backup end
- * location if the minimum recovery point is updated after the backup of
- * pg_control. This is harmless for current uses.

I haven't gave a notice on this but this seems to be necessary to
edit so as to mention this fix and the gap like the following.

+ * minRecoveryPoint can go behind the last checkpoint's redo location when
+ * the checkpoint writes out no buffer. This does no harm to performing a
+ * recovery but such inversion seems inconsistent from the view of the
+ * callers and prevents them from knowing WAL segments needed by sane
+ * calcuation. For the reason we return the last replayed point as the
+ * backup end location. Note that it can be greater than the exact backup
+ * end location or even the minimum recovery point of pg_control at the
+ * time. This is harmless for current uses.


==== About the second patch.

By the way, related to the following discussion in the upthread,

At Tue, 19 Jul 2016 14:13:36 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqRXMkR2rq7_Wa1aZZqBFVRUwU_5QTBr_kEQqWGUEjAaAQ@mail.gmail.com>
> The thing that is really annoying btw is that there will be always a
> gap between minRecoveryPoint and the actual moment where a backup
> finishes because there is no way to rely on the XLOG_BACKUP_END
> record. On top of that we can not be sure if pg_control has been
> backed up last or not. Which is why it would be cool to document that
> gap.

Even with out my previous patch, the gap between minRecoveryPoint
*when pg_control is backed up* and that (or the replay end) at
the end of backup is crucial and should be back-patched.

Addition to that, while I examined the documentation I found the
following description about pg_stop_backup in
continuous-archiving.html, which contradicts to its definition.

> In the same connection as before, issue the command:
> 
>   SELECT * FROM pg_stop_backup(false);
...
> The pg_stop_backup will return one row with three values.

AFAICS pg_stop_backup returns a single value of LSN. I don't know
where this comes from, but the attached patch fixes this and adds
a mention on the "gap". The original description mentioned
backup_label and tablespace_map but it seems not necessary. The
following is the new content rewritten by the patch.

| The pg_stop_backup will return the LSN when it is called. This
| LSN informs upto where the backup needs WAL segment files to
| complete. For a backup taken from a master, this function leaves
| a backup history file to inform the segment files needed and a
| server starts from the backup performes a recovery up to where a
| backup-end WAL record to reach a consistent state. But things are
| different for a backup taken from a standby. For the case, backup
| history file won't be created and recvoery is perfomed up to the
| Minimum-recovery-ending-location field in the pg_control file
| included in the backup instead. The location is properly stored
| in the backup if you copy the the pg_control file after all the
| other files as pg_basebackup does. Note that a hot standby
| started from a backup not taken in that manner will consider to
| have reached a consistent state mistakenly earlier.
| 
| Once the WAL segment files active during the backup are archived,
| you are done. The file contains the LSN returned by
| pg_stop_backup is the last segment that is required to form a
| complete set of backup files.

The "LSN" in the first line is intentionally ambiguated. It may
seems too much to write here.

> Another crazy idea would be to return pg_control as an extra
> return field of pg_stop_backup() and encourage users to write that
> back in the backup itself. This would allow closing any hole in the
> current logic for backups taken from live standbys: minRecoveryPoint
> would be updated directly to the last replayed LSN/TLI in the control
> file.

It seems to be doable at do_pg_*_backup level. non-exclusive
backup code does the same thing to backup_label but no usage
seen.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [BUG] pg_basebackup from disconnected standby fails

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Mon, 11 Jul 2016 16:47:53 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqQJqEMqY_caBh0=dPBxpmSfd6_Uf4HXSG=LgB1cNsZL=g@mail.gmail.com>
> On Mon, Jul 11, 2016 at 4:40 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > That's true, but we don't always have a perfectly comprehensive
> > test suite, consciously or unconsciously. The sentence was
> > inattentive but the "bug" was just the negative comparable to
> > "feature" in my mind. My point was the comparison between adding
> > a test for a corner-case and its cost. It must be added if the
> > fixed feature is fragile. It can be added it doesn't take a too
> > long time to finish.
> 
> I'd just add it to be honest. Taking backups from standbys, with or
> without the master being connected is a supported feature, and we want
> to follow-up to be sure that it does not in the future. Having now the
> infrastructure for more complex scenarios does not mean of course that
> we need to test everything and all kind of things, of course, but here
> the benefits are good compared to the cost that a single call of
> pg_basebackup has in the complete the test suite run.

Yes, totally agreed.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Thu, Jul 21, 2016 at 5:19 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> + * minRecoveryPoint can go behind the last checkpoint's redo location when
> + * the checkpoint writes out no buffer. This does no harm to performing a
> + * recovery but such inversion seems inconsistent from the view of the
> + * callers and prevents them from knowing WAL segments needed by sane
> + * calcuation. For the reason we return the last replayed point as the
> + * backup end location. Note that it can be greater than the exact backup
> + * end location or even the minimum recovery point of pg_control at the
> + * time. This is harmless for current uses.

It does not seem an improvement to me to mention a comment regarding
minRecoveryPoint in do_pg_stop_backup, especially knowing that the
patch that we have here uses the last replayed LSN and TLI and does
not care about that anymore.

> AFAICS pg_stop_backup returns a single value of LSN. I don't know
> where this comes from, but the attached patch fixes this and adds
> a mention on the "gap". The original description mentioned
> backup_label and tablespace_map but it seems not necessary. The
> following is the new content rewritten by the patch.

No, this second patch is wrong. This part of the docs refers to
non-exclusive backups, where 3 fields are returned. So the docs are
correct.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Thu, 21 Jul 2016 22:32:08 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqSQgt280LtSW9EgX7CvU4JwVraMmOQTM42NM-qyXAkFOw@mail.gmail.com>
> On Thu, Jul 21, 2016 at 5:19 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > + * minRecoveryPoint can go behind the last checkpoint's redo location when
> > + * the checkpoint writes out no buffer. This does no harm to performing a
> > + * recovery but such inversion seems inconsistent from the view of the
> > + * callers and prevents them from knowing WAL segments needed by sane
> > + * calcuation. For the reason we return the last replayed point as the
> > + * backup end location. Note that it can be greater than the exact backup
> > + * end location or even the minimum recovery point of pg_control at the
> > + * time. This is harmless for current uses.
> 
> It does not seem an improvement to me to mention a comment regarding
> minRecoveryPoint in do_pg_stop_backup, especially knowing that the
> patch that we have here uses the last replayed LSN and TLI and does
> not care about that anymore.

Recovery still uses the minRecoveryPoint stored in pg_control
value. The patch makes pg_stop_backup return replayEndRecPtr but
the previous comment does not mention why pg_stop_backup returns
replay end but recovery. So at least we should edit it so that it
describes the correct thing.

- * We return the current minimum recovery point as the backup end
- * location.
+ * We return the current replay end point as the backup end
+ * location.

Does this make sense? Next,

- * Note that it can be greater than the exact backup end
- * location if the minimum recovery point is updated after the backup of
- * pg_control. This is harmless for current uses.
+ * Note that it can be greater than the exact backup end
+ * location if the replay end point is updated after the backup
+ * is finished. This is harmless for current uses.

I think this is still correct. But this lacks a mention about
minRecoveryPoint. It is now separated from the return value of
this function.

+ * Recovery from a backup taken from a standby regards the
+ * minimum recovery point stored in pg_control as consistency
+ * point. It also can be greater than the exact backup end
+ * location if it is updated after the backup of
+ * pg_control. This is harmless, too.

It seems not good.

> > AFAICS pg_stop_backup returns a single value of LSN. I don't know
> > where this comes from, but the attached patch fixes this and adds
> > a mention on the "gap". The original description mentioned
> > backup_label and tablespace_map but it seems not necessary. The
> > following is the new content rewritten by the patch.
> 
> No, this second patch is wrong. This part of the docs refers to
> non-exclusive backups, where 3 fields are returned. So the docs are
> correct.

Ah! I see. Thanks. pg_stop_backup has two signatures
'pg_stop_backup()' and 'pg_stop_backup(exclusive boolean)' and
this mentions the latter.

https://www.postgresql.org/docs/9.6/static/functions-admin.html

This page doesn't explain the components of the return value of
the second form. It is mentioned here

https://www.postgresql.org/docs/9.6/static/continuous-archiving.html

| The pg_stop_backup will return one row with three values. The
| second of these fields should be written to a file named
| backup_label in the root directory of the backup.

| The file identified by pg_stop_backup's first return value is the
| last segment that is required to form a complete set of backup
| files.

It returns an LSN, not a segment. But it won't be such a matter.

Don't we need a brief explanation about the second form of
pg_stop_backup, especially about non-exclusive use in the admin
function page?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
> So, if I understand correctly, then we can mark the version posted by
> you upthread [1] which includes a test along with Kyotaro's fix can be
> marked as Ready for committer.  If so, then please change the status
> of patch accordingly.

Patch moved to next CF 2016-11, still with status "ready for committer".

IMO, this thread has reached as conclusion to use this patch to
address the problem reported:
CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Robert Haas
Дата:
On Sun, Oct 2, 2016 at 8:52 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>> So, if I understand correctly, then we can mark the version posted by
>> you upthread [1] which includes a test along with Kyotaro's fix can be
>> marked as Ready for committer.  If so, then please change the status
>> of patch accordingly.
>
> Patch moved to next CF 2016-11, still with status "ready for committer".
>
> IMO, this thread has reached as conclusion to use this patch to
> address the problem reported:
> CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com

I spent some time reading this thread today and trying to understand
exactly what's going on here.  Here's my attempt to restate the
problem:

1. If a restartpoint flushes no dirty buffers, then the redo location
advances but the minimum recovery point does not; therefore, the
minimum recovery point can fall behind the redo location.  That's
correct behavior, because if the standby is subsequently restarted, it
can correctly begin at the checkpoint record associated with the
restartpoint and need not replay any further.

2. However, it causes a problem when a base backup with the "include
WAL" option is taken from a standby because -- on standby servers only
-- the end location for a backup as returned by do_pg_stop_backup() is
the minimum recovery point.  If this precedes the start point for the
backup -- which is the restart point -- perform_base_backup() will
conclude that no WAL files are required for the backup and fail an
internal sanity check.

3. Removing the sanity check wouldn't help, because it can never be
right to end up with zero WAL files as a result of a base backup.  At
a minimum, we need the WAL file which contains the checkpoint record
from which the control file specifies that redo should start.
Actually, I think that checkpoint record could be spread across
multiple files: it might be big.  We need them all, but the current
code won't ensure that we get them.

The consensus solution on this thread seems to be that we should have
pg_do_stop_backup() return the last-replayed XLOG location as the
backup end point.  If the control file has been updated with a newer
redo location, then the associated checkpoint record has surely been
replayed, so we'll definitely have enough WAL to replay that
checkpoint record (and we don't need to replay anything else, because
we're supposing that this is the case where the minimum recovery point
precedes the redo location).  Although this will work, it might
include more WAL in the backup than is strictly necessary.  If the
additional WAL replayed beyond the minimum recovery point does NOT
include a checkpoint, we don't need any of it; if it does, we need
only the portion up to and including the last checkpoint record, and
not anything beyond that.

I can think of two solutions that would be "tighter":

1. When performing a restartpoint, update the minimum recovery point
to just beyond the checkpoint record.  I think this can't hurt anyone
who is actually restarting recovery on the same machine, because
that's exactly the point where they are going to start recovery.  A
minimum recovery point which precedes the point at which they will
start recovery is no better than one which is equal to the point at
which they will start recovery.

2. In perform_base_backup(), if the endptr returned by
do_pg_stop_backup() precedes the end of the checkpoint record returned
by do_pg_start_backup(), use the latter value instead.  Unfortunately,
that's not so easy: we can't just say if (endptr < starptr) startptr =
endptr; because startptr is the *start* of the checkpoint record, not
the end.  I suspect somebody could figure out a solution to this
problem, though.

If we decide we don't want to aim for one of these tighter solutions
and just adopt the previously-discussed patch, then at the very least
it needs better comments.  The minimum comment change the patch makes
right now doesn't give any explanation of why the new algorithm is
better than the old, and that's not cool.  Without that, the next
person who comes along to maintain this code won't have any easy way
of knowing that we grappled with this problem and what our conclusions
were.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [BUG] pg_basebackup from disconnected standby fails

От
Kyotaro HORIGUCHI
Дата:
Thank you for looking and retelling this.

At Fri, 21 Oct 2016 13:02:21 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaRdyfCqcg9x-E99XCxdC4LW0zygP5qLsPhFsb6BYDKow@mail.gmail.com>
> On Sun, Oct 2, 2016 at 8:52 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> >> So, if I understand correctly, then we can mark the version posted by
> >> you upthread [1] which includes a test along with Kyotaro's fix can be
> >> marked as Ready for committer.  If so, then please change the status
> >> of patch accordingly.
> >
> > Patch moved to next CF 2016-11, still with status "ready for committer".
> >
> > IMO, this thread has reached as conclusion to use this patch to
> > address the problem reported:
> > CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com
> 
> I spent some time reading this thread today and trying to understand
> exactly what's going on here.  Here's my attempt to restate the
> problem:
> 
> 1. If a restartpoint flushes no dirty buffers, then the redo location
> advances but the minimum recovery point does not; therefore, the
> minimum recovery point can fall behind the redo location.  That's
> correct behavior, because if the standby is subsequently restarted, it
> can correctly begin at the checkpoint record associated with the
> restartpoint and need not replay any further.

Yes.

> 2. However, it causes a problem when a base backup with the "include
> WAL" option is taken from a standby because -- on standby servers only
> -- the end location for a backup as returned by do_pg_stop_backup() is
> the minimum recovery point.  If this precedes the start point for the
> backup -- which is the restart point -- perform_base_backup() will
> conclude that no WAL files are required for the backup and fail an
> internal sanity check.

Yes. The option implies that the copied $PGDATA is a standalone
backup, that is, usable to start a standby instantly so at least
one WAL segment is needed, as mentioned in 3 below.

> 3. Removing the sanity check wouldn't help, because it can never be
> right to end up with zero WAL files as a result of a base backup.  At
> a minimum, we need the WAL file which contains the checkpoint record
> from which the control file specifies that redo should start.
> Actually, I think that checkpoint record could be spread across
> multiple files: it might be big.  We need them all, but the current
> code won't ensure that we get them.

Yes. The "checkpoit records" means that records during a checkpoint.

> The consensus solution on this thread seems to be that we should have
> pg_do_stop_backup() return the last-replayed XLOG location as the
> backup end point.  If the control file has been updated with a newer
> redo location, then the associated checkpoint record has surely been
> replayed, so we'll definitely have enough WAL to replay that
> checkpoint record (and we don't need to replay anything else, because
> we're supposing that this is the case where the minimum recovery point
> precedes the redo location).  Although this will work, it might
> include more WAL in the backup than is strictly necessary.  If the
> additional WAL replayed beyond the minimum recovery point does NOT
> include a checkpoint, we don't need any of it; if it does, we need
> only the portion up to and including the last checkpoint record, and
> not anything beyond that.

StartupXLOG mandates the existence of the start record of the
latest checkpoint. pg_start_backup() returns the start point
(redo location) of the restartpoint that it requested and
minRecoveryPoint is always after the replpayed checkpoint so it
works if always was going well.

But if the checkpoint contains no update, the restart point
exceeds minRecoveryPoint. Then no WAL copied.

> I can think of two solutions that would be "tighter":
> 
> 1. When performing a restartpoint, update the minimum recovery point
> to just beyond the checkpoint record.  I think this can't hurt anyone
> who is actually restarting recovery on the same machine, because
> that's exactly the point where they are going to start recovery.  A
> minimum recovery point which precedes the point at which they will
> start recovery is no better than one which is equal to the point at
> which they will start recovery.

This is a candidate that I thought of. But I avoided to change
the behavior of minRecoveryPoint that (seems to me that) it
describes only buffer state. So checkpoint with no update doesn't
advances minRecoveryPoint as my understanding.

> 2. In perform_base_backup(), if the endptr returned by
> do_pg_stop_backup() precedes the end of the checkpoint record returned
> by do_pg_start_backup(), use the latter value instead.  Unfortunately,
> that's not so easy: we can't just say if (endptr < starptr) startptr =
> endptr; because startptr is the *start* of the checkpoint record, not
> the end.  I suspect somebody could figure out a solution to this
> problem, though.

Yes, and the reason I rejected it was that it is not logical,
maybe the same meaning to it would cause another problem later.

> If we decide we don't want to aim for one of these tighter solutions
> and just adopt the previously-discussed patch, then at the very least
> it needs better comments.  The minimum comment change the patch makes
> right now doesn't give any explanation of why the new algorithm is
> better than the old, and that's not cool.  Without that, the next
> person who comes along to maintain this code won't have any easy way
> of knowing that we grappled with this problem and what our conclusions
> were.

If I minunderstand the meaning/function of minRecoveryPoint or we
are allowed to change the behavior/definition of it, using it can
reduce the amount of copied WAL for some casees. (Though no
concrete example is coming to my mind now.)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Fri, Oct 21, 2016 at 10:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Oct 2, 2016 at 8:52 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>> So, if I understand correctly, then we can mark the version posted by
>>> you upthread [1] which includes a test along with Kyotaro's fix can be
>>> marked as Ready for committer.  If so, then please change the status
>>> of patch accordingly.
>>
>> Patch moved to next CF 2016-11, still with status "ready for committer".
>>
>> IMO, this thread has reached as conclusion to use this patch to
>> address the problem reported:
>> CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com
>
> I spent some time reading this thread today and trying to understand
> exactly what's going on here.  Here's my attempt to restate the
> problem:
>
> 1. If a restartpoint flushes no dirty buffers, then the redo location
> advances but the minimum recovery point does not; therefore, the
> minimum recovery point can fall behind the redo location.  That's
> correct behavior, because if the standby is subsequently restarted, it
> can correctly begin at the checkpoint record associated with the
> restartpoint and need not replay any further.
>
> 2. However, it causes a problem when a base backup with the "include
> WAL" option is taken from a standby because -- on standby servers only
> -- the end location for a backup as returned by do_pg_stop_backup() is
> the minimum recovery point.  If this precedes the start point for the
> backup -- which is the restart point -- perform_base_backup() will
> conclude that no WAL files are required for the backup and fail an
> internal sanity check.
>
> 3. Removing the sanity check wouldn't help, because it can never be
> right to end up with zero WAL files as a result of a base backup.  At
> a minimum, we need the WAL file which contains the checkpoint record
> from which the control file specifies that redo should start.
> Actually, I think that checkpoint record could be spread across
> multiple files: it might be big.  We need them all, but the current
> code won't ensure that we get them.
>
> The consensus solution on this thread seems to be that we should have
> pg_do_stop_backup() return the last-replayed XLOG location as the
> backup end point.  If the control file has been updated with a newer
> redo location, then the associated checkpoint record has surely been
> replayed, so we'll definitely have enough WAL to replay that
> checkpoint record (and we don't need to replay anything else, because
> we're supposing that this is the case where the minimum recovery point
> precedes the redo location).  Although this will work, it might
> include more WAL in the backup than is strictly necessary.  If the
> additional WAL replayed beyond the minimum recovery point does NOT
> include a checkpoint, we don't need any of it; if it does, we need
> only the portion up to and including the last checkpoint record, and
> not anything beyond that.
>

You are right that it will include additional WAL than strictly
necessary, but that can happen today as well because minrecoverypoint
could be updated after you have established restart point for
do_pg_start_backup().  Do you want to say that even without patch in
some cases we are including additional WAL in backup than what is
strictly necessary, so it is better to improve the situation?

> I can think of two solutions that would be "tighter":
>
> 1. When performing a restartpoint, update the minimum recovery point
> to just beyond the checkpoint record.  I think this can't hurt anyone
> who is actually restarting recovery on the same machine, because
> that's exactly the point where they are going to start recovery.  A
> minimum recovery point which precedes the point at which they will
> start recovery is no better than one which is equal to the point at
> which they will start recovery.
>

I think this will work but will cause an unnecessary control file
update for the cases when buffer flush will anyway do it.  It can work
if we try to do only when required (minrecoverypoint is lesser than
lastcheckpoint) after flush of buffers.

> 2. In perform_base_backup(), if the endptr returned by
> do_pg_stop_backup() precedes the end of the checkpoint record returned
> by do_pg_start_backup(), use the latter value instead.  Unfortunately,
> that's not so easy: we can't just say if (endptr < starptr) startptr =
> endptr; because startptr is the *start* of the checkpoint record, not
> the end.  I suspect somebody could figure out a solution to this
> problem, though.
>

With this approach, don't we need something similar for API's
pg_stop_backup()/pg_stop_backup_v2()?


> If we decide we don't want to aim for one of these tighter solutions
> and just adopt the previously-discussed patch, then at the very least
> it needs better comments.
>

+1.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Mon, Oct 24, 2016 at 9:18 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Thank you for looking and retelling this.
>
> At Fri, 21 Oct 2016 13:02:21 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaRdyfCqcg9x-E99XCxdC4LW0zygP5qLsPhFsb6BYDKow@mail.gmail.com>
>> On Sun, Oct 2, 2016 at 8:52 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>> >> So, if I understand correctly, then we can mark the version posted by
>> >> you upthread [1] which includes a test along with Kyotaro's fix can be
>> >> marked as Ready for committer.  If so, then please change the status
>> >> of patch accordingly.
>> >
>> > Patch moved to next CF 2016-11, still with status "ready for committer".
>> >
>> > IMO, this thread has reached as conclusion to use this patch to
>> > address the problem reported:
>> > CAB7nPqTv5gmKQcNDoFGTGqoqXz2xLz4RRw247oqOJzZTVy6-7Q@mail.gmail.com
>>
>> I spent some time reading this thread today and trying to understand
>> exactly what's going on here.  Here's my attempt to restate the
>> problem:
>>
>> 1. If a restartpoint flushes no dirty buffers, then the redo location
>> advances but the minimum recovery point does not; therefore, the
>> minimum recovery point can fall behind the redo location.  That's
>> correct behavior, because if the standby is subsequently restarted, it
>> can correctly begin at the checkpoint record associated with the
>> restartpoint and need not replay any further.
>
> Yes.
>
>> 2. However, it causes a problem when a base backup with the "include
>> WAL" option is taken from a standby because -- on standby servers only
>> -- the end location for a backup as returned by do_pg_stop_backup() is
>> the minimum recovery point.  If this precedes the start point for the
>> backup -- which is the restart point -- perform_base_backup() will
>> conclude that no WAL files are required for the backup and fail an
>> internal sanity check.
>
> Yes. The option implies that the copied $PGDATA is a standalone
> backup, that is, usable to start a standby instantly so at least
> one WAL segment is needed, as mentioned in 3 below.
>
>> 3. Removing the sanity check wouldn't help, because it can never be
>> right to end up with zero WAL files as a result of a base backup.  At
>> a minimum, we need the WAL file which contains the checkpoint record
>> from which the control file specifies that redo should start.
>> Actually, I think that checkpoint record could be spread across
>> multiple files: it might be big.  We need them all, but the current
>> code won't ensure that we get them.
>
> Yes. The "checkpoit records" means that records during a checkpoint.
>
>> The consensus solution on this thread seems to be that we should have
>> pg_do_stop_backup() return the last-replayed XLOG location as the
>> backup end point.  If the control file has been updated with a newer
>> redo location, then the associated checkpoint record has surely been
>> replayed, so we'll definitely have enough WAL to replay that
>> checkpoint record (and we don't need to replay anything else, because
>> we're supposing that this is the case where the minimum recovery point
>> precedes the redo location).  Although this will work, it might
>> include more WAL in the backup than is strictly necessary.  If the
>> additional WAL replayed beyond the minimum recovery point does NOT
>> include a checkpoint, we don't need any of it; if it does, we need
>> only the portion up to and including the last checkpoint record, and
>> not anything beyond that.
>
> StartupXLOG mandates the existence of the start record of the
> latest checkpoint. pg_start_backup() returns the start point
> (redo location) of the restartpoint that it requested and
> minRecoveryPoint is always after the replpayed checkpoint so it
> works if always was going well.
>
> But if the checkpoint contains no update, the restart point
> exceeds minRecoveryPoint. Then no WAL copied.
>
>> I can think of two solutions that would be "tighter":
>>
>> 1. When performing a restartpoint, update the minimum recovery point
>> to just beyond the checkpoint record.  I think this can't hurt anyone
>> who is actually restarting recovery on the same machine, because
>> that's exactly the point where they are going to start recovery.  A
>> minimum recovery point which precedes the point at which they will
>> start recovery is no better than one which is equal to the point at
>> which they will start recovery.
>
> This is a candidate that I thought of. But I avoided to change
> the behavior of minRecoveryPoint that (seems to me that) it
> describes only buffer state. So checkpoint with no update doesn't
> advances minRecoveryPoint as my understanding.
>

I think what you are saying is not completely right, because we do
update minRecoveryPoint when we don't perform a new restart point.
When we perform restart point, then it assumes that flushing the
buffers will take care of updating minRecoveryPoint.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Mon, Oct 24, 2016 at 1:39 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Oct 24, 2016 at 9:18 AM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> Thank you for looking and retelling this.

+1.

>> At Fri, 21 Oct 2016 13:02:21 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaRdyfCqcg9x-E99XCxdC4LW0zygP5qLsPhFsb6BYDKow@mail.gmail.com>
>>> I can think of two solutions that would be "tighter":
>>>
>>> 1. When performing a restartpoint, update the minimum recovery point
>>> to just beyond the checkpoint record.  I think this can't hurt anyone
>>> who is actually restarting recovery on the same machine, because
>>> that's exactly the point where they are going to start recovery.  A
>>> minimum recovery point which precedes the point at which they will
>>> start recovery is no better than one which is equal to the point at
>>> which they will start recovery.
>>
>> This is a candidate that I thought of. But I avoided to change
>> the behavior of minRecoveryPoint that (seems to me that) it
>> describes only buffer state. So checkpoint with no update doesn't
>> advances minRecoveryPoint as my understanding.
>
> I think what you are saying is not completely right, because we do
> update minRecoveryPoint when we don't perform a new restart point.
> When we perform restart point, then it assumes that flushing the
> buffers will take care of updating minRecoveryPoint.

Yep, minRecoveryPoint still gets updated when the last checkpoint
record is the last restart point to avoid a hot standby to allow
read-only connections at a LSN-point earlier than the last shutdown.
Anyway, we can clearly reject 1. in the light of
https://www.postgresql.org/message-id/CAA4eK1KmjtsXqF0cav7Cs4d4vwv2H_pc8d8q1BUCqDzAF+7EzQ@mail.gmail.com
when playing with different stop locations at recovery.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Mon, Oct 24, 2016 at 1:26 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Oct 21, 2016 at 10:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> 2. In perform_base_backup(), if the endptr returned by
>> do_pg_stop_backup() precedes the end of the checkpoint record returned
>> by do_pg_start_backup(), use the latter value instead.  Unfortunately,
>> that's not so easy: we can't just say if (endptr < starptr) startptr =
>> endptr; because startptr is the *start* of the checkpoint record, not
>> the end.  I suspect somebody could figure out a solution to this
>> problem, though.
>>
>
> With this approach, don't we need something similar for API's
> pg_stop_backup()/pg_stop_backup_v2()?

Yes, I think so. That would sort of map with the idea I mentioned
upthread to have pg_stop_backup() return the contents of the control
file and have the caller write it to the backup by itself.

>> If we decide we don't want to aim for one of these tighter solutions
>> and just adopt the previously-discussed patch, then at the very least
>> it needs better comments.
>
> +1.

Yeah, here is an attempt at doing that:
-    * We return the current minimum recovery point as the backup end
+    * We return the current last replayed point as the backup end
     * location. Note that it can be greater than the exact backup end
-    * location if the minimum recovery point is updated after the backup of
+    * location if the last replayed point is updated after the backup of
     * pg_control. This is harmless for current uses.
     *
+    * Using the last replayed point as the backup end location ensures that
+    * the end location will never be older than the start position, something
+    * that could happen if for example minRecoveryPoint is used as backup
+    * end location when it never gets updated because no buffer flushes
+    * occurred. By using the last replay location, note that the backup may
+    * include more WAL segments than necessary. If the additional WAL
+    * replayed since minRecoveryPoint does not include a checkpoint, there
+    * is actually no need for it. Even if it does include a checkpoint,
+    * only the portion up to the checkpoint itself is necessary and not
+    * the WAL generated beyond that. Still, in the case of a backup taken
+    * from a standby, with its master disconnected, this ensures that the
+    * backup is valid.
+    *

Thoughts welcome.
--
Michael

Вложения

Re: [BUG] pg_basebackup from disconnected standby fails

От
Kyotaro HORIGUCHI
Дата:
Hi,

At Mon, 24 Oct 2016 15:55:58 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTDnE62F9RVXwvOw4ihCy9iVMgEYJeozjDQYV5pF7YvFg@mail.gmail.com>
> On Mon, Oct 24, 2016 at 1:39 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Mon, Oct 24, 2016 at 9:18 AM, Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> Thank you for looking and retelling this.
> 
> +1.
> 
> >> At Fri, 21 Oct 2016 13:02:21 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaRdyfCqcg9x-E99XCxdC4LW0zygP5qLsPhFsb6BYDKow@mail.gmail.com>
> >>> I can think of two solutions that would be "tighter":
> >>>
> >>> 1. When performing a restartpoint, update the minimum recovery point
> >>> to just beyond the checkpoint record.  I think this can't hurt anyone
> >>> who is actually restarting recovery on the same machine, because
> >>> that's exactly the point where they are going to start recovery.  A
> >>> minimum recovery point which precedes the point at which they will
> >>> start recovery is no better than one which is equal to the point at
> >>> which they will start recovery.
> >>
> >> This is a candidate that I thought of. But I avoided to change
> >> the behavior of minRecoveryPoint that (seems to me that) it
> >> describes only buffer state. So checkpoint with no update doesn't
> >> advances minRecoveryPoint as my understanding.
> >
> > I think what you are saying is not completely right, because we do
> > update minRecoveryPoint when we don't perform a new restart point.
> > When we perform restart point, then it assumes that flushing the
> > buffers will take care of updating minRecoveryPoint.

Thank you pointing that. I've forgot it. But as I looked there
again, I found that I have mistaken for a long time the code as
updating minRecoveryPoint, but it actually does invalidation.  If
the "update" were in the block of shutdown checkpoint, it would
look consistent but it is before the block, so any skipped
non-shutdown checkpoint record invalidates minRecoveryPoint but
leave the state as DB_IN_ARCHIVE_RECOVERY, aren't this two in a
contradiction? (Though I don't see something bad will happen with
it..)

> Yep, minRecoveryPoint still gets updated when the last checkpoint
> record is the last restart point to avoid a hot standby to allow
> read-only connections at a LSN-point earlier than the last shutdown.

The described behavior seems corect, but the code seems to be
doing somewhat different.

> Anyway, we can clearly reject 1. in the light of
> https://www.postgresql.org/message-id/CAA4eK1KmjtsXqF0cav7Cs4d4vwv2H_pc8d8q1BUCqDzAF+7EzQ@mail.gmail.com
> when playing with different stop locations at recovery.

| * If the last checkpoint record we've replayed is already our last
| * restartpoint, we can't perform a new restart point. We still update
| * minRecoveryPoint in that case, so that if this is a shutdown restart
| * point, we won't start up earlier than before. 
...
| * We don't explicitly advance minRecoveryPoint when we do create a
| * restartpoint. It's assumed that flushing the buffers will do that as a
| * side-effect.

The second sentense seems to me as "we *expect* minRecoveryPoint
to be updated anyway even if we don't do that here". Though a bit
different in reality..

skipped checkpoints - advance minRecvoeryPoint to the checkpoint

I'm failing to make a consistent model for the code around here
in my mind..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Tue, Oct 25, 2016 at 11:07 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Mon, 24 Oct 2016 15:55:58 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTDnE62F9RVXwvOw4ihCy9iVMgEYJeozjDQYV5pF7YvFg@mail.gmail.com>
>> Anyway, we can clearly reject 1. in the light of
>> https://www.postgresql.org/message-id/CAA4eK1KmjtsXqF0cav7Cs4d4vwv2H_pc8d8q1BUCqDzAF+7EzQ@mail.gmail.com
>> when playing with different stop locations at recovery.
>
> | * If the last checkpoint record we've replayed is already our last
> | * restartpoint, we can't perform a new restart point. We still update
> | * minRecoveryPoint in that case, so that if this is a shutdown restart
> | * point, we won't start up earlier than before.
> ...
> | * We don't explicitly advance minRecoveryPoint when we do create a
> | * restartpoint. It's assumed that flushing the buffers will do that as a
> | * side-effect.
>
> The second sentence seems to me as "we *expect* minRecoveryPoint
> to be updated anyway even if we don't do that here". Though a bit
> different in reality..it.
>
> skipped checkpoints - advance minRecvoeryPoint to the checkpoint
>
> I'm failing to make a consistent model for the code around here
> in my mind..

Hm? If the last checkpoint record replayed is the last restart point,
no restart point is created *but* minRecoveryPoint is updated to
prevent the case where read only queries are allowed earlier than the
next startup point. On the other hand, if a restart point is created,
this code does not update minRecoveryPoint and it is assumed that the
next buffer flush will do it.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Robert Haas
Дата:
On Sun, Oct 23, 2016 at 11:48 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> I can think of two solutions that would be "tighter":
>>
>> 1. When performing a restartpoint, update the minimum recovery point
>> to just beyond the checkpoint record.  I think this can't hurt anyone
>> who is actually restarting recovery on the same machine, because
>> that's exactly the point where they are going to start recovery.  A
>> minimum recovery point which precedes the point at which they will
>> start recovery is no better than one which is equal to the point at
>> which they will start recovery.
>
> This is a candidate that I thought of. But I avoided to change
> the behavior of minRecoveryPoint that (seems to me that) it
> describes only buffer state. So checkpoint with no update doesn't
> advances minRecoveryPoint as my understanding.

I don't really know what you mean by this.  minRecoveryPoint is the
point up to which we need to replay WAL in order to achieve
consistency - the distinction between "buffer state" and something
else is artificial.  My point is that the amount of WAL we need to
replay can never be negative, but that's what a minimum recovery point
prior to the end of the checkpoint record used for start-of-REDO
implies.

>> 2. In perform_base_backup(), if the endptr returned by
>> do_pg_stop_backup() precedes the end of the checkpoint record returned
>> by do_pg_start_backup(), use the latter value instead.  Unfortunately,
>> that's not so easy: we can't just say if (endptr < starptr) startptr =
>> endptr; because startptr is the *start* of the checkpoint record, not
>> the end.  I suspect somebody could figure out a solution to this
>> problem, though.
>
> Yes, and the reason I rejected it was that it is not logical,
> maybe the same meaning to it would cause another problem later.

I don't know why this isn't logical.  If you want to say it's going to
cause another problem later, you should say what problem you think it
will cause, not just guess that there might be one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [BUG] pg_basebackup from disconnected standby fails

От
Robert Haas
Дата:
On Mon, Oct 24, 2016 at 12:26 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> The consensus solution on this thread seems to be that we should have
>> pg_do_stop_backup() return the last-replayed XLOG location as the
>> backup end point.  If the control file has been updated with a newer
>> redo location, then the associated checkpoint record has surely been
>> replayed, so we'll definitely have enough WAL to replay that
>> checkpoint record (and we don't need to replay anything else, because
>> we're supposing that this is the case where the minimum recovery point
>> precedes the redo location).  Although this will work, it might
>> include more WAL in the backup than is strictly necessary.  If the
>> additional WAL replayed beyond the minimum recovery point does NOT
>> include a checkpoint, we don't need any of it; if it does, we need
>> only the portion up to and including the last checkpoint record, and
>> not anything beyond that.
>
> You are right that it will include additional WAL than strictly
> necessary, but that can happen today as well because minrecoverypoint
> could be updated after you have established restart point for
> do_pg_start_backup().  Do you want to say that even without patch in
> some cases we are including additional WAL in backup than what is
> strictly necessary, so it is better to improve the situation?

In that case, including additional WAL is *necessary*.  If the minimum
recovery point is updated after do_pg_start_backup(), pages bearing
LSNs newer than the old minimum recovery point could conceivably have
been copied as part of the backup, and therefore we need to replay up
through the new minimum recovery point to guarantee consistency.

>> I can think of two solutions that would be "tighter":
>>
>> 1. When performing a restartpoint, update the minimum recovery point
>> to just beyond the checkpoint record.  I think this can't hurt anyone
>> who is actually restarting recovery on the same machine, because
>> that's exactly the point where they are going to start recovery.  A
>> minimum recovery point which precedes the point at which they will
>> start recovery is no better than one which is equal to the point at
>> which they will start recovery.
>
> I think this will work but will cause an unnecessary control file
> update for the cases when buffer flush will anyway do it.  It can work
> if we try to do only when required (minrecoverypoint is lesser than
> lastcheckpoint) after flush of buffers.

Sure, that doesn't sound too hard to implement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [BUG] pg_basebackup from disconnected standby fails

От
Robert Haas
Дата:
On Mon, Oct 24, 2016 at 2:55 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Yep, minRecoveryPoint still gets updated when the last checkpoint
> record is the last restart point to avoid a hot standby to allow
> read-only connections at a LSN-point earlier than the last shutdown.
> Anyway, we can clearly reject 1. in the light of
> https://www.postgresql.org/message-id/CAA4eK1KmjtsXqF0cav7Cs4d4vwv2H_pc8d8q1BUCqDzAF+7EzQ@mail.gmail.com
> when playing with different stop locations at recovery.

I can't understand what point you're driving at here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Tue, Oct 25, 2016 at 10:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Oct 24, 2016 at 12:26 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> You are right that it will include additional WAL than strictly
>> necessary, but that can happen today as well because minrecoverypoint
>> could be updated after you have established restart point for
>> do_pg_start_backup().  Do you want to say that even without patch in
>> some cases we are including additional WAL in backup than what is
>> strictly necessary, so it is better to improve the situation?
>
> In that case, including additional WAL is *necessary*.  If the minimum
> recovery point is updated after do_pg_start_backup(), pages bearing
> LSNs newer than the old minimum recovery point could conceivably have
> been copied as part of the backup,
>

I might be missing something here, but I think this theory doesn't
hold true with respect to current code because we always update
minimum recovery point to last record being replayed not till till the
LSN of the page being flushed.  This is done to avoid repeated update
of control file.  Considering that, it seems to me that the patch by
Kyotaro-San is a reasonable fix provided we update comments at all
appropriate places.  I can try to update the comments, if the proposed
fix is acceptable to you.

> and therefore we need to replay up
> through the new minimum recovery point to guarantee consistency.
>
>>> I can think of two solutions that would be "tighter":
>>>
>>> 1. When performing a restartpoint, update the minimum recovery point
>>> to just beyond the checkpoint record.  I think this can't hurt anyone
>>> who is actually restarting recovery on the same machine, because
>>> that's exactly the point where they are going to start recovery.  A
>>> minimum recovery point which precedes the point at which they will
>>> start recovery is no better than one which is equal to the point at
>>> which they will start recovery.
>>
>> I think this will work but will cause an unnecessary control file
>> update for the cases when buffer flush will anyway do it.  It can work
>> if we try to do only when required (minrecoverypoint is lesser than
>> lastcheckpoint) after flush of buffers.
>
> Sure, that doesn't sound too hard to implement.
>

If you are inclined towards this solution, then I think what we need
to do is to change the API UpdateMinRecoveryPoint() such that it's
second parameter can take three values.  0 means update
minRecoveryPoint to passed lsn if minRecoveryPoint < lsn; 1 means
update minRecoveryPoint to latest replayed point if minRecoveryPoint <
lsn, same as currently false for *force*; 2 indicates same behaviour
as current *force* as true.  Also we need to pass currentTLI parameter
(lastCheckPoint.ThisTimeLineID) to this API to update
minRecoveryPointTLI.  I have not tried this, but I think something on
these lines should work.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Mon, Oct 24, 2016 at 12:25 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Oct 24, 2016 at 1:39 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> I think what you are saying is not completely right, because we do
>> update minRecoveryPoint when we don't perform a new restart point.
>> When we perform restart point, then it assumes that flushing the
>> buffers will take care of updating minRecoveryPoint.
>
> Yep, minRecoveryPoint still gets updated when the last checkpoint
> record is the last restart point to avoid a hot standby to allow
> read-only connections at a LSN-point earlier than the last shutdown.
> Anyway, we can clearly reject 1. in the light of
> https://www.postgresql.org/message-id/CAA4eK1KmjtsXqF0cav7Cs4d4vwv2H_pc8d8q1BUCqDzAF+7EzQ@mail.gmail.com
> when playing with different stop locations at recovery.
>

That point holds good only for cases when we try to update minimum
recovery point beyond what is required (like earlier we were thinking
to update it unconditionally), however what is being discussed here is
to update only if it is not updated by flush of buffers.  I think that
is okay, but I feel Kyotaro-San's fix is a good fix for the problem
and we don't need to add some more code (additional update of control
file) to fix the problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Wed, Oct 26, 2016 at 1:53 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Oct 24, 2016 at 12:25 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Mon, Oct 24, 2016 at 1:39 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> I think what you are saying is not completely right, because we do
>>> update minRecoveryPoint when we don't perform a new restart point.
>>> When we perform restart point, then it assumes that flushing the
>>> buffers will take care of updating minRecoveryPoint.
>>
>> Yep, minRecoveryPoint still gets updated when the last checkpoint
>> record is the last restart point to avoid a hot standby to allow
>> read-only connections at a LSN-point earlier than the last shutdown.
>> Anyway, we can clearly reject 1. in the light of
>> https://www.postgresql.org/message-id/CAA4eK1KmjtsXqF0cav7Cs4d4vwv2H_pc8d8q1BUCqDzAF+7EzQ@mail.gmail.com
>> when playing with different stop locations at recovery.
>>
>
> That point holds good only for cases when we try to update minimum
> recovery point beyond what is required (like earlier we were thinking
> to update it unconditionally), however what is being discussed here is
> to update only if it is not updated by flush of buffers.  I think that
> is okay, but I feel Kyotaro-San's fix is a good fix for the problem
> and we don't need to add some more code (additional update of control
> file) to fix the problem.

Reading ten times my last paragraph and staring at my screen for 15
long weird minutes, I cannot recall what I had in mind when I wrote
those lines in particular..

[...scratches head...]

But yes, thinking *harder*, I agree that updating minRecoveryPoint
just after the checkpoint record would be fine and removes the need to
have more WAL than necessary in for a backup taken from a standby.
That will also prevent cases where minRecoveryPoint is older than the
recovery start point. On top of that the cost of an extra call to
UpdateControlFile() looks cheap considering that CreateRestartPoint()
is called only by the checkpointer or at shutdown.

Just coding things this solution gives roughtly the attached? The TAP
test passes btw.
--
Michael

Вложения

Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Wed, Oct 26, 2016 at 1:53 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Oct 24, 2016 at 12:25 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Mon, Oct 24, 2016 at 1:39 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> I think what you are saying is not completely right, because we do
>>> update minRecoveryPoint when we don't perform a new restart point.
>>> When we perform restart point, then it assumes that flushing the
>>> buffers will take care of updating minRecoveryPoint.
>>
>> Yep, minRecoveryPoint still gets updated when the last checkpoint
>> record is the last restart point to avoid a hot standby to allow
>> read-only connections at a LSN-point earlier than the last shutdown.
>> Anyway, we can clearly reject 1. in the light of
>> https://www.postgresql.org/message-id/CAA4eK1KmjtsXqF0cav7Cs4d4vwv2H_pc8d8q1BUCqDzAF+7EzQ@mail.gmail.com
>> when playing with different stop locations at recovery.
>>
>
> That point holds good only for cases when we try to update minimum
> recovery point beyond what is required (like earlier we were thinking
> to update it unconditionally), however what is being discussed here is
> to update only if it is not updated by flush of buffers.  I think that
> is okay, but I feel Kyotaro-San's fix is a good fix for the problem
> and we don't need to add some more code (additional update of control
> file) to fix the problem.

We have the choice between a solution that requires any backup taken
from standbys to need more WAL than it needs, not sure if would be a
lot, and a solution that prevents minRecoveryPoint to get older than
the point where standby would start recovery anyway at next startup.
For correctness the latter solution looks more robust to me to be
honest.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Wed, Oct 26, 2016 at 1:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> If you are inclined towards this solution, then I think what we need
> to do is to change the API UpdateMinRecoveryPoint() such that it's
> second parameter can take three values.  0 means update
> minRecoveryPoint to passed lsn if minRecoveryPoint < lsn; 1 means
> update minRecoveryPoint to latest replayed point if minRecoveryPoint <
> lsn, same as currently false for *force*; 2 indicates same behaviour
> as current *force* as true.  Also we need to pass currentTLI parameter
> (lastCheckPoint.ThisTimeLineID) to this API to update
> minRecoveryPointTLI.  I have not tried this, but I think something on
> these lines should work.

0 is the behavior that you get when force = false, so it works as 1, no?
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Wed, Oct 26, 2016 at 12:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Oct 26, 2016 at 1:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> If you are inclined towards this solution, then I think what we need
>> to do is to change the API UpdateMinRecoveryPoint() such that it's
>> second parameter can take three values.  0 means update
>> minRecoveryPoint to passed lsn if minRecoveryPoint < lsn; 1 means
>> update minRecoveryPoint to latest replayed point if minRecoveryPoint <
>> lsn, same as currently false for *force*; 2 indicates same behaviour
>> as current *force* as true.  Also we need to pass currentTLI parameter
>> (lastCheckPoint.ThisTimeLineID) to this API to update
>> minRecoveryPointTLI.  I have not tried this, but I think something on
>> these lines should work.
>
> 0 is the behavior that you get when force = false, so it works as 1, no?
>

I don't think so.  Refer below code:

UpdateMinRecoveryPoint()
{
..
else if (force || minRecoveryPoint < lsn)
{
/*
* To avoid having to update the control file too often, we update it
* all the way to the last record being replayed, even though 'lsn'
* would suffice for correctness.  This also allows the 'force' case
* to not need a valid 'lsn' value.
..
}

If my understanding is right, then the changes proposed by you as
below is not what is intended here.  I think you need to do something
as I have mentioned above.

+ /*
+ * Update minRecoveryPoint just past the last redo checkpoint if
+ * necessary. This ensures that at next startup minRecoveryPoint will
+ * not be past the next point it would start at, preventing any
+ * weird behaviors with for example backups taken from standbys that
+ * rely on minRecoveryPoint as end backup location.
+ */
+ UpdateMinRecoveryPoint(RedoRecPtr, false);
+

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [BUG] pg_basebackup from disconnected standby fails

От
Robert Haas
Дата:
On Wed, Oct 26, 2016 at 2:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> But yes, thinking *harder*, I agree that updating minRecoveryPoint
> just after the checkpoint record would be fine and removes the need to
> have more WAL than necessary in for a backup taken from a standby.
> That will also prevent cases where minRecoveryPoint is older than the
> recovery start point. On top of that the cost of an extra call to
> UpdateControlFile() looks cheap considering that CreateRestartPoint()
> is called only by the checkpointer or at shutdown.
>
> Just coding things this solution gives roughtly the attached? The TAP
> test passes btw.

I think that still leaves a race condition, right?  It's got to be
part of the SAME control file update that advances the redo pointer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Thu, Oct 27, 2016 at 2:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Oct 26, 2016 at 2:06 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> But yes, thinking *harder*, I agree that updating minRecoveryPoint
>> just after the checkpoint record would be fine and removes the need to
>> have more WAL than necessary in for a backup taken from a standby.
>> That will also prevent cases where minRecoveryPoint is older than the
>> recovery start point. On top of that the cost of an extra call to
>> UpdateControlFile() looks cheap considering that CreateRestartPoint()
>> is called only by the checkpointer or at shutdown.
>>
>> Just coding things this solution gives roughtly the attached? The TAP
>> test passes btw.
>
> I think that still leaves a race condition, right?  It's got to be
> part of the SAME control file update that advances the redo pointer.

Right, thanks for double-checking... There is no meaning to do that
out of the ControlFileLock taken previously...
--
Michael

Вложения

Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Wed, Oct 26, 2016 at 6:13 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> If my understanding is right, then the changes proposed by you as
> below is not what is intended here.  I think you need to do something
> as I have mentioned above.

Ah OK I have spotted the subtility;
- 0 means update minRecoveryPoint to the LSN *passed* by the caller
- 1 means update minRecoveryPoint to the *latest replayed* LSN and not
the one of the caller
But if we do so that would mean an extra update of the control file
for nothing in CreateRestartPoint(), and that serves nothing.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Thu, Oct 27, 2016 at 9:51 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Oct 27, 2016 at 2:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Oct 26, 2016 at 2:06 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> But yes, thinking *harder*, I agree that updating minRecoveryPoint
>>> just after the checkpoint record would be fine and removes the need to
>>> have more WAL than necessary in for a backup taken from a standby.
>>> That will also prevent cases where minRecoveryPoint is older than the
>>> recovery start point. On top of that the cost of an extra call to
>>> UpdateControlFile() looks cheap considering that CreateRestartPoint()
>>> is called only by the checkpointer or at shutdown.
>>>
>>> Just coding things this solution gives roughtly the attached? The TAP
>>> test passes btw.
>>
>> I think that still leaves a race condition, right?  It's got to be
>> part of the SAME control file update that advances the redo pointer.
>
> Right, thanks for double-checking... There is no meaning to do that
> out of the ControlFileLock taken previously...
>

+ if (ControlFile->minRecoveryPoint < lastCheckPointRecPtr)
+ {
+ ControlFile->minRecoveryPoint = lastCheckPointRecPtr;
+ ControlFile->minRecoveryPointTLI = ThisTimeLineID;


This can create problem if the checkpoint record spans across multiple
segments, because you are updating minRecoveryPoint to start of
checkpoint record.  We need to update it to end+1 of checkpoint
record.  Please find attached patch which takes care of same.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Thu, Oct 27, 2016 at 7:16 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> This can create problem if the checkpoint record spans across multiple
> segments, because you are updating minRecoveryPoint to start of
> checkpoint record.  We need to update it to end+1 of checkpoint
> record.  Please find attached patch which takes care of same.

I gave up counting my mistakes on this thread, thanks. You should
update the comments of XLogCtlData for the new field
lastCheckPointEndPtr so as it is not used by the background writer but
when creating a new restart point to define the minimum recovery
point.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Robert Haas
Дата:
On Thu, Oct 27, 2016 at 9:05 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Oct 27, 2016 at 7:16 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> This can create problem if the checkpoint record spans across multiple
>> segments, because you are updating minRecoveryPoint to start of
>> checkpoint record.  We need to update it to end+1 of checkpoint
>> record.  Please find attached patch which takes care of same.
>
> I gave up counting my mistakes on this thread, thanks. You should
> update the comments of XLogCtlData for the new field
> lastCheckPointEndPtr so as it is not used by the background writer but
> when creating a new restart point to define the minimum recovery
> point.

I committed and back-patched this with some additional work on the
comments, but I don't understand this remark.  That comment seems like
it should refer to the checkpointer in modern branches, but isn't that
point independent of this patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [BUG] pg_basebackup from disconnected standby fails

От
Michael Paquier
Дата:
On Fri, Oct 28, 2016 at 1:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Oct 27, 2016 at 9:05 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Oct 27, 2016 at 7:16 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> This can create problem if the checkpoint record spans across multiple
>>> segments, because you are updating minRecoveryPoint to start of
>>> checkpoint record.  We need to update it to end+1 of checkpoint
>>> record.  Please find attached patch which takes care of same.
>>
>> I gave up counting my mistakes on this thread, thanks. You should
>> update the comments of XLogCtlData for the new field
>> lastCheckPointEndPtr so as it is not used by the background writer but
>> when creating a new restart point to define the minimum recovery
>> point.
>
> I committed and back-patched this with some additional work on the
> comments, but I don't understand this remark.  That comment seems like
> it should refer to the checkpointer in modern branches, but isn't that
> point independent of this patch?
    * During recovery, we keep a copy of the latest checkpoint record here.
-    * Used by the background writer when it wants to create a restartpoint.
+    * lastCheckPointRecPtr points to start of checkpoint record and
+    * lastCheckPointEndPtr points to end+1 of checkpoint record.  Used by the
+    * background writer when it wants to create a restartpoint.

The patch committed introduces lastCheckPointEndPtr, which is not used
to decide if a restart point should be created or not. Only
lastCheckPointRecPtr fits with this purpose. lastCheckPointEndPtr is
used just to check if minRecoveryPoint should be pushed up or not. So
my point here would be to refine a bit this comment, in the shape of
that for example:
lastCheckPointRecPtr points to start of checkpoint record and
lastCheckPointEndPtr points to end+1 of checkpoint record. Both are
used by the checkpointer. lastCheckPointRecPtr is used when it wants
to create a restart point, and lastCheckPointEndPtr is used to
determine if the minimum recovery point should be updated or not.

If you think that's not an improvement, feel free to discard it. Still
it seems to me that the comment as currently presented is confusing.
-- 
Michael



Re: [BUG] pg_basebackup from disconnected standby fails

От
Amit Kapila
Дата:
On Fri, Oct 28, 2016 at 5:08 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Oct 28, 2016 at 1:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Oct 27, 2016 at 9:05 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Thu, Oct 27, 2016 at 7:16 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> This can create problem if the checkpoint record spans across multiple
>>>> segments, because you are updating minRecoveryPoint to start of
>>>> checkpoint record.  We need to update it to end+1 of checkpoint
>>>> record.  Please find attached patch which takes care of same.
>>>
>>> I gave up counting my mistakes on this thread, thanks. You should
>>> update the comments of XLogCtlData for the new field
>>> lastCheckPointEndPtr so as it is not used by the background writer but
>>> when creating a new restart point to define the minimum recovery
>>> point.
>>
>> I committed and back-patched this with some additional work on the
>> comments, but I don't understand this remark.  That comment seems like
>> it should refer to the checkpointer in modern branches, but isn't that
>> point independent of this patch?
>

Yeah, I also think so.

>      * During recovery, we keep a copy of the latest checkpoint record here.
> -    * Used by the background writer when it wants to create a restartpoint.
> +    * lastCheckPointRecPtr points to start of checkpoint record and
> +    * lastCheckPointEndPtr points to end+1 of checkpoint record.  Used by the
> +    * background writer when it wants to create a restartpoint.
>
> The patch committed introduces lastCheckPointEndPtr, which is not used
> to decide if a restart point should be created or not. Only
> lastCheckPointRecPtr fits with this purpose. lastCheckPointEndPtr is
> used just to check if minRecoveryPoint should be pushed up or not. So
> my point here would be to refine a bit this comment, in the shape of
> that for example:
> lastCheckPointRecPtr points to start of checkpoint record and
> lastCheckPointEndPtr points to end+1 of checkpoint record. Both are
> used by the checkpointer. lastCheckPointRecPtr is used when it wants
> to create a restart point, and lastCheckPointEndPtr is used to
> determine if the minimum recovery point should be updated or not.
>

lastCheckPointEndPtr is used *not* only to *determine* if the minimum
recovery point, but also used to *update* it.  Actually, if you see
the comment is somewhat generic "Used by the background writer when it
wants to create a restartpoint." and lastCheckPointEndPtr gets used
when we create a restartpoint.  I think there is no harm in further
improving the comment, but I see nothing wrong as it is.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [BUG] pg_basebackup from disconnected standby fails

От
Robert Haas
Дата:
On Thu, Oct 27, 2016 at 7:38 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>> I committed and back-patched this with some additional work on the
>> comments, but I don't understand this remark.  That comment seems like
>> it should refer to the checkpointer in modern branches, but isn't that
>> point independent of this patch?
>
>      * During recovery, we keep a copy of the latest checkpoint record here.
> -    * Used by the background writer when it wants to create a restartpoint.
> +    * lastCheckPointRecPtr points to start of checkpoint record and
> +    * lastCheckPointEndPtr points to end+1 of checkpoint record.  Used by the
> +    * background writer when it wants to create a restartpoint.
>
> The patch committed introduces lastCheckPointEndPtr, which is not used
> to decide if a restart point should be created or not.

The comment doesn't say anything about those structure members being
used to decide "if a restart point should be created or not".  It just
says that they are used; it says nothing about the purpose for which
they are used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company