Re: [BUG] pg_basebackup from disconnected standby fails

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: [BUG] pg_basebackup from disconnected standby fails
Дата
Msg-id 20160721.171950.15026313.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [BUG] pg_basebackup from disconnected standby fails  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: [BUG] pg_basebackup from disconnected standby fails  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
Sorry for the absense. I've reached here.

At Thu, 21 Jul 2016 12:20:30 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqR+krKK9fqwquSJnA2oB-RcQ6Mi8ybz3WL0kCOB5SSZEg@mail.gmail.com>
> On Thu, Jul 21, 2016 at 11:56 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Jul 21, 2016 at 7:28 AM, Michael Paquier
> > <michael.paquier@gmail.com> wrote:
> >> On Wed, Jul 20, 2016 at 8:56 PM, Michael Paquier
> >> <michael.paquier@gmail.com> wrote:
> >>>>
> >>>> Yeah, I think that is totally different angle to fix this issue, so
> >>>> don't you think it is better to start a separate thread to discuss
> >>>> about it for 10.0 and mark this patch as ready for committer.
> >>>
> >>> I'd like to tackle this problem in 10.0, but that will strongly depend
> >>> on how my patches move on in CF1 and CF2.
> >>
> >> By the way, thank you for taking the time to provide input. I think
> >> we're in good shape here now.
> >>
> >
> > So, if I understand correctly, then we can mark the version posted by
> > you upthread [1] which includes a test along with Kyotaro's fix can be
> > marked as Ready for committer.  If so, then please change the status
> > of patch accordingly.
> 
> Oops. I thought you did it already. So done.

Thank you very much for the intensive discussion on this.

After some additional consideration, I attached two patches about
comment and documentation. 0001- is a patch related to the
previous patch that adds description on why we use replay end
point instead of minRecoveryPoint. And 0002- is a fix and an
additional mention on what would happen when pg_control is not
copied last during backup.


==== About the first patch.

Related to the previous patch, I found the following comment just
above where it applies.

xlog.c:10412
- * We return the current minimum recovery point as the backup end
- * location. Note that it can be greater than the exact backup end
- * location if the minimum recovery point is updated after the backup of
- * pg_control. This is harmless for current uses.

I haven't gave a notice on this but this seems to be necessary to
edit so as to mention this fix and the gap like the following.

+ * minRecoveryPoint can go behind the last checkpoint's redo location when
+ * the checkpoint writes out no buffer. This does no harm to performing a
+ * recovery but such inversion seems inconsistent from the view of the
+ * callers and prevents them from knowing WAL segments needed by sane
+ * calcuation. For the reason we return the last replayed point as the
+ * backup end location. Note that it can be greater than the exact backup
+ * end location or even the minimum recovery point of pg_control at the
+ * time. This is harmless for current uses.


==== About the second patch.

By the way, related to the following discussion in the upthread,

At Tue, 19 Jul 2016 14:13:36 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqRXMkR2rq7_Wa1aZZqBFVRUwU_5QTBr_kEQqWGUEjAaAQ@mail.gmail.com>
> The thing that is really annoying btw is that there will be always a
> gap between minRecoveryPoint and the actual moment where a backup
> finishes because there is no way to rely on the XLOG_BACKUP_END
> record. On top of that we can not be sure if pg_control has been
> backed up last or not. Which is why it would be cool to document that
> gap.

Even with out my previous patch, the gap between minRecoveryPoint
*when pg_control is backed up* and that (or the replay end) at
the end of backup is crucial and should be back-patched.

Addition to that, while I examined the documentation I found the
following description about pg_stop_backup in
continuous-archiving.html, which contradicts to its definition.

> In the same connection as before, issue the command:
> 
>   SELECT * FROM pg_stop_backup(false);
...
> The pg_stop_backup will return one row with three values.

AFAICS pg_stop_backup returns a single value of LSN. I don't know
where this comes from, but the attached patch fixes this and adds
a mention on the "gap". The original description mentioned
backup_label and tablespace_map but it seems not necessary. The
following is the new content rewritten by the patch.

| The pg_stop_backup will return the LSN when it is called. This
| LSN informs upto where the backup needs WAL segment files to
| complete. For a backup taken from a master, this function leaves
| a backup history file to inform the segment files needed and a
| server starts from the backup performes a recovery up to where a
| backup-end WAL record to reach a consistent state. But things are
| different for a backup taken from a standby. For the case, backup
| history file won't be created and recvoery is perfomed up to the
| Minimum-recovery-ending-location field in the pg_control file
| included in the backup instead. The location is properly stored
| in the backup if you copy the the pg_control file after all the
| other files as pg_basebackup does. Note that a hot standby
| started from a backup not taken in that manner will consider to
| have reached a consistent state mistakenly earlier.
| 
| Once the WAL segment files active during the backup are archived,
| you are done. The file contains the LSN returned by
| pg_stop_backup is the last segment that is required to form a
| complete set of backup files.

The "LSN" in the first line is intentionally ambiguated. It may
seems too much to write here.

> Another crazy idea would be to return pg_control as an extra
> return field of pg_stop_backup() and encourage users to write that
> back in the backup itself. This would allow closing any hole in the
> current logic for backups taken from live standbys: minRecoveryPoint
> would be updated directly to the last replayed LSN/TLI in the control
> file.

It seems to be doable at do_pg_*_backup level. non-exclusive
backup code does the same thing to backup_label but no usage
seen.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: asynchronous and vectorized execution