Обсуждение: reassure me that it's good to copy pg_control last in a base backup

Поиск
Список
Период
Сортировка

reassure me that it's good to copy pg_control last in a base backup

От
Chapman Flack
Дата:
I've been using a base backup script that takes special care to
have pg_control be the last file it grabs. And I see that
basebackup.c takes similar care:


https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/replication/basebackup.c;h=81203c9f5ac9dbf38da09e1ff55b29846c83f514;hp=2fa1f5461356a191559b93b591c9037c6c75b389;hb=8366c7803ec3d0591cf2d1226fea1fee947d56c3;hpb=74ab96a45ef6259aa6a86a781580edea8488511a

But I need to swallow my pride and admit I'm not sure how to
reason about this. I think I'm being spooked by language in the
"WAL Internals" documentation section:

"... the checkpoint's position is saved in the file pg_control.
Therefore, at the start of recovery, the server first reads pg_control
and then the checkpoint record; then it performs the REDO operation by
scanning forward from the log position indicated in the checkpoint
record."

From that description alone, I'd imagine a danger in redoing from a
base backup in which pg_control was copied last. What if another
checkpoint was made (after the one done by pg_start_backup) during
the course of the backup, and the late-copied pg_control refers to
it, but some of the files had been copied into the base backup
too early to reflect it?

Looking harder, I think I see that the special care to grab
pg_control last was introduced for the case of taking a base backup
from a standby, and perhaps only matters in that case. The long
discussion seems to be this one:

https://www.postgresql.org/message-id/201108050646.p756kHC5023570%40ccmds32.silk.ntts.co.jp

What I think I've gleaned is:

1. The description in the doc ("at the start of recovery, the server
   first reads pg_control and the checkpoint record") only applies to
   the kind of recovery that happens in an unexpected restart, using
   the files that are present; it's not the whole story for the kind
   of recovery that begins with a base backup.

2. In the case of recovery from a backup (that was taken from a master),
   both the start and end location in pg_control are disregarded, in
   favor of the backup label file and the backup end WAL record,
   respectively, so it doesn't matter a whit whether pg_control was
   copied early or late.

3. In recovery from a backup taken from a standby, there is a backup
   label file but no backup end WAL record, so the 'minimum recovery
   ending location' in pg_control has to be used, and that's why the
   fuss about copying pg_control last when backing up from a standby.

Did I get that right? If so, would it be worth adding some words
to that paragraph in "WAL Internals", to clarify that the pg_control
checkpoint position is not relied on when starting recovery with
a backup label present, and therefore it isn't scary to copy pg_control
late in the backup?

It all seems to make sense ultimately, but took a lot of reading
and head scratching to get there.

-Chap


Re: reassure me that it's good to copy pg_control last in a basebackup

От
Michael Paquier
Дата:
On Thu, Dec 21, 2017 at 10:48:49PM -0500, Chapman Flack wrote:
> From that description alone, I'd imagine a danger in redoing from a
> base backup in which pg_control was copied last. What if another
> checkpoint was made (after the one done by pg_start_backup) during
> the course of the backup, and the late-copied pg_control refers to
> it, but some of the files had been copied into the base backup
> too early to reflect it?

As long as you have a backup_label file to guarantee the start position
of recovery, that's not something to worry about. What would be bad is
to remove the backup_label file from a backup, which exposes you to
risks of corrupting an instance. This description stands for crash
recovery, where there is no backup_label file. Now you see why the
exclusive backup API can lead to problems? Imagine the case where
you take a exclusive backup and the instance from which a backup is
taken crashes, *with* a backup_label file on disk. Oops. That's one
reason behind non-exclusive backups, which is what pg_basebackup
uses as well.

> Looking harder, I think I see that the special care to grab
> pg_control last was introduced for the case of taking a base backup
> from a standby, and perhaps only matters in that case. The long
> discussion seems to be this one:
>
> https://www.postgresql.org/message-id/201108050646.p756kHC5023570%40ccmds32.silk.ntts.co.jp

Copying pg_control last in the backup matters only for bcakups taken from
standbys where you want to maximize the LSN position for minRecoveryPoint
so as you have a minimum amount of risks to face inconsistent data at
recovery. When taking a backup from a primary server, the WAL record
marking the end of the backup holds as guarantee that a consistent point
has been reached, so it does not matter to copy the control file first
or last in this case.

> What I think I've gleaned is:
>
> 1. The description in the doc ("at the start of recovery, the server
>    first reads pg_control and the checkpoint record") only applies to
>    the kind of recovery that happens in an unexpected restart, using
>    the files that are present; it's not the whole story for the kind
>    of recovery that begins with a base backup.

Yes, that's a crash recovery. But see the case I just described above
of an instance that crashing while an exclusive backup is running.

> 2. In the case of recovery from a backup (that was taken from a master),
>    both the start and end location in pg_control are disregarded, in
>    favor of the backup label file and the backup end WAL record,
>    respectively, so it doesn't matter a whit whether pg_control was
>    copied early or late.

Yes.

> 3. In recovery from a backup taken from a standby, there is a backup
>    label file but no backup end WAL record, so the 'minimum recovery
>    ending location' in pg_control has to be used, and that's why the
>    fuss about copying pg_control last when backing up from a standby.

Yes.

> Did I get that right? If so, would it be worth adding some words
> to that paragraph in "WAL Internals", to clarify that the pg_control
> checkpoint position is not relied on when starting recovery with
> a backup label present, and therefore it isn't scary to copy pg_control
> late in the backup?

I would be interested in seeing a patch about that, people tend to
remove backup_label files too easily, so hardening the documentation
a bit could be an idea to dig into.
--
Michael

Вложения

Re: reassure me that it's good to copy pg_control last in a basebackup

От
Chapman Flack
Дата:
On 12/22/17 00:29, Michael Paquier wrote:
> exclusive backup API can lead to problems? Imagine the case where
> you take a exclusive backup and the instance from which a backup is
> taken crashes, *with* a backup_label file on disk. Oops. That's one
> reason behind non-exclusive backups, which is what pg_basebackup

I was noticing that terminology in the long backup-from-standby thread
I was reading, but it wasn't clear to me how the terms originated.
What's exclusive about pg_start_backup/copy/pg_stop_backup? And what's
nonexclusive about pg_basebackup (which, AFAICS, is following roughly
the same sequence under the hood)?

By the way, what does happen in that case? I'm guessing it wakes up,
sees the backup_label file, decides it's doing a PITR, and starts
replaying already-applied WAL from the start-of-backup checkpoint,
rather than from the most recent one? Oops.

-Chap


Re: reassure me that it's good to copy pg_control last in a basebackup

От
Michael Paquier
Дата:
On Fri, Dec 22, 2017 at 12:46:01AM -0500, Chapman Flack wrote:
> I was noticing that terminology in the long backup-from-standby thread
> I was reading, but it wasn't clear to me how the terms originated.
> What's exclusive about pg_start_backup/copy/pg_stop_backup? And what's
> nonexclusive about pg_basebackup (which, AFAICS, is following roughly
> the same sequence under the hood)?

You can run an exclusive backup only once at a time in a given PostgreSQL
server as it uses the on-disk backup_label file to determine the state the
server is in, while non-exclusive backups can be run across many sessions,
at the cost that you need to maintain the session alive for the duration
of the backup. pg_basebackup uses always non-exclusive backups, so it
never creates an on-disk backup_label file on the instance from which the
backup is taken but it writes the file by itself. pg_start_backup and
pg_stop_backup include a set of optional arguments to control if you want
to do a non-exclusive or an exclusive backup, the default being exclusive.
Non-exclusive backups can also be run while an exclusive backup is running.

> By the way, what does happen in that case? I'm guessing it wakes up,
> sees the backup_label file, decides it's doing a PITR, and starts
> replaying already-applied WAL from the start-of-backup checkpoint,
> rather than from the most recent one? Oops.

In short, yes. And it does not find the backup end record as well.
--
Michael

Вложения