Обсуждение: reassure me that it's good to copy pg_control last in a base backup
I've been using a base backup script that takes special care to have pg_control be the last file it grabs. And I see that basebackup.c takes similar care: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/replication/basebackup.c;h=81203c9f5ac9dbf38da09e1ff55b29846c83f514;hp=2fa1f5461356a191559b93b591c9037c6c75b389;hb=8366c7803ec3d0591cf2d1226fea1fee947d56c3;hpb=74ab96a45ef6259aa6a86a781580edea8488511a But I need to swallow my pride and admit I'm not sure how to reason about this. I think I'm being spooked by language in the "WAL Internals" documentation section: "... the checkpoint's position is saved in the file pg_control. Therefore, at the start of recovery, the server first reads pg_control and then the checkpoint record; then it performs the REDO operation by scanning forward from the log position indicated in the checkpoint record." From that description alone, I'd imagine a danger in redoing from a base backup in which pg_control was copied last. What if another checkpoint was made (after the one done by pg_start_backup) during the course of the backup, and the late-copied pg_control refers to it, but some of the files had been copied into the base backup too early to reflect it? Looking harder, I think I see that the special care to grab pg_control last was introduced for the case of taking a base backup from a standby, and perhaps only matters in that case. The long discussion seems to be this one: https://www.postgresql.org/message-id/201108050646.p756kHC5023570%40ccmds32.silk.ntts.co.jp What I think I've gleaned is: 1. The description in the doc ("at the start of recovery, the server first reads pg_control and the checkpoint record") only applies to the kind of recovery that happens in an unexpected restart, using the files that are present; it's not the whole story for the kind of recovery that begins with a base backup. 2. In the case of recovery from a backup (that was taken from a master), both the start and end location in pg_control are disregarded, in favor of the backup label file and the backup end WAL record, respectively, so it doesn't matter a whit whether pg_control was copied early or late. 3. In recovery from a backup taken from a standby, there is a backup label file but no backup end WAL record, so the 'minimum recovery ending location' in pg_control has to be used, and that's why the fuss about copying pg_control last when backing up from a standby. Did I get that right? If so, would it be worth adding some words to that paragraph in "WAL Internals", to clarify that the pg_control checkpoint position is not relied on when starting recovery with a backup label present, and therefore it isn't scary to copy pg_control late in the backup? It all seems to make sense ultimately, but took a lot of reading and head scratching to get there. -Chap
On Thu, Dec 21, 2017 at 10:48:49PM -0500, Chapman Flack wrote: > From that description alone, I'd imagine a danger in redoing from a > base backup in which pg_control was copied last. What if another > checkpoint was made (after the one done by pg_start_backup) during > the course of the backup, and the late-copied pg_control refers to > it, but some of the files had been copied into the base backup > too early to reflect it? As long as you have a backup_label file to guarantee the start position of recovery, that's not something to worry about. What would be bad is to remove the backup_label file from a backup, which exposes you to risks of corrupting an instance. This description stands for crash recovery, where there is no backup_label file. Now you see why the exclusive backup API can lead to problems? Imagine the case where you take a exclusive backup and the instance from which a backup is taken crashes, *with* a backup_label file on disk. Oops. That's one reason behind non-exclusive backups, which is what pg_basebackup uses as well. > Looking harder, I think I see that the special care to grab > pg_control last was introduced for the case of taking a base backup > from a standby, and perhaps only matters in that case. The long > discussion seems to be this one: > > https://www.postgresql.org/message-id/201108050646.p756kHC5023570%40ccmds32.silk.ntts.co.jp Copying pg_control last in the backup matters only for bcakups taken from standbys where you want to maximize the LSN position for minRecoveryPoint so as you have a minimum amount of risks to face inconsistent data at recovery. When taking a backup from a primary server, the WAL record marking the end of the backup holds as guarantee that a consistent point has been reached, so it does not matter to copy the control file first or last in this case. > What I think I've gleaned is: > > 1. The description in the doc ("at the start of recovery, the server > first reads pg_control and the checkpoint record") only applies to > the kind of recovery that happens in an unexpected restart, using > the files that are present; it's not the whole story for the kind > of recovery that begins with a base backup. Yes, that's a crash recovery. But see the case I just described above of an instance that crashing while an exclusive backup is running. > 2. In the case of recovery from a backup (that was taken from a master), > both the start and end location in pg_control are disregarded, in > favor of the backup label file and the backup end WAL record, > respectively, so it doesn't matter a whit whether pg_control was > copied early or late. Yes. > 3. In recovery from a backup taken from a standby, there is a backup > label file but no backup end WAL record, so the 'minimum recovery > ending location' in pg_control has to be used, and that's why the > fuss about copying pg_control last when backing up from a standby. Yes. > Did I get that right? If so, would it be worth adding some words > to that paragraph in "WAL Internals", to clarify that the pg_control > checkpoint position is not relied on when starting recovery with > a backup label present, and therefore it isn't scary to copy pg_control > late in the backup? I would be interested in seeing a patch about that, people tend to remove backup_label files too easily, so hardening the documentation a bit could be an idea to dig into. -- Michael
Вложения
On 12/22/17 00:29, Michael Paquier wrote: > exclusive backup API can lead to problems? Imagine the case where > you take a exclusive backup and the instance from which a backup is > taken crashes, *with* a backup_label file on disk. Oops. That's one > reason behind non-exclusive backups, which is what pg_basebackup I was noticing that terminology in the long backup-from-standby thread I was reading, but it wasn't clear to me how the terms originated. What's exclusive about pg_start_backup/copy/pg_stop_backup? And what's nonexclusive about pg_basebackup (which, AFAICS, is following roughly the same sequence under the hood)? By the way, what does happen in that case? I'm guessing it wakes up, sees the backup_label file, decides it's doing a PITR, and starts replaying already-applied WAL from the start-of-backup checkpoint, rather than from the most recent one? Oops. -Chap
On Fri, Dec 22, 2017 at 12:46:01AM -0500, Chapman Flack wrote: > I was noticing that terminology in the long backup-from-standby thread > I was reading, but it wasn't clear to me how the terms originated. > What's exclusive about pg_start_backup/copy/pg_stop_backup? And what's > nonexclusive about pg_basebackup (which, AFAICS, is following roughly > the same sequence under the hood)? You can run an exclusive backup only once at a time in a given PostgreSQL server as it uses the on-disk backup_label file to determine the state the server is in, while non-exclusive backups can be run across many sessions, at the cost that you need to maintain the session alive for the duration of the backup. pg_basebackup uses always non-exclusive backups, so it never creates an on-disk backup_label file on the instance from which the backup is taken but it writes the file by itself. pg_start_backup and pg_stop_backup include a set of optional arguments to control if you want to do a non-exclusive or an exclusive backup, the default being exclusive. Non-exclusive backups can also be run while an exclusive backup is running. > By the way, what does happen in that case? I'm guessing it wakes up, > sees the backup_label file, decides it's doing a PITR, and starts > replaying already-applied WAL from the start-of-backup checkpoint, > rather than from the most recent one? Oops. In short, yes. And it does not find the backup end record as well. -- Michael