Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
От | Michael Paquier |
---|---|
Тема | Re: Requiring recovery.signal or standby.signal when recovering with a backup_label |
Дата | |
Msg-id | ZUGQJQjV3QqH0rRd@paquier.xyz обсуждение исходный текст |
Ответ на | Re: Requiring recovery.signal or standby.signal when recovering with a backup_label (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
(Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: Requiring recovery.signal or standby.signal when recovering with a backup_label (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Tue, Oct 31, 2023 at 08:28:07AM -0400, Robert Haas wrote: > On Mon, Oct 30, 2023 at 8:40 PM Michael Paquier <michael@paquier.xyz> wrote: >> As far as I know, there's one paragraph in the docs that implies this >> mode without giving an actual hint that this may be OK or not, so >> shrug: >> https://www.postgresql.org/docs/devel/continuous-archiving.html#BACKUP-TIPS >> "As with base backups, the easiest way to produce a standalone hot >> backup is to use the pg_basebackup tool. If you include the -X >> parameter when calling it, all the write-ahead log required to use the >> backup will be included in the backup automatically, and no special >> action is required to restore the backup." > > I see your point, but that's way too subtle. As far as I know, the > only actually-documented procedure for restoring is this one: > https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-PITR-RECOVERY > > That procedure actually is badly in need of some updating, IMHO, > because close to half of it is about moving your existing database > cluster out of the way, which may or may not be needed in the case of > any particular backup restore. Also, it unconditionally mentions > creating recovery.signal, with no mention of standby.signal. And > certainly not with neither. It also gives zero motivation for actually > doing this and says nothing useful about backup_label. > > Both recovery.signal and standby.signal are documented in > https://www.postgresql.org/docs/current/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVE-RECOVERY > but you'd have no real reason to look in a list of GUCs for > information about a file on disk. recovery.signal but not > standby.signal is mentioned in > https://www.postgresql.org/docs/current/warm-standby.html but nowhere > that I can find do we explicitly talk about running with at least one > of them. Point 7. of what you quote says to use one? True that this needs a refresh, and perhaps a bit fat warning about the fact that these are required if you want to fetch WAL from other sources than the local pg_wal/. Perhaps there may be a point of revisiting the default behavior of recovery_target_timeline in this case, I don't know. >> As you're telling me, and I've considered that as an option as well, >> perhaps we should just consider the presence of a backup_label file >> with no .signal files as a synonym of crash recovery? In the recovery >> path, currently the essence of the problem is when we do >> InArchiveRecovery=true, but ArchiveRecoveryRequested=false, meaning >> that it should do archive recovery but we don't want it, and that does >> not really make sense. The rest of the code sort of implies that this >> is not a suported combination. So basically, my suggestion here, is >> to just replay WAL up to the end of what's in your local pg_wal/ and >> hope for the best, without TLI jumps, except that we'd do nothing. > > This sentence seems to be incomplete. I've re-read it, and it looks OK to me. What I mean with this paragraph are two things: - Remove InArchiveRecovery=true and ArchiveRecoveryRequested=false as a possible combination in the code. - Treat backup_label with no .signal file as the same as crash recovery, that: -- Does no TLI jump at the end of recovery. -- Expects all the WAL to be in pg_wal/. > But I was not saying we should treat the case where we have a > backup_label file like crash recovery. The real question here is why > we don't treat it fully like archive recovery. Timeline jump at the end of recovery? Archive recovery forces a TLI jump by default at the end of redo if there's a signal file, and some users may not want a TLI jump by default? > I don't know off-hand > what is different if I start the server with both backup_label and > recovery.signal vs. if I start it with only backup_label, but I > question whether there should be any difference at all. Perhaps we could do that, but note that backup_label is renamed to backup_label.old at the beginning of redo. The code has historically always enforced InArchiveRecovery=true when there's a backup label, and InArchiveRecovery=false where there is no backup label, so we don't get the same recovery behavior if a cluster is restarted while it was still performing recovery. I don't quite see how it is possible to make this code simpler without enforcing a policy to take care of this inconsistency. I've listed two of them on this thread: - Force the presence of a .recovery file when there is a backup_label, to force archive recovery. - Force crash recovery if there are no signal files but a backup_label, then a restart of a cluster that began a restore while it processed a backup would be confused: should it do crash recovery or archive recovery? My guess, based on what I read from the feedback of this thread, is that it could be more helpful to do the second thing, not the third one, because this is better with standalone backups: no TLI jumps and restore happens with all the local WAL in pg_wal/, without any GUCs to control how recovery should run. You are suggesting a third, hybrid, approach. Now note we have always checked for signal files before the backup_label. Recovery GUCs are checked only if there's one of the two signal files. It seems to me that what you are suggesting would make the code a bit harder to follow, actually, and more inconsistent with stable branches because we would need to check the control file contents *before* checking for the .signal files or backup_label to be able to see if archive recovery *should* happen, depending on if there's a backupEndPoint. >> A point of contention is if we'd better be stricter about satisfying >> backupEndPoint in such a case, but the redo code only wants to do >> something here when ArchiveRecoveryRequested is set (aka there's a >> .signal file set), and we would not want a TLI jump at the end of >> recovery, so I don't see an argument with caring about backupEndPoint >> in this case. > > This is a bit hard for me to understand, but I disagree strongly with > the idea that we should ever ignore a backup end point if we have one. Actually, while experimenting yesterday before sending my reply to you, I have noticed that redo cares about backupEndPoint even if you force crash recovery when there's only a backup_label file. There's a case in pg_basebackup that would fail, but that's accidental, AFAIK: https://www.postgresql.org/message-id/ZUBVKfL6FR6NOQyt%40paquier.xyz See in StartupXLOG(), around the comment "complain if we did not roll forward far enough to reach". This complains if archive recovery has been requested *or* if we retrieved a backup end LSN from the backup_label. -- Michael
Вложения
В списке pgsql-hackers по дате отправления: