Обсуждение: backup_label in a crash recovery
Hi, When a crash occurs before calling pg_stop_backup(), the subsequent crash recovery causes the FATAL error and outputs the following HINT message. If you are not restoring from a backup, try removing the file \"%s/backup_label\"." I wonder why backup_label isn't automatically removed in normal crash recovery case. Is this for the fail-safe protection; prevent admin from restoring from a backup wrongly without creating recovery.conf? Or another? If that's intentional, a clusterware for shared disk failover system should remove backup_label whenever doing failover. Otherwise, when a crash occurs during online-backup, the failover would fail. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > When a crash occurs before calling pg_stop_backup(), > the subsequent crash recovery causes the FATAL error > and outputs the following HINT message. > > If you are not restoring from a backup, try removing the file > \"%s/backup_label\"." > > I wonder why backup_label isn't automatically removed > in normal crash recovery case. Is this for the fail-safe > protection; prevent admin from restoring from a backup > wrongly without creating recovery.conf? Or another? > > If that's intentional, a clusterware for shared disk > failover system should remove backup_label whenever > doing failover. Otherwise, when a crash occurs during > online-backup, the failover would fail. I do not know if there is a good reason why the server does not ignore backup_label if recovery.conf is not present. But as it is, any failover system should definitely remove backup_label. Yours, Laurenz Albe
Fujii Masao <masao.fujii@gmail.com> writes: > I wonder why backup_label isn't automatically removed > in normal crash recovery case. Removing it automatically could be catastrophic if done incorrectly, no? > If that's intentional, a clusterware for shared disk > failover system should remove backup_label whenever > doing failover. It would be no less catastrophic if done incorrectly from outside the postmaster; see for example the problems people have had historically with startup scripts that think they should remove postmaster.pid. regards, tom lane
Tom Lane wrote: > > I wonder why backup_label isn't automatically removed > > in normal crash recovery case. > > Removing it automatically could be catastrophic if done > incorrectly, no? > > It would be no less catastrophic if done incorrectly from outside the > postmaster; see for example the problems people have had historically > with startup scripts that think they should remove postmaster.pid. I beg to differ. Removing postmaster.pid can lead to a corrupt database. Removing backup_label means that one of your backups will go wrong, and a subsequent pg_stop_backup() will throw an error. If you have a cluster failover during an online backup, I think any reasonable person would suspect that the backup went wrong. And if nothing else does, the error on pg_stop_backup() will tell you. Given a choice, I expect that everybody who is intent enough on availibility to implement such a solution will want the database to come up if it can be done without data loss. Is there a flaw in my reasoning? Yours, Laurenz Albe
>>>>> "Albe" == "Albe Laurenz" <laurenz.albe@wien.gv.at> writes: Albe> Removing postmaster.pid can lead to a corrupt database.Albe> Removing backup_label means that one of your backups willgoAlbe> wrong, and a subsequent pg_stop_backup() will throw an error. Albe> If you have a cluster failover during an online backup, I thinkAlbe> any reasonable person would suspect that the backupwent wrong.Albe> And if nothing else does, the error on pg_stop_backup() willAlbe> tell you.[...]Albe> Is there a flawin my reasoning? Yes. Imagine the following scenario: the system crashed while pg_start_backup was in effect (so backup_label exists), and the postmaster is about to start up. i.e. you're at the point where you might naively imagine that you can delete the backup_label. How do you distinguish between these two scenarios: 1) you're starting up in a data dir where you crashed in the middle of a backup 2) you're starting up in a data dir that is a restore of a base backup, but no recovery.conf has been created (hint: you can't) If in scenario 2, you remove the backup_label and proceed with recovery, there is a significant chance (depending on the timing, and if you didn't exclude pg_xlog from the backup) that recovery will _think_ it succeeds but actually leaves you with a corrupt data directory. -- Andrew (irc:RhodiumToad)
[ after further thought... ] Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > How do you distinguish between these two scenarios: > 1) you're starting up in a data dir where you crashed in the middle of > a backup > 2) you're starting up in a data dir that is a restore of a base backup, > but no recovery.conf has been created > (hint: you can't) Hmm ... you can not tell this if the postmaster just started, and I agree that removing backup_label in such a case is too risky. However, in a typical crash scenario the postmaster doesn't die, it just kills off and restarts its children; and in that scenario we do have additional knowledge, namely that the postmaster was already up. I think it could be safe and useful to forcibly remove backup_label before commencing recovery, *if* we know that the system had previously been in fully-operational status. However, this begs the question: does a backend crash necessarily imply that an in-progress base backup has to be canceled and restarted from scratch? It's not clear to me why you wouldn't consider that the backup can keep going. So maybe what we really want here is not to remove the label file, but to have the postmaster signal to the recovery process that it knows this is a crash recovery and any backup_label should be ignored. regards, tom lane
Hi, On Wed, Nov 4, 2009 at 12:01 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > 2) you're starting up in a data dir that is a restore of a base backup, > but no recovery.conf has been created Is the scenario 2 (i.e., a normal crash recovery without recovery.conf) supported in postgres? But, anyway, it's possible by admin's error in operation. So maybe backup_label should not be removed automatically for the fail-safe protection, in that case. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center