Обсуждение: backup_label and server start
If the postmaster is stopped with 'pg_ctl stop' while an online backup is in progress, the 'backup_label' file will remain in the data directory. There is no recovery.conf file present. When the server is started again, it attempts to recover from the checkpoint marked in the backup_label file even if the shutdown was clean. If the WAL file mentioned in backup_label is not in pg_xlog (it has already been archived and removed because there was enough database activity since pg_start_backup()), the startup process will fail with a message like this: LOG: could not open file "pg_xlog/000000020000000000000084" (log file 0, segment 132): No such file or directory LOG: invalid checkpoint record PANIC: could not locate required checkpoint record HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label". My question: Is it safe to just delete the file as the hint suggests? I see the following comment in src/backend/access/transam/xlog.c: /** read_backup_label: check to see if a backup_label file is present** If we see a backup_label during recovery, we assumethat we are recovering* from a backup dump file, and we therefore roll forward from the checkpoint* identified by thelabel file, NOT what pg_control says. This avoids the* problem that pg_control might have been archived one or morecheckpoints* later than the start of the dump, and so if we rely on it as the start* point, we will fail to restore aconsistent database state. "We will fail to restore a consistent database state" sounds rather intimidating. *If* - on the other hand - it is safe to follow the hint and remove the backup_label, wouldn't it be a good thing for the startup process to ignore (and rename) the backup_label file if no recovery.conf is present? Or, alternatively, the backup_label file could by removed by a clean shutdown. Thanks, Laurenz Albe
"Albe Laurenz" <laurenz.albe@wien.gv.at> writes: > wouldn't it be a good thing > for the startup process to ignore (and rename) the backup_label > file if no recovery.conf is present? No, it certainly wouldn't. I don't see why we should simplify the bizarre case you're talking about at the price of putting land mines under the feet of people who are actually trying to do a restore. It hasn't lost any data for you, and it gave you a correct HINT, so I don't have a problem with the current behavior. regards, tom lane
On Tue, 2007-11-20 at 15:19 +0100, Albe Laurenz wrote: > "We will fail to restore a consistent database state" > sounds rather intimidating. Well, how else should a warning of data loss sound? :-) It's vaguely possible that the database state could be consistent, if the server were quiet when you stopped it. But that is unlikely *and* there is no way of knowing for certain, that is why we introduced pg_stop_backup() in the first place. > *If* - on the other hand - it is safe to follow the hint > and remove the backup_label, wouldn't it be a good thing > for the startup process to ignore (and rename) the backup_label > file if no recovery.conf is present? The hint is telling you how to restart the original server, not a crafty way of cheating the process to allow you to use it for backup. What are you trying to do? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
>> If the postmaster is stopped with 'pg_ctl stop' while an >> online backup is in progress, the 'backup_label' file will remain >> in the data directory. [...] >> the startup process will fail with a message like this: [...] >> PANIC: could not locate required checkpoint record >> HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label". >> >> wouldn't it be a good thing >> for the startup process to ignore (and rename) the backup_label >> file if no recovery.conf is present? Tom Lane replied: > No, it certainly wouldn't. Point taken. When backup_label is present and recovery.conf isn't, there is the risk that the data directory has been restored from an online backup, in which case using the latest available checkpoint would be detrimental. > I don't see why we should simplify the bizarre case you're > talking about Well, it's not a bizarre case, it has happened twice here. If somebody stops the postmaster while an online backup is in progress, there is no warning or nothing. Only the server will fail to restart. One of our databases is running in a RedHat cluster, which in this case cannot failover to another node. And this can also happen during an online backup. Simon Riggs replied: > The hint is telling you how to restart the original server, not a crafty > way of cheating the process to allow you to use it for backup. > > What are you trying to do? You misunderstood me, I'm not trying to cheat anything, nor do I want to restore a backup that way. All I want to do is restart a server after a clean shutdown. How about my second suggestion: Remove backup_label when the server shuts down cleanly. In that case an online backup in progress will not be useful anyway, and there is no need to recover on server restart. What do you think? Yours, Laurenz Albe
On Wed, 2007-11-21 at 09:04 +0100, Albe Laurenz wrote: > If somebody stops the postmaster while an online backup is > in progress, there is no warning or nothing. Only the server > will fail to restart. Well, it seems best not to do this. There is always a need for a careful procedure to manually shutdown a live server, interlocking with other applications. ISTM like a manual procedure will resolve this for you. If we remove the file in the place you suggest then an Archive Recovery will succeed when it should fail, with no possibility of a hint, which seems a worse error. > All I want to do is restart a server after a clean shutdown. > > How about my second suggestion: > > Remove backup_label when the server shuts down cleanly. > In that case an online backup in progress will not be useful > anyway, and there is no need to recover on server restart. That will make PITRs fail: 1. pg_start_backup() 2. backup 3. shutdown, removes backup_label 4. pg_stop_backup() step 4 will now fail because of a missing backup_label file. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On 21/11/2007, Simon Riggs <simon@2ndquadrant.com> wrote:
On Wed, 2007-11-21 at 09:04 +0100, Albe Laurenz wrote:
> If somebody stops the postmaster while an online backup is
> in progress, there is no warning or nothing. Only the server
> will fail to restart.
Well, it seems best not to do this. There is always a need for a careful
procedure to manually shutdown a live server, interlocking with other
applications. ISTM like a manual procedure will resolve this for you.
If we remove the file in the place you suggest then an Archive Recovery
will succeed when it should fail, with no possibility of a hint, which
seems a worse error.
> All I want to do is restart a server after a clean shutdown.
>
> How about my second suggestion:
>
> Remove backup_label when the server shuts down cleanly.
> In that case an online backup in progress will not be useful
> anyway, and there is no need to recover on server restart.
That will make PITRs fail:
1. pg_start_backup()
2. backup
3. shutdown, removes backup_label
4. pg_stop_backup()
step 4 will now fail because of a missing backup_label file.
How about this, emit a warning on shutdown and fail to shutdown until the backup has finished.
Seams to me that either way your sunk if you shut down a server while a backup is in progress. Your only way out is to work out weather to use the previous pitr backups plus logs or remove the label. Doing it automatically would be very very dangerous.
Peter.
On Wed, 2007-11-21 at 09:47 +0000, Peter Childs wrote: > How about this, emit a warning on shutdown and fail to shutdown until > the backup has finished. That would be reasonable for -m smart shutdown. We would then be treating the backup as a connection. ...but not for a fast shutdown. Any comments against? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: >> If somebody stops the postmaster while an online backup is >> in progress, there is no warning or nothing. Only the server >> will fail to restart. > > Well, it seems best not to do this. There is always a need > for a careful > procedure to manually shutdown a live server, interlocking with other > applications. ISTM like a manual procedure will resolve this for you. You're arguing that there *should* be a manual intervention if a server was shutdown while a backup was active. > If we remove the file in the place you suggest then an Archive Recovery > will succeed when it should fail, with no possibility of a hint, which > seems a worse error. > >> How about my second suggestion: >> >> Remove backup_label when the server shuts down cleanly. >> In that case an online backup in progress will not be useful >> anyway, and there is no need to recover on server restart. > > That will make PITRs fail: > > 1. pg_start_backup() > 2. backup > 3. shutdown, removes backup_label > 4. pg_stop_backup() > > step 4 will now fail because of a missing backup_label file. Using the same kind of argument as you did above I would say that pg_stop_backup() *should* fail if the server restarted (and recovered!) inbetween - there was certainly something fishy going on during the online backup. In your list, you left out step 3.5: restart the server. This step may fail if you do *not* remove the backup_label. What is worse: - Have pg_stop_backup() fail if the server was shut down during the backup or - Prevent the server from restarting at all without manual intervention. I would say the latter. Yours, Laurenz Albe
On Wed, 2007-11-21 at 15:04 +0100, Albe Laurenz wrote: > Simon Riggs wrote: > >> If somebody stops the postmaster while an online backup is > >> in progress, there is no warning or nothing. Only the server > >> will fail to restart. > > > > Well, it seems best not to do this. There is always a need > > for a careful > > procedure to manually shutdown a live server, interlocking with other > > applications. ISTM like a manual procedure will resolve this for you. > > You're arguing that there *should* be a manual intervention > if a server was shutdown while a backup was active. Shutting down the server was a manual action, so what is wrong in a manual action to recover from that mistake? If the shutdown was automatic, then it needs to be properly scheduled so automatic actions do not conflict with one another. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > That will make PITRs fail: > > 1. pg_start_backup() > 2. backup > 3. shutdown, removes backup_label > 4. pg_stop_backup() > > step 4 will now fail because of a missing backup_label file. Wait a minute: pg_stop_backup() will also fail in the current setup, because after recovery backup_label gets renamed to backup_label.old. So what do we lose if we remove (or rename) backup_label on a clean server shutdown? Yours, Laurenz Albe
Simon Riggs wrote: > On Wed, 2007-11-21 at 09:47 +0000, Peter Childs wrote: >> How about this, emit a warning on shutdown and fail to shutdown until >> the backup has finished. > > That would be reasonable for -m smart shutdown. > > We would then be treating the backup as a connection. > > ...but not for a fast shutdown. > > Any comments against? No, that would be ok with me. Anything that gets us out of the trap that you can shutdown a server without any warning and then cannot restart it without manual intervention. What about: refuse shutdown for "smart" if a backup is in progress, but shutdown with a loud warning for "fast". ... I still don't know what's wrong with removing backup_label upon a clean server shutdown ... Yours, Laurenz Albe
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Albe Laurenz wrote: > >> If the postmaster is stopped with 'pg_ctl stop' while an > >> online backup is in progress, the 'backup_label' file will remain > >> in the data directory. > [...] > >> the startup process will fail with a message like this: > [...] > >> PANIC: could not locate required checkpoint record > >> HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label". > >> > >> wouldn't it be a good thing > >> for the startup process to ignore (and rename) the backup_label > >> file if no recovery.conf is present? > > Tom Lane replied: > > No, it certainly wouldn't. > > Point taken. When backup_label is present and recovery.conf isn't, > there is the risk that the data directory has been restored from > an online backup, in which case using the latest available > checkpoint would be detrimental. > > > I don't see why we should simplify the bizarre case you're > > talking about > > Well, it's not a bizarre case, it has happened twice here. > > If somebody stops the postmaster while an online backup is > in progress, there is no warning or nothing. Only the server > will fail to restart. > > One of our databases is running in a RedHat cluster, which > in this case cannot failover to another node. > And this can also happen during an online backup. > > Simon Riggs replied: > > The hint is telling you how to restart the original server, not a crafty > > way of cheating the process to allow you to use it for backup. > > > > What are you trying to do? > > You misunderstood me, I'm not trying to cheat anything, nor do > I want to restore a backup that way. > > All I want to do is restart a server after a clean shutdown. > > How about my second suggestion: > > Remove backup_label when the server shuts down cleanly. > In that case an online backup in progress will not be useful > anyway, and there is no need to recover on server restart. > > What do you think? > > Yours, > Laurenz Albe > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Added to TODO: o Fix server restart problem when the server was shutdown during a PITR backup http://archives.postgresql.org/pgsql-hackers/2007-11/msg00800.php --------------------------------------------------------------------------- Albe Laurenz wrote: > >> If the postmaster is stopped with 'pg_ctl stop' while an > >> online backup is in progress, the 'backup_label' file will remain > >> in the data directory. > [...] > >> the startup process will fail with a message like this: > [...] > >> PANIC: could not locate required checkpoint record > >> HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label". > >> > >> wouldn't it be a good thing > >> for the startup process to ignore (and rename) the backup_label > >> file if no recovery.conf is present? > > Tom Lane replied: > > No, it certainly wouldn't. > > Point taken. When backup_label is present and recovery.conf isn't, > there is the risk that the data directory has been restored from > an online backup, in which case using the latest available > checkpoint would be detrimental. > > > I don't see why we should simplify the bizarre case you're > > talking about > > Well, it's not a bizarre case, it has happened twice here. > > If somebody stops the postmaster while an online backup is > in progress, there is no warning or nothing. Only the server > will fail to restart. > > One of our databases is running in a RedHat cluster, which > in this case cannot failover to another node. > And this can also happen during an online backup. > > Simon Riggs replied: > > The hint is telling you how to restart the original server, not a crafty > > way of cheating the process to allow you to use it for backup. > > > > What are you trying to do? > > You misunderstood me, I'm not trying to cheat anything, nor do > I want to restore a backup that way. > > All I want to do is restart a server after a clean shutdown. > > How about my second suggestion: > > Remove backup_label when the server shuts down cleanly. > In that case an online backup in progress will not be useful > anyway, and there is no need to recover on server restart. > > What do you think? > > Yours, > Laurenz Albe > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +