Обсуждение: [HACKERS] Re: [COMMITTERS] pgsql: Fix an assertion failure related to anexclusive backup.
[HACKERS] Re: [COMMITTERS] pgsql: Fix an assertion failure related to anexclusive backup.
От
Michael Paquier
Дата:
On Tue, Jan 17, 2017 at 5:40 PM, Fujii Masao <fujii@postgresql.org> wrote: > Fix an assertion failure related to an exclusive backup. > > Previously multiple sessions could execute pg_start_backup() and > pg_stop_backup() to start and stop an exclusive backup at the same time. > This could trigger the assertion failure of > "FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)". > This happend because, even while pg_start_backup() was starting > an exclusive backup, other session could run pg_stop_backup() > concurrently and mark the backup as not-in-progress unconditionally. > > This patch introduces ExclusiveBackupState indicating the state of > an exclusive backup. This state is used to ensure that there is only > one session running pg_start_backup() or pg_stop_backup() at > the same time, to avoid the assertion failure. Please note that this commit message is not completely exact. This fix does not only avoid triggerring this assertion failure, it also makes sure that no manual on-disk intervention is needed by the user to remove a backup_label file after a failure of pg_stop_backup(). Before this patch, what happened is that the exclusive backup counter in XLogCtl got decremented before removing backup_label. However, after the counter was decremented, if an error occurred, the shared memory counter would have been at 0 with a backup_label file on disk. Subsequent attempts to start pg_start_backup() would have failed, and putting the system backup into a consistent state would have required an operator to remove by hand the backup_label file. The heart of the logic here is in the callback of pg_stop_backup() when an error happens during the deletion of the backup_label file. -- Michael
On Tue, Jan 17, 2017 at 10:37 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Jan 17, 2017 at 5:40 PM, Fujii Masao <fujii@postgresql.org> wrote: >> Fix an assertion failure related to an exclusive backup. >> >> Previously multiple sessions could execute pg_start_backup() and >> pg_stop_backup() to start and stop an exclusive backup at the same time. >> This could trigger the assertion failure of >> "FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)". >> This happend because, even while pg_start_backup() was starting >> an exclusive backup, other session could run pg_stop_backup() >> concurrently and mark the backup as not-in-progress unconditionally. >> >> This patch introduces ExclusiveBackupState indicating the state of >> an exclusive backup. This state is used to ensure that there is only >> one session running pg_start_backup() or pg_stop_backup() at >> the same time, to avoid the assertion failure. > > Please note that this commit message is not completely exact. This fix > does not only avoid triggerring this assertion failure, it also makes > sure that no manual on-disk intervention is needed by the user to > remove a backup_label file after a failure of pg_stop_backup(). Before > this patch, what happened is that the exclusive backup counter in > XLogCtl got decremented before removing backup_label. However, after > the counter was decremented, if an error occurred, the shared memory > counter would have been at 0 with a backup_label file on disk. > Subsequent attempts to start pg_start_backup() would have failed, and > putting the system backup into a consistent state would have required > an operator to remove by hand the backup_label file. The heart of the > logic here is in the callback of pg_stop_backup() when an error > happens during the deletion of the backup_label file. With the patch, what happens if pg_stop_backup exits with an error after removing backup_label file before resetting the backup state to none? Regards, -- Fujii Masao
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix an assertion failurerelated to an exclusive backup.
От
Michael Paquier
Дата:
On Tue, Jan 17, 2017 at 11:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > With the patch, what happens if pg_stop_backup exits with an error > after removing backup_label file before resetting the backup state > to none? Removing the backup_label file is the last error that can happen during the time the callback is set. And the counter is reset immediately after. -- Michael