Обсуждение: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

Поиск
Список
Период
Сортировка

[sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Andreas Seltenreich
Дата:
Hi,

testing with sqlsmith shows that the following assertion doesn't hold:
   FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

The triggering statements always contain a call to pg_start_backup with
the third argument 'true', i.e. it's trying to start an exlusive backup.

I didn't manage to put together a stand-alone testcase yet.

regards,
Andreas



Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Michael Paquier
Дата:
On Thu, Aug 4, 2016 at 2:19 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> testing with sqlsmith shows that the following assertion doesn't hold:
>
>     FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
>
> The triggering statements always contain a call to pg_start_backup with
> the third argument 'true', i.e. it's trying to start an exlusive backup.
>
> I didn't manage to put together a stand-alone testcase yet.

While I have not been able to trigger this assertion directly, I have
bumped into the fact that pg_stop_backup can reset unconditionally
XLogCtl->Insert.exclusiveBackup *before* pg_start_backup finishes or
even creates the backup_label file if it is set. So the in-memory
state of the backup is like there is no backups running at all
(including exclusive and non-exclusive), but there could be a
backup_label file present. In this state, it is not possible to
trigger pg_start_backup or pg_stop_backup again except if the
backup_label file is manually removed.

In do_pg_stop_backup, both steps would be better reversed, like in the
patch attached. So what we should actually do in pg_stop_backup is
first look at if the backup_label file exists, and then we reset the
in-memory flag as the last thing that do_pg_start_backup does is
writing the backup_label file. This does not close completely the
window though. After the backup_label file is created, it could still
be possible to trigger  the assertion if there is an error on the
tablespace map file.

This window exists as well on back-branches btw, this is not new to
9.6. Magnus, what do you think?
--
Michael

Вложения

Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Michael Paquier
Дата:
On Thu, Aug 4, 2016 at 4:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> This window exists as well on back-branches btw, this is not new to
> 9.6. Magnus, what do you think?

Actually, a completely water-proof solution is to use an in-memory
flag proper to exclusive backups during the time pg_start_backup() is
called, at the cost of taking WALInsertLockAcquireExclusive once again
at the end of do_pg_start_backup(), but it seems to me that it is an
acceptable cost because pg_start_backup is not a hot code path. Using
a separate flag has the advantage to provide to the user a better
error message when pg_stop_backup is used while pg_start_backup is
running as well.

Andreas, with the patch attached is the assertion still triggered?

Thoughts from others are welcome as well.
--
Michael

Вложения

Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Michael Paquier
Дата:
On Thu, Aug 4, 2016 at 4:38 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Andreas, with the patch attached is the assertion still triggered?
>
> Thoughts from others are welcome as well.

Self review:     * of streaming base backups currently in progress. forcePageWrites is set     * to true when either of
theseis non-zero. lastBackupStart is the latest     * checkpoint redo location used as a starting point for an online
backup.
+     * exclusiveBackupStarted is true while pg_start_backup() is being called
+     * during an exclusive backup.     */    bool        exclusiveBackup;    int            nonExclusiveBackups;
XLogRecPtr   lastBackupStart;
 
+    bool        exclusiveBackupStarted;
It would be better to just use one variable that uses an enum to track
the following states of an exclusive backup: none, starting and
in-progress.
-- 
Michael



Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Kyotaro HORIGUCHI
Дата:
Hello,

While the exact cause of the situation is not known yet but we
have apparently forgot that pg_stop_backup can be called
simultaneously with pg_start_backup. It seems anyway to me that
pg_stop_backup should be called before the end of pg_start_backup
from their definition and what they do. And it should fix the
assertion failure.

On solution is exclusiveBackupStarted (I'd like to rename the
variable) on shared memory as the patch does.

Another place where we can have something with the same effect is
file system. We can create 'backup_label.tmp" at very early in
pg_start_backup and rename it to backup_label at the end of the
function. Anyway exclusive pg_stop_backup needs that. A defect of
that would be the remaining backup_label.tmp file after crash
during pg_start_backup. Renaming tmp to (none) is atomic enough
for this usage but I'm not sure if it's in a network file system.
exclusiveBackup is also used this kind of exclusion, this also is
replasable with the .tmp file.

But after some thoughts, it found to need to add a bunch of error
handling code for the file operations and it should become too
complex. So I abandoned it.


At Fri, 5 Aug 2016 12:16:13 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqQeMQ8K3Vwh+T7Gq0O3i-3kiRyNPue9WRgqJQVuFAbZrQ@mail.gmail.com>
> On Thu, Aug 4, 2016 at 4:38 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > Andreas, with the patch attached is the assertion still triggered?
> >
> > Thoughts from others are welcome as well.
> 
> Self review:
>       * of streaming base backups currently in progress. forcePageWrites is set
>       * to true when either of these is non-zero. lastBackupStart is the latest
>       * checkpoint redo location used as a starting point for an online backup.
> +     * exclusiveBackupStarted is true while pg_start_backup() is being called
> +     * during an exclusive backup.
>       */
>      bool        exclusiveBackup;
>      int            nonExclusiveBackups;
>      XLogRecPtr    lastBackupStart;
> +    bool        exclusiveBackupStarted;
> It would be better to just use one variable that uses an enum to track
> the following states of an exclusive backup: none, starting and
> in-progress.

What I thought when I looked the first patch is that. Addition to
that I don't see the name exclusiveBackupStated is
appropriate.

One more argument is the necessity of the WALInsertLock at the
end of pg_start_backup. No other backend cannot reach there
concurrently and possible latency from cache coherency won't be a
matter for the purpose, I suppose. What do you think it is needed
for?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Andreas Seltenreich
Дата:
Michael Paquier writes:

> Andreas, with the patch attached is the assertion still triggered?
> [2. text/x-diff; base-backup-crash-v2.patch]

I didn't observe the crashes since applying this patch.  There should
have been about five by the amount of fuzzing done.

regards
Andreas



Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Michael Paquier
Дата:
On Sat, Aug 6, 2016 at 6:35 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> Michael Paquier writes:
>
>> Andreas, with the patch attached is the assertion still triggered?
>> [2. text/x-diff; base-backup-crash-v2.patch]
>
> I didn't observe the crashes since applying this patch.  There should
> have been about five by the amount of fuzzing done.

I have reworked the patch, replacing those two booleans with a single
enum. That's definitely clearer. Also, I have added this patch to the
CF to not lose track of it:
https://commitfest.postgresql.org/10/731/
Horiguchi-san, I have added you as a reviewer as you provided some
input. I hope you don't mind.
--
Michael

Вложения

Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Amit Kapila
Дата:
On Mon, Aug 22, 2016 at 7:13 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sat, Aug 6, 2016 at 6:35 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>> Michael Paquier writes:
>>
>>> Andreas, with the patch attached is the assertion still triggered?
>>> [2. text/x-diff; base-backup-crash-v2.patch]
>>
>> I didn't observe the crashes since applying this patch.  There should
>> have been about five by the amount of fuzzing done.
>
> I have reworked the patch, replacing those two booleans with a single
> enum. That's definitely clearer. Also, I have added this patch to the
> CF to not lose track of it:
> https://commitfest.postgresql.org/10/731/
> Horiguchi-san, I have added you as a reviewer as you provided some
> input. I hope you don't mind.
>

Won't the similar problem exists for nonExclusiveBackups?  Basically
in similar situation, the count for nonExclusiveBackups will be
decremented and if it hits pg_start_backup_callback(), the following
Assertion can fail.
pg_start_backup_callback()
{
..
Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
..
}


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Michael Paquier
Дата:
On Mon, Aug 22, 2016 at 10:09 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Won't the similar problem exists for nonExclusiveBackups?  Basically
> in similar situation, the count for nonExclusiveBackups will be
> decremented and if it hits pg_start_backup_callback(), the following
> Assertion can fail.
> pg_start_backup_callback()
> {
> ..
> Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
> ..
> }

This cannot be hit for non-exclusive backups as pg_start_backup() and
pg_stop_backup() need to be launched from the same session: their call
will never overlap across session, and this overlap is the reason why
exclusive backups are exposed to the problem.
-- 
Michael



Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Fujii Masao
Дата:
On Mon, Aug 22, 2016 at 10:43 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sat, Aug 6, 2016 at 6:35 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>> Michael Paquier writes:
>>
>>> Andreas, with the patch attached is the assertion still triggered?
>>> [2. text/x-diff; base-backup-crash-v2.patch]
>>
>> I didn't observe the crashes since applying this patch.  There should
>> have been about five by the amount of fuzzing done.
>
> I have reworked the patch, replacing those two booleans with a single
> enum. That's definitely clearer. Also, I have added this patch to the
> CF to not lose track of it:
> https://commitfest.postgresql.org/10/731/
> Horiguchi-san, I have added you as a reviewer as you provided some
> input. I hope you don't mind.

You seem to add another entry for this patch into CommitFest.
Either of them needs to be removed.
https://commitfest.postgresql.org/10/698/

This patch prevents pg_stop_backup from starting while pg_start_backup
is running. OTOH, we also should prevent pg_start_backup from starting
until "previous" pg_stop_backup has completed? What happens if
pg_start_backup starts while pg_stop_backup is running?
As far as I read the current code, ISTM that there is no need to do that,
but it's better to do the double check.

Regards,

-- 
Fujii Masao



Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Michael Paquier
Дата:
On Mon, Aug 29, 2016 at 11:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> You seem to add another entry for this patch into CommitFest.
> Either of them needs to be removed.
> https://commitfest.postgresql.org/10/698/

Indeed. I just removed this one.

> This patch prevents pg_stop_backup from starting while pg_start_backup
> is running. OTOH, we also should prevent pg_start_backup from starting
> until "previous" pg_stop_backup has completed? What happens if
> pg_start_backup starts while pg_stop_backup is running?
> As far as I read the current code, ISTM that there is no need to do that,
> but it's better to do the double check.

I don't agree about doing that. The on-disk presence of the
backup_label file is what prevents pg_start_backup to run should
pg_stop_backup have already decremented its counters but not unlinked
yet the backup_label, so this insurance is really enough IMO. One good
reason to keep pg_stop_backup as-is in its failure handling is that if
for example it fails to remove the backup_label file, it is still
possible to do again a backup by just removing manually the existing
backup_label file *without* restarting the server. If you have an
in-memory state it would not be possible to fallback to this method
and you'd need a method to clean up the in-memory state.

Now, well, with the patch as well as HEAD, it could be possible that
the backup counters are decremented, but pg_stop_backup *fails* to
remove the backup_label file which would prevent any subsequent
pg_start_backup to run, because there is still a backup_label file, as
well as any other pg_stop_backup to run, because there is no backup
running per the in-memory state. We could improve the in-memory state
by:
- Having an extra exclusive backup status to mark an exclusive backup
as stopping, say EXCLUSIVE_BACKUP_STOPPING. Then mark the exclusive
backup status as such when do_pg_stop_backup starts.
- delaying counter decrementation in pg_stop_backup to the moment when
the backup_label file has been removed.
- Taking an extra time the exclusive WAL insert lock after
backup_label is removed, and decrement the counters.
- During the time the backup_label file is removed, we need an error
callback to switch back to BACKUP_RUNNING in case of error, similarly
to do_pg_start_backup.
Which is more or less the attached patch. Now, if pg_stop_backup
fails, this has the disadvantage to force a server restart to clean up
the in-memory state, instead of just removing manually the existing
backup_file.

For those reasons I still strongly recommend using v3, and not meddle
with the error handling of pg_stop_backup. Because, well, that's
actually not necessary, and this would just hurt the error handling of
exclusive backups.
--
Michael

Вложения

Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Fujii Masao
Дата:
On Tue, Aug 30, 2016 at 1:32 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Aug 29, 2016 at 11:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> You seem to add another entry for this patch into CommitFest.
>> Either of them needs to be removed.
>> https://commitfest.postgresql.org/10/698/
>
> Indeed. I just removed this one.
>
>> This patch prevents pg_stop_backup from starting while pg_start_backup
>> is running. OTOH, we also should prevent pg_start_backup from starting
>> until "previous" pg_stop_backup has completed? What happens if
>> pg_start_backup starts while pg_stop_backup is running?
>> As far as I read the current code, ISTM that there is no need to do that,
>> but it's better to do the double check.
>
> I don't agree about doing that.

Hmm... my previous comment was confusing. I wanted to comment "it's better
*also for you* to do the double check whether we need to prevent pg_start_backup
while pg_stop_backup is running or not". Your answer seems to be almost the same
as mine, i.e., not necessary. Right?

Regards,

-- 
Fujii Masao



Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Michael Paquier
Дата:
On Tue, Aug 30, 2016 at 1:44 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Aug 30, 2016 at 1:32 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Mon, Aug 29, 2016 at 11:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> You seem to add another entry for this patch into CommitFest.
>>> Either of them needs to be removed.
>>> https://commitfest.postgresql.org/10/698/
>>
>> Indeed. I just removed this one.
>>
>>> This patch prevents pg_stop_backup from starting while pg_start_backup
>>> is running. OTOH, we also should prevent pg_start_backup from starting
>>> until "previous" pg_stop_backup has completed? What happens if
>>> pg_start_backup starts while pg_stop_backup is running?
>>> As far as I read the current code, ISTM that there is no need to do that,
>>> but it's better to do the double check.
>>
>> I don't agree about doing that.
>
> Hmm... my previous comment was confusing. I wanted to comment "it's better
> *also for you* to do the double check whether we need to prevent pg_start_backup
> while pg_stop_backup is running or not". Your answer seems to be almost the same
> as mine, i.e., not necessary. Right?

Yes, that's not necessary to do more. In the worst case, say
pg_start_backup starts when pg_stop_backup is running. And
pg_stop_backup has decremented the backup counters, but not yet
removed the backup_label, then pg_start_backup would just choke on the
presence of the backup_label file.
-- 
Michael



Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Kyotaro HORIGUCHI
Дата:
Hello,

(the header of this message is crafted so it might be isolate
this message from the thread)

The patch still applies on the current master with disaplacements.

> On Tue, Aug 30, 2016 at 1:44 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > On Tue, Aug 30, 2016 at 1:32 PM, Michael Paquier
> > <michael.paquier@gmail.com> wrote:
> >> On Mon, Aug 29, 2016 at 11:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> >>> You seem to add another entry for this patch into CommitFest.
> >>> Either of them needs to be removed.
> >>> https://commitfest.postgresql.org/10/698/
> >>
> >> Indeed. I just removed this one.
> >>
> >>> This patch prevents pg_stop_backup from starting while pg_start_backup
> >>> is running. OTOH, we also should prevent pg_start_backup from starting
> >>> until "previous" pg_stop_backup has completed? What happens if
> >>> pg_start_backup starts while pg_stop_backup is running?
> >>> As far as I read the current code, ISTM that there is no need to do that,
> >>> but it's better to do the double check.
> >>
> >> I don't agree about doing that.
> >
> > Hmm... my previous comment was confusing. I wanted to comment "it's better
> > *also for you* to do the double check whether we need to prevent pg_start_backup
> > while pg_stop_backup is running or not". Your answer seems to be almost the same
> > as mine, i.e., not necessary. Right?
> 
> Yes, that's not necessary to do more. In the worst case, say
> pg_start_backup starts when pg_stop_backup is running. And
> pg_stop_backup has decremented the backup counters, but not yet
> removed the backup_label, then pg_start_backup would just choke on the
> presence of the backup_label file

I don't see any problem on the state-transition of
exclusiveBackupState. For the following part

@@ -10217,7 +10255,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,    {      /*
 * Check for existing backup label --- implies a backup is already
 
-       * running.  (XXX given that we checked exclusiveBackup above,
+       * running.  (XXX given that we checked exclusiveBackupState above,       * maybe it would be OK to just unlink
anysuch label file?)
 

The issue in the XXX comment should be settled by this
chance. PostgreSQL no longer (or should not) places the file by
mistake of itself. It is only possible by someone evil crafting
it, or crash-then-restarted. For the former it can be safely
removed with some notice. For the latter we should remove it
since the backup taken through the restarting is not
reliable. Addition to that, issueing pg_start_backup indicates
that the operator already have forgotten that he/she issued it
previously. So it seems to me that it can be removed with some
WARNING.

The error checking on exclusiveBackupState looks somewhat
redandunt but I don't come up with a better coding.

So, everything other than the XXX comment looks fine for me, and
this is rather straghtforward patch. So after deciding what to do
for the issue and anyone opposed to this patch, I'll make this
'Ready for commiter'.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Michael Paquier
Дата:
On Fri, Nov 4, 2016 at 5:36 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> On Tue, Aug 30, 2016 at 1:44 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> > On Tue, Aug 30, 2016 at 1:32 PM, Michael Paquier
>> > <michael.paquier@gmail.com> wrote:
>> >> On Mon, Aug 29, 2016 at 11:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> >>> You seem to add another entry for this patch into CommitFest.
>> >>> Either of them needs to be removed.
>> >>> https://commitfest.postgresql.org/10/698/
>> >>
>> >> Indeed. I just removed this one.
>> >>
>> >>> This patch prevents pg_stop_backup from starting while pg_start_backup
>> >>> is running. OTOH, we also should prevent pg_start_backup from starting
>> >>> until "previous" pg_stop_backup has completed? What happens if
>> >>> pg_start_backup starts while pg_stop_backup is running?
>> >>> As far as I read the current code, ISTM that there is no need to do that,
>> >>> but it's better to do the double check.
>> >>
>> >> I don't agree about doing that.
>> >
>> > Hmm... my previous comment was confusing. I wanted to comment "it's better
>> > *also for you* to do the double check whether we need to prevent pg_start_backup
>> > while pg_stop_backup is running or not". Your answer seems to be almost the same
>> > as mine, i.e., not necessary. Right?
>>
>> Yes, that's not necessary to do more. In the worst case, say
>> pg_start_backup starts when pg_stop_backup is running. And
>> pg_stop_backup has decremented the backup counters, but not yet
>> removed the backup_label, then pg_start_backup would just choke on the
>> presence of the backup_label file
>
> I don't see any problem on the state-transition of
> exclusiveBackupState. For the following part
>
> @@ -10217,7 +10255,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
>      {
>        /*
>         * Check for existing backup label --- implies a back>
up is already
> -       * running.  (XXX given that we checked exclusiveBackup above,
> +       * running.  (XXX given that we checked exclusiveBackupState above,
>         * maybe it would be OK to just unlink any such label file?)
>
> The issue in the XXX comment should be settled by this
> chance. PostgreSQL no longer (or should not) places the file by
> mistake of itself. It is only possible by someone evil crafting
> it, or crash-then-restarted. For the former it can be safely
> removed with some notice. For the latter we should remove it
> since the backup taken through the restarting is not
> reliable. Addition to that, issueing pg_start_backup indicates
> that the operator already have forgotten that he/she issued it
> previously. So it seems to me that it can be removed with some
> WARNING.
>
> The error checking on exclusiveBackupState looks somewhat
> redandunt but I don't come up with a better coding.

Yes, that's on purpose to make the error messages more verbose for the user.

> So, everything other than the XXX comment looks fine for me, and
> this is rather straghtforward patch.

I agree that we could do something better, but that would be an
optimization, not a bug fix which is what this patch is aiming at.

> So after deciding what to do
> for the issue and anyone opposed to this patch, I'll make this
> 'Ready for commiter'.

Thanks for the input.

By the way, as long as I look at that, the patch applies cleanly on
master and 9.6 but not further down. If a committer shows up to push
this patch, I can prepare versions for back branches as needed.
-- 
Michael



Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Sat, 5 Nov 2016 21:18:42 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqS8FQf3T3ZLw=v-FMMbUEM_kKcVzfxybTnG68u-SfZJ7Q@mail.gmail.com>
> > I don't see any problem on the state-transition of
> > exclusiveBackupState. For the following part
> >
> > @@ -10217,7 +10255,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
> >      {
> >        /*
> >         * Check for existing backup label --- implies a back>
> up is already
> > -       * running.  (XXX given that we checked exclusiveBackup above,
> > +       * running.  (XXX given that we checked exclusiveBackupState above,
> >         * maybe it would be OK to just unlink any such label file?)
> >
> > The issue in the XXX comment should be settled by this
> > chance. PostgreSQL no longer (or should not) places the file by
> > mistake of itself. It is only possible by someone evil crafting
> > it, or crash-then-restarted. For the former it can be safely
> > removed with some notice. For the latter we should remove it
> > since the backup taken through the restarting is not
> > reliable. Addition to that, issueing pg_start_backup indicates
> > that the operator already have forgotten that he/she issued it
> > previously. So it seems to me that it can be removed with some
> > WARNING.
> > 
> > The error checking on exclusiveBackupState looks somewhat
> > redandunt but I don't come up with a better coding.
> 
> Yes, that's on purpose to make the error messages more verbose for the user.
> 
> > So, everything other than the XXX comment looks fine for me, and
> > this is rather straghtforward patch.
> 
> I agree that we could do something better, but that would be an
> optimization, not a bug fix which is what this patch is aiming at.

Ok, I understand that.

> > So after deciding what to do
> > for the issue and anyone opposed to this patch, I'll make this
> > 'Ready for commiter'.
> 
> Thanks for the input.
> 
> By the way, as long as I look at that, the patch applies cleanly on
> master and 9.6 but not further down. If a committer shows up to push
> this patch, I can prepare versions for back branches as needed.

Ok, the attached is the version just rebased to the current
master. It is clieanly appliable without whitespace errors on
master and 9.6 with some displacements but fails on 9.5.

I will mark this as "Ready for Committer".

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6cec027..a38692f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -463,6 +463,28 @@ typedef union WALInsertLockPadded} WALInsertLockPadded;/*
+ * State of an exclusive backup
+ *
+ * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
+ * running, to be more precise pg_start_backup() is not being executed for
+ * an exclusive backup and there is exclusive backup in progress.
+ * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is running and
+ * that is is not done yet.
+ * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is running and
+ * that is is not done yet.
+ * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() has finished
+ * running and an exclusive backup is in progress. pg_stop_backup() is
+ * needed to finish it.
+ */
+typedef enum ExclusiveBackupState
+{
+    EXCLUSIVE_BACKUP_NONE = 0,
+    EXCLUSIVE_BACKUP_STARTING,
+    EXCLUSIVE_BACKUP_STOPPING,
+    EXCLUSIVE_BACKUP_IN_PROGRESS
+} ExclusiveBackupState;
+
+/* * Shared state data for WAL insertion. */typedef struct XLogCtlInsert
@@ -503,13 +525,14 @@ typedef struct XLogCtlInsert    bool        fullPageWrites;    /*
-     * exclusiveBackup is true if a backup started with pg_start_backup() is
-     * in progress, and nonExclusiveBackups is a counter indicating the number
+     * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the
+     * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS once
+     * it is done, and nonExclusiveBackups is a counter indicating the number     * of streaming base backups
currentlyin progress. forcePageWrites is set     * to true when either of these is non-zero. lastBackupStart is the
latest    * checkpoint redo location used as a starting point for an online backup.     */
 
-    bool        exclusiveBackup;
+    ExclusiveBackupState exclusiveBackupState;    int            nonExclusiveBackups;    XLogRecPtr
lastBackupStart;
@@ -848,6 +871,7 @@ static void xlog_outrec(StringInfo buf, XLogReaderState *record);#endifstatic void
xlog_outdesc(StringInfobuf, XLogReaderState *record);static void pg_start_backup_callback(int code, Datum arg);
 
+static void pg_stop_backup_callback(int code, Datum arg);static bool read_backup_label(XLogRecPtr *checkPointLoc,
           bool *backupEndRequired, bool *backupFromStandby);static bool read_tablespace_map(List **tablespaces);
 
@@ -9957,7 +9981,21 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
WALInsertLockAcquireExclusive();   if (exclusive)    {
 
-        if (XLogCtl->Insert.exclusiveBackup)
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("a backup is already starting")));
+        }
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("a backup is currently stopping")));
+        }
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_IN_PROGRESS)        {
WALInsertLockRelease();           ereport(ERROR,
 
@@ -9965,7 +10003,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 errmsg("a backup is already in progress"),                     errhint("Run pg_stop_backup() and try again.")));
}
 
-        XLogCtl->Insert.exclusiveBackup = true;
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STARTING;    }    else
XLogCtl->Insert.nonExclusiveBackups++;
@@ -10220,7 +10258,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,        {
  /*             * Check for existing backup label --- implies a backup is already
 
-             * running.  (XXX given that we checked exclusiveBackup above,
+             * running.  (XXX given that we checked exclusiveBackupState above,             * maybe it would be OK to
justunlink any such label file?)             */            if (stat(BACKUP_LABEL_FILE, &stat_buf) != 0)
 
@@ -10302,6 +10340,16 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback,(Datum) BoolGetDatum(exclusive));    /*
 
+     * Mark that start phase has correctly finished for an exclusive backup.
+     */
+    if (exclusive)
+    {
+        WALInsertLockAcquireExclusive();
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
+        WALInsertLockRelease();
+    }
+
+    /*     * We're done.  As a convenience, return the starting WAL location.     */    if (starttli_p)
@@ -10319,8 +10367,8 @@ pg_start_backup_callback(int code, Datum arg)    WALInsertLockAcquireExclusive();    if
(exclusive)   {
 
-        Assert(XLogCtl->Insert.exclusiveBackup);
-        XLogCtl->Insert.exclusiveBackup = false;
+        Assert(XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING);
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE;    }    else    {
@@ -10328,7 +10376,7 @@ pg_start_backup_callback(int code, Datum arg)        XLogCtl->Insert.nonExclusiveBackups--;
}
-    if (!XLogCtl->Insert.exclusiveBackup &&
+    if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&        XLogCtl->Insert.nonExclusiveBackups ==
0)   {        XLogCtl->Insert.forcePageWrites = false;
 
@@ -10337,6 +10385,19 @@ pg_start_backup_callback(int code, Datum arg)}/*
+ * Error cleanup callback for pg_stop_backup, should be used only for
+ * exclusive backups.
+ */
+static void
+pg_stop_backup_callback(int code, Datum arg)
+{
+    /* Update backup status on failure */
+    WALInsertLockAcquireExclusive();
+    XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
+    WALInsertLockRelease();
+}
+
+/* * do_pg_stop_backup is the workhorse of the user-visible pg_stop_backup() * function. *
@@ -10399,19 +10460,103 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 errhint("wal_level must be set to \"replica\" or \"logical\" at server start.")));    /*
 
-     * OK to update backup counters and forcePageWrites
+     * Remove the backup_label file for an exclusive backup. Before doing
+     * anything the status of the backup is switched to ensure that there
+     * is no concurrent operation during this operation. Backup counters
+     * are updated after this phase to be able to rollback in case of
+     * failure.     */
-    WALInsertLockAcquireExclusive();    if (exclusive)    {
-        if (!XLogCtl->Insert.exclusiveBackup)
+        WALInsertLockAcquireExclusive();
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)        {            WALInsertLockRelease();
          ereport(ERROR,                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("exclusivebackup not in progress")));        }
 
-        XLogCtl->Insert.exclusiveBackup = false;
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("exclusive backup currently starting")));
+        }
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("exclusive backup currently stopping")));
+        }
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STOPPING;
+        WALInsertLockRelease();
+
+        /* remove backup_label */
+        PG_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) 0);
+        {
+            /*
+             * Read the existing label file into memory.
+             */
+            struct stat statbuf;
+            int            r;
+
+            if (stat(BACKUP_LABEL_FILE, &statbuf))
+            {
+                /* should not happen per the upper checks */
+                if (errno != ENOENT)
+                    ereport(ERROR,
+                            (errcode_for_file_access(),
+                             errmsg("could not stat file \"%s\": %m",
+                                    BACKUP_LABEL_FILE)));
+                ereport(ERROR,
+                        (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                         errmsg("a backup is not in progress")));
+            }
+
+            lfp = AllocateFile(BACKUP_LABEL_FILE, "r");
+            if (!lfp)
+            {
+                ereport(ERROR,
+                        (errcode_for_file_access(),
+                         errmsg("could not read file \"%s\": %m",
+                                BACKUP_LABEL_FILE)));
+            }
+            labelfile = palloc(statbuf.st_size + 1);
+            r = fread(labelfile, statbuf.st_size, 1, lfp);
+            labelfile[statbuf.st_size] = '\0';
+
+            /*
+             * Close and remove the backup label file
+             */
+            if (r != 1 || ferror(lfp) || FreeFile(lfp))
+                ereport(ERROR,
+                        (errcode_for_file_access(),
+                         errmsg("could not read file \"%s\": %m",
+                                BACKUP_LABEL_FILE)));
+            if (unlink(BACKUP_LABEL_FILE) != 0)
+                ereport(ERROR,
+                        (errcode_for_file_access(),
+                         errmsg("could not remove file \"%s\": %m",
+                                BACKUP_LABEL_FILE)));
+
+            /*
+             * Remove tablespace_map file if present, it is created only if there
+             * are tablespaces.
+             */
+            unlink(TABLESPACE_MAP);
+        }
+        PG_END_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) 0);
+    }
+
+    /*
+     * OK to update backup counters and forcePageWrites
+     */
+    WALInsertLockAcquireExclusive();
+    if (exclusive)
+    {
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE;    }    else    {
@@ -10425,66 +10570,13 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
XLogCtl->Insert.nonExclusiveBackups--;   }
 
-    if (!XLogCtl->Insert.exclusiveBackup &&
+    if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&        XLogCtl->Insert.nonExclusiveBackups ==
0)   {        XLogCtl->Insert.forcePageWrites = false;    }    WALInsertLockRelease();
 
-    if (exclusive)
-    {
-        /*
-         * Read the existing label file into memory.
-         */
-        struct stat statbuf;
-        int            r;
-
-        if (stat(BACKUP_LABEL_FILE, &statbuf))
-        {
-            if (errno != ENOENT)
-                ereport(ERROR,
-                        (errcode_for_file_access(),
-                         errmsg("could not stat file \"%s\": %m",
-                                BACKUP_LABEL_FILE)));
-            ereport(ERROR,
-                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-                     errmsg("a backup is not in progress")));
-        }
-
-        lfp = AllocateFile(BACKUP_LABEL_FILE, "r");
-        if (!lfp)
-        {
-            ereport(ERROR,
-                    (errcode_for_file_access(),
-                     errmsg("could not read file \"%s\": %m",
-                            BACKUP_LABEL_FILE)));
-        }
-        labelfile = palloc(statbuf.st_size + 1);
-        r = fread(labelfile, statbuf.st_size, 1, lfp);
-        labelfile[statbuf.st_size] = '\0';
-
-        /*
-         * Close and remove the backup label file
-         */
-        if (r != 1 || ferror(lfp) || FreeFile(lfp))
-            ereport(ERROR,
-                    (errcode_for_file_access(),
-                     errmsg("could not read file \"%s\": %m",
-                            BACKUP_LABEL_FILE)));
-        if (unlink(BACKUP_LABEL_FILE) != 0)
-            ereport(ERROR,
-                    (errcode_for_file_access(),
-                     errmsg("could not remove file \"%s\": %m",
-                            BACKUP_LABEL_FILE)));
-
-        /*
-         * Remove tablespace_map file if present, it is created only if there
-         * are tablespaces.
-         */
-        unlink(TABLESPACE_MAP);
-    }
-    /*     * Read and parse the START WAL LOCATION line (this code is pretty crude,     * but we are not expecting any
variabilityin the file format).
 
@@ -10721,7 +10813,7 @@ do_pg_abort_backup(void)    Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
XLogCtl->Insert.nonExclusiveBackups--;
-    if (!XLogCtl->Insert.exclusiveBackup &&
+    if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&        XLogCtl->Insert.nonExclusiveBackups ==
0)   {        XLogCtl->Insert.forcePageWrites = false; 

Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Michael Paquier
Дата:
On Mon, Nov 7, 2016 at 6:18 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> I will mark this as "Ready for Committer".

I have just noticed that Robert has switched this patch to "needs
review" by mistake (I think that there was a mistake with another
patch), so I have switched it back to "Ready for committer". I agree
with Horiguchi-san that this seems adapted, as it got some review and
some love.
-- 
Michael



Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

От
Michael Paquier
Дата:
On Thu, Nov 17, 2016 at 1:18 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Nov 7, 2016 at 6:18 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> I will mark this as "Ready for Committer".
>
> I have just noticed that Robert has switched this patch to "needs
> review" by mistake (I think that there was a mistake with another
> patch), so I have switched it back to "Ready for committer". I agree
> with Horiguchi-san that this seems adapted, as it got some review and
> some love.

Moved to next CF.
-- 
Michael



On Tue, Nov 8, 2016 at 11:18 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> At Sat, 5 Nov 2016 21:18:42 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqS8FQf3T3ZLw=v-FMMbUEM_kKcVzfxybTnG68u-SfZJ7Q@mail.gmail.com>
>> > I don't see any problem on the state-transition of
>> > exclusiveBackupState. For the following part
>> >
>> > @@ -10217,7 +10255,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
>> >      {
>> >        /*
>> >         * Check for existing backup label --- implies a back>
>> up is already
>> > -       * running.  (XXX given that we checked exclusiveBackup above,
>> > +       * running.  (XXX given that we checked exclusiveBackupState above,
>> >         * maybe it would be OK to just unlink any such label file?)
>> >
>> > The issue in the XXX comment should be settled by this
>> > chance. PostgreSQL no longer (or should not) places the file by
>> > mistake of itself. It is only possible by someone evil crafting
>> > it, or crash-then-restarted. For the former it can be safely
>> > removed with some notice. For the latter we should remove it
>> > since the backup taken through the restarting is not
>> > reliable. Addition to that, issueing pg_start_backup indicates
>> > that the operator already have forgotten that he/she issued it
>> > previously. So it seems to me that it can be removed with some
>> > WARNING.
>> >
>> > The error checking on exclusiveBackupState looks somewhat
>> > redandunt but I don't come up with a better coding.
>>
>> Yes, that's on purpose to make the error messages more verbose for the user.
>>
>> > So, everything other than the XXX comment looks fine for me, and
>> > this is rather straghtforward patch.
>>
>> I agree that we could do something better, but that would be an
>> optimization, not a bug fix which is what this patch is aiming at.
>
> Ok, I understand that.
>
>> > So after deciding what to do
>> > for the issue and anyone opposed to this patch, I'll make this
>> > 'Ready for commiter'.
>>
>> Thanks for the input.
>>
>> By the way, as long as I look at that, the patch applies cleanly on
>> master and 9.6 but not further down. If a committer shows up to push
>> this patch, I can prepare versions for back branches as needed.
>
> Ok, the attached is the version just rebased to the current
> master. It is clieanly appliable without whitespace errors on
> master and 9.6 with some displacements but fails on 9.5.
>
> I will mark this as "Ready for Committer".

Thanks for updating the patch! Here are the review comments;

+ * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
+ * running, to be more precise pg_start_backup() is not being executed for
+ * an exclusive backup and there is exclusive backup in progress.

"there is exclusive" should be "there is no exclusive"?
ISTM that the description following "to be more" doesn't seem to be necessary.

+ * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is running and
+ * that is is not done yet.
+ * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is running and
+ * that is is not done yet.

Typo: "is is"

Isn't it better to mention "an exclusive backup" here? What about

EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an exclusive
backup.
EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an exclusive
backup.

I think that it's worth explaining why ExclusiveBackupState is necessary,
in the comment.

-     * exclusiveBackup is true if a backup started with pg_start_backup() is
-     * in progress, and nonExclusiveBackups is a counter indicating the number
+     * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the
+     * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS once
+     * it is done, and nonExclusiveBackups is a counter indicating the number

Why didn't you mention EXCLUSIVE_BACKUP_STOPPING and _NONE?
Basically it's better to explain fully how the state changes.

+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("a backup is already starting")));
+        }
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("a backup is currently stopping")));

Isn't it better to use "an exclusive backup" explicitly rather than
"a backup" here?

You changed the code so that pg_stop_backup() removes backup_label before
it marks the exclusive-backup-state as not-in-progress. Could you tell me
why you did this change? It's better to explain it in the comment.

Regards,

-- 
Fujii Masao



Re: [HACKERS] Re: [sqlsmith]FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c",Line: 10200)

От
Kyotaro HORIGUCHI
Дата:
Thank you for the comment.

At Tue, 13 Dec 2016 12:49:12 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwHPZ8mwx=FcAxB2kaydhBjM-di7mxy6C2B3V5oWtddp_Q@mail.gmail.com>
> On Tue, Nov 8, 2016 at 11:18 AM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> By the way, as long as I look at that, the patch applies cleanly on
> >> master and 9.6 but not further down. If a committer shows up to push
> >> this patch, I can prepare versions for back branches as needed.
> >
> > Ok, the attached is the version just rebased to the current
> > master. It is clieanly appliable without whitespace errors on
> > master and 9.6 with some displacements but fails on 9.5.
> >
> > I will mark this as "Ready for Committer".
> 
> Thanks for updating the patch! Here are the review comments;
> 
> + * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
> + * running, to be more precise pg_start_backup() is not being executed for
> + * an exclusive backup and there is exclusive backup in progress.
> 
> "there is exclusive" should be "there is no exclusive"?
> ISTM that the description following "to be more" doesn't seem to be necessary.

I agree. One can know that by reading the description on
EXCLUSIVE_BACKUP_STARTING (and STOPPING, which is not mentioned here) 

> + * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is running and
> + * that is is not done yet.
> + * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is running and
> + * that is is not done yet.
> 
> Typo: "is is"

Oops. Sorry for overlooking such a silly typo.

> Isn't it better to mention "an exclusive backup" here? What about
> 
> EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an exclusive
> backup.
> EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an exclusive
> backup.

It seems fine. Done in the attached version.

> I think that it's worth explaining why ExclusiveBackupState is necessary,
> in the comment.

The attached version contains the following comment.

| * State of an exclusive backup.
| *
| * EXCLUSIVE_BACKUP_STARTING and EXCLUSIVE_BACKUP_STOPPING blocks
| * pg_start_backup and pg_stop_backup to prevent a race condition. Both of the
| * functions returns error on these state.

Does this make sense?

> -     * exclusiveBackup is true if a backup started with pg_start_backup() is
> -     * in progress, and nonExclusiveBackups is a counter indicating the number
> +     * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the
> +     * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS once
> +     * it is done, and nonExclusiveBackups is a counter indicating the number
> 
> Why didn't you mention EXCLUSIVE_BACKUP_STOPPING and _NONE?
> Basically it's better to explain fully how the state changes.

Hmm, it is modfied as the following in the attached.

|  * exclusiveBackupState indicates state of an exlusive backup, see
|  * ExclusiveBackupState for the detail.

and the comment for ExclusiveBackupState explains the state
transition. Does this work for you?


> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                     errmsg("a backup is already starting")));
> +        }
> +        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
> +        {
> +            WALInsertLockRelease();
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                     errmsg("a backup is currently stopping")));
> 
> Isn't it better to use "an exclusive backup" explicitly rather than
> "a backup" here?

pg_stop_backup already says like that. To unify with them, the
messages of pg_start_backup is modifed as the following.
  exlusive backup is already starting  exlusive backup is currently stopping  exlusive backup is currently in progress

> You changed the code so that pg_stop_backup() removes backup_label before
> it marks the exclusive-backup-state as not-in-progress. Could you tell me
> why you did this change? It's better to explain it in the comment.

Previously the backup_label is the traffic signal for the backup
state and that is the cause of this problem. Now the
exclusiveBackupState takes the place of it, so it naturally
should be _NONE after all steps have been finished.  It seems to
me so natural so that no additional explanation is needed.

If they are done in the opposite order, the existence check of
backup_label still left in pg_start_backup may fire.

|       * Check for existing backup label --- implies a backup is already
|       * running.  (XXX given that we checked exclusiveBackupState above,
|       * maybe it would be OK to just unlink any such label file?)
|       */
|      if (stat(BACKUP_LABEL_FILE, &stat_buf) != 0)
|      {
|        if (errno != ENOENT)
|          ereport(ERROR,


-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 084401d..fd2d809 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -463,6 +463,30 @@ typedef union WALInsertLockPadded} WALInsertLockPadded;/*
+ * State of an exclusive backup.
+ *
+ * EXCLUSIVE_BACKUP_STARTING and EXCLUSIVE_BACKUP_STOPPING blocks
+ * pg_start_backup and pg_stop_backup to prevent a race condition. Both of the
+ * functions returns error on these state.
+ *
+ * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
+ * running. pg_start_backup() can be called only on this state.
+ * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an
+ * exlusive backup. Turns into EXCLUSIVE_BACKUP_IN_PROGRESS after finishing.
+ * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() completed
+ * preparation for exlucsive backup. pg_stop_backup() finishes the state.
+ * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an
+ * exclusive backup. Turns into EXCLUSIVE_BACKUP_NONE after finishing.
+ */
+typedef enum ExclusiveBackupState
+{
+    EXCLUSIVE_BACKUP_NONE = 0,
+    EXCLUSIVE_BACKUP_STARTING,
+    EXCLUSIVE_BACKUP_STOPPING,
+    EXCLUSIVE_BACKUP_IN_PROGRESS
+} ExclusiveBackupState;
+
+/* * Shared state data for WAL insertion. */typedef struct XLogCtlInsert
@@ -503,13 +527,14 @@ typedef struct XLogCtlInsert    bool        fullPageWrites;    /*
-     * exclusiveBackup is true if a backup started with pg_start_backup() is
-     * in progress, and nonExclusiveBackups is a counter indicating the number
-     * of streaming base backups currently in progress. forcePageWrites is set
-     * to true when either of these is non-zero. lastBackupStart is the latest
-     * checkpoint redo location used as a starting point for an online backup.
+     * exclusiveBackupState indicates state of an exlusive backup, see
+     * ExclusiveBackupState for the detail. nonExclusiveBackups is a counter
+     * indicating the number of streaming base backups currently in
+     * progress. forcePageWrites is set to true when either of these is
+     * non-zero. lastBackupStart is the latest checkpoint redo location used
+     * as a starting point for an online backup.     */
-    bool        exclusiveBackup;
+    ExclusiveBackupState exclusiveBackupState;    int            nonExclusiveBackups;    XLogRecPtr
lastBackupStart;
@@ -848,6 +873,7 @@ static void xlog_outrec(StringInfo buf, XLogReaderState *record);#endifstatic void
xlog_outdesc(StringInfobuf, XLogReaderState *record);static void pg_start_backup_callback(int code, Datum arg);
 
+static void pg_stop_backup_callback(int code, Datum arg);static bool read_backup_label(XLogRecPtr *checkPointLoc,
           bool *backupEndRequired, bool *backupFromStandby);static bool read_tablespace_map(List **tablespaces);
 
@@ -9957,15 +9983,29 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
WALInsertLockAcquireExclusive();   if (exclusive)    {
 
-        if (XLogCtl->Insert.exclusiveBackup)
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("exlusive backup is already starting")));
+        }
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("exlusive backup is currently stopping")));
+        }
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_IN_PROGRESS)        {
WALInsertLockRelease();           ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-                     errmsg("a backup is already in progress"),
+                     errmsg("exlusive backup is currently in progress"),                     errhint("Run
pg_stop_backup()and try again.")));        }
 
-        XLogCtl->Insert.exclusiveBackup = true;
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STARTING;    }    else
XLogCtl->Insert.nonExclusiveBackups++;
@@ -10220,7 +10260,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,        {
  /*             * Check for existing backup label --- implies a backup is already
 
-             * running.  (XXX given that we checked exclusiveBackup above,
+             * running.  (XXX given that we checked exclusiveBackupState above,             * maybe it would be OK to
justunlink any such label file?)             */            if (stat(BACKUP_LABEL_FILE, &stat_buf) != 0)
 
@@ -10302,6 +10342,16 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback,(Datum) BoolGetDatum(exclusive));    /*
 
+     * Mark that start phase has correctly finished for an exclusive backup.
+     */
+    if (exclusive)
+    {
+        WALInsertLockAcquireExclusive();
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
+        WALInsertLockRelease();
+    }
+
+    /*     * We're done.  As a convenience, return the starting WAL location.     */    if (starttli_p)
@@ -10319,8 +10369,8 @@ pg_start_backup_callback(int code, Datum arg)    WALInsertLockAcquireExclusive();    if
(exclusive)   {
 
-        Assert(XLogCtl->Insert.exclusiveBackup);
-        XLogCtl->Insert.exclusiveBackup = false;
+        Assert(XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING);
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE;    }    else    {
@@ -10328,7 +10378,7 @@ pg_start_backup_callback(int code, Datum arg)        XLogCtl->Insert.nonExclusiveBackups--;
}
-    if (!XLogCtl->Insert.exclusiveBackup &&
+    if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&        XLogCtl->Insert.nonExclusiveBackups ==
0)   {        XLogCtl->Insert.forcePageWrites = false;
 
@@ -10337,6 +10387,19 @@ pg_start_backup_callback(int code, Datum arg)}/*
+ * Error cleanup callback for pg_stop_backup, should be used only for
+ * exclusive backups.
+ */
+static void
+pg_stop_backup_callback(int code, Datum arg)
+{
+    /* Update backup status on failure */
+    WALInsertLockAcquireExclusive();
+    XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
+    WALInsertLockRelease();
+}
+
+/* * do_pg_stop_backup is the workhorse of the user-visible pg_stop_backup() * function. *
@@ -10399,19 +10462,103 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 errhint("wal_level must be set to \"replica\" or \"logical\" at server start.")));    /*
 
-     * OK to update backup counters and forcePageWrites
+     * Remove the backup_label file for an exclusive backup. Before doing
+     * anything the status of the backup is switched to ensure that there
+     * is no concurrent operation during this operation. Backup counters
+     * are updated after this phase to be able to rollback in case of
+     * failure.     */
-    WALInsertLockAcquireExclusive();    if (exclusive)    {
-        if (!XLogCtl->Insert.exclusiveBackup)
+        WALInsertLockAcquireExclusive();
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)        {            WALInsertLockRelease();
          ereport(ERROR,                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("exclusivebackup not in progress")));        }
 
-        XLogCtl->Insert.exclusiveBackup = false;
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("exclusive backup currently starting")));
+        }
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("exclusive backup currently stopping")));
+        }
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STOPPING;
+        WALInsertLockRelease();
+
+        /* remove backup_label */
+        PG_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) 0);
+        {
+            /*
+             * Read the existing label file into memory.
+             */
+            struct stat statbuf;
+            int            r;
+
+            if (stat(BACKUP_LABEL_FILE, &statbuf))
+            {
+                /* should not happen per the upper checks */
+                if (errno != ENOENT)
+                    ereport(ERROR,
+                            (errcode_for_file_access(),
+                             errmsg("could not stat file \"%s\": %m",
+                                    BACKUP_LABEL_FILE)));
+                ereport(ERROR,
+                        (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                         errmsg("a backup is not in progress")));
+            }
+
+            lfp = AllocateFile(BACKUP_LABEL_FILE, "r");
+            if (!lfp)
+            {
+                ereport(ERROR,
+                        (errcode_for_file_access(),
+                         errmsg("could not read file \"%s\": %m",
+                                BACKUP_LABEL_FILE)));
+            }
+            labelfile = palloc(statbuf.st_size + 1);
+            r = fread(labelfile, statbuf.st_size, 1, lfp);
+            labelfile[statbuf.st_size] = '\0';
+
+            /*
+             * Close and remove the backup label file
+             */
+            if (r != 1 || ferror(lfp) || FreeFile(lfp))
+                ereport(ERROR,
+                        (errcode_for_file_access(),
+                         errmsg("could not read file \"%s\": %m",
+                                BACKUP_LABEL_FILE)));
+            if (unlink(BACKUP_LABEL_FILE) != 0)
+                ereport(ERROR,
+                        (errcode_for_file_access(),
+                         errmsg("could not remove file \"%s\": %m",
+                                BACKUP_LABEL_FILE)));
+
+            /*
+             * Remove tablespace_map file if present, it is created only if there
+             * are tablespaces.
+             */
+            unlink(TABLESPACE_MAP);
+        }
+        PG_END_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) 0);
+    }
+
+    /*
+     * OK to update backup counters and forcePageWrites
+     */
+    WALInsertLockAcquireExclusive();
+    if (exclusive)
+    {
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE;    }    else    {
@@ -10425,66 +10572,13 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
XLogCtl->Insert.nonExclusiveBackups--;   }
 
-    if (!XLogCtl->Insert.exclusiveBackup &&
+    if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&        XLogCtl->Insert.nonExclusiveBackups ==
0)   {        XLogCtl->Insert.forcePageWrites = false;    }    WALInsertLockRelease();
 
-    if (exclusive)
-    {
-        /*
-         * Read the existing label file into memory.
-         */
-        struct stat statbuf;
-        int            r;
-
-        if (stat(BACKUP_LABEL_FILE, &statbuf))
-        {
-            if (errno != ENOENT)
-                ereport(ERROR,
-                        (errcode_for_file_access(),
-                         errmsg("could not stat file \"%s\": %m",
-                                BACKUP_LABEL_FILE)));
-            ereport(ERROR,
-                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-                     errmsg("a backup is not in progress")));
-        }
-
-        lfp = AllocateFile(BACKUP_LABEL_FILE, "r");
-        if (!lfp)
-        {
-            ereport(ERROR,
-                    (errcode_for_file_access(),
-                     errmsg("could not read file \"%s\": %m",
-                            BACKUP_LABEL_FILE)));
-        }
-        labelfile = palloc(statbuf.st_size + 1);
-        r = fread(labelfile, statbuf.st_size, 1, lfp);
-        labelfile[statbuf.st_size] = '\0';
-
-        /*
-         * Close and remove the backup label file
-         */
-        if (r != 1 || ferror(lfp) || FreeFile(lfp))
-            ereport(ERROR,
-                    (errcode_for_file_access(),
-                     errmsg("could not read file \"%s\": %m",
-                            BACKUP_LABEL_FILE)));
-        if (unlink(BACKUP_LABEL_FILE) != 0)
-            ereport(ERROR,
-                    (errcode_for_file_access(),
-                     errmsg("could not remove file \"%s\": %m",
-                            BACKUP_LABEL_FILE)));
-
-        /*
-         * Remove tablespace_map file if present, it is created only if there
-         * are tablespaces.
-         */
-        unlink(TABLESPACE_MAP);
-    }
-    /*     * Read and parse the START WAL LOCATION line (this code is pretty crude,     * but we are not expecting any
variabilityin the file format).
 
@@ -10721,7 +10815,7 @@ do_pg_abort_backup(void)    Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
XLogCtl->Insert.nonExclusiveBackups--;
-    if (!XLogCtl->Insert.exclusiveBackup &&
+    if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&        XLogCtl->Insert.nonExclusiveBackups ==
0)   {        XLogCtl->Insert.forcePageWrites = false; 

On Tue, Dec 13, 2016 at 12:49 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Thanks for the review.

> Isn't it better to mention "an exclusive backup" here? What about
>
> EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an exclusive
> backup.
> EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an exclusive
> backup.

OK, changed this way.

> I think that it's worth explaining why ExclusiveBackupState is necessary,
> in the comment.

Changed like that at the top of the declaration of ExclusiveBackupState:
 * State of an exclusive backup, necessary to control concurrent activities
 * across sessions when working on exclusive backups.

> -     * exclusiveBackup is true if a backup started with pg_start_backup() is
> -     * in progress, and nonExclusiveBackups is a counter indicating the number
> +     * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the
> +     * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS once
> +     * it is done, and nonExclusiveBackups is a counter indicating the number
>
> Why didn't you mention EXCLUSIVE_BACKUP_STOPPING and _NONE?
> Basically it's better to explain fully how the state changes.

Let's simplify things instead and just say that the meaning of the values is
described where ExclusiveBackupState is declared.

> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                     errmsg("a backup is already starting")));
> +        }
> +        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
> +        {
> +            WALInsertLockRelease();
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                     errmsg("a backup is currently stopping")));
>
> Isn't it better to use "an exclusive backup" explicitly rather than
> "a backup" here?

Yes. It would be better to not touch the original in-progress messages
though.

> You changed the code so that pg_stop_backup() removes backup_label before
> it marks the exclusive-backup-state as not-in-progress. Could you tell me
> why you did this change? It's better to explain it in the comment.

That's actually mentioned in the comments :)

This is to ensure that there cannot be any other pg_stop_backup() running
in parallel and trying to remove the backup label file. The key point here
is that the backup_label file is removed during EXCLUSIVE_BACKUP_STOPPING,
that the removal of the backup_label file is kept last on purpose (that's
what HEAD does anyway), and that we can rollback to an in-progress state
in case of failure.

I have rewritten this block like that. Does that sound better?

+    * Remove backup_label for an exclusive backup. Before doing anything
+    * the status of the exclusive backup is switched to
+    * EXCLUSIVE_BACKUP_STOPPING to ensure that there cannot be concurrent
+    * operations. In case of failure, in which case the status of the
+    * exclusive backup is switched back to in-progress. The removal of the
+    * backup_label file is kept last as it is the critical piece proving
+    * if an exclusive backup is running or not in the event of a system
+    * crash.

Thanks,
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On Tue, Dec 13, 2016 at 7:15 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Thank you for the comment.

Er, it is good to see more people interested in this problem... As the
former author, still working actively on this patch, perhaps it would
be better if I continue to code this patch, no? It is a waste to step
on each other's toes.
-- 
Michael



Re: [HACKERS] Re: [sqlsmith]FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c",Line: 10200)

От
Kyotaro HORIGUCHI
Дата:
Ah, sorry for the confusion.

At Tue, 13 Dec 2016 22:43:22 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqQJ=1HH0usLyBEDJ5s_kugs6CpVLU=eLvEMRWEGkGGTDg@mail.gmail.com>
> On Tue, Dec 13, 2016 at 7:15 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > Thank you for the comment.
> 
> Er, it is good to see more people interested in this problem... As the
> former author, still working actively on this patch, perhaps it would
> be better if I continue to code this patch, no? It is a waste to step
> on each other's toes.

You are the author of this patch. I addressed that since the
comments are mainly on typos, not function. I think you do better
than me so the v6 from me should be abandoned.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





On Wed, Dec 14, 2016 at 9:25 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Ah, sorry for the confusion.
>
> At Tue, 13 Dec 2016 22:43:22 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqQJ=1HH0usLyBEDJ5s_kugs6CpVLU=eLvEMRWEGkGGTDg@mail.gmail.com>
>> On Tue, Dec 13, 2016 at 7:15 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> > Thank you for the comment.
>>
>> Er, it is good to see more people interested in this problem... As the
>> former author, still working actively on this patch, perhaps it would
>> be better if I continue to code this patch, no? It is a waste to step
>> on each other's toes.
>
> You are the author of this patch. I addressed that since the
> comments are mainly on typos, not function. I think you do better
> than me so the v6 from me should be abandoned.

No problem. Thanks for the help!
-- 
Michael



On Tue, Dec 13, 2016 at 10:24 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Dec 13, 2016 at 12:49 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> Thanks for the review.

Thanks for the updated version of the patch!

>> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> +                     errmsg("a backup is already starting")));
>> +        }
>> +        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
>> +        {
>> +            WALInsertLockRelease();
>> +            ereport(ERROR,
>> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> +                     errmsg("a backup is currently stopping")));
>>
>> Isn't it better to use "an exclusive backup" explicitly rather than
>> "a backup" here?
>
> Yes. It would be better to not touch the original in-progress messages
> though.

On second thought, do users really want to distinguish those three errornous
states? I'm inclined to merge the checks for those three conditions into one,
that is,
       if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_NONE)       {           WALInsertLockRelease();
        ereport(ERROR,                   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("abackup is already in progress"),
 

Also it may be better to handle the similar checks in pg_stop_backup,
in the same way.

>> You changed the code so that pg_stop_backup() removes backup_label before
>> it marks the exclusive-backup-state as not-in-progress. Could you tell me
>> why you did this change? It's better to explain it in the comment.
>
> That's actually mentioned in the comments :)
>
> This is to ensure that there cannot be any other pg_stop_backup() running
> in parallel and trying to remove the backup label file. The key point here
> is that the backup_label file is removed during EXCLUSIVE_BACKUP_STOPPING,
> that the removal of the backup_label file is kept last on purpose (that's
> what HEAD does anyway), and that we can rollback to an in-progress state
> in case of failure.

Okay.

Regards,

-- 
Fujii Masao



On Wed, Dec 14, 2016 at 11:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Dec 13, 2016 at 10:24 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Tue, Dec 13, 2016 at 12:49 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>> Thanks for the review.
>
> Thanks for the updated version of the patch!
>
>>> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>> +                     errmsg("a backup is already starting")));
>>> +        }
>>> +        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
>>> +        {
>>> +            WALInsertLockRelease();
>>> +            ereport(ERROR,
>>> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>> +                     errmsg("a backup is currently stopping")));
>>>
>>> Isn't it better to use "an exclusive backup" explicitly rather than
>>> "a backup" here?
>>
>> Yes. It would be better to not touch the original in-progress messages
>> though.
>
> On second thought, do users really want to distinguish those three errornous
> states? I'm inclined to merge the checks for those three conditions into one,
> that is,
>
>         if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_NONE)
>         {
>             WALInsertLockRelease();
>             ereport(ERROR,
>                     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                      errmsg("a backup is already in progress"),
>
> Also it may be better to handle the similar checks in pg_stop_backup,
> in the same way.

Here is the updated version of the patch that I applied the above "merge" to.

Unfortunately this patch is not applied cleanly to old versions.
So we need to create more patches for back-patch.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
I sent this email one month ago but forgot to cc pgsql-hackers ;)
For the record, it is the set of patches attached that have been
pushed as 974ece5, and only Fujii-san has received them... Thanks for
committing the fix by the way!

On Thu, Dec 15, 2016 at 3:30 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Dec 14, 2016 at 11:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On second thought, do users really want to distinguish those three errornous
>> states? I'm inclined to merge the checks for those three conditions into one,
>> that is,
>>
>>         if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_NONE)
>>         {
>>             WALInsertLockRelease();
>>             ereport(ERROR,
>>                     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>                      errmsg("a backup is already in progress"),
>>
>> Also it may be better to handle the similar checks in pg_stop_backup,
>> in the same way.
>
> Here is the updated version of the patch that I applied the above "merge" to.

Okay.

> Unfortunately this patch is not applied cleanly to old versions.
> So we need to create more patches for back-patch.

Attached are patches for all supported branches. I have tweaked the
following comment for master/9.6, that's why I am re-sending it:
+    * (see comments of ExclusiveBackupState for more details).

Getting master and 9.6 was straight-forward. 9.5 showed a small
conflict. 9.4 bigger conflicts but the code blocks are
straight-forward to place except that the tablespace map does not
exist there, and that we need to be careful that the same flow is used
for the removal of the backup_label file. At 9.3, conflicts are caused
because of changes of APIs for WAL insert locking. Finally 9.2 showed
no conflicts with 9.3, which was a bit surprising.
--
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения