Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Дата
Msg-id 20161108.111858.10741575.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)",File: "xlog.c", Line: 10200)
Список pgsql-hackers
Hello,

At Sat, 5 Nov 2016 21:18:42 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqS8FQf3T3ZLw=v-FMMbUEM_kKcVzfxybTnG68u-SfZJ7Q@mail.gmail.com>
> > I don't see any problem on the state-transition of
> > exclusiveBackupState. For the following part
> >
> > @@ -10217,7 +10255,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
> >      {
> >        /*
> >         * Check for existing backup label --- implies a back>
> up is already
> > -       * running.  (XXX given that we checked exclusiveBackup above,
> > +       * running.  (XXX given that we checked exclusiveBackupState above,
> >         * maybe it would be OK to just unlink any such label file?)
> >
> > The issue in the XXX comment should be settled by this
> > chance. PostgreSQL no longer (or should not) places the file by
> > mistake of itself. It is only possible by someone evil crafting
> > it, or crash-then-restarted. For the former it can be safely
> > removed with some notice. For the latter we should remove it
> > since the backup taken through the restarting is not
> > reliable. Addition to that, issueing pg_start_backup indicates
> > that the operator already have forgotten that he/she issued it
> > previously. So it seems to me that it can be removed with some
> > WARNING.
> > 
> > The error checking on exclusiveBackupState looks somewhat
> > redandunt but I don't come up with a better coding.
> 
> Yes, that's on purpose to make the error messages more verbose for the user.
> 
> > So, everything other than the XXX comment looks fine for me, and
> > this is rather straghtforward patch.
> 
> I agree that we could do something better, but that would be an
> optimization, not a bug fix which is what this patch is aiming at.

Ok, I understand that.

> > So after deciding what to do
> > for the issue and anyone opposed to this patch, I'll make this
> > 'Ready for commiter'.
> 
> Thanks for the input.
> 
> By the way, as long as I look at that, the patch applies cleanly on
> master and 9.6 but not further down. If a committer shows up to push
> this patch, I can prepare versions for back branches as needed.

Ok, the attached is the version just rebased to the current
master. It is clieanly appliable without whitespace errors on
master and 9.6 with some displacements but fails on 9.5.

I will mark this as "Ready for Committer".

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6cec027..a38692f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -463,6 +463,28 @@ typedef union WALInsertLockPadded} WALInsertLockPadded;/*
+ * State of an exclusive backup
+ *
+ * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
+ * running, to be more precise pg_start_backup() is not being executed for
+ * an exclusive backup and there is exclusive backup in progress.
+ * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is running and
+ * that is is not done yet.
+ * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is running and
+ * that is is not done yet.
+ * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() has finished
+ * running and an exclusive backup is in progress. pg_stop_backup() is
+ * needed to finish it.
+ */
+typedef enum ExclusiveBackupState
+{
+    EXCLUSIVE_BACKUP_NONE = 0,
+    EXCLUSIVE_BACKUP_STARTING,
+    EXCLUSIVE_BACKUP_STOPPING,
+    EXCLUSIVE_BACKUP_IN_PROGRESS
+} ExclusiveBackupState;
+
+/* * Shared state data for WAL insertion. */typedef struct XLogCtlInsert
@@ -503,13 +525,14 @@ typedef struct XLogCtlInsert    bool        fullPageWrites;    /*
-     * exclusiveBackup is true if a backup started with pg_start_backup() is
-     * in progress, and nonExclusiveBackups is a counter indicating the number
+     * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the
+     * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS once
+     * it is done, and nonExclusiveBackups is a counter indicating the number     * of streaming base backups
currentlyin progress. forcePageWrites is set     * to true when either of these is non-zero. lastBackupStart is the
latest    * checkpoint redo location used as a starting point for an online backup.     */
 
-    bool        exclusiveBackup;
+    ExclusiveBackupState exclusiveBackupState;    int            nonExclusiveBackups;    XLogRecPtr
lastBackupStart;
@@ -848,6 +871,7 @@ static void xlog_outrec(StringInfo buf, XLogReaderState *record);#endifstatic void
xlog_outdesc(StringInfobuf, XLogReaderState *record);static void pg_start_backup_callback(int code, Datum arg);
 
+static void pg_stop_backup_callback(int code, Datum arg);static bool read_backup_label(XLogRecPtr *checkPointLoc,
           bool *backupEndRequired, bool *backupFromStandby);static bool read_tablespace_map(List **tablespaces);
 
@@ -9957,7 +9981,21 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
WALInsertLockAcquireExclusive();   if (exclusive)    {
 
-        if (XLogCtl->Insert.exclusiveBackup)
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("a backup is already starting")));
+        }
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("a backup is currently stopping")));
+        }
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_IN_PROGRESS)        {
WALInsertLockRelease();           ereport(ERROR,
 
@@ -9965,7 +10003,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 errmsg("a backup is already in progress"),                     errhint("Run pg_stop_backup() and try again.")));
}
 
-        XLogCtl->Insert.exclusiveBackup = true;
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STARTING;    }    else
XLogCtl->Insert.nonExclusiveBackups++;
@@ -10220,7 +10258,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,        {
  /*             * Check for existing backup label --- implies a backup is already
 
-             * running.  (XXX given that we checked exclusiveBackup above,
+             * running.  (XXX given that we checked exclusiveBackupState above,             * maybe it would be OK to
justunlink any such label file?)             */            if (stat(BACKUP_LABEL_FILE, &stat_buf) != 0)
 
@@ -10302,6 +10340,16 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback,(Datum) BoolGetDatum(exclusive));    /*
 
+     * Mark that start phase has correctly finished for an exclusive backup.
+     */
+    if (exclusive)
+    {
+        WALInsertLockAcquireExclusive();
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
+        WALInsertLockRelease();
+    }
+
+    /*     * We're done.  As a convenience, return the starting WAL location.     */    if (starttli_p)
@@ -10319,8 +10367,8 @@ pg_start_backup_callback(int code, Datum arg)    WALInsertLockAcquireExclusive();    if
(exclusive)   {
 
-        Assert(XLogCtl->Insert.exclusiveBackup);
-        XLogCtl->Insert.exclusiveBackup = false;
+        Assert(XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING);
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE;    }    else    {
@@ -10328,7 +10376,7 @@ pg_start_backup_callback(int code, Datum arg)        XLogCtl->Insert.nonExclusiveBackups--;
}
-    if (!XLogCtl->Insert.exclusiveBackup &&
+    if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&        XLogCtl->Insert.nonExclusiveBackups ==
0)   {        XLogCtl->Insert.forcePageWrites = false;
 
@@ -10337,6 +10385,19 @@ pg_start_backup_callback(int code, Datum arg)}/*
+ * Error cleanup callback for pg_stop_backup, should be used only for
+ * exclusive backups.
+ */
+static void
+pg_stop_backup_callback(int code, Datum arg)
+{
+    /* Update backup status on failure */
+    WALInsertLockAcquireExclusive();
+    XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
+    WALInsertLockRelease();
+}
+
+/* * do_pg_stop_backup is the workhorse of the user-visible pg_stop_backup() * function. *
@@ -10399,19 +10460,103 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 errhint("wal_level must be set to \"replica\" or \"logical\" at server start.")));    /*
 
-     * OK to update backup counters and forcePageWrites
+     * Remove the backup_label file for an exclusive backup. Before doing
+     * anything the status of the backup is switched to ensure that there
+     * is no concurrent operation during this operation. Backup counters
+     * are updated after this phase to be able to rollback in case of
+     * failure.     */
-    WALInsertLockAcquireExclusive();    if (exclusive)    {
-        if (!XLogCtl->Insert.exclusiveBackup)
+        WALInsertLockAcquireExclusive();
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)        {            WALInsertLockRelease();
          ereport(ERROR,                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("exclusivebackup not in progress")));        }
 
-        XLogCtl->Insert.exclusiveBackup = false;
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("exclusive backup currently starting")));
+        }
+        if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
+        {
+            WALInsertLockRelease();
+            ereport(ERROR,
+                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                     errmsg("exclusive backup currently stopping")));
+        }
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STOPPING;
+        WALInsertLockRelease();
+
+        /* remove backup_label */
+        PG_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) 0);
+        {
+            /*
+             * Read the existing label file into memory.
+             */
+            struct stat statbuf;
+            int            r;
+
+            if (stat(BACKUP_LABEL_FILE, &statbuf))
+            {
+                /* should not happen per the upper checks */
+                if (errno != ENOENT)
+                    ereport(ERROR,
+                            (errcode_for_file_access(),
+                             errmsg("could not stat file \"%s\": %m",
+                                    BACKUP_LABEL_FILE)));
+                ereport(ERROR,
+                        (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                         errmsg("a backup is not in progress")));
+            }
+
+            lfp = AllocateFile(BACKUP_LABEL_FILE, "r");
+            if (!lfp)
+            {
+                ereport(ERROR,
+                        (errcode_for_file_access(),
+                         errmsg("could not read file \"%s\": %m",
+                                BACKUP_LABEL_FILE)));
+            }
+            labelfile = palloc(statbuf.st_size + 1);
+            r = fread(labelfile, statbuf.st_size, 1, lfp);
+            labelfile[statbuf.st_size] = '\0';
+
+            /*
+             * Close and remove the backup label file
+             */
+            if (r != 1 || ferror(lfp) || FreeFile(lfp))
+                ereport(ERROR,
+                        (errcode_for_file_access(),
+                         errmsg("could not read file \"%s\": %m",
+                                BACKUP_LABEL_FILE)));
+            if (unlink(BACKUP_LABEL_FILE) != 0)
+                ereport(ERROR,
+                        (errcode_for_file_access(),
+                         errmsg("could not remove file \"%s\": %m",
+                                BACKUP_LABEL_FILE)));
+
+            /*
+             * Remove tablespace_map file if present, it is created only if there
+             * are tablespaces.
+             */
+            unlink(TABLESPACE_MAP);
+        }
+        PG_END_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) 0);
+    }
+
+    /*
+     * OK to update backup counters and forcePageWrites
+     */
+    WALInsertLockAcquireExclusive();
+    if (exclusive)
+    {
+        XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE;    }    else    {
@@ -10425,66 +10570,13 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
XLogCtl->Insert.nonExclusiveBackups--;   }
 
-    if (!XLogCtl->Insert.exclusiveBackup &&
+    if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&        XLogCtl->Insert.nonExclusiveBackups ==
0)   {        XLogCtl->Insert.forcePageWrites = false;    }    WALInsertLockRelease();
 
-    if (exclusive)
-    {
-        /*
-         * Read the existing label file into memory.
-         */
-        struct stat statbuf;
-        int            r;
-
-        if (stat(BACKUP_LABEL_FILE, &statbuf))
-        {
-            if (errno != ENOENT)
-                ereport(ERROR,
-                        (errcode_for_file_access(),
-                         errmsg("could not stat file \"%s\": %m",
-                                BACKUP_LABEL_FILE)));
-            ereport(ERROR,
-                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-                     errmsg("a backup is not in progress")));
-        }
-
-        lfp = AllocateFile(BACKUP_LABEL_FILE, "r");
-        if (!lfp)
-        {
-            ereport(ERROR,
-                    (errcode_for_file_access(),
-                     errmsg("could not read file \"%s\": %m",
-                            BACKUP_LABEL_FILE)));
-        }
-        labelfile = palloc(statbuf.st_size + 1);
-        r = fread(labelfile, statbuf.st_size, 1, lfp);
-        labelfile[statbuf.st_size] = '\0';
-
-        /*
-         * Close and remove the backup label file
-         */
-        if (r != 1 || ferror(lfp) || FreeFile(lfp))
-            ereport(ERROR,
-                    (errcode_for_file_access(),
-                     errmsg("could not read file \"%s\": %m",
-                            BACKUP_LABEL_FILE)));
-        if (unlink(BACKUP_LABEL_FILE) != 0)
-            ereport(ERROR,
-                    (errcode_for_file_access(),
-                     errmsg("could not remove file \"%s\": %m",
-                            BACKUP_LABEL_FILE)));
-
-        /*
-         * Remove tablespace_map file if present, it is created only if there
-         * are tablespaces.
-         */
-        unlink(TABLESPACE_MAP);
-    }
-    /*     * Read and parse the START WAL LOCATION line (this code is pretty crude,     * but we are not expecting any
variabilityin the file format).
 
@@ -10721,7 +10813,7 @@ do_pg_abort_backup(void)    Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
XLogCtl->Insert.nonExclusiveBackups--;
-    if (!XLogCtl->Insert.exclusiveBackup &&
+    if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&        XLogCtl->Insert.nonExclusiveBackups ==
0)   {        XLogCtl->Insert.forcePageWrites = false; 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Do we need use more meaningful variables to replace 0 in catalog head files?
Следующее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: Re: Re: BUG #13755: pgwin32_is_service not checking if SECURITY_SERVICE_SID is disabled