Re: standby recovery fails (tablespace related) (tentative patchand discussion)

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: standby recovery fails (tablespace related) (tentative patchand discussion)
Дата
Msg-id 20190422.164027.33866403.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: standby recovery fails (tablespace related) (tentative patchand discussion)  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: standby recovery fails (tablespace related) (tentative patchand discussion)  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Paul Guo <pguo@pivotal.io>)
Список pgsql-hackers
Oops! The comment in the previous patch is wrong.

At Mon, 22 Apr 2019 16:15:13 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20190422.161513.258021727.horiguchi.kyotaro@lab.ntt.co.jp>
> At Mon, 22 Apr 2019 12:36:43 +0800, Paul Guo <pguo@pivotal.io> wrote in
<CAEET0ZGpUrMGUzfyzVF9FuSq+zb=QovYa2cvyRnDOTvZ5vXxTw@mail.gmail.com>
> > Please see my replies inline. Thanks.
> > 
> > On Fri, Apr 19, 2019 at 12:38 PM Asim R P <apraveen@pivotal.io> wrote:
> > 
> > > On Wed, Apr 17, 2019 at 1:27 PM Paul Guo <pguo@pivotal.io> wrote:
> > > >
> > > > create db with tablespace
> > > > drop database
> > > > drop tablespace.
> > >
> > > Essentially, that sequence of operations causes crash recovery to fail
> > > if the "drop tablespace" transaction was committed before crashing.
> > > This is a bug in crash recovery in general and should be reproducible
> > > without configuring a standby.  Is that right?
> > >
> > 
> > No. In general, checkpoint is done for drop_db/create_db/drop_tablespace on
> > master.
> > That makes the file/directory update-to-date if I understand the related
> > code correctly.
> > For standby, checkpoint redo does not ensure that.
> 
> That's right partly. As you must have seen, fast shutdown forces
> restartpoint for the last checkpoint and it prevents the problem
> from happening. Anyway it seems to be a problem.
> 
> > > Your patch creates missing directories in the destination.  Don't we
> > > need to create the tablespace symlink under pg_tblspc/?  I would
> > >
> > 
> >  'create db with tablespace' redo log does not include the tablespace real
> > directory information.
> > Yes, we could add in it into the xlog, but that seems to be an overdesign.
> 
> But I don't think creating directory that is to be removed just
> after is a wanted solution. The directory most likely to be be
> removed just after.
> 
> > > prefer extending the invalid page mechanism to deal with this, as
> > > suggested by Ashwin off-list.  It will allow us to avoid creating
> > 
> > directories and files only to remove them shortly afterwards when the
> > > drop database and drop tablespace records are replayed.
> > >
> > >
> > 'invalid page' mechanism seems to be more proper for missing pages of a
> > file. For
> > missing directories, we could, of course, hack to use that (e.g. reading
> > any page of
> > a relfile in that database) to make sure the tablespace create code
> > (without symlink)
> > safer (It assumes those directories will be deleted soon).
> > 
> > More feedback about all of the previous discussed solutions is welcome.
> 
> It doesn't seem to me that the invalid page mechanism is
> applicable in straightforward way, because it doesn't consider
> simple file copy.
> 
> Drop failure is ignored any time. I suppose we can ignore the
> error to continue recovering as far as recovery have not reached
> consistency. The attached would work *at least* your case, but I
> haven't checked this covers all places where need the same
> treatment.

The comment for the new function XLogMakePGDirectory is wrong:

+ * There is a possibility that WAL replay causes a creation of the same
+ * directory left by the previous crash. Issuing ERROR prevents the caller
+ * from continuing recovery.

The correct one is:

+ * There is a possibility that WAL replay causes an error by creation of a
+ * directory under a directory removed before the previous crash. Issuing
+ * ERROR prevents the caller from continuing recovery.

It is fixed in the attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 10a663bae6..01331f0da9 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -522,6 +522,44 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
     return buffer;
 }
 
+/*
+ * XLogMakePGDirectory
+ *
+ * There is a possibility that WAL replay causes an error by creation of a
+ * directory under a directory removed before the previous crash. Issuing
+ * ERROR prevents the caller from continuing recovery.
+ *
+ * To prevent that case, this function issues WARNING instead of ERROR on
+ * error if consistency is not reached yet.
+ */
+int
+XLogMakePGDirectory(const char *directoryName)
+{
+    int ret;
+
+    ret = MakePGDirectory(directoryName);
+
+    if (ret != 0)
+    {
+        int elevel = ERROR;
+
+        /*
+         * We might get error trying to create existing directory that is to
+         * be removed just after.  Don't issue ERROR in the case. Recovery
+         * will stop if we again failed after reaching consistency.
+         */
+        if (InRecovery && !reachedConsistency)
+            elevel = WARNING;
+
+        ereport(elevel,
+                (errcode_for_file_access(),
+                 errmsg("could not create directory \"%s\": %m", directoryName)));
+        return ret;
+    }
+
+    return 0;
+}
+
 /*
  * Struct actually returned by XLogFakeRelcacheEntry, though the declared
  * return type is Relation.
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 30f6200a86..0216270dd3 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -22,11 +22,11 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
+#include "access/xlogutils.h"
 #include "storage/copydir.h"
 #include "storage/fd.h"
 #include "miscadmin.h"
 #include "pgstat.h"
-
 /*
  * copydir: copy a directory
  *
@@ -41,10 +41,12 @@ copydir(char *fromdir, char *todir, bool recurse)
     char        fromfile[MAXPGPATH * 2];
     char        tofile[MAXPGPATH * 2];
 
-    if (MakePGDirectory(todir) != 0)
-        ereport(ERROR,
-                (errcode_for_file_access(),
-                 errmsg("could not create directory \"%s\": %m", todir)));
+    /*
+     * We might have to skip copydir to continue recovery. See the function
+     * for details.
+     */
+    if (XLogMakePGDirectory(todir) != 0)
+        return;
 
     xldir = AllocateDir(fromdir);
 
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 0ab5ba62f5..46a7596315 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -43,6 +43,7 @@ extern XLogRedoAction XLogReadBufferForRedoExtended(XLogReaderState *record,
 
 extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
                        BlockNumber blkno, ReadBufferMode mode);
+extern int XLogMakePGDirectory(const char *directoryName);
 
 extern Relation CreateFakeRelcacheEntry(RelFileNode rnode);
 extern void FreeFakeRelcacheEntry(Relation fakerel);

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: block-level incremental backup
Следующее
От: Amit Langote
Дата:
Сообщение: display of variables in EXPLAIN VERBOSE