Re: standby recovery fails (tablespace related) (tentative patchand discussion)

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: standby recovery fails (tablespace related) (tentative patchand discussion)
Дата
Msg-id 20190422.161513.258021727.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Paul Guo <pguo@pivotal.io>)
Ответы Re: standby recovery fails (tablespace related) (tentative patchand discussion)  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
Hello.

At Mon, 22 Apr 2019 12:36:43 +0800, Paul Guo <pguo@pivotal.io> wrote in
<CAEET0ZGpUrMGUzfyzVF9FuSq+zb=QovYa2cvyRnDOTvZ5vXxTw@mail.gmail.com>
> Please see my replies inline. Thanks.
> 
> On Fri, Apr 19, 2019 at 12:38 PM Asim R P <apraveen@pivotal.io> wrote:
> 
> > On Wed, Apr 17, 2019 at 1:27 PM Paul Guo <pguo@pivotal.io> wrote:
> > >
> > > create db with tablespace
> > > drop database
> > > drop tablespace.
> >
> > Essentially, that sequence of operations causes crash recovery to fail
> > if the "drop tablespace" transaction was committed before crashing.
> > This is a bug in crash recovery in general and should be reproducible
> > without configuring a standby.  Is that right?
> >
> 
> No. In general, checkpoint is done for drop_db/create_db/drop_tablespace on
> master.
> That makes the file/directory update-to-date if I understand the related
> code correctly.
> For standby, checkpoint redo does not ensure that.

That's right partly. As you must have seen, fast shutdown forces
restartpoint for the last checkpoint and it prevents the problem
from happening. Anyway it seems to be a problem.

> > Your patch creates missing directories in the destination.  Don't we
> > need to create the tablespace symlink under pg_tblspc/?  I would
> >
> 
>  'create db with tablespace' redo log does not include the tablespace real
> directory information.
> Yes, we could add in it into the xlog, but that seems to be an overdesign.

But I don't think creating directory that is to be removed just
after is a wanted solution. The directory most likely to be be
removed just after.

> > prefer extending the invalid page mechanism to deal with this, as
> > suggested by Ashwin off-list.  It will allow us to avoid creating
> 
> directories and files only to remove them shortly afterwards when the
> > drop database and drop tablespace records are replayed.
> >
> >
> 'invalid page' mechanism seems to be more proper for missing pages of a
> file. For
> missing directories, we could, of course, hack to use that (e.g. reading
> any page of
> a relfile in that database) to make sure the tablespace create code
> (without symlink)
> safer (It assumes those directories will be deleted soon).
> 
> More feedback about all of the previous discussed solutions is welcome.

It doesn't seem to me that the invalid page mechanism is
applicable in straightforward way, because it doesn't consider
simple file copy.

Drop failure is ignored any time. I suppose we can ignore the
error to continue recovering as far as recovery have not reached
consistency. The attached would work *at least* your case, but I
haven't checked this covers all places where need the same
treatment.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 10a663bae6..0bc63f48da 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -522,6 +522,44 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
     return buffer;
 }
 
+/*
+ * XLogMakePGDirectory
+ *
+ * There is a possibility that WAL replay causes a creation of the same
+ * directory left by the previous crash. Issuing ERROR prevents the caller
+ * from continuing recovery.
+ *
+ * To prevent that case, this function issues WARNING instead of ERROR on
+ * error if consistency is not reached yet.
+ */
+int
+XLogMakePGDirectory(const char *directoryName)
+{
+    int ret;
+
+    ret = MakePGDirectory(directoryName);
+
+    if (ret != 0)
+    {
+        int elevel = ERROR;
+
+        /*
+         * We might get error trying to create existing directory that is to
+         * be removed just after.  Don't issue ERROR in the case. Recovery
+         * will stop if we again failed after reaching consistency.
+         */
+        if (InRecovery && !reachedConsistency)
+            elevel = WARNING;
+
+        ereport(elevel,
+                (errcode_for_file_access(),
+                 errmsg("could not create directory \"%s\": %m", directoryName)));
+        return ret;
+    }
+
+    return 0;
+}
+
 /*
  * Struct actually returned by XLogFakeRelcacheEntry, though the declared
  * return type is Relation.
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 30f6200a86..0216270dd3 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -22,11 +22,11 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
+#include "access/xlogutils.h"
 #include "storage/copydir.h"
 #include "storage/fd.h"
 #include "miscadmin.h"
 #include "pgstat.h"
-
 /*
  * copydir: copy a directory
  *
@@ -41,10 +41,12 @@ copydir(char *fromdir, char *todir, bool recurse)
     char        fromfile[MAXPGPATH * 2];
     char        tofile[MAXPGPATH * 2];
 
-    if (MakePGDirectory(todir) != 0)
-        ereport(ERROR,
-                (errcode_for_file_access(),
-                 errmsg("could not create directory \"%s\": %m", todir)));
+    /*
+     * We might have to skip copydir to continue recovery. See the function
+     * for details.
+     */
+    if (XLogMakePGDirectory(todir) != 0)
+        return;
 
     xldir = AllocateDir(fromdir);
 
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 0ab5ba62f5..46a7596315 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -43,6 +43,7 @@ extern XLogRedoAction XLogReadBufferForRedoExtended(XLogReaderState *record,
 
 extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
                        BlockNumber blkno, ReadBufferMode mode);
+extern int XLogMakePGDirectory(const char *directoryName);
 
 extern Relation CreateFakeRelcacheEntry(RelFileNode rnode);
 extern void FreeFakeRelcacheEntry(Relation fakerel);

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: [PATCH v1] Add \echo_stderr to psql
Следующее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: block-level incremental backup