Re: standby recovery fails (tablespace related) (tentative patchand discussion)

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: standby recovery fails (tablespace related) (tentative patchand discussion)
Дата
Msg-id 20190424.171354.11597092.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: standby recovery fails (tablespace related) (tentative patchand discussion)  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Paul Guo <pguo@pivotal.io>)
Список pgsql-hackers
Mmm. I posted to wrong thread. Sorry.

At Tue, 23 Apr 2019 16:39:49 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20190423.163949.36763221.horiguchi.kyotaro@lab.ntt.co.jp>
> At Tue, 23 Apr 2019 13:31:58 +0800, Paul Guo <pguo@pivotal.io> wrote in
<CAEET0ZEcwz57z2yfWRds43b3TfQPPDSWmbjGmD43xRxLT41NDg@mail.gmail.com>
> > Hi Kyotaro, ignoring the MakePGDirectory() failure will fix this database
> > create redo error, but I suspect some other kind of redo, which depends on
> > the files under the directory (they are not copied since the directory is
> > not created) and also cannot be covered by the invalid page mechanism,
> > could fail. Thanks.
> 
> If recovery starts from just after tablespace creation, that's
> simple. The Symlink to the removed tablespace is already removed
> in the case. Hence server innocently create files directly under
> pg_tblspc, not in the tablespace. Finally all files that were
> supposed to be created in the removed tablespace are removed
> later in recovery.
> 
> If recovery starts from recalling page in a file that have been
> in the tablespace, XLogReadBufferExtended creates one (perhaps
> directly in pg_tblspc as described above) and the files are
> removed later in recoery the same way to above. This case doen't
> cause FATAL/PANIC during recovery even in master.
> 
> XLogReadBufferExtended@xlogutils.c
> | * Create the target file if it doesn't already exist.  This lets us cope
> | * if the replay sequence contains writes to a relation that is later
> | * deleted.  (The original coding of this routine would instead suppress
> | * the writes, but that seems like it risks losing valuable data if the
> | * filesystem loses an inode during a crash.  Better to write the data
> | * until we are actually told to delete the file.)
> 
> So buffered access cannot be a problem for the reason above. The
> remaining possible issue is non-buffered access to files in
> removed tablespaces. This is what I mentioned upthread:
> 
> me> but I haven't checked this covers all places where need the same
> me> treatment.

RM_DBASE_ID is fixed by the patch.

XLOG/XACT/CLOG/MULTIXACT/RELMAP/STANDBY/COMMIT_TS/REPLORIGIN/LOGICALMSG:
  - are not relevant.

HEAP/HEAP2/BTREE/HASH/GIN/GIST/SEQ/SPGIST/BRIN/GENERIC:
  - Resources works on buffer is not affected.

SMGR:
  - Both CREATE and TRUNCATE seems fine.

TBLSPC:
  - We don't nest tablespace directories. No Problem.

I don't find a similar case.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




From bc97e195f21af5d715d85424efc21fcbe8bb902c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Mon, 22 Apr 2019 20:59:15 +0900
Subject: [PATCH 4/5] Fix failure of standby startup caused by tablespace
 removal

When standby restarts after a crash after drop of a tablespace,
there's a possibility that recovery fails trying an object-creation in
already removed tablespace directory. Allow recovery to continue by
ignoring the error if not reaching consistency point.
---
 src/backend/access/transam/xlogutils.c | 34 ++++++++++++++++++++++++++++++++++
 src/backend/commands/tablespace.c      | 12 ++++++------
 src/backend/storage/file/copydir.c     | 12 +++++++-----
 src/include/access/xlogutils.h         |  1 +
 4 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 10a663bae6..75cdb882cd 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -522,6 +522,40 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
     return buffer;
 }
 
+/*
+ * XLogMakePGDirectory
+ *
+ * There is a possibility that WAL replay causes an error by creation of a
+ * directory under a directory removed before the previous crash. Issuing
+ * ERROR prevents the caller from continuing recovery.
+ *
+ * To prevent that case, this function issues WARNING instead of ERROR on
+ * error if consistency is not reached yet.
+ */
+int
+XLogMakePGDirectory(const char *directoryName)
+{
+    int ret;
+
+    ret = MakePGDirectory(directoryName);
+
+    if (ret != 0)
+    {
+        int elevel = ERROR;
+
+        /* Don't issue ERROR for this failure before reaching consistency. */
+        if (InRecovery && !reachedConsistency)
+            elevel = WARNING;
+
+        ereport(elevel,
+                (errcode_for_file_access(),
+                 errmsg("could not create directory \"%s\": %m", directoryName)));
+        return ret;
+    }
+
+    return 0;
+}
+
 /*
  * Struct actually returned by XLogFakeRelcacheEntry, though the declared
  * return type is Relation.
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 66a70871e6..c9fb2af015 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -303,12 +303,6 @@ CreateTableSpace(CreateTableSpaceStmt *stmt)
                 (errcode(ERRCODE_INVALID_NAME),
                  errmsg("tablespace location cannot contain single quotes")));
 
-    /* Reject tablespaces in the data directory. */
-    if (is_in_data_directory(location))
-        ereport(ERROR,
-                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-                 errmsg("tablespace location must not be inside the data directory")));
-
     /*
      * Check that location isn't too long. Remember that we're going to append
      * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'.  In the relative path case, we
@@ -323,6 +317,12 @@ CreateTableSpace(CreateTableSpaceStmt *stmt)
                  errmsg("tablespace location \"%s\" is too long",
                         location)));
 
+    /* Reject tablespaces in the data directory. */
+    if (is_in_data_directory(location))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                 errmsg("tablespace location must not be inside the data directory")));
+
     /*
      * Disallow creation of tablespaces named "pg_xxx"; we reserve this
      * namespace for system purposes.
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 30f6200a86..0216270dd3 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -22,11 +22,11 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
+#include "access/xlogutils.h"
 #include "storage/copydir.h"
 #include "storage/fd.h"
 #include "miscadmin.h"
 #include "pgstat.h"
-
 /*
  * copydir: copy a directory
  *
@@ -41,10 +41,12 @@ copydir(char *fromdir, char *todir, bool recurse)
     char        fromfile[MAXPGPATH * 2];
     char        tofile[MAXPGPATH * 2];
 
-    if (MakePGDirectory(todir) != 0)
-        ereport(ERROR,
-                (errcode_for_file_access(),
-                 errmsg("could not create directory \"%s\": %m", todir)));
+    /*
+     * We might have to skip copydir to continue recovery. See the function
+     * for details.
+     */
+    if (XLogMakePGDirectory(todir) != 0)
+        return;
 
     xldir = AllocateDir(fromdir);
 
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 0ab5ba62f5..46a7596315 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -43,6 +43,7 @@ extern XLogRedoAction XLogReadBufferForRedoExtended(XLogReaderState *record,
 
 extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
                        BlockNumber blkno, ReadBufferMode mode);
+extern int XLogMakePGDirectory(const char *directoryName);
 
 extern Relation CreateFakeRelcacheEntry(RelFileNode rnode);
 extern void FreeFakeRelcacheEntry(Relation fakerel);
-- 
2.16.3


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Regression test PANICs with master-standby setup on samemachine
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: [PATCH v1] Show whether tables are logged in \dt+