Re: standby recovery fails (tablespace related) (tentative patchand discussion)
| От | Kyotaro HORIGUCHI |
|---|---|
| Тема | Re: standby recovery fails (tablespace related) (tentative patchand discussion) |
| Дата | |
| Msg-id | 20190424.171354.11597092.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст |
| Ответ на | Re: standby recovery fails (tablespace related) (tentative patchand discussion) (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
| Ответы |
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
|
| Список | pgsql-hackers |
Mmm. I posted to wrong thread. Sorry.
At Tue, 23 Apr 2019 16:39:49 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20190423.163949.36763221.horiguchi.kyotaro@lab.ntt.co.jp>
> At Tue, 23 Apr 2019 13:31:58 +0800, Paul Guo <pguo@pivotal.io> wrote in
<CAEET0ZEcwz57z2yfWRds43b3TfQPPDSWmbjGmD43xRxLT41NDg@mail.gmail.com>
> > Hi Kyotaro, ignoring the MakePGDirectory() failure will fix this database
> > create redo error, but I suspect some other kind of redo, which depends on
> > the files under the directory (they are not copied since the directory is
> > not created) and also cannot be covered by the invalid page mechanism,
> > could fail. Thanks.
>
> If recovery starts from just after tablespace creation, that's
> simple. The Symlink to the removed tablespace is already removed
> in the case. Hence server innocently create files directly under
> pg_tblspc, not in the tablespace. Finally all files that were
> supposed to be created in the removed tablespace are removed
> later in recovery.
>
> If recovery starts from recalling page in a file that have been
> in the tablespace, XLogReadBufferExtended creates one (perhaps
> directly in pg_tblspc as described above) and the files are
> removed later in recoery the same way to above. This case doen't
> cause FATAL/PANIC during recovery even in master.
>
> XLogReadBufferExtended@xlogutils.c
> | * Create the target file if it doesn't already exist. This lets us cope
> | * if the replay sequence contains writes to a relation that is later
> | * deleted. (The original coding of this routine would instead suppress
> | * the writes, but that seems like it risks losing valuable data if the
> | * filesystem loses an inode during a crash. Better to write the data
> | * until we are actually told to delete the file.)
>
> So buffered access cannot be a problem for the reason above. The
> remaining possible issue is non-buffered access to files in
> removed tablespaces. This is what I mentioned upthread:
>
> me> but I haven't checked this covers all places where need the same
> me> treatment.
RM_DBASE_ID is fixed by the patch.
XLOG/XACT/CLOG/MULTIXACT/RELMAP/STANDBY/COMMIT_TS/REPLORIGIN/LOGICALMSG:
- are not relevant.
HEAP/HEAP2/BTREE/HASH/GIN/GIST/SEQ/SPGIST/BRIN/GENERIC:
- Resources works on buffer is not affected.
SMGR:
- Both CREATE and TRUNCATE seems fine.
TBLSPC:
- We don't nest tablespace directories. No Problem.
I don't find a similar case.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From bc97e195f21af5d715d85424efc21fcbe8bb902c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Mon, 22 Apr 2019 20:59:15 +0900
Subject: [PATCH 4/5] Fix failure of standby startup caused by tablespace
removal
When standby restarts after a crash after drop of a tablespace,
there's a possibility that recovery fails trying an object-creation in
already removed tablespace directory. Allow recovery to continue by
ignoring the error if not reaching consistency point.
---
src/backend/access/transam/xlogutils.c | 34 ++++++++++++++++++++++++++++++++++
src/backend/commands/tablespace.c | 12 ++++++------
src/backend/storage/file/copydir.c | 12 +++++++-----
src/include/access/xlogutils.h | 1 +
4 files changed, 48 insertions(+), 11 deletions(-)
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 10a663bae6..75cdb882cd 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -522,6 +522,40 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
return buffer;
}
+/*
+ * XLogMakePGDirectory
+ *
+ * There is a possibility that WAL replay causes an error by creation of a
+ * directory under a directory removed before the previous crash. Issuing
+ * ERROR prevents the caller from continuing recovery.
+ *
+ * To prevent that case, this function issues WARNING instead of ERROR on
+ * error if consistency is not reached yet.
+ */
+int
+XLogMakePGDirectory(const char *directoryName)
+{
+ int ret;
+
+ ret = MakePGDirectory(directoryName);
+
+ if (ret != 0)
+ {
+ int elevel = ERROR;
+
+ /* Don't issue ERROR for this failure before reaching consistency. */
+ if (InRecovery && !reachedConsistency)
+ elevel = WARNING;
+
+ ereport(elevel,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", directoryName)));
+ return ret;
+ }
+
+ return 0;
+}
+
/*
* Struct actually returned by XLogFakeRelcacheEntry, though the declared
* return type is Relation.
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 66a70871e6..c9fb2af015 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -303,12 +303,6 @@ CreateTableSpace(CreateTableSpaceStmt *stmt)
(errcode(ERRCODE_INVALID_NAME),
errmsg("tablespace location cannot contain single quotes")));
- /* Reject tablespaces in the data directory. */
- if (is_in_data_directory(location))
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("tablespace location must not be inside the data directory")));
-
/*
* Check that location isn't too long. Remember that we're going to append
* 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. In the relative path case, we
@@ -323,6 +317,12 @@ CreateTableSpace(CreateTableSpaceStmt *stmt)
errmsg("tablespace location \"%s\" is too long",
location)));
+ /* Reject tablespaces in the data directory. */
+ if (is_in_data_directory(location))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablespace location must not be inside the data directory")));
+
/*
* Disallow creation of tablespaces named "pg_xxx"; we reserve this
* namespace for system purposes.
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 30f6200a86..0216270dd3 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -22,11 +22,11 @@
#include <unistd.h>
#include <sys/stat.h>
+#include "access/xlogutils.h"
#include "storage/copydir.h"
#include "storage/fd.h"
#include "miscadmin.h"
#include "pgstat.h"
-
/*
* copydir: copy a directory
*
@@ -41,10 +41,12 @@ copydir(char *fromdir, char *todir, bool recurse)
char fromfile[MAXPGPATH * 2];
char tofile[MAXPGPATH * 2];
- if (MakePGDirectory(todir) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not create directory \"%s\": %m", todir)));
+ /*
+ * We might have to skip copydir to continue recovery. See the function
+ * for details.
+ */
+ if (XLogMakePGDirectory(todir) != 0)
+ return;
xldir = AllocateDir(fromdir);
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 0ab5ba62f5..46a7596315 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -43,6 +43,7 @@ extern XLogRedoAction XLogReadBufferForRedoExtended(XLogReaderState *record,
extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
BlockNumber blkno, ReadBufferMode mode);
+extern int XLogMakePGDirectory(const char *directoryName);
extern Relation CreateFakeRelcacheEntry(RelFileNode rnode);
extern void FreeFakeRelcacheEntry(Relation fakerel);
--
2.16.3
В списке pgsql-hackers по дате отправления: