Re: recovery modules

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: recovery modules
Дата
Msg-id Y830qTRZJP6XGuOF@paquier.xyz
обсуждение исходный текст
Ответ на Re: recovery modules  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: recovery modules  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
On Tue, Jan 17, 2023 at 08:44:27PM -0800, Nathan Bossart wrote:
> Thanks.  Here's a rebased version of the last patch.

Thanks for the rebase.

The final state of the documentation is as follows:
51. Archive and Recovery Modules
    51.1. Archive Module Initialization Functions
    51.2. Archive Module Callbacks
    51.3. Recovery Module Initialization Functions
    51.4. Recovery Module Callbacks

I am not completely sure whether this grouping is the best thing to
do.  Wouldn't it be better to move that into two different
sub-sections instead?  One layout suggestion:
51. WAL Modules
  51.1. Archive Modules
    51.1.1. Archive Module Initialization Functions
    51.1.2. Archive Module Callbacks
  51.2. Recovery Modules
    51.2.1. Recovery Module Initialization Functions
    51.2.2. Recovery Module Callbacks

Putting both of them in the same section sounds like a good idea per
the symmetry that one would likely have between the code paths of
archiving and recovery, so as they share some common knowledge.

This kinds of comes back to the previous suggestion to rename
basic_archive to something like wal_modules, that covers both
archiving and recovery.  I does not seem like this would overlap with
RMGRs, which is much more internal anyway.

     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-     errmsg("must specify restore_command when standby mode is not enabled")));
+     errmsg("must specify restore_command or a restore_library that defines "
+            "a restore callback when standby mode is not enabled")));
This is long.  Shouldn't this be split into an errdetail() to list all
the options at hand?

-   if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
-       ereport(ERROR,
-               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                errmsg("both archive_command and archive_library set"),
-                errdetail("Only one of archive_command, archive_library may be set.")));
+   CheckMutuallyExclusiveGUCs(XLogArchiveLibrary, "archive_library",
+                              XLogArchiveCommand, "archive_command");

The introduction of this routine could be a patch on its own, as it
impacts the archiving path.

+       CheckMutuallyExclusiveGUCs(restoreLibrary, "restore_library",
+                                  archiveCleanupCommand, "archive_cleanup_command");
+       if (strcmp(prevRestoreLibrary, restoreLibrary) != 0 ||
+           strcmp(prevArchiveCleanupCommand, archiveCleanupCommand) != 0)
+       {
+           call_recovery_module_shutdown_cb(0, (Datum) 0);
+           LoadRecoveryCallbacks();
+       }
+
+       pfree(prevRestoreLibrary);
+       pfree(prevArchiveCleanupCommand);

Hm..  The callers of CheckMutuallyExclusiveGUCs() with the new ERROR
paths they introduce need a close lookup.  As far as I can see this
concerns four areas depending on the three restore commands
(restore_command and recovery_end command for startup,
archive_cleanup_command for the checkpointer):
- Startup process initialization, as of validateRecoveryParameters()
where the postmaster GUCs for the recovery target are evaluated.  This
one is an early stage which is fine.
- Startup reloading, as of StartupRereadConfig().  This code could
involve a WAL receiver restart depending on a change in the slot
change or in primary_conninfo, and force at the same time an ERROR
because of conflicting recovery library and command configuration.
This one should be safe because pendingWalRcvRestart would just be
considered later on by the startup process itself while waiting for
WAL to become available.  Still this could deserve a comment?  Even if
there is a misconfiguration, a reload on a standby would enforce a
FATAL in the startup process, taking down the whole server.
- Checkpointer initialization, as of CheckpointerMain().  A
configuration failure in this code path, aka server startup, causes
the server to loop infinitely on FATAL with the misconfiguration
showing up all the time..  This is a problem.
- Last comes the checkpointer GUC reloading, as of
HandleCheckpointerInterrupts(), with a second problem.  This
introduces a failure path where ConfigReloadPending is processed at
the same time as ShutdownRequestPending based on the way it is coded,
interacting with what would be a normal shutdown in some cases?  And
actually, if you enforce a misconfiguration on reload, the
checkpointer reports an error but it does not enforce a process
restart, hence it keeps around the new, incorrect, configuration while
waiting for a new checkpoint to happen once restore_library and
archive_cleanup_command are set.  This could lead to surprises, IMO.
Upgrading to a FATAL in this code path triggers an infinite loop, like
the startup path.

If the archive_cleanup_command ballback of a restore library triggers
a FATAL, it seems to me that it would continuously trigger a server
restart, actually.  Perhaps that's something to document, in
comparison to the safe fallbacks of the shell command where we don't
force an ERROR to give priority to the stability of the checkpointer.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: Remove source code display from \df+?
Следующее
От: Isaac Morland
Дата:
Сообщение: Re: Remove source code display from \df+?