Re: Weird failure with latches in curculio on v15

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Weird failure with latches in curculio on v15
Дата
Msg-id 20230204112034.egwp52oa2f5i3crk@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Weird failure with latches in curculio on v15  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
Hi,

On 2023-02-03 10:54:17 -0800, Nathan Bossart wrote:
> @@ -146,7 +146,25 @@ ExecuteRecoveryCommand(const char *command, const char *commandName,
>       */
>      fflush(NULL);
>      pgstat_report_wait_start(wait_event_info);
> +
> +    /*
> +     * PreRestoreCommand() is used to tell the SIGTERM handler for the startup
> +     * process that it is okay to proc_exit() right away on SIGTERM.  This is
> +     * done for the duration of the system() call because there isn't a good
> +     * way to break out while it is executing.  Since we might call proc_exit()
> +     * in a signal handler here, it is extremely important that nothing but the
> +     * system() call happens between the calls to PreRestoreCommand() and
> +     * PostRestoreCommand().  Any additional code must go before or after this
> +     * section.
> +     */
> +    if (exitOnSigterm)
> +        PreRestoreCommand();
> +
>      rc = system(command);
> +
> +    if (exitOnSigterm)
> +        PostRestoreCommand();
> +
>      pgstat_report_wait_end();
>
>      if (rc != 0)

It's somewhat weird that we now call the startup-process specific
PreRestoreCommand/PostRestoreCommand() in other processes than the
startup process. Gated on a variable that's not immediately obviously
tied to being in the startup process.

I think at least we ought to add comments to PreRestoreCommand /
PostRestoreCommand that they need to be robust against being called
outside of the startup process, and similarly a comment in
ExecuteRecoveryCommand(), explaining that all this stuff just works in
the startup process.


> diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
> index bcd23542f1..503eb1a5a6 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -19,6 +19,8 @@
>   */
>  #include "postgres.h"
>  
> +#include <unistd.h>
> +
>  #include "access/xlog.h"
>  #include "access/xlogrecovery.h"
>  #include "access/xlogutils.h"
> @@ -121,7 +123,17 @@ StartupProcShutdownHandler(SIGNAL_ARGS)
>      int            save_errno = errno;
>  
>      if (in_restore_command)
> -        proc_exit(1);
> +    {
> +        /*
> +         * If we are in a child process (e.g., forked by system() in
> +         * shell_restore()), we don't want to call any exit callbacks.  The
> +         * parent will take care of that.
> +         */
> +        if (MyProcPid == (int) getpid())
> +            proc_exit(1);
> +        else
> +            _exit(1);

I think it might be worth adding something like
  const char msg[] = "StartupProcShutdownHandler() called in child process";
  write(STDERR_FILENO, msg, sizeof(msg));
to this path. Otherwise it might end up being a very hard to debug path.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Exit walsender before confirming remote flush in logical replication
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher