Re: Reopen logfile on SIGHUP

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: Reopen logfile on SIGHUP
Дата
Msg-id 20180416.115435.28153375.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Reopen logfile on SIGHUP  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
Ответы Re: Reopen logfile on SIGHUP  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
Список pgsql-hackers
At Thu, 12 Apr 2018 17:23:42 +0300, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote in
<f9f32301-53b4-74cb-335a-c293911aed41@postgrespro.ru>
> On 11.04.2018 00:00, Tom Lane wrote:
> > So we need a mechanism that's narrowly targeted
> > to reopening the logfile, without SIGHUP'ing the entire database.
> 
> We can send SIGUSR1 to the syslogger process. To make its pid easier
> to find out, it can be published in "$PGDATA/logging_collector.pid",
> as suggested by Grigory. The attached patch does this. It also adds a
> brief description of how to use this with logrotate.

FWIW I'm not a fan of officially exposing logging collector PID
and let users send SIGUSR1 directly to the postmaster's internal
process. (It seems to me more unusual than pg_terminate_backed.)
We can provide a new command "pg_ctl logrotate" to hide the
details. (It cannot be executed by root, though.)

> > Point 2: Depending on how you've got the log filenames configured,
> > setting rotation_requested may result in a change in log filename
> 
> If logrotate only needs the file to be reopened, syslogger's rotation
> does just than when using a static log file name. I imagine logrotate
> can be configured to do something useful with changing file names,
> too. It is a matter of keeping the configuration of syslogger and
> logrotate consistent.

Seems fine for me.

> > BTW, another thing that needs to be considered is the interaction with
> > rotation_disabled.  Right now we automatically drop that on SIGHUP,
> > but
> > I'm unclear on whether it should be different for logrotate requests.

I feel the same, an explicit request from user ought to reset (or
ignore) it. (By the way, logrorate_disabled cannot be reset
without reloading config..)

> The SIGUSR1 path is supposed to be used by automated tools. In a
> sense, it is an automatic rotation, the difference being that it
> originates from an external tool and not from syslogger itself. So, it
> sounds plausible that the rotation request shouldn't touch the
> rotation_disabled flag, and should be disabled by it, just like the
> automatic rotation.
> 
> Still, this leads us to a scenario where we can lose logs:
> 1. postgres is configured to use a static file name. logrotate is
> configured to move the file, send SIGUSR1 to postgres syslogger, gzip
> the file and delete it.
> 2. logrotate starts the rotation. It moves the file and signals
> postgres to reopen it.
> 3. postgres fails to reopen the file because there are too many files
> open (ENFILE/EMFILE), which is a normal occurrence on heavily loaded
> systems. Or it doesn't open the new file because the rotation_disable
> flag is set. It continues logging to the old file.
> 4. logrotate has no way to detect this failure, so it gzips the file
> and unlinks it.
> 5. postgres continues writing to the now unlinked file, and we lose an
> arbitrary amount of logs until the next successful rotation.
> 
> With dynamic file names, logrotate can be told to skip open files, so
> that it doesn't touch our log file if we haven't switched to the new
> one. With a static file name, the log file is always open, so this
> method doesn't work. I'm not sure how to make this work reliably.

The loss is unavoidable by any means since logrotate works that
way by design. It doesn't care whether its peer did the work as
expected. Someone wants to avoid the loss can use copytruncate
for another kind of small loss that can happen at every rotation
time and we don't need to change anything in the case. Those who
want more reliability ought to use the PostgreSQL's genuine
logging mechanism:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 08dc9ba031..7cadd71af0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -83,6 +83,7 @@ extern uint32 bootstrap_data_checksum_version;
 #define RECOVERY_COMMAND_DONE    "recovery.done"
 #define PROMOTE_SIGNAL_FILE        "promote"
 #define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote"
+#define LOGROTATE_SIGNAL_FILE    "logrotate"
 
 
 /* User-settable parameters */
@@ -12225,6 +12226,24 @@ CheckPromoteSignal(void)
     return false;
 }
 
+/*
+ * Check to see if a log rotate request has arrived. Should be
+ * called by postmaster after receiving SIGUSR1.
+ */
+bool
+CheckLogrotateSignal(void)
+{
+    struct stat stat_buf;
+
+    if (stat(LOGROTATE_SIGNAL_FILE, &stat_buf) == 0)
+    {
+        unlink(LOGROTATE_SIGNAL_FILE);
+        return true;
+    }
+
+    return false;
+}
+
 /*
  * Wake up startup process to replay newly arrived WAL, or to notice that
  * failover has been requested.
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index d948369f3e..d4f0f5c853 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5092,7 +5092,8 @@ sigusr1_handler(SIGNAL_ARGS)
         signal_child(PgArchPID, SIGUSR1);
     }
 
-    if (CheckPostmasterSignal(PMSIGNAL_ROTATE_LOGFILE) &&
+    if ((CheckLogrotateSignal() ||
+         CheckPostmasterSignal(PMSIGNAL_ROTATE_LOGFILE)) &&
         SysLoggerPID != 0)
     {
         /* Tell syslogger to rotate logfile */
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 143021de05..e4968f7b6f 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -59,6 +59,7 @@ typedef enum
     STOP_COMMAND,
     RESTART_COMMAND,
     RELOAD_COMMAND,
+    LOGROTATE_COMMAND,
     STATUS_COMMAND,
     PROMOTE_COMMAND,
     KILL_COMMAND,
@@ -100,6 +101,7 @@ static char version_file[MAXPGPATH];
 static char pid_file[MAXPGPATH];
 static char backup_file[MAXPGPATH];
 static char promote_file[MAXPGPATH];
+static char logrotate_file[MAXPGPATH];
 
 #ifdef WIN32
 static DWORD pgctl_start_type = SERVICE_AUTO_START;
@@ -125,6 +127,7 @@ static void do_restart(void);
 static void do_reload(void);
 static void do_status(void);
 static void do_promote(void);
+static void do_logrotate(void);
 static void do_kill(pgpid_t pid);
 static void print_msg(const char *msg);
 static void adjust_data_dir(void);
@@ -1171,6 +1174,62 @@ do_promote(void)
         print_msg(_("server promoting\n"));
 }
 
+/*
+ * log rotate
+ */
+
+static void
+do_logrotate(void)
+{
+    FILE       *logrotatefile;
+    pgpid_t        pid;
+
+    pid = get_pgpid(false);
+
+    if (pid == 0)                /* no pid file */
+    {
+        write_stderr(_("%s: PID file \"%s\" does not exist\n"), progname, pid_file);
+        write_stderr(_("Is server running?\n"));
+        exit(1);
+    }
+    else if (pid < 0)            /* standalone backend, not postmaster */
+    {
+        pid = -pid;
+        write_stderr(_("%s: cannot promote server; "
+                       "single-user server is running (PID: %ld)\n"),
+                     progname, pid);
+        exit(1);
+    }
+
+    snprintf(logrotate_file, MAXPGPATH, "%s/logrotate", pg_data);
+
+    if ((logrotatefile = fopen(logrotate_file, "w")) == NULL)
+    {
+        write_stderr(_("%s: could not create log rotate signal file \"%s\": %s\n"),
+                     progname, logrotate_file, strerror(errno));
+        exit(1);
+    }
+    if (fclose(logrotatefile))
+    {
+        write_stderr(_("%s: could not write log rotate signal file \"%s\": %s\n"),
+                     progname, logrotate_file, strerror(errno));
+        exit(1);
+    }
+
+    sig = SIGUSR1;
+    if (kill((pid_t) pid, sig) != 0)
+    {
+        write_stderr(_("%s: could not send log rotate signal (PID: %ld): %s\n"),
+                     progname, pid, strerror(errno));
+        if (unlink(logrotate_file) != 0)
+            write_stderr(_("%s: could not remove logrotate signal file \"%s\": %s\n"),
+                         progname, logrotate_file, strerror(errno));
+        exit(1);
+    }
+
+    print_msg(_("commanded to rotate log file\n"));
+}
+
 
 /*
  *    utility routines
@@ -2332,6 +2391,8 @@ main(int argc, char **argv)
                 ctl_command = RESTART_COMMAND;
             else if (strcmp(argv[optind], "reload") == 0)
                 ctl_command = RELOAD_COMMAND;
+            else if (strcmp(argv[optind], "logrotate") == 0)
+                ctl_command = LOGROTATE_COMMAND;
             else if (strcmp(argv[optind], "status") == 0)
                 ctl_command = STATUS_COMMAND;
             else if (strcmp(argv[optind], "promote") == 0)
@@ -2442,6 +2503,9 @@ main(int argc, char **argv)
         case PROMOTE_COMMAND:
             do_promote();
             break;
+        case LOGROTATE_COMMAND:
+            do_logrotate();
+            break;
         case KILL_COMMAND:
             do_kill(killproc);
             break;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..31d1fb4d80 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -280,6 +280,7 @@ extern void GetNextXidAndEpoch(TransactionId *xid, uint32 *epoch);
 extern void RemovePromoteSignalFiles(void);
 
 extern bool CheckPromoteSignal(void);
+extern bool CheckLogrotateSignal(void);
 extern void WakeupRecovery(void);
 extern void SetWalWriterSleeping(bool sleeping);


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Gotchas about pg_verify_checksums
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..