Обсуждение: [PATCH] better systemd integration
I have written a couple of patches to improve the integration of the postgres daemon with systemd. The setup that is shipped with Red Hat- and Debian-family packages at the moment is just an imitation of the old shell scripts, relying on polling by pg_ctl for readiness, with various custom pieces of complexity for handling custom port numbers and such. In the first patch, my proposal is to use sd_notify() calls from libsystemd to notify the systemd daemon directly when startup is completed. This is the recommended low-overhead solution that is now being adopted by many other server packages. It allows us to cut out pg_ctl completely from the startup configuration and makes the startup configuration manageable by non-wizards. An example is included in the patch. The second patch improves integration with the system journal managed by systemd. This is a facility that captures a daemon's standard output and error and records it in configurable places, including syslog. The patch adds a new log_destination that is like stderr but marks up the output so that systemd knows the severity. With that in place, users can choose to do away with the postgres log file management and let systemd do it. The third patch is technically unrelated but arose while I was working on this. It improves error reporting when the data directory is missing.
Вложения
Peter Eisentraut <peter_e@gmx.net> writes: > I have written a couple of patches to improve the integration of the > postgres daemon with systemd. Seems like a generally reasonable thing to do. systemd is probably not going away (unfortunately IMO, but there it is). > The second patch improves integration with the system journal managed by > systemd. This is a facility that captures a daemon's standard output > and error and records it in configurable places, including syslog. The > patch adds a new log_destination that is like stderr but marks up the > output so that systemd knows the severity. With that in place, users > can choose to do away with the postgres log file management and let > systemd do it. One of the benefits of the log collector is that it's able to do something reasonably sane with random stderr output that might be generated by libraries linked into PG (starting with glibc...). If someone sets things up as you're suggesting, what will systemd do with unlabeled output lines? Or in other words, how much value-add is there really from this markup? Also, it looks like the markup is intended to be per-line, but as you've coded this a possibly-multi-line error report will only have a prefix on the first line. regards, tom lane
Peter Eisentraut wrote: > I have written a couple of patches to improve the integration of the > postgres daemon with systemd. Great. Is anything happening with these patches, or? They've been inactive for quite a while now. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/18/16 10:58 AM, Alvaro Herrera wrote: > Peter Eisentraut wrote: >> I have written a couple of patches to improve the integration of the >> postgres daemon with systemd. > > Great. Is anything happening with these patches, or? They've been > inactive for quite a while now. Well, they are waiting for someone to review them.
2016-01-21 3:33 GMT+01:00 Peter Eisentraut <peter_e@gmx.net>:
On 1/18/16 10:58 AM, Alvaro Herrera wrote:
> Peter Eisentraut wrote:
>> I have written a couple of patches to improve the integration of the
>> postgres daemon with systemd.
>
> Great. Is anything happening with these patches, or? They've been
> inactive for quite a while now.
Well, they are waiting for someone to review them.
I read some basic materials about systemd and these patche looks correct. Next week I'll test it.
Regards
Pavel
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi
2. all tests passed
The issues:
2015-11-17 15:08 GMT+01:00 Peter Eisentraut <peter_e@gmx.net>:
I have written a couple of patches to improve the integration of the
postgres daemon with systemd.
The setup that is shipped with Red Hat- and Debian-family packages at
the moment is just an imitation of the old shell scripts, relying on
polling by pg_ctl for readiness, with various custom pieces of
complexity for handling custom port numbers and such.
In the first patch, my proposal is to use sd_notify() calls from
libsystemd to notify the systemd daemon directly when startup is
completed. This is the recommended low-overhead solution that is now
being adopted by many other server packages. It allows us to cut out
pg_ctl completely from the startup configuration and makes the startup
configuration manageable by non-wizards. An example is included in the
patch.
The second patch improves integration with the system journal managed by
systemd. This is a facility that captures a daemon's standard output
and error and records it in configurable places, including syslog. The
patch adds a new log_destination that is like stderr but marks up the
output so that systemd knows the severity. With that in place, users
can choose to do away with the postgres log file management and let
systemd do it.
The third patch is technically unrelated but arose while I was working
on this. It improves error reporting when the data directory is missing.
2. all tests passed
The issues:
1. configure missing systemd integration test, compilation fails:
postmaster.o postmaster.c
postmaster.c:91:31: fatal error: systemd/sd-daemon.h: No such file or directory
postmaster.o postmaster.c
postmaster.c:91:31: fatal error: systemd/sd-daemon.h: No such file or directory
3. PostgreSQL is able to write to systemd log, but multiline entry was stored with different priorities
do $$ begin raise warning 'NAZDAREK****'; end $$;
do $$ begin raise warning 'NAZDAREK****'; end $$;
first line
{
"__CURSOR" : "s=cac797bc03f242febea9f32357bba773;i=b4a5;b=e8d5b3df2ebf46dd86c39046b326bd32;m=1cb792a63b;t=52a4f3ad40860;x=57014959bf6e3481",
"__REALTIME_TIMESTAMP" : "1453894661310560",
"__MONOTONIC_TIMESTAMP" : "123338925627",
"_BOOT_ID" : "e8d5b3df2ebf46dd86c39046b326bd32",
"SYSLOG_FACILITY" : "3",
"_UID" : "1001",
"_GID" : "1001",
"_CAP_EFFECTIVE" : "0",
"_SELINUX_CONTEXT" : "system_u:system_r:init_t:s0",
"_MACHINE_ID" : "b8299a722638414a8776d3e130e228e4",
"_HOSTNAME" : "localhost.localdomain",
"_SYSTEMD_SLICE" : "system.slice",
"_TRANSPORT" : "stdout",
"SYSLOG_IDENTIFIER" : "postgres",
"_PID" : "3150",
"_COMM" : "postgres",
"_EXE" : "/usr/local/pgsql/bin/postgres",
"_CMDLINE" : "/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data -c log_destination=systemd",
"_SYSTEMD_CGROUP" : "/system.slice/postgresql.service",
"_SYSTEMD_UNIT" : "postgresql.service",
"PRIORITY" : "5",
"MESSAGE" : "WARNING: NAZDAREK****"
}
{
"__CURSOR" : "s=cac797bc03f242febea9f32357bba773;i=b4a5;b=e8d5b3df2ebf46dd86c39046b326bd32;m=1cb792a63b;t=52a4f3ad40860;x=57014959bf6e3481",
"__REALTIME_TIMESTAMP" : "1453894661310560",
"__MONOTONIC_TIMESTAMP" : "123338925627",
"_BOOT_ID" : "e8d5b3df2ebf46dd86c39046b326bd32",
"SYSLOG_FACILITY" : "3",
"_UID" : "1001",
"_GID" : "1001",
"_CAP_EFFECTIVE" : "0",
"_SELINUX_CONTEXT" : "system_u:system_r:init_t:s0",
"_MACHINE_ID" : "b8299a722638414a8776d3e130e228e4",
"_HOSTNAME" : "localhost.localdomain",
"_SYSTEMD_SLICE" : "system.slice",
"_TRANSPORT" : "stdout",
"SYSLOG_IDENTIFIER" : "postgres",
"_PID" : "3150",
"_COMM" : "postgres",
"_EXE" : "/usr/local/pgsql/bin/postgres",
"_CMDLINE" : "/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data -c log_destination=systemd",
"_SYSTEMD_CGROUP" : "/system.slice/postgresql.service",
"_SYSTEMD_UNIT" : "postgresql.service",
"PRIORITY" : "5",
"MESSAGE" : "WARNING: NAZDAREK****"
}
second line
{
"__CURSOR" : "s=cac797bc03f242febea9f32357bba773;i=b4a6;b=e8d5b3df2ebf46dd86c39046b326bd32;m=1cb792a882;t=52a4f3ad40aa6;x=ae9801b2ecbd4da3",
"__REALTIME_TIMESTAMP" : "1453894661311142",
"__MONOTONIC_TIMESTAMP" : "123338926210",
"_BOOT_ID" : "e8d5b3df2ebf46dd86c39046b326bd32",
"PRIORITY" : "6",
"SYSLOG_FACILITY" : "3",
"_UID" : "1001",
"_GID" : "1001",
"_CAP_EFFECTIVE" : "0",
"_SELINUX_CONTEXT" : "system_u:system_r:init_t:s0",
"_MACHINE_ID" : "b8299a722638414a8776d3e130e228e4",
"_HOSTNAME" : "localhost.localdomain",
"_SYSTEMD_SLICE" : "system.slice",
"_TRANSPORT" : "stdout",
"SYSLOG_IDENTIFIER" : "postgres",
"_PID" : "3150",
"_COMM" : "postgres",
"_EXE" : "/usr/local/pgsql/bin/postgres",
"_CMDLINE" : "/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data -c log_destination=systemd",
"_SYSTEMD_CGROUP" : "/system.slice/postgresql.service",
"_SYSTEMD_UNIT" : "postgresql.service",
"MESSAGE" : "CONTEXT: PL/pgSQL function inline_code_block line 1 at RAISE"
}
{
"__CURSOR" : "s=cac797bc03f242febea9f32357bba773;i=b4a6;b=e8d5b3df2ebf46dd86c39046b326bd32;m=1cb792a882;t=52a4f3ad40aa6;x=ae9801b2ecbd4da3",
"__REALTIME_TIMESTAMP" : "1453894661311142",
"__MONOTONIC_TIMESTAMP" : "123338926210",
"_BOOT_ID" : "e8d5b3df2ebf46dd86c39046b326bd32",
"PRIORITY" : "6",
"SYSLOG_FACILITY" : "3",
"_UID" : "1001",
"_GID" : "1001",
"_CAP_EFFECTIVE" : "0",
"_SELINUX_CONTEXT" : "system_u:system_r:init_t:s0",
"_MACHINE_ID" : "b8299a722638414a8776d3e130e228e4",
"_HOSTNAME" : "localhost.localdomain",
"_SYSTEMD_SLICE" : "system.slice",
"_TRANSPORT" : "stdout",
"SYSLOG_IDENTIFIER" : "postgres",
"_PID" : "3150",
"_COMM" : "postgres",
"_EXE" : "/usr/local/pgsql/bin/postgres",
"_CMDLINE" : "/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data -c log_destination=systemd",
"_SYSTEMD_CGROUP" : "/system.slice/postgresql.service",
"_SYSTEMD_UNIT" : "postgresql.service",
"MESSAGE" : "CONTEXT: PL/pgSQL function inline_code_block line 1 at RAISE"
}
Is it expected?
Second issue:
Mapping of levels between pg and journal levels is moved by1
+ case DEBUG1:
+ systemd_log_prefix = "<7>" /* SD_DEBUG */;
+ break;
+ case LOG:
+ case COMMERROR:
+ case INFO:
+ systemd_log_prefix = "<6>" /* SD_INFO */;
+ break;
+ case NOTICE:
+ case WARNING:
+ systemd_log_prefix = "<5>" /* SD_NOTICE */;
+ break;
+ case ERROR:
+ systemd_log_prefix = "<4>" /* SD_WARNING */;
+ break;
+ case FATAL:
+ systemd_log_prefix = "<3>" /* SD_ERR */;
+ break;
+ case PANIC:
+ case DEBUG1:
+ systemd_log_prefix = "<7>" /* SD_DEBUG */;
+ break;
+ case LOG:
+ case COMMERROR:
+ case INFO:
+ systemd_log_prefix = "<6>" /* SD_INFO */;
+ break;
+ case NOTICE:
+ case WARNING:
+ systemd_log_prefix = "<5>" /* SD_NOTICE */;
+ break;
+ case ERROR:
+ systemd_log_prefix = "<4>" /* SD_WARNING */;
+ break;
+ case FATAL:
+ systemd_log_prefix = "<3>" /* SD_ERR */;
+ break;
+ case PANIC:
is it expected?
This is little bit unexpected - (can be correct).
When I use filtering "warnings", then I got errors, etc. I can understand so these systems are not compatible, but these differences should be well documented.
I didn't find any other issues. It is working without any problems.
Regards
Pavel
On 1/27/16 7:02 AM, Pavel Stehule wrote: > The issues: > > 1. configure missing systemd integration test, compilation fails: > > postmaster.o postmaster.c > postmaster.c:91:31: fatal error: systemd/sd-daemon.h: No such file or > directory Updated patch attached that fixes this by adding additional checking in configure. > 3. PostgreSQL is able to write to systemd log, but multiline entry was > stored with different priorities Yeah, as Tom had already pointed out, this doesn't work as I had imagined it. I'm withdrawing this part of the patch for now. I'll come back to it later. > Second issue: > > Mapping of levels between pg and journal levels is moved by1 This is the same as how the "syslog" destination works.
Вложения
Hi
2016-01-28 3:50 GMT+01:00 Peter Eisentraut <peter_e@gmx.net>:
On 1/27/16 7:02 AM, Pavel Stehule wrote:
> The issues:
>
> 1. configure missing systemd integration test, compilation fails:
>
> postmaster.o postmaster.c
> postmaster.c:91:31: fatal error: systemd/sd-daemon.h: No such file or
> directory
Updated patch attached that fixes this by adding additional checking in
configure.
You sent only rebased code of previous version. I didn't find additional checks.
> 3. PostgreSQL is able to write to systemd log, but multiline entry was
> stored with different priorities
Yeah, as Tom had already pointed out, this doesn't work as I had
imagined it. I'm withdrawing this part of the patch for now. I'll come
back to it later.
ok
> Second issue:
>
> Mapping of levels between pg and journal levels is moved by1
This is the same as how the "syslog" destination works.
I didn't find any related code in PostgreSQL, can me help, please?
Regards
Pavel
> Second issue:
>
> Mapping of levels between pg and journal levels is moved by1
This is the same as how the "syslog" destination works.
I understand to this logic, but I miss any documentation.
Regards
Pavel
On 1/28/16 4:00 AM, Pavel Stehule wrote: > Hi > > 2016-01-28 3:50 GMT+01:00 Peter Eisentraut <peter_e@gmx.net > <mailto:peter_e@gmx.net>>: > > On 1/27/16 7:02 AM, Pavel Stehule wrote: > > The issues: > > > > 1. configure missing systemd integration test, compilation fails: > > > > postmaster.o postmaster.c > > postmaster.c:91:31: fatal error: systemd/sd-daemon.h: No such file or > > directory > > Updated patch attached that fixes this by adding additional checking in > configure. > > > > You sent only rebased code of previous version. I didn't find additional > checks. Oops. Here is the actual new code.
Вложения
Hi Peter, thanks for working on this, I'm looking forward to make Debian's pg_*cluster tools work with that (and hopefully be able to remove tons of legacy code). If a cluster is configured for non-hot-standby replication, the READY=1 seems to never happen. Did you check if that doesn't trigger any timeouts with would make the unit "fail" or the like? @@ -2787,6 +2800,10 @@ reaper(SIGNAL_ARGS) ereport(LOG, (errmsg("databasesystem is ready to accept connections"))); +#ifdef USE_SYSTEMD + sd_notify(0, "READY=1"); +#endif + continue; } @@ -4930,6 +4947,10 @@ sigusr1_handler(SIGNAL_ARGS) ereport(LOG, (errmsg("database system isready to accept read only connections"))); +#ifdef USE_SYSTEMD + sd_notify(0, "READY=1"); +#endif + pmState = PM_HOT_STANDBY; /* Some workers may be scheduled to start now */ StartWorkerNeeded= true; Also, I'm wondering how hard it would be to get socket activation work with that? (I wouldn't necessarily recommend that for production use, but on my desktop it would certainly be helpful not to have all those 8.4/9.0/.../9.6 clusters running all the time doing nothing.) Christoph -- cb@df7cb.de | http://www.df7cb.de/
I wonder if instead of HAVE_SYSTEMD at each callsite we shouldn't instead have a pg_sd_notify() call that's a no-op when not systemd. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">Hi<br /><br /></div><div class="gmail_quote"><blockquoteclass="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><spanclass=""> ><br /> ><br /> > You sent only rebased code of previous version.I didn't find additional<br /> > checks.<br /><br /></span>Oops. Here is the actual new code.<br /><br /></blockquote></div><br/></div><div class="gmail_extra">New test is working as expected<br /><br /></div><div class="gmail_extra">Idid lot of tests - and this code works perfect in single server mode, and with slave hot-standby mode.<br/><br /></div><div class="gmail_extra">It doesn't work with only standby mode<br /><br />[root@dhcppc1 pavel]# systemctlstart pg2.service<br />Job for pg2.service failed because a timeout was exceeded. See "systemctl status pg2.service"and "journalctl -xe" for details.<br /><br /></div><div class="gmail_extra">Default timeout on FC is 90 sec -it is should not to be enough for large servers with large shared buffers and high checkpoint segments. It should be mentionedin service file.<br /><br /></div><div class="gmail_extra">Regards<br /><br /></div><div class="gmail_extra">Pavel<br/></div><div class="gmail_extra"><br /><br /></div><div class="gmail_extra"><br /></div><divclass="gmail_extra"><br /></div></div>
On 1/28/16 9:46 AM, Christoph Berg wrote: > If a cluster is configured for non-hot-standby replication, the > READY=1 seems to never happen. Did you check if that doesn't trigger > any timeouts with would make the unit "fail" or the like? As Pavel showed, it doesn't work for that. I'll look into that. > Also, I'm wondering how hard it would be to get socket activation work > with that? (I wouldn't necessarily recommend that for production use, > but on my desktop it would certainly be helpful not to have all those > 8.4/9.0/.../9.6 clusters running all the time doing nothing.) I had looked into socket activation, and it looks feasible, but it's a separate feature. I couldn't really think of a strong use case, but what you describe makes sense.
On 1/28/16 10:08 AM, Alvaro Herrera wrote: > I wonder if instead of HAVE_SYSTEMD at each callsite we shouldn't > instead have a pg_sd_notify() call that's a no-op when not systemd. We do this for other optional features as well, and I think it keeps the code clearest, especially if the ifdef'ed sections are short.
On 1/29/16 4:15 PM, Pavel Stehule wrote: > Hi > > > > > > > You sent only rebased code of previous version. I didn't find additional > > checks. > > Oops. Here is the actual new code. > > > New test is working as expected > > I did lot of tests - and this code works perfect in single server mode, > and with slave hot-standby mode. > > It doesn't work with only standby mode Yeah, I hadn't though of that. How about this change in addition: diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 2e7f1d7..d983a50 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4933,6 +4933,11 @@ sigusr1_handler(SIGNAL_ARGS) if (XLogArchivingAlways()) PgArchPID = pgarch_start(); +#ifdef USE_SYSTEMD + if (!EnableHotStandby) + sd_notify(0, "READY=1"); +#endif + pmState = PM_RECOVERY; } if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) && > Default timeout on FC is 90 sec - it is should not to be enough for > large servers with large shared buffers and high checkpoint segments. It > should be mentioned in service file. Good point. I think we should set TimeoutSec=0 in the suggested service file.
2016-01-30 22:38 GMT+01:00 Peter Eisentraut <peter_e@gmx.net>:
On 1/29/16 4:15 PM, Pavel Stehule wrote:
> Hi
>
> >
> >
> > You sent only rebased code of previous version. I didn't find additional
> > checks.
>
> Oops. Here is the actual new code.
>
>
> New test is working as expected
>
> I did lot of tests - and this code works perfect in single server mode,
> and with slave hot-standby mode.
>
> It doesn't work with only standby mode
Yeah, I hadn't though of that. How about this change in addition:
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 2e7f1d7..d983a50 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4933,6 +4933,11 @@ sigusr1_handler(SIGNAL_ARGS)
if (XLogArchivingAlways())
PgArchPID = pgarch_start();
+#ifdef USE_SYSTEMD
+ if (!EnableHotStandby)
+ sd_notify(0, "READY=1");
+#endif
+
pmState = PM_RECOVERY;
}
if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
> Default timeout on FC is 90 sec - it is should not to be enough for
> large servers with large shared buffers and high checkpoint segments. It
> should be mentioned in service file.
Good point. I think we should set TimeoutSec=0 in the suggested service file.
probably no other is safe
Pavel
<p dir="ltr">Hi<p dir="ltr">> index 2e7f1d7..d983a50 100644<br /> > --- a/src/backend/postmaster/postmaster.c<br />> +++ b/src/backend/postmaster/postmaster.c<br /> > @@ -4933,6 +4933,11 @@ sigusr1_handler(SIGNAL_ARGS)<br /> > if (XLogArchivingAlways())<br /> > PgArchPID = pgarch_start();<br /> ><br /> > +#ifdef USE_SYSTEMD<br/> > + if (!EnableHotStandby)<br /> > + sd_notify(0, "READY=1");<br /> > +#endif<br/> > +<br /> > pmState = PM_RECOVERY;<br /> > }<br /> > if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY)&&<br /> ><p dir="ltr">I cannot to check it this week, but thisshould to work. "ready" state for standby only mode starting when slave is able to get wal records or segments from server.I will test it next week.<p dir="ltr">regards<p dir="ltr">Pavel<br /><p dir="ltr">> > Default timeout on FCis 90 sec - it is should not to be enough for<br /> > > large servers with large shared buffers and high checkpointsegments. It<br /> > > should be mentioned in service file.<br /> ><br /> > Good point. I think weshould set TimeoutSec=0 in the suggested service file.<br /> ><br />
I've committed this. Thanks for checking.