Обсуждение: postgresql systemd service fails to start only on boot but notmanually

Поиск
Список
Период
Сортировка

postgresql systemd service fails to start only on boot but notmanually

От
Doron Behar
Дата:
Hello everyone,

I'm having trouble with my PostgreSQL Systemd service and I wonder
whether this is a problem with the software or some problem with the
packaging or build.

My server fails to start PostgreSQL only on boot, if I restart it
manually afterwards it doesn't have any problem starting. Here is the
log extracted from the journal:

```
2018-09-21 20:46:40.028 CEST [306] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2018-09-21 20:46:40.036 CEST [306] LOG:  listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
2018-09-21 20:46:40.233 CEST [337] LOG:  database system was shut down at 2018-09-21 20:46:21 CEST
2018-09-21 20:48:10.441 CEST [352] WARNING:  worker took too long to start; canceled
2018-09-21 20:49:10.469 CEST [352] WARNING:  worker took too long to start; canceled
2018-09-21 20:49:10.478 CEST [306] LOG:  database system is ready to accept connections
2018-09-21 20:49:10.486 CEST [306] LOG:  received fast shutdown request
2018-09-21 20:49:10.497 CEST [306] LOG:  aborting any active transactions
2018-09-21 20:49:10.498 CEST [637] FATAL:  terminating autovacuum process due to administrator command
2018-09-21 20:49:10.500 CEST [306] LOG:  worker process: logical replication launcher (PID 636) exited with code 1
2018-09-21 20:49:10.501 CEST [349] LOG:  shutting down
2018-09-21 20:49:10.518 CEST [306] LOG:  database system is shut down
```

Any Ideas? I'd really like to fix this since now after every reboot I have to
remember to manually restart it.

Thanks.


Re: postgresql systemd service fails to start only on boot but notmanually

От
Adrian Klaver
Дата:
On 9/22/18 3:44 AM, Doron Behar wrote:
> Hello everyone,
> 
> I'm having trouble with my PostgreSQL Systemd service and I wonder
> whether this is a problem with the software or some problem with the
> packaging or build.
> 
> My server fails to start PostgreSQL only on boot, if I restart it
> manually afterwards it doesn't have any problem starting. Here is the
> log extracted from the journal:
> 
> ```
> 2018-09-21 20:46:40.028 CEST [306] LOG:  listening on IPv4 address "127.0.0.1", port 5432
> 2018-09-21 20:46:40.036 CEST [306] LOG:  listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
> 2018-09-21 20:46:40.233 CEST [337] LOG:  database system was shut down at 2018-09-21 20:46:21 CEST
> 2018-09-21 20:48:10.441 CEST [352] WARNING:  worker took too long to start; canceled
> 2018-09-21 20:49:10.469 CEST [352] WARNING:  worker took too long to start; canceled
> 2018-09-21 20:49:10.478 CEST [306] LOG:  database system is ready to accept connection > 2018-09-21 20:49:10.486 CEST
[306]LOG:  received fast shutdown request
 
> 2018-09-21 20:49:10.497 CEST [306] LOG:  aborting any active transactions
> 2018-09-21 20:49:10.498 CEST [637] FATAL:  terminating autovacuum process due to administrator command
> 2018-09-21 20:49:10.500 CEST [306] LOG:  worker process: logical replication launcher (PID 636) exited with code 1
> 2018-09-21 20:49:10.501 CEST [349] LOG:  shutting down
> 2018-09-21 20:49:10.518 CEST [306] LOG:  database system is shut down
> ```
> 
> Any Ideas? I'd really like to fix this since now after every reboot I have to
> remember to manually restart it.

Linux distro and version?

Assuming Postgres version 10+ given logical replication warning. Still 
actual version would be nice.

How did you install Postgres?

Where did systemd script come from?

What is in the systemd script?

What does the log show when you do a successful manual start?

What does the system log show when the Postgres reboot startup fails?




> 
> Thanks.
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: postgresql systemd service fails to start only on boot but notmanually

От
Christoph Moench-Tegeder
Дата:
## Doron Behar (doron.behar@gmail.com):

> My server fails to start PostgreSQL only on boot, if I restart it
> manually afterwards it doesn't have any problem starting. Here is the
> log extracted from the journal:
> 
> ```
> 2018-09-21 20:46:40.028 CEST [306] LOG:  listening on IPv4 address "127.0.0.1", port 5432
> 2018-09-21 20:46:40.036 CEST [306] LOG:  listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
> 2018-09-21 20:46:40.233 CEST [337] LOG:  database system was shut down at 2018-09-21 20:46:21 CEST
> 2018-09-21 20:48:10.441 CEST [352] WARNING:  worker took too long to start; canceled
> 2018-09-21 20:49:10.469 CEST [352] WARNING:  worker took too long to start; canceled

This would indicate that your machine is overloaded during start -
perhaps there's just too much being started at the same time?
ObRant: that's what happens if people take "system startup duration"
as a benchmark and optimize for that - sure, running one clumsy shell
script after another isn't effective usage of today's systems,
but starting eight dozens programs all at once may have other
side effects. Really, with the hardware taking small ages to find
it's own arse before even loading the boot loader, those few seconds
weren't worth optimizing - and if people reboot their computers so
often that startup time takes a measurable toll on their productive
day, perhaps they should rather spend their time thinking about their
usage pattern than "optimizing" the startup process.

So, now that I've got that off my chest... your machine propably tries to
do too much at the same time when booting: the worker processes take
longer than 90 seconds to start. Slow CPU or storage maybe?

> 2018-09-21 20:49:10.478 CEST [306] LOG:  database system is ready to accept connections
> 2018-09-21 20:49:10.486 CEST [306] LOG:  received fast shutdown request

And in the mean time, systemd has lost it's patience, declares the
start as failed and terminates the process group. (The default systemd
timeout is 90 seconds, at least in some releases of systemd, so
this fits quite nicely).

You could try to work around this by increasing TimeoutStartSec
in postgresql's systemd unit (or even globally), which perhaps
only hides the problem until the next service suddenly doesn't
start anymore.
You could move postgresql to the end of the boot order by
adding "After=..." to the Unit section of the systemd service
file, the value behind "After=" being all the other services in
the same target, which should reduce parallelism and improve
PostgreSQL's startup behaviour.
A more advanced variant of that would be to create a new
systemd target, make that start "After" multiuser.target
or even graphical.target (depending on your setup), make sure
it "Requires" the current default systemd target and make
postgresql the only additional service in that target.
(This would be the cleanest solution, but you should get some
grasp of systemd and how your specific distribution uses it
before meddling with the default targets; I don't know every
distribution/version variant of systemd integration, so I
can't give that specific instructions here).
Or you figure out what the heck your machine is running
during startup any why it is that slow, and try to fix that.

Regards,
Christoph

-- 
Spare Space


Re: postgresql systemd service fails to start only on boot but notmanually

От
Doron Behar
Дата:
On Sat, Sep 22, 2018 at 07:14:33AM -0700, Adrian Klaver wrote:
> 
> Linux distro and version?

Arch Linux

    $ uname -a
    Linux vps 4.18.9-arch1-1-ARCH #1 SMP PREEMPT Wed Sep 19 21:19:17 UTC 2018 x86_64 GNU/Linux

It's a VPS with one CPU core on it and 2G RAM - not very much I know,
hosted on OVH

> 
> Assuming Postgres version 10+ given logical replication warning. Still
> actual version would be nice.
> 
> How did you install Postgres?

With the package manager and the files used in it's build are viewable
from here:
https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/postgresql

> 
> Where did systemd script come from?
> 
> What is in the systemd script?

The systemd service was installed with the package

    [Unit]
    Description=PostgreSQL database server
    After=network.target
    
    [Service]
    Type=notify
    TimeoutSec=120
    User=postgres
    Group=postgres
    
    Environment=PGROOT=/var/lib/postgres
    
    SyslogIdentifier=postgres
    PIDFile=/var/lib/postgres/data/postmaster.pid
    RuntimeDirectory=postgresql
    RuntimeDirectoryMode=755
    
    ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGROOT}/data
    ExecStart=/usr/bin/postgres -D ${PGROOT}/data
    ExecReload=/bin/kill -HUP ${MAINPID}
    KillMode=mixed
    KillSignal=SIGINT
    
    # Due to PostgreSQL's use of shared memory, OOM killer is often overzealous in
    # killing Postgres, so adjust it downward
    OOMScoreAdjust=-200
    
    # Additional security-related features
    PrivateTmp=true
    ProtectHome=true
    ProtectSystem=full
    NoNewPrivileges=true
    
    [Install]
    WantedBy=multi-user.target

> 
> What does the log show when you do a successful manual start?

    2018-09-22 09:38:44.470 CEST [15251] LOG:  listening on IPv4 address "127.0.0.1", port 5432
    2018-09-22 09:38:44.472 CEST [15251] LOG:  listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
    2018-09-22 09:38:44.485 CEST [15253] LOG:  database system was shut down at 2018-09-21 20:49:10 CEST
    2018-09-22 09:38:44.490 CEST [15251] LOG:  database system is ready to accept connections

> 
> What does the system log show when the Postgres reboot startup fails?

This was posted in my 1st message.


Re: postgresql systemd service fails to start only on boot but notmanually

От
Doron Behar
Дата:
On Sat, Sep 22, 2018 at 04:58:18PM +0200, Christoph Moench-Tegeder wrote:
> ## Doron Behar (doron.behar@gmail.com):
> 
> > My server fails to start PostgreSQL only on boot, if I restart it
> > manually afterwards it doesn't have any problem starting. Here is the
> > log extracted from the journal:
> > 
> > ```
> > 2018-09-21 20:46:40.028 CEST [306] LOG:  listening on IPv4 address "127.0.0.1", port 5432
> > 2018-09-21 20:46:40.036 CEST [306] LOG:  listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
> > 2018-09-21 20:46:40.233 CEST [337] LOG:  database system was shut down at 2018-09-21 20:46:21 CEST
> > 2018-09-21 20:48:10.441 CEST [352] WARNING:  worker took too long to start; canceled
> > 2018-09-21 20:49:10.469 CEST [352] WARNING:  worker took too long to start; canceled
> 
> This would indicate that your machine is overloaded during start -
> perhaps there's just too much being started at the same time?
> ObRant: that's what happens if people take "system startup duration"
> as a benchmark and optimize for that - sure, running one clumsy shell
> script after another isn't effective usage of today's systems,
> but starting eight dozens programs all at once may have other
> side effects. Really, with the hardware taking small ages to find
> it's own arse before even loading the boot loader, those few seconds
> weren't worth optimizing - and if people reboot their computers so
> often that startup time takes a measurable toll on their productive
> day, perhaps they should rather spend their time thinking about their
> usage pattern than "optimizing" the startup process.
> 
> So, now that I've got that off my chest... your machine propably tries to
> do too much at the same time when booting: the worker processes take
> longer than 90 seconds to start. Slow CPU or storage maybe?
> 
> > 2018-09-21 20:49:10.478 CEST [306] LOG:  database system is ready to accept connections
> > 2018-09-21 20:49:10.486 CEST [306] LOG:  received fast shutdown request
> 
> And in the mean time, systemd has lost it's patience, declares the
> start as failed and terminates the process group. (The default systemd
> timeout is 90 seconds, at least in some releases of systemd, so
> this fits quite nicely).
> 
> You could try to work around this by increasing TimeoutStartSec
> in postgresql's systemd unit (or even globally), which perhaps
> only hides the problem until the next service suddenly doesn't
> start anymore.
> You could move postgresql to the end of the boot order by
> adding "After=..." to the Unit section of the systemd service
> file, the value behind "After=" being all the other services in
> the same target, which should reduce parallelism and improve
> PostgreSQL's startup behaviour.
> A more advanced variant of that would be to create a new
> systemd target, make that start "After" multiuser.target
> or even graphical.target (depending on your setup), make sure
> it "Requires" the current default systemd target and make
> postgresql the only additional service in that target.
> (This would be the cleanest solution, but you should get some
> grasp of systemd and how your specific distribution uses it
> before meddling with the default targets; I don't know every
> distribution/version variant of systemd integration, so I
> can't give that specific instructions here).
> Or you figure out what the heck your machine is running
> during startup any why it is that slow, and try to fix that.
> 
> Regards,
> Christoph

Thanks for your very detailed answer, that helped me a lot. I've
increased `TimeoutSec=` to infinity in the systemd service since it was
set initially to 120 seconds which apparently wasn't enough for my poor
VPS with 2G RAM and 1 CPU core. That worked great, I still feel like I
have slow startups but at least PostgreSQL doesn't totally fail to start
on boot.

I'll try to debug the slow startups on my own, thanks again for
everything!

Doron.