Обсуждение: Automatically starting postmaster after system crash
Sorry if this is a FAQ, but I couldn't find it. If my (RH 7.1) system crashes PostgreSQL does not restart automatically because the shared memory segment identifier and the .pid file remains, as a manual start explains: % pg_ctl start pg_ctl: Another postmaster may be running. Trying to start postmaster anyway. Found a pre-existing shared memory block (ID 693600256) still in use. If you're sure there are no old backends still running, remove the shared memory block with ipcrm(1), or just delete "/var/lib/pgsql/data/postmaster.pid". pg_ctl: cannot start postmaster Examine the log output. What is the "proper" way of ensuring (as far as possible) that PostgreSQL starts automatically after a crash? Is it sufficient (and safe) to include a 'rm -f $PGDATA/postmaster.pid' in the system boot scripts? Allan.
Allan Engelhardt <allane@cybaea.com> writes:
> If my (RH 7.1) system crashes PostgreSQL does not restart automatically
> because the shared memory segment identifier and the .pid file remains,
That's kinda hard to believe; how would a shared memory segment survive
a system crash?
> % pg_ctl start
> pg_ctl: Another postmaster may be running. Trying to start postmaster
> anyway.
> Found a pre-existing shared memory block (ID 693600256) still in use.
Darn, I thought we had fixed that class of problems. Would you try
tracing through SharedMemoryIsInUse() to figure out why it thinks that?
It could be that there's some platform-specific variation of shmctl()
behavior that we need to cater for.
> What is the "proper" way of ensuring (as far as possible) that
> PostgreSQL starts automatically after a crash? Is it sufficient (and
> safe) to include a 'rm -f $PGDATA/postmaster.pid' in the system boot
> scripts?
You can do that if you want, but MHO is that this is a bug we need to
fix.
regards, tom lane
Tom Lane wrote: > Allan Engelhardt <allane@cybaea.com> writes: > >>If my (RH 7.1) system crashes PostgreSQL does not restart automatically >>because the shared memory segment identifier and the .pid file remains, >> > > That's kinda hard to believe; how would a shared memory segment survive > a system crash? I don't think they can. Some options: (1) PostgreSQL keeps a reference to it somewhere and can get confused... (2) Red Hat's script for starting PostgreSQL at boot time, which (a) ran, (b) failed, and [Arrrgh! I *must* fix that stupid script ;-P] (c) directs all pg_ctl output (out+err) to /dev/null, somehow fubared the system. > Darn, I thought we had fixed that class of problems. Would you try > tracing through SharedMemoryIsInUse() to figure out why it thinks that? > It could be that there's some platform-specific variation of shmctl() > behavior that we need to cater for. Uhm, my system doesn't crash *that* often... :-) Seriously: I tried to reproduce using SysRq+S, SysRq+B and couldn't. I think I have seen enough fsck for one night, so I might give it a rest... >>What is the "proper" way of ensuring (as far as possible) that >>PostgreSQL starts automatically after a crash? Is it sufficient (and >>safe) to include a 'rm -f $PGDATA/postmaster.pid' in the system boot >>scripts? >> > > You can do that if you want, but MHO is that this is a bug we need to > fix. I'll see what I can do about reproducing it... Allan
Allan Engelhardt <allane@cybaea.com> writes:
> Tom Lane wrote:
>> That's kinda hard to believe; how would a shared memory segment survive
>> a system crash?
> I don't think they can. Some options:
> (1) PostgreSQL keeps a reference to it somewhere and can get confused...
Indeed, there is a reference to the old segment in the postmaster.pid
file. At startup, if there's a postmaster.pid file, Postgres checks to
see that the indicated shared memory segment is gone or at least has no
processes attached to it. (This is a defense against the possibility
that the old postmaster died but there are still backends running in
the database.) Evidently, that check is mistakenly thinking that there
*is* still a shmem seg with attached processes. Question is why?
> Seriously: I tried to reproduce using SysRq+S, SysRq+B and couldn't. I
> think I have seen enough fsck for one night, so I might give it a rest...
You might try just kill -9'ing the postmaster, rather than physically
rebooting your system.
regards, tom lane