Обсуждение: Possible better pg_ctl start/stop handling?

Поиск
Список
Период
Сортировка

Possible better pg_ctl start/stop handling?

От
"Joshua D. Drake"
Дата:
Hello,

Interesting problem with pg_ctl. We have ran into this consistently as I 
am sure a lot of other people have. If PostgreSQL does not get shutdown
correctly, the postmaster.pid file is still in PGDATA. This of course
causing problems starting up (and it should).

However it seems that pg_ctl if issued a stop should be able to remove 
the file. Below is a speicifc example:

bash-3.00$ bin/pg_ctl -D data start
pg_ctl: another postmaster may be running; trying to start postmaster anyway
LOG:  could not load root certificate file "root.crt": No such file or 
directory
DETAIL:  Will not verify client certificates.
FATAL:  pre-existing shared memory block (key 5432001, ID 19202077) is 
still in use
HINT:  If you're sure there are no old server processes still running, 
remove the shared memory block with the command "ipcclean", "ipcrm", or 
just delete the file "postmaster.pid".
pg_ctl: could not start postmaster
Examine the log output.
bash-3.00$ bin/pg_ctl -D data stop
pg_ctl: could not send stop signal (PID: 10180): No such process
bash-3.00$

As we can see pg_ctl knows that the PID does not exist. If the PID does 
not exist is it safe to assume that we can remove the file? So that we 
may start again?

Sincerely,

Joshua D. Drake


-- 
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


Re: Possible better pg_ctl start/stop handling?

От
Tom Lane
Дата:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> FATAL:  pre-existing shared memory block (key 5432001, ID 19202077) is 
> still in use
> HINT:  If you're sure there are no old server processes still running, 
> remove the shared memory block with the command "ipcclean", "ipcrm", or 
> just delete the file "postmaster.pid".

> As we can see pg_ctl knows that the PID does not exist. If the PID does 
> not exist is it safe to assume that we can remove the file? So that we 
> may start again?

The error message is warning you that there appear to still be live
backends in the data directory, even though the original postmaster
process is gone (crashed?).  If that is the case, forcibly starting a
new postmaster is a fine recipe for creating unrecoverable data
corruption.  So having pg_ctl auto-remove the file is horribly dangerous
and is NOT going to happen.

How did you get into this state anyway?
        regards, tom lane


Re: Possible better pg_ctl start/stop handling?

От
"Joshua D. Drake"
Дата:
Tom Lane wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
> 
>>FATAL:  pre-existing shared memory block (key 5432001, ID 19202077) is 
>>still in use
>>HINT:  If you're sure there are no old server processes still running, 
>>remove the shared memory block with the command "ipcclean", "ipcrm", or 
>>just delete the file "postmaster.pid".
> 
> 
>>As we can see pg_ctl knows that the PID does not exist. If the PID does 
>>not exist is it safe to assume that we can remove the file? So that we 
>>may start again?
> 
> 
> The error message is warning you that there appear to still be live
> backends in the data directory, even though the original postmaster
> process is gone (crashed?). 

Yes I am aware of that. My actual point was that pg_ctl test to see if 
the process is alive when you issue the stop. It comes back with the 
error that the PID is no longer available to kill.

I was just wondering if we could make pg_ctl a little smarter as all.
If pg_ctl can't start because the pid file exists, test for the 
existence of the pid, if the pid does not exist test for the existence
of **any** postgres process (grep? egad...), if none exists overwrite 
the pid file and start?

 If that is the case, forcibly starting a
> new postmaster is a fine recipe for creating unrecoverable data
> corruption.  So having pg_ctl auto-remove the file is horribly dangerous
> and is NOT going to happen.

Please understand my thought was not coming lightly. I recognize very 
well (as I have had to deal with customers who have done it) the dangers 
here.

> 
> How did you get into this state anyway?

Power off on a dev machine ;)

Sincerely,

Joshua D. Drake


> 
>             regards, tom lane


-- 
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


Re: Possible better pg_ctl start/stop handling?

От
Tom Lane
Дата:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> I was just wondering if we could make pg_ctl a little smarter as all.
> If pg_ctl can't start because the pid file exists, test for the 
> existence of the pid, if the pid does not exist test for the existence
> of **any** postgres process (grep? egad...), if none exists overwrite 
> the pid file and start?

This cannot be any smarter than the existing test in the postmaster,
and is most likely to be much stupider.


>> How did you get into this state anyway?

> Power off on a dev machine ;)

Does the dev machine run more than one postmaster?  I've occasionally
seen similar issues when restarting a clutch of dev postmasters ---
the kernel may assign a shmem id to one of them that belonged to another
one in the previous cycle, and if you already started that other one
then the second gets confused.  8.0 and up have a test that should deal
correctly with this; what version did you see failing exactly?
        regards, tom lane


Re: Possible better pg_ctl start/stop handling?

От
"Joshua D. Drake"
Дата:
>>Power off on a dev machine ;)
> 
> 
> Does the dev machine run more than one postmaster?

No.
  I've occasionally
> seen similar issues when restarting a clutch of dev postmasters ---
> the kernel may assign a shmem id to one of them that belonged to another
> one in the previous cycle, and if you already started that other one
> then the second gets confused.  8.0 and up have a test that should deal
> correctly with this; what version did you see failing exactly?

This is on my personal dev machine and I am running 8.1Dev.

Sincerely,

Joshua D. Drake


> 
>             regards, tom lane


-- 
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


Re: Possible better pg_ctl start/stop handling?

От
Tom Lane
Дата:
"Joshua D. Drake" <jd@commandprompt.com> writes:
>> Does the dev machine run more than one postmaster?

> No.

Hmm, it should be pretty impossible to see this if the machine's just
been rebooted and there are no other postmasters running.  If you can
replicate it, could you send along the output of "ipcs -m -a" along
with the contents of the postmaster.pid file?  Also, what's the platform
exactly?
        regards, tom lane


Re: Possible better pg_ctl start/stop handling?

От
"Joshua D. Drake"
Дата:
Tom Lane wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
> 
>>>Does the dev machine run more than one postmaster?
> 
> 
>>No.
> 
> 
> Hmm, it should be pretty impossible to see this if the machine's just
> been rebooted

It wasn't a reboot it was a total power loss and then startup.
 and there are no other postmasters running.  If you can
> replicate it, could you send along the output of "ipcs -m -a" along
> with the contents of the postmaster.pid file?

I will give it a shot a little later today.
  Also, what's the platform
> exactly?

FC3.

J


> 
>             regards, tom lane


-- 
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/