Обсуждение: Possible better pg_ctl start/stop handling?
Hello, Interesting problem with pg_ctl. We have ran into this consistently as I am sure a lot of other people have. If PostgreSQL does not get shutdown correctly, the postmaster.pid file is still in PGDATA. This of course causing problems starting up (and it should). However it seems that pg_ctl if issued a stop should be able to remove the file. Below is a speicifc example: bash-3.00$ bin/pg_ctl -D data start pg_ctl: another postmaster may be running; trying to start postmaster anyway LOG: could not load root certificate file "root.crt": No such file or directory DETAIL: Will not verify client certificates. FATAL: pre-existing shared memory block (key 5432001, ID 19202077) is still in use HINT: If you're sure there are no old server processes still running, remove the shared memory block with the command "ipcclean", "ipcrm", or just delete the file "postmaster.pid". pg_ctl: could not start postmaster Examine the log output. bash-3.00$ bin/pg_ctl -D data stop pg_ctl: could not send stop signal (PID: 10180): No such process bash-3.00$ As we can see pg_ctl knows that the PID does not exist. If the PID does not exist is it safe to assume that we can remove the file? So that we may start again? Sincerely, Joshua D. Drake -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
"Joshua D. Drake" <jd@commandprompt.com> writes: > FATAL: pre-existing shared memory block (key 5432001, ID 19202077) is > still in use > HINT: If you're sure there are no old server processes still running, > remove the shared memory block with the command "ipcclean", "ipcrm", or > just delete the file "postmaster.pid". > As we can see pg_ctl knows that the PID does not exist. If the PID does > not exist is it safe to assume that we can remove the file? So that we > may start again? The error message is warning you that there appear to still be live backends in the data directory, even though the original postmaster process is gone (crashed?). If that is the case, forcibly starting a new postmaster is a fine recipe for creating unrecoverable data corruption. So having pg_ctl auto-remove the file is horribly dangerous and is NOT going to happen. How did you get into this state anyway? regards, tom lane
Tom Lane wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > >>FATAL: pre-existing shared memory block (key 5432001, ID 19202077) is >>still in use >>HINT: If you're sure there are no old server processes still running, >>remove the shared memory block with the command "ipcclean", "ipcrm", or >>just delete the file "postmaster.pid". > > >>As we can see pg_ctl knows that the PID does not exist. If the PID does >>not exist is it safe to assume that we can remove the file? So that we >>may start again? > > > The error message is warning you that there appear to still be live > backends in the data directory, even though the original postmaster > process is gone (crashed?). Yes I am aware of that. My actual point was that pg_ctl test to see if the process is alive when you issue the stop. It comes back with the error that the PID is no longer available to kill. I was just wondering if we could make pg_ctl a little smarter as all. If pg_ctl can't start because the pid file exists, test for the existence of the pid, if the pid does not exist test for the existence of **any** postgres process (grep? egad...), if none exists overwrite the pid file and start? If that is the case, forcibly starting a > new postmaster is a fine recipe for creating unrecoverable data > corruption. So having pg_ctl auto-remove the file is horribly dangerous > and is NOT going to happen. Please understand my thought was not coming lightly. I recognize very well (as I have had to deal with customers who have done it) the dangers here. > > How did you get into this state anyway? Power off on a dev machine ;) Sincerely, Joshua D. Drake > > regards, tom lane -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
"Joshua D. Drake" <jd@commandprompt.com> writes: > I was just wondering if we could make pg_ctl a little smarter as all. > If pg_ctl can't start because the pid file exists, test for the > existence of the pid, if the pid does not exist test for the existence > of **any** postgres process (grep? egad...), if none exists overwrite > the pid file and start? This cannot be any smarter than the existing test in the postmaster, and is most likely to be much stupider. >> How did you get into this state anyway? > Power off on a dev machine ;) Does the dev machine run more than one postmaster? I've occasionally seen similar issues when restarting a clutch of dev postmasters --- the kernel may assign a shmem id to one of them that belonged to another one in the previous cycle, and if you already started that other one then the second gets confused. 8.0 and up have a test that should deal correctly with this; what version did you see failing exactly? regards, tom lane
>>Power off on a dev machine ;) > > > Does the dev machine run more than one postmaster? No. I've occasionally > seen similar issues when restarting a clutch of dev postmasters --- > the kernel may assign a shmem id to one of them that belonged to another > one in the previous cycle, and if you already started that other one > then the second gets confused. 8.0 and up have a test that should deal > correctly with this; what version did you see failing exactly? This is on my personal dev machine and I am running 8.1Dev. Sincerely, Joshua D. Drake > > regards, tom lane -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
"Joshua D. Drake" <jd@commandprompt.com> writes: >> Does the dev machine run more than one postmaster? > No. Hmm, it should be pretty impossible to see this if the machine's just been rebooted and there are no other postmasters running. If you can replicate it, could you send along the output of "ipcs -m -a" along with the contents of the postmaster.pid file? Also, what's the platform exactly? regards, tom lane
Tom Lane wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > >>>Does the dev machine run more than one postmaster? > > >>No. > > > Hmm, it should be pretty impossible to see this if the machine's just > been rebooted It wasn't a reboot it was a total power loss and then startup. and there are no other postmasters running. If you can > replicate it, could you send along the output of "ipcs -m -a" along > with the contents of the postmaster.pid file? I will give it a shot a little later today. Also, what's the platform > exactly? FC3. J > > regards, tom lane -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/