Обсуждение: why not kill -9 postmaster
Its always said that don't kill -9 postmaster.
Whats the reason not to do it. Why is it so strictly prohibited?
Thanks,
~Harpreet.
Harpreet Dhaliwal writes: > Its always said that don't kill -9 postmaster. > Whats the reason not to do it. Why is it so strictly prohibited? ,----[ <http://www.postgresql.org/docs/8.1/static/postmaster-shutdown.html#AEN18182> ] | It is best not to use SIGKILL to shut down the server. Doing so will | prevent the server from releasing shared memory and semaphores, | which may then have to be done manually before a new server can be | started. Furthermore, SIGKILL kills the postmaster process without | letting it relay the signal to its subprocesses, so it will be | necessary to kill the individual subprocesses by hand as well. `---- regards, andreas
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/20/06 05:27, Andreas Seltenreich wrote: > Harpreet Dhaliwal writes: > >> Its always said that don't kill -9 postmaster. >> Whats the reason not to do it. Why is it so strictly prohibited? > > ,----[ <http://www.postgresql.org/docs/8.1/static/postmaster-shutdown.html#AEN18182> ] > | It is best not to use SIGKILL to shut down the server. Doing so will > | prevent the server from releasing shared memory and semaphores, > | which may then have to be done manually before a new server can be > | started. Furthermore, SIGKILL kills the postmaster process without > | letting it relay the signal to its subprocesses, so it will be > | necessary to kill the individual subprocesses by hand as well. > `---- But it can't be fatal, can it? After all, that's what a system crash is, right? - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFOK8XS9HxQb37XmcRAsUMAKCptETkgCvdbhaxyvqhCryYAo3GtgCfUmqt J41C6cs+rk7+h993Qh0pUMI= =OJsz -----END PGP SIGNATURE-----
Ron Johnson writes: > On 10/20/06 05:27, Andreas Seltenreich wrote: >> ,----[ <http://www.postgresql.org/docs/8.1/static/postmaster-shutdown.html#AEN18182> ] >> | It is best not to use SIGKILL to shut down the server. Doing so will >> | prevent the server from releasing shared memory and semaphores, >> | which may then have to be done manually before a new server can be >> | started. Furthermore, SIGKILL kills the postmaster process without >> | letting it relay the signal to its subprocesses, so it will be >> | necessary to kill the individual subprocesses by hand as well. >> `---- > > But it can't be fatal, can it? While it could be fixed by hand, the list archives tell that it was fatal enough for some to shoot themselves in their feet. > After all, that's what a system crash is, right? A system crash is safer in that it won't leave orphaned child processes or IPC/synchronization resources around, making it more comparable to a SIGQUIT than a SIGKILL. regards, andreas
Вложения
Am Freitag, 20. Oktober 2006 13:12 schrieb Ron Johnson: > But it can't be fatal, can it? After all, that's what a system > crash is, right? Perhaps we should add another tip not to crash the system. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Andreas Seltenreich wrote: > Ron Johnson writes: > >> On 10/20/06 05:27, Andreas Seltenreich wrote: >>> ,----[ <http://www.postgresql.org/docs/8.1/static/postmaster-shutdown.html#AEN18182> ] >>> | It is best not to use SIGKILL to shut down the server. Doing so will >>> | prevent the server from releasing shared memory and semaphores, >>> | which may then have to be done manually before a new server can be >>> | started. Furthermore, SIGKILL kills the postmaster process without >>> | letting it relay the signal to its subprocesses, so it will be >>> | necessary to kill the individual subprocesses by hand as well. >>> `---- >> But it can't be fatal, can it? > > While it could be fixed by hand, the list archives tell that it was > fatal enough for some to shoot themselves in their feet. > >> After all, that's what a system crash is, right? > > A system crash is safer in that it won't leave orphaned child > processes or IPC/synchronization resources around, making it more > comparable to a SIGQUIT than a SIGKILL. > The one thing worse than kill -9 the postmaster is pulling the power cord out of the server. Which is what makes UPS's so good. If your server is changing the data file on disk and you pull the power cord, what chance do you expect of reading that data file again? While every attempt is made to make the server as reliable as possible and to be able to recover as much as possible when things go wrong, abrupt stops (whether from kill -9 or other) at the worst time will make you dig out your backup copies or spend hours or days manually fixing what is left to get as much data as you can. If you are testing and developing that probably won't matter, but what would it cost you or your company if you lost all the data in your database? What about lost productivity during the time spent recovering? Is it worth risking all that? -- Shane Ambler Postgres@007Marketing.com Get Sheeky @ http://Sheeky.Biz
Shane Ambler wrote: > The one thing worse than kill -9 the postmaster is pulling the power > cord out of the server. Which is what makes UPS's so good. > > If your server is changing the data file on disk and you pull the power > cord, what chance do you expect of reading that data file again? 1. That's what we have WAL for. The only thing that can really kill you is the use of non-battery-backed write cache. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
>
> If your server is changing the data file on disk and you pull the power
> cord, what chance do you expect of reading that data file again?
1. That's what we have WAL for. The only thing that can really kill
you is the use of non-battery-backed write cache.
Just for information: I had to suffer numerous BOS (blue screen of death) on an W2k3 Server running PostgreSQL 8.0 and 8.1 for Windows.
Every time the database restarted without data loss and without operator invention.
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Python: the only language with more web frameworks than keywords.
On Fri, Oct 20, 2006 at 10:56:09PM +0930, Shane Ambler wrote: Someone in the thread mentioned having to clean up shared mem. I've had to do this often with oracle: root# ipcs ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0xe97c83ac 5505024 oracle 640 807403520 10 0x0052f649 3538945 postgresql600 10461184 2 ------ Semaphore Arrays -------- key semid owner perms nsems 0xfb5e028c 25690112 oracle 640 154 0x0052f649 17629185 postgresq 600 17 0x0052f64a 17661954 postgresq 600 17 0x0052f64b 17694723 postgresq 600 17 0x0052f64c 17727492 postgresq 600 17 0x0052f64d 17760261 postgresq 600 17 0x0052f64e 17793030 postgresq 600 17 0x0052f64f 17825799 postgresq 600 17 ------ Message Queues -------- key msqid owner perms used-bytes messages $ ipcrm shm 2588672 resource(s) deleted this remove example was not from the above shared mem report.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Shane Ambler wrote: >> The one thing worse than kill -9 the postmaster is pulling the power >> cord out of the server. Which is what makes UPS's so good. >> >> If your server is changing the data file on disk and you pull the power >> cord, what chance do you expect of reading that data file again? > 1. That's what we have WAL for. The only thing that can really kill > you is the use of non-battery-backed write cache. The important distinction here is "will you lose data" vs "can you start a new server without tedious manual intervention" (ipcrm etc). kill -9 won't lose data, but you may have to clean up after it. And, as Andreas already noted, some people have been seen to mess up the manual intervention part badly enough to cause data loss by themselves. Personally I think the TIP that's really needed is "never remove postmaster.pid by hand". regards, tom lane
On 10/20/06, Shane Ambler <pgsql@007marketing.com> wrote:
>> After all, that's what a system crash is, right?
>
> A system crash is safer in that it won't leave orphaned child
> processes or IPC/synchronization resources around, making it more
> comparable to a SIGQUIT than a SIGKILL.
>
The one thing worse than kill -9 the postmaster is pulling the power
cord out of the server. Which is what makes UPS's so good.
Well, I think that pulling the power cord is much safer than killing -9
the postmaster. If you pull the plug, then during bootup postgresql
will just replay every COMMITed transaction, so there won't be any
dataloss or downtime.
If you kill -9 postmaster... well, it's messy. ;-))) I feel safer when
everything goes down at the same time. ;)
If your server is changing the data file on disk and you pull the power
cord, what chance do you expect of reading that data file again?
With PostgreSQL? I expect to read all commited transactions. And
those not commited... well, they weren't commited in the first place,
so I won't see them anyway.
This is all in assumption that you are running your DB with fsync on,
on a reliable filesystem, and your hardware doesn't lie to you about
fsyncing data (and it's best if you have a battery for controller's cache).
Regards,
Dawid
Dawid Kuroczko wrote: > On 10/20/06, Shane Ambler <pgsql@007marketing.com> wrote: >> The one thing worse than kill -9 the postmaster is pulling the power >> cord out of the server. Which is what makes UPS's so good. > > > Well, I think that pulling the power cord is much safer than killing -9 > the postmaster. If you pull the plug, then during bootup postgresql > will just replay every COMMITed transaction, so there won't be any > dataloss or downtime. If you kill -9 the postmaster the system can still finish sending changes to disk and close the file but pulling the power cord can stop a write in the middle of a block giving you half new data and half old data in the one file. It's all a matter of timing. -- Shane Ambler Postgres@007Marketing.com Get Sheeky @ http://Sheeky.Biz
Shane Ambler wrote: > Dawid Kuroczko wrote: > >On 10/20/06, Shane Ambler <pgsql@007marketing.com> wrote: > > >>The one thing worse than kill -9 the postmaster is pulling the power > >>cord out of the server. Which is what makes UPS's so good. > > > > > >Well, I think that pulling the power cord is much safer than killing -9 > >the postmaster. If you pull the plug, then during bootup postgresql > >will just replay every COMMITed transaction, so there won't be any > >dataloss or downtime. > > If you kill -9 the postmaster the system can still finish sending > changes to disk and close the file but pulling the power cord can stop a > write in the middle of a block giving you half new data and half old > data in the one file. That case is protected against in the WAL code. That's what we save whole page images for. The only difference between kill -9 postmaster and abrupt shutdown, is that on the former case there may be backends that continue to run and commit transactions. Those will still be WAL-logged though. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Sat, Oct 21, 2006 at 12:20:35AM +0930, Shane Ambler wrote: > If you kill -9 the postmaster the system can still finish sending > changes to disk and close the file but pulling the power cord can stop a > write in the middle of a block giving you half new data and half old > data in the one file. Well, if you kill -9 the postmaster all the connections stay alive and stay processing tuples and writing to disk, except the coordination is gone. Some queues won't be processed, some signals will be ignored, if the postmaster pid gets reused you'll have some fun. In particular, the sinval-queue processing would break, which could lead to some interesting issues. But I expect any number of issues to start occurring. A half-written disk blocks is a solved problem, postgresql will recover from that without blinking. > It's all a matter of timing. Pulling the plug is *way* safer, it's a known quantity. As Tom said, killing the postmaster needs cleanup, and some people screwup the cleanup enough to corrupt their own data. Now: killall -9 postgres (kill the parents, all the clients, autovacuum, bgwriter, etc) all in one go is much more like a crash. But that's not what's being discussed here. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Вложения
Martijn van Oosterhout <kleptog@svana.org> writes: > Well, if you kill -9 the postmaster all the connections stay alive and > stay processing tuples and writing to disk, except the coordination is > gone. The postmaster isn't involved in any critical inter-backend coordination. If you kill -9 the postmaster *and then kill or wait out all the backends*, you won't lose data. This is not a desirable long-term operating mode, because it cripples autovacuum and some other things, but it's not dangerous. The only really serious risk I'm aware of in this scenario is: 1. DBA does "kill -9" postmaster, but some backends are still alive and processing. 2. DBA tries to start new postmaster, gets message about "shared memory segment still in use". 3. DBA does "rm postmaster.pid" (this is the step that qualifies him as an idiot). 4. DBA starts new postmaster. Since the interlock file is gone, it starts up without any awareness that there are old backends still alive. At this point, you have two separate sets of backends that are not communicating (they're using two different shared memory segments) but they are munging the same data files. It will not take long to turn the data files into irrecoverable hash --- for just one reason, transaction numbering will diverge between the two sets of backends. regards, tom lane
After all that discussion that took place while i was sleeping, I have a few more questions simply haunting me.
Someitmes, rather most of the times, when I start postgres using pg_ctl, it says antoher postmaster is running. Being a total naive about the hazzards of kill -9 postmaster, i simply used to kill -9 all postmaster related process IDs.
Now, what should i do to get rid of the postmaster that is already running from a safe perspective.
Also, even though it says, postmaster is still running, i can't start my pgadmin because it starts crying over the fact that postgres server is not running.
Another thing that worries me is the importance of postmaster.pid.
What happens if I simply do rm postmaster.pid after killing all the postmaster processes.
How big a pain in the neck is that going to be?
Thanks,
~Harpreet
Someitmes, rather most of the times, when I start postgres using pg_ctl, it says antoher postmaster is running. Being a total naive about the hazzards of kill -9 postmaster, i simply used to kill -9 all postmaster related process IDs.
Now, what should i do to get rid of the postmaster that is already running from a safe perspective.
Also, even though it says, postmaster is still running, i can't start my pgadmin because it starts crying over the fact that postgres server is not running.
Another thing that worries me is the importance of postmaster.pid.
What happens if I simply do rm postmaster.pid after killing all the postmaster processes.
How big a pain in the neck is that going to be?
Thanks,
~Harpreet
On 10/20/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Well, if you kill -9 the postmaster all the connections stay alive and
> stay processing tuples and writing to disk, except the coordination is
> gone.
The postmaster isn't involved in any critical inter-backend coordination.
If you kill -9 the postmaster *and then kill or wait out all the
backends*, you won't lose data. This is not a desirable long-term
operating mode, because it cripples autovacuum and some other things,
but it's not dangerous.
The only really serious risk I'm aware of in this scenario is:
1. DBA does "kill -9" postmaster, but some backends are still alive and
processing.
2. DBA tries to start new postmaster, gets message about "shared memory
segment still in use".
3. DBA does "rm postmaster.pid" (this is the step that qualifies him
as an idiot).
4. DBA starts new postmaster. Since the interlock file is gone, it
starts up without any awareness that there are old backends still alive.
At this point, you have two separate sets of backends that are not
communicating (they're using two different shared memory segments)
but they are munging the same data files. It will not take long
to turn the data files into irrecoverable hash --- for just one
reason, transaction numbering will diverge between the two sets of
backends.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
On 10/20/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Shane Ambler wrote: > >> The one thing worse than kill -9 the postmaster is pulling the power > >> cord out of the server. Which is what makes UPS's so good. > >> > >> If your server is changing the data file on disk and you pull the power > >> cord, what chance do you expect of reading that data file again? > > > 1. That's what we have WAL for. The only thing that can really kill > > you is the use of non-battery-backed write cache. > > The important distinction here is "will you lose data" vs "can you start > a new server without tedious manual intervention" (ipcrm etc). kill -9 > won't lose data, but you may have to clean up after it. And, as Andreas > already noted, some people have been seen to mess up the manual > intervention part badly enough to cause data loss by themselves. > Personally I think the TIP that's really needed is "never remove > postmaster.pid by hand". > When the machine crashes, don't you have to remove the pid file by hand to get the Postgres to start? I seem to remember having to do that.... - Ian Never-Say-Never Harding
"Ian Harding" <iharding@destinydata.com> writes: > On 10/20/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Personally I think the TIP that's really needed is "never remove >> postmaster.pid by hand". > When the machine crashes, don't you have to remove the pid file by > hand to get the Postgres to start? I seem to remember having to do > that.... Given a properly written startup script and a reasonably recent postmaster, that shouldn't be necessary. In any case, retrying the startup script is a *far* safer habit to develop than manually removing the pidfile (and putting an "rm" into the script itself is folly of the first magnitude). regards, tom lane
what type of start up script are you talking about here?
On 10/21/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Ian Harding" <iharding@destinydata.com> writes:
> On 10/20/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Personally I think the TIP that's really needed is "never remove
>> postmaster.pid by hand".
> When the machine crashes, don't you have to remove the pid file by
> hand to get the Postgres to start? I seem to remember having to do
> that....
Given a properly written startup script and a reasonably recent
postmaster, that shouldn't be necessary. In any case, retrying the
startup script is a *far* safer habit to develop than manually removing
the pidfile (and putting an "rm" into the script itself is folly of the
first magnitude).
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly