Re: Shutting down a warm standby database in 8.2beta3
От | Stephen Harris |
---|---|
Тема | Re: Shutting down a warm standby database in 8.2beta3 |
Дата | |
Msg-id | 20061122185623.GA23202@pugwash.spuddy.org обсуждение исходный текст |
Ответ на | Re: [GENERAL] Shutting down a warm standby database in 8.2beta3 (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Shutting down a warm standby database in 8.2beta3
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
On Mon, Nov 20, 2006 at 11:20:41AM -0500, Tom Lane wrote: > > kill(child_pid, SIGxxx); > #ifdef HAVE_SETSID > kill(-child_pid, SIGxxx); > #endif > > In the normal case where the child has already completed setsid(), the > extra signal sent to it should do no harm. In the startup race Hmm. It looks like something more than this may be needed. The postgres recovery process appears to be ignoring it. I ran the whole database in it's own process group (ksh runs processes in their own process group by default, so pg_ctl became the session leader and so everything under pg_ctl all stayed in that process group). % ps -o pid,ppid,pgid,args -g 29141 | sort PID PPID PGID COMMAND 29145 1 29141 /local/apps/postgres/8.2.b3.0/solaris/bin/postgres 29146 29145 29141 /local/apps/postgres/8.2.b3.0/solaris/bin/postgres 29147 29145 29141 /local/apps/postgres/8.2.b3.0/solaris/bin/postgres 29501 29147 29141 sh -c /export/home/swharris/rr 000000010000000100000057 pg_xlog/RECOVERYXLOG 29502 29501 29141 /bin/ksh -p /export/home/swharris/rr 000000010000000100000057 pg_xlog/RECOVERYX 29537 29502 29141 sleep 5 I did kill -QUIT -29141 ; sleep 1 ; touch /export/home/swharris/archives/STOP_SWEH_RECOVERY This sent the QUIT signal to all those processes. The shell script ignores it and so tries to start again, so the 'touch' command tells it to exit(1) rather than loop again. The log file (the timestamp entries are from my 'rr' program so I can see what it's doing)... To start with we see a normal recovery: Wed Nov 22 13:41:20 EST 2006: Attempting to restore 000000010000000100000056 Wed Nov 22 13:41:25 EST 2006: Finished 000000010000000100000056LOG: restored log file "000000010000000100000056" from archive Wed Nov 22 13:41:25 EST 2006: Attemptingto restore 000000010000000100000057 Wed Nov 22 13:41:25 EST 2006: Waiting for file to become available Now I send the kill signal... LOG: received immediate shutdown request We can see that the sleep process got it! /export/home/swharris/rr[37]: 29537 Quit(coredump) And my script detects the trigger file Wed Nov 22 13:43:51 EST 2006: End of recovery trigger file found Now database recovery appears to continue as normal; the postgres recovery processes are still running, despite having received SIGQUIT LOG: could not open file "pg_xlog/000000010000000100000057" (log file 1, segment 87): No such file or directory LOG: redodone at 1/56000070 Wed Nov 22 13:43:51 EST 2006: Attempting to restore 000000010000000100000056 Wed Nov 22 13:43:55 EST2006: Finished 000000010000000100000056 LOG: restored log file "000000010000000100000056" from archive LOG: archiverecovery complete LOG: database system is ready LOG: logger shutting down pg_xlog now contains 000000010000000100000056 and 000000010000000100000057 A similar sort of thing happens if I use SIGTERM rather than SIGQUIT I'm out of here in an hour, so for all you US based people, have a good Thanksgiving holiday! -- rgds Stephen
В списке pgsql-hackers по дате отправления: