Обсуждение: standby shutdown
I need to do OS maintenance on a system running a standby in recovery using pg_standby. I've gotten mixed results with doing the stop/start with pg_ctl. Sometimes the standby is not recoverable. What should the shutdown script look like if the standby is to be usable on the other side? The log seems to indicate that interuption of recovery is bad. What should the script look for to avoid that? "-m i" left it in this state: 2009-05-01 00:42:15 EDT,0, LOG: database system was interrupted while in recovery at log time 2009-05-01 00:34:06 EDT 2009-05-01 00:42:15 EDT,0, HINT: If this has occurred more than once some data might be corrupted and you might need tochoose an earlier recovery target. 2009-05-01 00:42:15 EDT,0, LOG: could not open file "pg_xlog/00000001000000000000000A" (log file 0, segment 10): No suchfile or directory 2009-05-01 00:42:15 EDT,0, LOG: invalid primary checkpoint record 2009-05-01 00:42:15 EDT,0, LOG: could not open file "pg_xlog/000000010000000000000009" (log file 0, segment 9): No suchfile or directory 2009-05-01 00:42:15 EDT,0, LOG: invalid secondary checkpoint record 2009-05-01 00:42:15 EDT,0, PANIC: could not locate a valid checkpoint record 2009-05-01 00:42:14 EDT,0, LOG: startup process (PID 25121) was terminated by signal 6: Aborted 2009-05-01 00:42:14 EDT,0, LOG: aborting startup due to startup process failure
On Friday 01 May 2009 09:14:40 Ray Stell wrote: > I need to do OS maintenance on a system running a standby in recovery > using pg_standby. I've gotten mixed results with doing the stop/start > with pg_ctl. Sometimes the standby is not recoverable. What should > the shutdown script look like if the standby is to be usable on the > other side? The log seems to indicate that interuption of recovery > is bad. What should the script look for to avoid that? > > "-m i" left it in this state: > First tip, dont use -i, it isn't needed. I'd try withought any flag and work up if needed. Be patient unless you can't afford to be. > 2009-05-01 00:42:15 EDT,0, LOG: database system was interrupted while in > recovery at log time 2009-05-01 00:34:06 EDT 2009-05-01 00:42:15 EDT,0, > HINT: If this has occurred more than once some data might be corrupted and > you might need to choose an earlier recovery target. 2009-05-01 00:42:15 > EDT,0, LOG: could not open file "pg_xlog/00000001000000000000000A" (log > file 0, segment 10): No such file or directory 2009-05-01 00:42:15 EDT,0, > LOG: invalid primary checkpoint record 2009-05-01 00:42:15 EDT,0, LOG: > could not open file "pg_xlog/000000010000000000000009" (log file 0, segment > 9): No such file or directory 2009-05-01 00:42:15 EDT,0, LOG: invalid > secondary checkpoint record 2009-05-01 00:42:15 EDT,0, PANIC: could not > locate a valid checkpoint record 2009-05-01 00:42:14 EDT,0, LOG: startup > process (PID 25121) was terminated by signal 6: Aborted 2009-05-01 00:42:14 > EDT,0, LOG: aborting startup due to startup process failure This indicates it couldn't find the files it was looking for rather than something being necessarily broken. This makes me wonder about your xlog retention policy; it sounds like you might be deleting xlogs more agressivly than you should be. I'd suggest you look into the %r option for pg_standby. HTH. -- Robert Treat Conjecture: http://www.xzilla.net Consulting: http://www.omniti.com
On Sat, May 02, 2009 at 12:13:42AM -0400, Robert Treat wrote: > > First tip, dont use -i, it isn't needed. I'd try withought any flag and work > up if needed. Be patient unless you can't afford to be. thanks, Robert, I assume you mean don't use -m i, I don't see a -i anywhere. So, you are saying just shutdown, but it is my experience that this might not work and I need to script this for the operator to do the maintenance. Here I get: $ pg_ctl stop -D /data/pgsql/alerts_oamp/ waiting for server to shut down............................................................... failed pg_ctl: server does not shut down and ps says the postmaster is still up: ps -ef | grep 502 502 12914 1 0 08:20 pts/0 00:00:00 /usr/local/pgsql836/bin/postgres -D /data/pgsql/alerts_oamp 502 12915 12914 0 08:20 ? 00:00:00 postgres: logger process 502 12916 12914 0 08:20 ? 00:00:01 postgres: startup process waiting for 00000001000000000000000E 502 13064 12916 0 08:29 ? 00:00:00 sh -c /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp 00000001000000000000000Epg_xlog/RECOVERYXLOG 00000001000000000000000C >> /home/postgresql/log/alerts_oamp/recovery.log 502 13065 13064 0 08:29 ? 00:00:00 /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp 00000001000000000000000Epg_xlog/RECOVERYXLOG 00000001000000000000000C seems like I need more that patience here, but I don't know what? The log doesn't help much: 2009-05-04 08:20:53 EDT,0, LOG: restored log file "00000001000000000000000C" from archive 2009-05-04 08:20:53 EDT,0, LOG: restored log file "00000001000000000000000D" from archive 2009-05-04 08:20:52 EDT,0, LOG: received smart shutdown request should I loop in the shutdown script on "pg_ctl stop" until the postmaster pid goes away? > This indicates it couldn't find the files it was looking for rather than > something being necessarily broken. This makes me wonder about your xlog > retention policy; it sounds like you might be deleting xlogs more agressivly > than you should be. I'd suggest you look into the %r option for pg_standby. > HTH. I'll be the first to admit I don't really know the details of this. I'm just taking the default, %r. If the default is too "agressive" to allow recovery, then why is it that the default? Shouldn't some behavior that allows for recovery be the default?
On Mon, May 04, 2009 at 09:06:48AM -0400, Ray Stell wrote: > > So, you are saying just shutdown, but it is my experience that this might not > work and I need to script this for the operator to do the maintenance. Could not get the standby to go down without flags, which makes me nervous since Robert seems to indicate that it should. I tried "-m f" and this seemed to work. I'm able to restart recovery. I suppose there might still be some timing issues with xlogs deletion waiting to blow my foot off. What value do people pass to %r as a general rule? But more importantly, how do you measure what will not fail?
On Mon, 2009-05-04 at 11:22 -0400, Ray Stell wrote: > On Mon, May 04, 2009 at 09:06:48AM -0400, Ray Stell wrote: > > > > So, you are saying just shutdown, but it is my experience that this might not > > work and I need to script this for the operator to do the maintenance. > > > Could not get the standby to go down without flags, which makes me > nervous since Robert seems to indicate that it should. I tried "-m f" > and this seemed to work. I'm able to restart recovery. I suppose > there might still be some timing issues with xlogs deletion waiting to > blow my foot off. > What version of PostgreSQL is this? 8.2+ should shut down proper without issue. 8.1 does have issues. Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997