Обсуждение: standby shutdown

Поиск
Список
Период
Сортировка

standby shutdown

От
Ray Stell
Дата:
I need to do OS maintenance on a system running a standby in recovery
using pg_standby.  I've gotten mixed results with doing the stop/start
with pg_ctl.  Sometimes the standby is not recoverable.  What should
the shutdown script look like if the standby is to be usable on the
other side?  The log seems to indicate that interuption of recovery
is bad.  What should the script look for to avoid that?

"-m i" left it in this state:

2009-05-01 00:42:15 EDT,0, LOG:  database system was interrupted while in recovery at log time 2009-05-01 00:34:06 EDT
2009-05-01 00:42:15 EDT,0, HINT:  If this has occurred more than once some data might be corrupted and you might need
tochoose an earlier recovery target. 
2009-05-01 00:42:15 EDT,0, LOG:  could not open file "pg_xlog/00000001000000000000000A" (log file 0, segment 10): No
suchfile or directory 
2009-05-01 00:42:15 EDT,0, LOG:  invalid primary checkpoint record
2009-05-01 00:42:15 EDT,0, LOG:  could not open file "pg_xlog/000000010000000000000009" (log file 0, segment 9): No
suchfile or directory 
2009-05-01 00:42:15 EDT,0, LOG:  invalid secondary checkpoint record
2009-05-01 00:42:15 EDT,0, PANIC:  could not locate a valid checkpoint record
2009-05-01 00:42:14 EDT,0, LOG:  startup process (PID 25121) was terminated by signal 6: Aborted
2009-05-01 00:42:14 EDT,0, LOG:  aborting startup due to startup process failure


Re: standby shutdown

От
Robert Treat
Дата:
On Friday 01 May 2009 09:14:40 Ray Stell wrote:
> I need to do OS maintenance on a system running a standby in recovery
> using pg_standby.  I've gotten mixed results with doing the stop/start
> with pg_ctl.  Sometimes the standby is not recoverable.  What should
> the shutdown script look like if the standby is to be usable on the
> other side?  The log seems to indicate that interuption of recovery
> is bad.  What should the script look for to avoid that?
>
> "-m i" left it in this state:
>

First tip, dont use -i, it isn't needed. I'd try withought any flag and work
up if needed. Be patient unless you can't afford to be.

> 2009-05-01 00:42:15 EDT,0, LOG:  database system was interrupted while in
> recovery at log time 2009-05-01 00:34:06 EDT 2009-05-01 00:42:15 EDT,0,
> HINT:  If this has occurred more than once some data might be corrupted and
> you might need to choose an earlier recovery target. 2009-05-01 00:42:15
> EDT,0, LOG:  could not open file "pg_xlog/00000001000000000000000A" (log
> file 0, segment 10): No such file or directory 2009-05-01 00:42:15 EDT,0,
> LOG:  invalid primary checkpoint record 2009-05-01 00:42:15 EDT,0, LOG:
> could not open file "pg_xlog/000000010000000000000009" (log file 0, segment
> 9): No such file or directory 2009-05-01 00:42:15 EDT,0, LOG:  invalid
> secondary checkpoint record 2009-05-01 00:42:15 EDT,0, PANIC:  could not
> locate a valid checkpoint record 2009-05-01 00:42:14 EDT,0, LOG:  startup
> process (PID 25121) was terminated by signal 6: Aborted 2009-05-01 00:42:14
> EDT,0, LOG:  aborting startup due to startup process failure

This indicates it couldn't find the files it was looking for rather than
something being necessarily broken. This makes me wonder about your xlog
retention policy; it sounds like you might be deleting xlogs more agressivly
than you should be. I'd suggest you look into the %r option for pg_standby.
HTH.

--
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.com

Re: standby shutdown

От
Ray Stell
Дата:
On Sat, May 02, 2009 at 12:13:42AM -0400, Robert Treat wrote:
>
> First tip, dont use -i, it isn't needed. I'd try withought any flag and work
> up if needed. Be patient unless you can't afford to be.



thanks, Robert, I assume you mean don't use -m i, I don't see a -i anywhere.

So, you are saying just shutdown, but it is my experience that this might not
work and I need to script this for the operator to do the maintenance.

Here I get:

$ pg_ctl stop -D /data/pgsql/alerts_oamp/
waiting for server to shut down............................................................... failed
pg_ctl: server does not shut down

and ps says the postmaster is still up:

ps -ef | grep 502
502      12914     1  0 08:20 pts/0    00:00:00 /usr/local/pgsql836/bin/postgres -D /data/pgsql/alerts_oamp
502      12915 12914  0 08:20 ?        00:00:00 postgres: logger process
502      12916 12914  0 08:20 ?        00:00:01 postgres: startup process   waiting for 00000001000000000000000E
502      13064 12916  0 08:29 ?        00:00:00 sh -c /usr/local/pgsql/bin/pg_standby  /data/pgsql/wals/alerts_oamp
00000001000000000000000Epg_xlog/RECOVERYXLOG 00000001000000000000000C >> /home/postgresql/log/alerts_oamp/recovery.log 
502      13065 13064  0 08:29 ?        00:00:00 /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp
00000001000000000000000Epg_xlog/RECOVERYXLOG 00000001000000000000000C 


seems like I need more that patience here, but I don't know what?  The log doesn't help much:

2009-05-04 08:20:53 EDT,0, LOG:  restored log file "00000001000000000000000C" from archive
2009-05-04 08:20:53 EDT,0, LOG:  restored log file "00000001000000000000000D" from archive
2009-05-04 08:20:52 EDT,0, LOG:  received smart shutdown request

should I loop in the shutdown script on "pg_ctl stop" until the postmaster pid goes away?



> This indicates it couldn't find the files it was looking for rather than
> something being necessarily broken. This makes me wonder about your xlog
> retention policy; it sounds like you might be deleting xlogs more agressivly
> than you should be. I'd suggest you look into the %r option for pg_standby.
> HTH.


I'll be the first to admit I don't really know the details of this.
I'm just taking the default, %r.  If the default is too "agressive"
to allow recovery, then why is it that the default?  Shouldn't some
behavior that allows for recovery be the default?

Re: standby shutdown

От
Ray Stell
Дата:
On Mon, May 04, 2009 at 09:06:48AM -0400, Ray Stell wrote:
>
> So, you are saying just shutdown, but it is my experience that this might not
> work and I need to script this for the operator to do the maintenance.


Could not get the standby to go down without flags, which makes me
nervous since Robert seems to indicate that it should.   I tried "-m f"
and this seemed to work.  I'm able to restart recovery.  I suppose
there might still be some timing issues with xlogs deletion waiting to
blow my foot off.

What value do people pass to %r as a general rule?
But more importantly, how do you measure what will not fail?

Re: standby shutdown

От
"Joshua D. Drake"
Дата:
On Mon, 2009-05-04 at 11:22 -0400, Ray Stell wrote:
> On Mon, May 04, 2009 at 09:06:48AM -0400, Ray Stell wrote:
> >
> > So, you are saying just shutdown, but it is my experience that this might not
> > work and I need to script this for the operator to do the maintenance.
>
>
> Could not get the standby to go down without flags, which makes me
> nervous since Robert seems to indicate that it should.   I tried "-m f"
> and this seemed to work.  I'm able to restart recovery.  I suppose
> there might still be some timing issues with xlogs deletion waiting to
> blow my foot off.
>

What version of PostgreSQL is this? 8.2+ should shut down proper without
issue. 8.1 does have issues.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997