Re: standby waiting for what?

Поиск
Список
Период
Сортировка
От Ray Stell
Тема Re: standby waiting for what?
Дата
Msg-id 20090306161908.GA27527@cns.vt.edu
обсуждение исходный текст
Ответ на Re: standby waiting for what?  (Ray Stell <stellr@cns.vt.edu>)
Список pgsql-admin
On Wed, Mar 04, 2009 at 03:14:51PM -0500, Ray Stell wrote:
> On Wed, Mar 04, 2009 at 03:06:12PM -0500, Ray Stell wrote:
> > Testing pg_standby in 8.3.6.  I've gotten this standby into some sort of
> > bind.  It seems like it may be waiting for some WAL.   How can I tell
> > what it is waiting on?  I don't really know how this works, so I may
>
>
> say something silly.  The standby log says:
>
> ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,1,2009-03-04 12:23:01 EST,0, LOG:  database system was interrupted;
lastknown up at 2009-03-04 12:20:29 EST 
> ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,2,2009-03-04 12:23:01 EST,0, LOG:  starting archive recovery
> ,2512,,2009-03-04 12:23:01.484 EST,49aeb8f5.9d0,3,2009-03-04 12:23:01 EST,0, LOG:  restore_command =
'/usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp %f %p %r >>
/home/postgresql/log/alerts_oamp/recovery.log'
>
>
> alerts_oamp]$ cat postmaster.pid
> 2510
> /data/pgsql/alerts_oamp
>   5498001   4194312
>
> alerts_oamp]$ ps -ef | grep 1005
> 1005       903   901  0 10:10 ?        00:00:00 sshd: postgresql@pts/0
> 1005       904   903  0 10:10 pts/0    00:00:00 -bash
> 1005      1016  1013  0 10:21 ?        00:00:00 sshd: postgresql@pts/1
> 1005      1017  1016  0 10:21 pts/1    00:00:00 -bash
> 1005      2510     1  0 12:23 pts/0    00:00:00 /usr/local/pgsql836/bin/postgres -D /data/pgsql/alerts_oamp
> 1005      2511  2510  0 12:23 ?        00:00:00 postgres: logger process
> 1005      2512  2510  0 12:23 ?        00:00:00 postgres: startup process
> 1005      2520  2512  0 12:23 ?        00:00:00 sh -c /usr/local/pgsql/bin/pg_standby  /data/pgsql/wals/alerts_oamp
00000002000000000000001C.00512178.backuppg_xlog/RECOVERYHISTORY 000000000000000000000000 >>
/home/postgresql/log/alerts_oamp/recovery.log
> 1005      2521  2520  0 12:23 ?        00:00:00 /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp
00000002000000000000001C.00512178.backuppg_xlog/RECOVERYHISTORY 000000000000000000000000 
> 1005      2615  1017  0 12:27 pts/1    00:00:00 tail -f alerts_oamp-2009-03-04_122301.log
> 1005      3271   904  0 15:11 pts/0    00:00:00 ps -ef
> 1005      3272   904  0 15:11 pts/0    00:00:00 grep 1005
>
> alerts_oamp]$ ls -l /data/pgsql/wals/alerts_oamp/
> total 114828
> -rw------- 1 postgresql postgresql 16777216 Mar  4 11:28 00000002000000000000001A
> -rw------- 1 postgresql postgresql 16777216 Mar  4 11:29 00000002000000000000001B
> -rw------- 1 postgresql postgresql 16777216 Mar  4 12:24 00000002000000000000001C
> -rw------- 1 postgresql postgresql 16777216 Mar  4 12:25 00000002000000000000001D
> -rw------- 1 postgresql postgresql 16777216 Mar  4 12:26 00000002000000000000001E
> -rw------- 1 postgresql postgresql 16777216 Mar  4 14:45 00000002000000000000001F
> -rw------- 1 postgresql postgresql 16777216 Mar  4 14:45 000000020000000000000020
>
> any ideas what this guy is hurt by?


I stubbled into the source of the problem.  I hope somebody who knows the code can explain.
I decided to bounce the primary just to see if it would make a difference in the standby.
The primary would not restart:

,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,2,2009-03-06 10:34:01 EST,0, LOG:  could not open file
"pg_xlog/00000002000000000000001C"(log file 0, segment 28): No such file or directory 
,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,3,2009-03-06 10:34:01 EST,0, LOG:  invalid checkpoint record
,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,4,2009-03-06 10:34:01 EST,0, PANIC:  could not locate required
checkpointrecord 
,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,5,2009-03-06 10:34:01 EST,0, HINT:  If you are not restoring from a
backup,try removing the file "/data/pgsql/alerts_oamp/backup_label". 
,3093,,2009-03-06 10:34:01.910 EST,49b14269.c15,1,2009-03-06 10:34:01 EST,0, LOG:  startup process (PID 3095) was
terminatedby signal 6: Aborted 

So, I removed that file and restarted.  Rebuilt the standby and all is well.  So, why did that file muck up the standby
and
change the value pg was passing to pg_standby?

Thanks, looking forward to 8.4!

В списке pgsql-admin по дате отправления:

Предыдущее
От: Carol Walter
Дата:
Сообщение: Re: Default text_serach_config
Следующее
От: Jakov Sosic
Дата:
Сообщение: Re: using pgdg repo