pitr replica dies on startup

Поиск
Список
Период
Сортировка
От Jeff Frost
Тема pitr replica dies on startup
Дата
Msg-id Pine.LNX.4.64.0708311523260.11463@discord.home.frostconsultingllc.com
обсуждение исходный текст
Ответы Re: pitr replica dies on startup
Список pgsql-admin
Hi guys, I've inherited a PITR continuous recovery master/standby server pair.
The continuous recovery and loading of the xlogs seems to work fine, but when
I opted to test the replica bring up, it falls over with signal 6.

Here's an excerpt from the log with log levels set up to debug5:

DEBUG:  max_safe_fds = 984, usable_fds = 1000, already_open = 6
LOG:  database system was interrupted at 2007-08-31 15:09:40 PDT
LOG:  starting archive recovery
LOG:  restore_command = "/data/pg/data/recovery.py /data/pg/wal/%f "%p""
DEBUG:  executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001.history "pg_xlog/RECOVERYHISTORY""
DEBUG:  could not restore file "00000001.history" from archive: return code 0
DEBUG:  executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004C.00000020.backup
"pg_xlog/RECOVERYHISTORY""
LOG:  restored log file "00000001000000000000004C.00000020.backup" from
archive
DEBUG:  executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004C "pg_xlog/RECOVERYXLOG""
LOG:  restored log file "00000001000000000000004C" from archive
LOG:  checkpoint record is at 0/4C000020
LOG:  redo record is at 0/4C000020; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 0/1222; next OID: 24576
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  automatic recovery in progress
LOG:  redo starts at 0/4C000070
DEBUG:  executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004D "pg_xlog/RECOVERYXLOG""
LOG:  restored log file "00000001000000000000004D" from archive
DEBUG:  recovery restart point at 0/4D000020
CONTEXT:  xlog redo checkpoint: redo 0/4D000020; undo 0/0; tli 1; xid 0/1232;
oid 24576; multi 1; offset 0; online
DEBUG:  executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004E "pg_xlog/RECOVERYXLOG""
LOG:  restored log file "00000001000000000000004E" from archive
DEBUG:  recovery restart point at 0/4E000020
CONTEXT:  xlog redo checkpoint: redo 0/4E000020; undo 0/0; tli 1; xid 0/1242;
oid 24576; multi 1; offset 0; online
DEBUG:  executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004F "pg_xlog/RECOVERYXLOG""
LOG:  restored log file "00000001000000000000004F" from archive
DEBUG:  recovery restart point at 0/4F000020
CONTEXT:  xlog redo checkpoint: redo 0/4F000020; undo 0/0; tli 1; xid 0/1252;
oid 24576; multi 1; offset 0; online
DEBUG:  executing restore command "/data/pg/data/recovery.py
/data/pg/wal/000000010000000000000050 "pg_xlog/RECOVERYXLOG""
DEBUG:  could not restore file "000000010000000000000050" from archive: return
code 256
LOG:  could not open file "pg_xlog/000000010000000000000050" (log file 0,
segment 80): No such file or directory
LOG:  redo done at 0/4F000070
DEBUG:  executing restore command "/data/pg/data/recovery.py
/data/pg/wal/00000001000000000000004F "pg_xlog/RECOVERYXLOG""
DEBUG:  could not restore file "00000001000000000000004F" from archive: return
code 256
PANIC:  could not open file "pg_xlog/00000001000000000000004F" (log file 0,
segment 79): No such file or directory
DEBUG:  reaping dead processes
LOG:  startup process (PID 31211) was terminated by signal 6
LOG:  aborting startup due to startup process failure
DEBUG:  proc_exit(1)
DEBUG:  shmem_exit(1)
DEBUG:  exit(1)

You can see the rest of the log file here:

http://pgsql.privatepaste.com/841mOR5gzm

It seems like everything is happy, except it seems to ask for the 4F log file
more than once.

Directory permissions look good:

total 45
-rw------- 1 postgres postgres     4 Aug 31 11:13 PG_VERSION
-rw------- 1 postgres postgres   167 Aug 31 15:09 backup_label.old
drwx------ 5 postgres postgres   120 Aug 31 14:59 base
drwx------ 2 postgres postgres   768 Aug 31 15:08 global
drwx------ 2 postgres postgres    72 Aug 31 11:13 pg_clog
-rw------- 1 postgres postgres  3279 Aug 31 11:16 pg_hba.conf
-rw------- 1 postgres postgres  1460 Aug 31 11:14 pg_ident.conf
drwx------ 4 postgres postgres    96 Aug 31 11:13 pg_multixact
drwx------ 2 postgres postgres    72 Aug 31 11:13 pg_subtrans
drwx------ 2 postgres postgres    48 Aug 31 11:13 pg_tblspc
drwx------ 2 postgres postgres    48 Aug 31 11:13 pg_twophase
drwx------ 3 postgres postgres   112 Aug 31 15:24 pg_xlog
-rw-r--r-- 1 postgres postgres 15370 Aug 31 14:57 postgresql.conf
-rw------- 1 postgres postgres    35 Aug 31 15:09 postmaster.opts
-rw-r--r-- 1 postgres postgres    67 Aug 31 13:40 recovery.conf
-rwxr-xr-x 1 postgres postgres  1384 Aug 31 13:40 recovery.py

I verified that the files are really being copied into
pg_xlog/RECOVERYXLOG.

This is 8.2.4 on Gentoo, but I observed the same behavior with 8.2.3 on
Gentoo.

We've md5sum'd the postgres binary, diffed the output of pg_config and
everything is the same.

I tried starting up postgres in a newly initdb'd directory and it starts up
fine.  I also tried using the conf files from the 'broken' directory in this
new directory and it starts up fine, but if I try to replicate into this
directory, postgres exits with signal 6 just the same.

Conversely, using a default postgresql.conf file in the 'broken' directory
leaves it just as broken.

The master can start/stop just fine with no problems.

What else to look at?  I do have a core file, back trace can be found here:

http://pgsql.privatepaste.com/8f0WDAAzz9

Any suggestions much appreciated!

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

В списке pgsql-admin по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: How to monitor resources on Linux.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pitr replica dies on startup