question about warm standby databases in 8.2.5

Поиск
Список
Период
Сортировка
От Brett Neumeier
Тема question about warm standby databases in 8.2.5
Дата
Msg-id 5f668d330712102043m17a391c7xeac6ba135ff673c2@mail.gmail.com
обсуждение исходный текст
Ответы Re: question about warm standby databases in 8.2.5
Re: question about warm standby databases in 8.2.5
Список pgsql-general
Hi,

I set up a warm standby failover system on Redhat, using built-from-source postgresql 8.2.5 on (of course) both the master and standby systems.

The setup of the system was very easy, and the recovery script we have in place on the standby system correctly copies in the archived WAL log files, which are then applied.

What seems odd is what happens when we abort the continuous recovery so the standby database becomes primary.

It seems that the recovery command always copies the source WAL file (with a name like 00000001000000020000009C) to a file path "pg_xlog/RECOVERYXLOG", which is fine. However, then when we abort recovery, postgresql seems to expect that the most recent WAL log should be in pg_xlog with its original filename, e.g., the 0....9C filename from above.

This seems broken -- if the WAL file should wind up in the pg_xlog directory with the 0...9C name, why isn't postgresql copying it there?

Here are the log messages that show what I'm talking about. Note that everything is fine for quite a while; then we triggered the standby database to come online before 0...B4 was archived...and postgresql then bails out because 0...B3 (which has already been restored) doesn't exist!

We're working around this, for now, by having the recovery command script copy archived WAL files to the specified location pg_xlog/RECOVERYXLOG, and also to the pg_xlog directory with the file's original basename. But that seems awfully sloppy, and isn't the process documented in the manual.

Advice is eagerly solicited!

LOG:  starting archive recovery
LOG:  restore_command = "/home/pgsql/bin/recover_script.rb %f %p"
LOG:  restored log file "0000000100000002000000A1.001FAD68.backup " from archive
LOG:  restored log file "0000000100000002000000A1" from archive
LOG:  checkpoint record is at 2/A11FAD68
LOG:  redo record is at 2/A11FAD68; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 0/82464990; next OID: 45282
LOG:  next MultiXactId: 28; next MultiXactOffset: 55
LOG:  automatic recovery in progress
LOG:  redo starts at 2/A11FADB0
LOG:  restored log file "0000000100000002000000A2" from archive
[a bunch of similar messages omitted]
LOG:  restored log file "0000000100000002000000B3" from archive
LOG:  could not open file "pg_xlog/0000000100000002000000B4" (log file 2, segment 180): No such file or directory
LOG:  redo done at 2/B354BDD0
PANIC:  could not open file "pg_xlog/0000000100000002000000B3" (log file 2, segment 179): No such file or directory
LOG:  startup process (PID 17604) was terminated by signal 6
LOG:  aborting startup due to startup process failure
LOG:  database system was interrupted while in recovery at log time 2007-12-10 16:57:42 EST
HINT:  If this has occurred more than once some data may be corrupted and you may need to choose an earlier recovery target.

Cheers,

bn

--
Brett Neumeier (bneumeier@gmail.com)

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: partitioned table query question
Следующее
От: "Trevor Talbot"
Дата:
Сообщение: Re: partitioned table query question