Re: WAL Restore process during recovery

Поиск

Список

Период

Сортировка

От	Fujii Masao
Тема	Re: WAL Restore process during recovery
Дата	23 января 2012 г. 11:24:09
Msg-id	CAHGQGwFS-8i6sH0sFShy2xKiJBh=5-HSsSZeUZJSJKQZVQMN-Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: WAL Restore process during recovery (Fujii Masao <masao.fujii@gmail.com>)
Ответы	Re: WAL Restore process during recovery (Simon Riggs <simon@2ndQuadrant.com>) Re: WAL Restore process during recovery (Simon Riggs <simon@2ndQuadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, Jan 20, 2012 at 7:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Jan 20, 2012 at 7:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>> Requested update
>
> Thanks! Will review.

In StartChildProcess(), the code which emits an error when fork of walrestore
fails is required.

In reaper(), the following comment needs to be updated because an unexpected
exit (including FATAL) is treated as a crash in the patch.
/* * Was it the wal restore? If exit status is zero (normal) or one * (FATAL exit), we assume everything
isall right just like normal * backends. */ if (pid == WalRestorePID)

Why does walrestore need to be invoked even when restore_command is
not specified? It seems to be useless. We invoke walreceiver only when
primary_conninfo is specified now. Similarly we should invoke walrestore
only when restore_command is specified?

When I set up the file-based log-shipping environment using pg_standby,
ran "pgbench -i -s2", waited for walrestore to restore at least one WAL
file, and created the trigger file, then I encounterd the following error in
the standby.
sby LOG: startup process requests 000000010000000000000003 from archive trigger file found: smart failover sby
LOG: startup process sees last file was 000000010000000000000003 sby FATAL: could not rename file
"pg_xlog/RECOVERYXLOG"to

"pg_xlog/000000010000000000000003": No such file or directory sby LOG: startup process (PID 11079) exited with exit
code1 sby LOG: terminating any other active server processes

When I set up streaming replication with setting restore_command,
I got the following messages repeatedly. The WAL file name was always
"000000000000000000000000".
sby1 LOG: walrestore checking for next file to restore sby1 LOG: restore of 000000000000000000000000 is already
complete,so sleep

In PostmasterStateMachine(), the following comment needs to mention WALRestore.
* PM_WAIT_READONLY state ends when we have no regular backends that * have been started during recovery. We
killthe startup and * walreceiver processes and transition to PM_WAIT_BACKENDS. Ideally,

In walrestore.c, the following comments seem to be incorrect. At least
an unexpected
exit of WALRestore doesn't start a recovery cycle in the patch.
* If the WAL restore exits unexpectedly, the postmaster treats
that the same * as a backend crash: shared memory may be corrupted, so remaining backends * should be killed by
SIGQUITand then a recovery cycle started.

In walrestore.c
+ * Main entry point for walrestore process
+ *
+ * This is invoked from BootstrapMain, which has already created the basic
+ * execution environment, but not enabled signals yet.

BootstrapMain() doesn't exist, and it should be changed to
AuxiliaryProcessMain().
This is not a fault of the patch. There are the same typos in
bgwriter.c, walwriter.c
and checkpointer.c

In walrestore.c
+ * SIGUSR1 is presently unused; keep it spare in case someday we want this
+ * process to participate in ProcSignal signalling.

The above comment is incorrect because SIGUSR1 is presently used.

+ /*
+ * From here on, elog(ERROR) should end with exit(1), not send
+ * control back to the sigsetjmp block above
+ */
+ ExitOnAnyError = true;

The above is not required because sigsetjmp is not used in walrestore.c

+ /* Normal exit from the walwriter is here */
+ proc_exit(0); /* done */

Typo: s/walwriter/walrestore

I've not reviewed the patch enough yet. Will review the patch tomorrow again.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Yeb Havinga
Дата: 23 января 2012 г., 10:59:45
Сообщение: Re: Multithread Query Planner

Следующее

От: Simon Riggs
Дата: 23 января 2012 г., 12:00:50
Сообщение: Re: New replication mode: write

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: WAL Restore process during recovery

Предыдущее

Следующее