Re: Cascade replication

Поиск

Список

Период

Сортировка

От	Simon Riggs
Тема	Re: Cascade replication
Дата	6 июля 2011 г. 12:41:38
Msg-id	CA+U5nMLf==9gHmXZW3ioCb06r-nw6Lwcsybk75C6LpBbnnqOSQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Cascade replication (Fujii Masao <masao.fujii@gmail.com>)
Ответы	Re: Cascade replication
Список	pgsql-hackers

Дерево обсуждения

On Wed, Jul 6, 2011 at 12:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Jul 6, 2011 at 4:53 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, Jul 6, 2011 at 2:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> 1. De-archive the file to RECOVERYXLOG
>>>> 2. If RECOVERYXLOG is valid, remove a pre-existing one and rename
>>>>    RECOVERYXLOG to the correct name
>>>> 3. Replay the file with the correct name
>>>
>>> Yes please, that makes sense.
>
> In #2, if the server is killed with SIGKILL just after removing a pre-existing
> file and before renaming RECOVERYXLOG, we lose the file with correct name.
> Even in this case, we would be able to restore it from the archive, but what if
> unfortunately the archive is unavailable? We would lose the file infinitely. So
> we should introduce the following safeguard?
>
>    2'. If RECOVERYXLOG is valid, move a pre-existing file to pg_xlog/backup,
>        rename RECOVERYXLOG to the correct name, and remove the pre-existing
>        file from pg_xlog/backup
>
>        Currently we give up a recovery if there is the target file in
> neither the
>        archive nor pg_xlog. But, if we adopt the above safeguard, in that case,
>        we should try to read the file from also pg_xlog/backup.
>
> In #2, there is another problem; walsender might have the pre-existing file
> open, so the startup process would need to request walsenders to close the
> file before removing (or renaming) it, wait for new file to appear and open it
> again. This might make the code complicated. Does anyone have better
> approach?

The risk you describe already exists in current code.

I regard it as a non-risk. The unlink() and the rename() are executed
consecutively, so the gap between them is small, so the chance of a
SIGKILL in that gap at the same time as losing the archive seems low,
and we can always get that file from the master again if we are
streaming. Any code you add to "fix" this will get executed so rarely
it probably won't work when we need it to.

In the current scheme we restart archiving from the last restartpoint,
which exists only on the archive. This new patch improves upon this by
keeping the most recent files locally, so we are less expose in the
case of archive unavailability. So this patch already improves things
and we don't need any more than that. No extra code please, IMHO.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Peter Geoghegan
Дата: 06 июля 2011 г., 12:12:02
Сообщение: Re: proper format for printing GetLastError()

Следующее

От: Craig Ringer
Дата: 06 июля 2011 г., 12:56:43
Сообщение: Re: Review of VS 2010 support patches

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Cascade replication

Предыдущее

Следующее