Re: PITR potentially broken in 9.2

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: PITR potentially broken in 9.2
Дата
Msg-id CA+U5nMKOsUWKP++QH=0nMff27t-KbLTEz560CzXz_upn-k1LDQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PITR potentially broken in 9.2  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: PITR potentially broken in 9.2  (Andres Freund <andres@2ndquadrant.com>)
Re: PITR potentially broken in 9.2  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-bugs
On 5 December 2012 02:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>>> But the key is, the database was not actually consistent at that
>>> point, and so opening hot standby was a dangerous thing to do.
>>>
>>> The bug that allowed the database to open early (the original topic if
>>> this email chain) was masking this secondary issue.
>
>> Could you check whether the attached patch fixes the behaviour?
>
> Yeah, I had just come to the same conclusion: the bug is not with
> Heikki's fix, but with the pause logic.  The comment says that it
> shouldn't pause unless users can connect to un-pause, but the actual
> implementation of that test is several bricks shy of a load.

OK, I've had a look at this now.

Andres correctly identified the bug above, which was that the backup
end is noted by the XLOG_BACKUP_END record. recoveryStopsHere() was
not changed when that record type was added, so it ignores the
situation that we are waiting for end of backup and yet it stop
anyway. So I concur with Jeff that there is a bug and think that
Andres has provided the clue to a fix. I'll patch that now. Aboriginal
bug extends back to 9.0.

Pause has got nothing at all to do with this, even if there are other
problems there.

> Your patch is better, but it's still missing a bet: what we need to be
> sure of is not merely that we *could* have told the postmaster to start
> hot standby, but that we *actually have done so*.  Otherwise, it's
> flow-of-control-dependent whether it works or not; we could fail if the
> main loop hasn't gotten to CheckRecoveryConsistency since the conditions
> became true.  Looking at the code in CheckRecoveryConsistency, testing
> LocalHotStandbyActive seems appropriate.
>
> I also thought it was pretty dangerous that this absolutely critical
> test was not placed in recoveryPausesHere() itself, rather than relying
> on the call sites to remember to do it.
>
> So the upshot is that I propose a patch more like the attached.

I've reworked pause logic along the lines you suggest. Attached here
for further discussion.

> This is not a regression because the pause logic is broken this same
> way since 9.1.  So I no longer think that we need a rewrap.

I think we do need a rewrap, since the bug is not in pause logic.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "Duggirala, Manikanth (TCS)"
Дата:
Сообщение: PostgreSQL v8.1.11 compatibility with OS 2008 R2
Следующее
От: Tom Lane
Дата:
Сообщение: Re: PITR potentially broken in 9.2