Re: Fast promotion failure

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Fast promotion failure
Дата
Msg-id 007201ce4fbe$0fa8bbe0$2efa33a0$@kapila@huawei.com
обсуждение исходный текст
Ответ на Re: Fast promotion failure  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Fast promotion failure
Список pgsql-hackers
On Monday, May 13, 2013 1:13 PM Heikki Linnakangas wrote:
> On 13.05.2013 06:07, Amit Kapila wrote:
> > On Monday, May 13, 2013 5:54 AM Kyotaro HORIGUCHI wrote:
> >> Heikki said in the fist message in this thread that he suspected
> >> the cause of the failure he had seen to be wrong TLI on whitch
> >> checkpointer runs. Nevertheless, the patch you suggested for me
> >> looks fixing it. Moreover (one of?) the failure from the same
> >> cause looks fixed with the patch.
> >
> > There were 2 problems:
> > 1. There was some issue in walsender logic due to which after
> promotion in
> > some cases it hits assertion or error
> > 2. During fast promotion, checkpoint gets created with wrong TLI
> >
> > He has provided 2 different patches
> > fix-standby-promotion-assert-fail-2.patch and
> > fast-promotion-quick-fix.patch.
> > Among 2, he has already committed fix-standby-promotion-assert-fail-
> 2.patch
> >
> (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ffa
> 66f49
> > 75c99e52984f7ee81b47d137b5b4751)
> 
> That's correct.
> 
> >> Is the point of this discussion that the patch may leave out some
> >> glich about timing of timeline-related changing and Heikki saw an
> >> egress of that?
> >
> > AFAIU, the committed patch has some gap in overall scenario which is
> the
> > fast promotion issue.
> 
> Right, the fast promotion issue is still there.
> 
> Just to get us all on the same page again: Does anyone see a problem
> with a fresh git checkout, with the fast-promotion-quick-fix.patch
> applied?
> (http://www.postgresql.org/message-id/51894942.4080500@vmware.com). If
> you do, please speak up. As far as I know, the already-committed patch,
> together with fast-promotion-quick-fix.patch, should fix all known
> issues (*).
> 
> I haven't committed a fix for the issue I reported in this thread,
> because I'm not 100% on what the right fix for it would be.
> fast-promotion-quick-fix.patch seems to do the trick, but at least the
> comments need to be updated, and I'm not sure if there some related
> corner cases that it doesn't handle. Simon?

The patch provided will un-necessarily call InitXLOGAccess() 2 times for End
of recovery checkpoint, it doesn't matter w.r.t performance but actually the
purpose will
be almost same for calling LocalSetXLogInsertAllowed() and InitXLOGAccess(),
or am I missing something.

One more thing, I think after fast promotion, either it should set timeline
or give error in CreateCheckPoint() function before it reaches the check
mentioned by you in your initial mail.
if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0)                elog(ERROR, "can't create a
checkpointduring recovery");
 
Shouldn't it set timeline in above check (RecoveryInProgress()) or when
RecoveryInProgress() is called before CreateCheckPoint()?

With Regards,
Amit Kapila.








В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: erroneous restore into pg_catalog schema
Следующее
От: Amit Langote
Дата:
Сообщение: Re: Logging of PAM Authentication Failure