Re: Fast promotion failure
От | Amit Kapila |
---|---|
Тема | Re: Fast promotion failure |
Дата | |
Msg-id | 007201ce4fbe$0fa8bbe0$2efa33a0$@kapila@huawei.com обсуждение исходный текст |
Ответ на | Re: Fast promotion failure (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Ответы |
Re: Fast promotion failure
|
Список | pgsql-hackers |
On Monday, May 13, 2013 1:13 PM Heikki Linnakangas wrote: > On 13.05.2013 06:07, Amit Kapila wrote: > > On Monday, May 13, 2013 5:54 AM Kyotaro HORIGUCHI wrote: > >> Heikki said in the fist message in this thread that he suspected > >> the cause of the failure he had seen to be wrong TLI on whitch > >> checkpointer runs. Nevertheless, the patch you suggested for me > >> looks fixing it. Moreover (one of?) the failure from the same > >> cause looks fixed with the patch. > > > > There were 2 problems: > > 1. There was some issue in walsender logic due to which after > promotion in > > some cases it hits assertion or error > > 2. During fast promotion, checkpoint gets created with wrong TLI > > > > He has provided 2 different patches > > fix-standby-promotion-assert-fail-2.patch and > > fast-promotion-quick-fix.patch. > > Among 2, he has already committed fix-standby-promotion-assert-fail- > 2.patch > > > (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ffa > 66f49 > > 75c99e52984f7ee81b47d137b5b4751) > > That's correct. > > >> Is the point of this discussion that the patch may leave out some > >> glich about timing of timeline-related changing and Heikki saw an > >> egress of that? > > > > AFAIU, the committed patch has some gap in overall scenario which is > the > > fast promotion issue. > > Right, the fast promotion issue is still there. > > Just to get us all on the same page again: Does anyone see a problem > with a fresh git checkout, with the fast-promotion-quick-fix.patch > applied? > (http://www.postgresql.org/message-id/51894942.4080500@vmware.com). If > you do, please speak up. As far as I know, the already-committed patch, > together with fast-promotion-quick-fix.patch, should fix all known > issues (*). > > I haven't committed a fix for the issue I reported in this thread, > because I'm not 100% on what the right fix for it would be. > fast-promotion-quick-fix.patch seems to do the trick, but at least the > comments need to be updated, and I'm not sure if there some related > corner cases that it doesn't handle. Simon? The patch provided will un-necessarily call InitXLOGAccess() 2 times for End of recovery checkpoint, it doesn't matter w.r.t performance but actually the purpose will be almost same for calling LocalSetXLogInsertAllowed() and InitXLOGAccess(), or am I missing something. One more thing, I think after fast promotion, either it should set timeline or give error in CreateCheckPoint() function before it reaches the check mentioned by you in your initial mail. if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0) elog(ERROR, "can't create a checkpointduring recovery"); Shouldn't it set timeline in above check (RecoveryInProgress()) or when RecoveryInProgress() is called before CreateCheckPoint()? With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: