Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
Дата
Msg-id CAHGQGwHWbQpsQ7k=x8x=7gYuU1rOvRNdtxtdpx9+XONpMA48ww@mail.gmail.com
обсуждение исходный текст
Ответ на BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby  (amit.kapila@huawei.com)
Ответы Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby  (Amit Kapila <amit.kapila@huawei.com>)
Список pgsql-bugs
On Fri, Sep 14, 2012 at 12:21 PM, Amit Kapila <amit.kapila@huawei.com> wrote:
> On Thursday, September 13, 2012 10:32 PM Fujii Masao wrote:
> On Thu, Sep 13, 2012 at 9:21 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> On 12.09.2012 22:03, Fujii Masao wrote:
>>>
>>> On Wed, Sep 12, 2012 at 8:47 PM,<amit.kapila@huawei.com>  wrote:
>>>>
>>>> The following bug has been logged on the website:
>>>>
>>>> Bug reference:      7533
>>>> Logged by:          Amit Kapila
>>>> Email address:      amit.kapila@huawei.com
>>>> PostgreSQL version: 9.2.0
>>>> Operating system:   Suse
>>>> Description:
>>>>
>>>> M host is primary, S host is standby and CS host is cascaded standby.
>>>>
>>
>
>
>>> Hmm, I think the CheckRecoveryConsistency() call in the redo loop is
>>> misplaced. It's called after we got a record from ReadRecord, but *before*
>>> replaying it (rm_redo). Even if replaying record X makes the system
>>> consistent, we won't check and notice that until we have fetched record X+1.
>>> In this particular test case, record X is a shutdown checkpoint record, but
>>> it could as well be a running-xacts record, or the record that reaches
>>> minRecoveryPoint.
>>
>>> Does the problem go away if you just move the CheckRecoveryConsistency()
>>> call *after* rm_redo (attached)?
>
>> No, at least in my case. When recovery starts at shutdown checkpoint record and
>> there is no record following the shutdown checkpoint, recovery gets in
>> wait state
>> before entering the main redo apply loop. That is, recovery starts waiting for
>> new WAL record to arrive, in ReadRecord just before the redo loop. So moving
>> the CheckRecoveryConsistency() call after rm_redo cannot fix the problem which
>>I reported. To fix the problem, we need to make the recovery reach the
>> consistent
>> point before the redo loop, i.e., in the CheckRecoveryConsistency()
>> just before the redo loop.
>
> I think may be in that case we need both the fixes, as the problem I have reported can be fixed with Heikki's patch.

Agreed. And we should just add the CheckRecoveryConsistency() call after rm_redo
rather than moving it, as you suggested upthread.

Regards,

--
Fujii Masao

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: BUG #7534: walreceiver takes long time to detect n/w breakdown
Следующее
От: Amit kapila
Дата:
Сообщение: Re: BUG #7534: walreceiver takes long time to detect n/w breakdown