Re: Standby corruption after master is restarted

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Standby corruption after master is restarted
Дата
Msg-id 20180427010411.GF3419@paquier.xyz
обсуждение исходный текст
Ответ на Re: Standby corruption after master is restarted  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-bugs
On Fri, Apr 27, 2018 at 09:49:08AM +0900, Kyotaro HORIGUCHI wrote:
> Thank you for noticing me of that. Is there any way to know how a
> bug report has been concluded? Or should I search -hackers for
> a corresponding thread?

Keeping a look at the list of patches for bugs in the CF app, and
looking at the list of open items is what I use now.  Now for this
particular issue my memory has just served me well as it is hard to know
that both are the same issue by looking at the title.  Good thing I
looked at your patch as well.

> At Thu, 26 Apr 2018 21:13:48 +0900, Michael Paquier <michael@paquier.xyz> wrote in
<20180426121348.GA2365@paquier.xyz>
>> On Thu, Apr 26, 2018 at 07:53:04PM +0900, Kyotaro HORIGUCHI wrote:
>>> I think this behavior is a bug. XLogReadRecord is considering the
>>> case but palloc_extended() breaks it. So in the attached, add a
>>> new flag MCXT_ALLOC_NO_PARAMERR to palloc_extended() and
>>> allocate_recordbuf calls it with the new flag. That alone fixes
>>> the problem. However, the patch frees read state buffer facing
>>> errorneous records since such records can leave a too-large
>>> buffer allocated.
>>
>> This problem is already discussed here:
>> https://commitfest.postgresql.org/18/1516/
>>
>> And here is the thread:
>> https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05
>>
>> Tsunakawa-san and I discussed a couple of approaches.  Extending
>> palloc_extended so as an incorrect length does not result in an error is
>> also something that crossed by mind, but the length handling is
>> different on the backend and the frontend, so I discarded the idea you
>> are proposing here and instead relied on a check with AllocSizeIsValid,
>> which gives a more simple patch:
>> https://www.postgresql.org/message-id/20180314052753.GA16179%40paquier.xyz
>
> Yeah, perhaps all approaches in the thread came to my mind but
> choosed different one. I'm fine with the approach in the thread.

Okay, cool.

>> This got no interest from committers yet unfortunately, but I think that
>> this is a real problem which should be back-patched :(
>
> Several other WAL-related fixes are also waiting to be picked up..

Yeah, simply ignoring corrupted 2PC files at redo is no fun, as well as
is breaking the promise of replication slots.  Let's just make sure that
everything is properly tracked and listed, that's the least we can do.
--
Michael

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Standby corruption after master is restarted
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #15114: logical decoding Segmentation fault