Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node
Дата
Msg-id 20190924.124619.248088532.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
Hello.

At Tue, 24 Sep 2019 10:40:19 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190924014019.GB2012@paquier.xyz>
> On Mon, Sep 23, 2019 at 01:45:14PM +0200, Tomas Vondra wrote:
> > On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote:
> >> Is this an issue?
> >> Can we fix like this?
> >> Thanks!
> >> 
> > 
> > I do think it is a valid issue. No opinion on the fix yet, though.
> > The report was sent on saturday, so patience ;-)
> 
> And for some others it was even a longer weekend.  Anyway, the problem
> can be reproduced if you apply the attached which introduces a failure
> point, and then if you run the following commands:
> create table aa as select 1;
> delete from aa;
> \! touch /tmp/truncate_flag
> vacuum aa;
> \! rm /tmp/truncate_flag
> vacuum aa; -- panic on standby
> 
> This also points out that there are other things to worry about than
> interruptions, as for example DropRelFileNodeLocalBuffers() could lead
> to an ERROR, and this happens before the physical truncation is done
> but after the WAL record is replayed on the standby, so any failures
> happening at the truncation phase before the work is done would be a

Indeed.

> problem.  However we are talking about failures which should not
> happen and these are elog() calls.  It would be tempting to add a
> critical section here, but we could still have problems if we have a
> failure after the WAL record has been flushed, which means that it
> would be replayed on the standby, and the surrounding comments are

Agreed.

> clear about that.  In short, as a matter of safety I'd like to think
> that what you are suggesting is rather acceptable (aka hold interrupts
> before the WAL record is written and release after the physical
> truncate), so as truncation avoids failures possible to avoid.
> 
> Do others have thoughts to share on the matter?

Agreed for the concept, but does the patch work as described? It
seems that query cancel doesn't fire during the holded-off
section since no CHECK_FOR_INTERRUPTS() there.

regares.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [PATCH] src/test/modules/dummy_index -- way to test reloptionsfrom inside of access method
Следующее
От: David Fetter
Дата:
Сообщение: Re: Efficient output for integer types