Re: Problem with synchronous replication

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Problem with synchronous replication
Дата
Msg-id 20191030014511.GE1590@paquier.xyz
обсуждение исходный текст
Ответ на Re: Problem with synchronous replication  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Problem with synchronous replication  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: Problem with synchronous replication  (lingce.ldm <lingce.ldm@alibaba-inc.com>)
Список pgsql-hackers
On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote:
> At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in
>> I recently discovered two possible bugs about synchronous replication.
>>
>> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted
>> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
>> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
>> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.
>>
>> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
>> whether the queue is detached or not.
>
> I think you're right here.

Indeed.  Looking at the surroundings we expect some code paths to hold
SyncRepLock in exclusive or shared mode but we don't actually check
that the lock is hold.  So let's add some assertions while on it.

> This is not right. It is in transaction commit so it is in a
> HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
> cancel/die interrupts thus the ereport should return.

Yeah.  There is an easy way to check after that: InterruptHoldoffCount
needs to be strictly positive.

My suggestions are attached.  Any thoughts?
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Problem with synchronous replication
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Make StringInfo available to frontend code.