Re: [HACKERS] More race conditions in logical replication

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: [HACKERS] More race conditions in logical replication
Дата
Msg-id 20170725174223.cf2z3venmka4gxta@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: [HACKERS] More race conditions in logical replication  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Ответы Re: [HACKERS] More race conditions in logical replication  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] More race conditions in logical replication  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
Petr Jelinek wrote:
> On 25/07/17 01:33, Alvaro Herrera wrote:
> > Alvaro Herrera wrote:
> >> I'm back at looking into this again, after a rather exhausting week.  I
> >> think I have a better grasp of what is going on in this code now,
> > 
> > Here's an updated patch, which I intend to push tomorrow.
> 
> I think the ConditionVariablePrepareToSleep() call in
> ReplicationSlotAcquire() needs to be done inside the mutex lock
> otherwise there is tiny race condition where the process which has slot
> acquired might release the slot between the mutex unlock and the
> ConditionVariablePrepareToSleep() call which means we'll never get
> signaled and ConditionVariableSleep() will wait forever.

Hmm, yeah, that's not good.  However, I didn't like the idea of putting
it inside the locked area, as it's too much code.  Instead I added just
before acquiring the spinlock.  We cancel the sleep unconditionally
later on if we didn't need to sleep after all.

As I see it, we need to backpatch at least parts of this patch.  I've
received reports that in earlier branches pglogical and BDR can
sometimes leave slots behind when removing nodes, and I have a hunch
that it can be explained by the bugs being fixed here.  Now we cannot
use condition variables in back-branches, so we'll need to figure out
how to actually do it ...

> As a side note, the ConditionVariablePrepareToSleep()'s comment could be
> improved because currently it says the only advantage is that we skip
> double-test in the beginning of ConditionVariableSleep(). But that's not
> true, it's essential for preventing race conditions like the one above
> because it puts the current process into waiting list so we can be sure
> it will be signaled on broadcast once ConditionVariablePrepareToSleep()
> has been called.

Hmm.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Testlib.pm vs msys
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: [HACKERS] Create language syntax is not proper in pg_dumpall andnot working using pg_upgrade