Re: race condition in sync rep

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: race condition in sync rep
Дата	28 марта 2011 г. 01:27:22
Msg-id	AANLkTikE0KVCNOB4o=ZqAX=8TQ9CjvAzriZML=oeNgpk@mail.gmail.com обсуждение исходный текст
Ответ на	Re: race condition in sync rep (Simon Riggs <simon@2ndQuadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Sun, Mar 27, 2011 at 7:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Are the master and standby on same system or are they separated by a network?
>
> I'm surprised that a network roundtrip takes less time than the
> backend takes to mark clog and then queue for the SyncRepLock.

When I first noticed that it was slow (really hanging, though I failed
to realize it) with fsync=off, I had two clusters on the system.  I
didn't know what was going on at that point, and wasn't specifically
looking for this bug - I was actually testing some other aspect of the
behavior and hit upon it by accident.  Then while I was at PG East I
realized there was a race condition. (I think I actually realized it
while I was dreaming about PostgreSQL; if you think dreaming about
PostgreSQL is a sign that something is seriously wrong with me, you
are likely correct.)   Just to convince myself that I wasn't making
things up I then stuck a sleep(1) in right before the sync rep wait,
for testing purposes, which of course made it trivial to demonstrate
the hang; I again did that on the same system (different one) but of
course with the sleep in there it wouldn't have mattered.  Then later
I realized that the race condition and the fsync=off were probably the
same problem, so I wrote up the email that way.

If your point is that I never demonstrated with sync rep between two
different systems, I agree.  I suspect it could be done, but you'd
probably have to load the master down pretty heavily while keeping the
load on the standby very light - or possibly it would work to just run
a single-threaded test for a really long time, but I don't know
because I haven't tried it.  I'm actually not that interested in
quantifying the exact probability of this happening under any given
set of circumstances; it seems like enough that it's been found and
fixed.  If something in the phrasing of my original email gave
offence, it wasn't intended to: in particular, the use of the word
"nasty" was intended to convey "difficult to find; tricky".  I think
my fear that it would prove difficult to fix also may have affected
that word choice; I didn't anticipate it being resolved so quickly and
with such a small patch.

I am doing my best to help fix the things that I believe to be bugs in
the code without pissing anybody off.  Clearly, at least in your case,
that doesn't seem to have been entirely successful, but in all honesty
it's not for lack of trying.  I really, really want to get this
release out the door and get back to writing code and doing
CommitFests; but I also want it to be good (as I'm sure you do as
well) and I think we're not quite there yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Greg Stark
Дата: 27 марта 2011 г., 22:56:37
Сообщение: Re: Additional options for Sync Replication

Следующее

От: Robert Haas
Дата: 28 марта 2011 г., 01:31:17
Сообщение: Re: [DOCS] patch for createdb section in tutorial

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: race condition in sync rep

Предыдущее

Следующее