Re: Reducing the cost of sinval messaging

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Reducing the cost of sinval messaging
Дата
Msg-id 2763.1414457645@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Reducing the cost of sinval messaging  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Reducing the cost of sinval messaging
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Oct 27, 2014 at 4:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Neither of those messages seem to me to bear on this point.  AFAICS,
>> the unlocked hasMessages test has a race condition, which the comment
>> just above it argues isn't a problem in practice.  What I propose above
>> would have an absolutely indistinguishable race condition, which again
>> wouldn't be a problem in practice if you believe the comment's argument.

> Yes, the read from hasMessages has a race condition, which is in fact
> not a problem in practice if you believe the comment's argument.
> However, your proposed formulation has a second race condition, which
> is discussed in those emails, and which is precisely why we rejected
> the design you're now proposing three years ago.  That second race
> condition is that the read from stateP->nextMsgNum doesn't have to
> happen at the same time as the read from stateP->resetState. No CPU or
> compiler barrier separates them, so either can happen before the
> other. SICleanupQueue updates both values, so you have the problem
> first described precisely by Noah:

> # (In theory, you could
> # hit a case where the load of resetState gives an ancient "false" just as the
> # counters wrap to match.

> While it practically seems very unlikely that this would ever really
> happen, some guy named Tom Lane told us we should worry about it, so I
> did.

That argument is nonsense.  I complained about a lack of close analysis,
but with close analysis I think this is perfectly safe; or at least no
less safe than what's there now, with its not terribly bulletproof
assumption that a suitable memory barrier has happened recently.

You'll grant that reads and writes of the integer msgNum variables are
atomic, I trust (if not, we've got lots of bugs elsewhere).  So what we
have to worry about does not include totally-corrupted values, but only
stale values, including possibly out-of-order reads.  Ignoring the
possibility of MSGNUMWRAPAROUND wraparound for a moment, only our own
process ever changes nextMsgNum, so we should certainly see a current
value of that.  So the argument for a hazard is that maxMsgNum could have
been advanced 2^32 times since our process last read messages, and then we
come along and see that value, but we *don't* see our resetState flag set
even though that happened 2^32 - MAXNUMMESSAGES messages ago, and was
certainly flushed into shared memory shortly thereafter.  That's nonsense.
It's especially nonsense if you assume, as the existing code already does,
that the reading process "recently" executed a read barrier.

Now, as to what happens when MSGNUMWRAPAROUND wraparound occurs: that code
decrements all msgNum variables whether they belong to hopelessly-behind
backends or not.  That's why the time until a possible chance match is
2^32 messages and not something else.  However, since the proposed new
unlocked test might come along and see a partially-complete wraparound
update, it could see either nextMsgNum or maxMsgNum offset by
MSGNUMWRAPAROUND compared to the other.  What would most likely happen is
that this would make for a false decision that they're unequal, resulting
in a useless trip through the locked part of SIGetDataEntries, which is
harmless.  If our backend had fallen behind by exactly 2^32-MSGNUMWRAPAROUND
messages or 2^32+MSGNUMWRAPAROUND messages, we could make a false
conclusion that nextMsgNum and maxMsgNum are equal --- but, again, our
resetState must have become set a billion or so sinval messages back, and
it's just not credible that we're not going to see that flag set.

So I still say that the current code is wasting a lot of cycles for
no actual gain in safety.

> Personally, I think trying to further optimize this is most likely a
> waste of time.  I'm quite sure that there's a whole lot of stuff in
> the sinval code that could be further optimized, but it executes so
> infrequently that I believe it doesn't matter.

Maybe not today, but it's not hard to imagine usage patterns where this
would result in O(N^2) total work in the sinval code.

> If you've got some evidence that this is a real problem as opposed to
> a hypothetical one, we could certainly reopen the debate about whether
> ignoring the possibility that the loads from stateP->nextMsgNum and
> stateP->resetState will be widely-enough spaced in time for a
> quarter-million messages to be queued up in the meantime is
> sufficiently unlikely to ignore.

It's a billion messages, not a quarter million...

> However, absent evidence of a
> tangible performance problem, I'd be quite disinclined to throw the
> guaranteed safety of the current approach out the window.

I fail to find any "guaranteed safety" in the current approach.  It's
dependent on the assumption of a recently-executed read barrier in the
reading backend, which assumption is sufficient for what I'm proposing
as well.

Another thought here is that the whole thing becomes moot if we stick a
read barrier into the "unlocked" test.  We did not have that option during
the last go-round with this code, but I'm inclined to think we ought to
revisit sinvaladt.c with that tool in our belts anyway; replacing the
spinlock around maxMsgNum with suitable read/write barriers has been on
the to-do list since forever.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Следующее
От: David G Johnston
Дата:
Сообщение: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT