I wrote:
> We could ameliorate this if there were a way to acquire ownership of the
> cache line without necessarily winning the spinlock. I'm imagining
> that we insert a "dummy" locked instruction just ahead of the xchgb,
> which touches the spinlock in such a way as to not change its state.
I tried this, using this tas code:
static __inline__ int
tas(volatile slock_t *lock)
{register slock_t _res = 1;register slock_t _dummy = 0;
/* Use a locking test before trying to take the spinlock *//* xchg implies a LOCK prefix, so no need to say LOCK for it
*/__asm____volatile__( " lock \n" " xaddb %2,%1 \n" " xchgb %0,%1 \n"
: "+q"(_res), "+m"(*lock), "+q"(_dummy)
:
: "memory", "cc");return (int) _res;
}
At least on Opteron, it's a loser. The previous best results (with
slock-no-cmpb and spin-delay patches) were1 31s 2 42s 4 51s 8 100s
and with this instead of slock-no-cmpb,1 33s 2 45s 4 55s 8 104s
The xadd may indeed be helping in terms of protecting the xchg from
end-of-timeslice --- the rate of select() delays is really tiny, one
every few seconds, which is better than I saw before. But the extra
cost of the extra locked operation isn't getting repaid overall.
Feel free to try it on other hardware, but it doesn't look promising.
BTW, I also determined that on that 4-way Opteron box, the integer
modulo idea doesn't make any difference --- that is, spin-delay and
what Michael called spin-delay-2 are the same speed. I think I had
tried the modulo before adding the variable spin delay, and it did
help in that configuration; but most likely, it was just helping by
stretching out the amount of time spent looping before entering the
kernel. So we can drop that idea too.
regards, tom lane