On Thu, Jun 23, 2011 at 5:35 PM, Florian Pflug <fgp@phlo.org> wrote:
>> Well, I'm sure there is some effect, but my experiments seem to
>> indicate that it's not a very important one. Again, please feel free
>> to provide contrary evidence. I think the basic issue is that - in
>> the best possible case - padding the LWLocks so that you don't have
>> two locks sharing a cache line can reduce contention on the busier
>> lock by at most 2x. (The less busy lock may get a larger reduction
>> but that may not help you much.) If you what you really need is for
>> the contention to decrease by 1000x, you're just not really moving the
>> needle.
>
> Agreed. OTOH, adding a few dummy entries to the LWLocks array to separate
> the most heavily contested LWLocks for the others might still be
> worthwhile.
Hey, if we can show that it works, sign me up.
>> That's why the basic fast-relation-lock patch helps so much:
>> it replaces a system where every lock request results in contention
>> with a system where NONE of them do.
>>
>> I tried rewriting the LWLocks using CAS. It actually seems to make
>> things slightly worse on the tests I've done so far, perhaps because I
>> didn't make it respect spins_per_delay. Perhaps fetch-and-add would
>> be better, but I'm not holding my breath. Everything I'm seeing so
>> far leads me to the belief that we need to get rid of the contention
>> altogether, not just contend more quickly.
>
> Is there a patch available? How did you do the slow path (i.e. the
> case where there's contention and you need to block)? It seems to
> me that without some kernel support like futexes it's impossible
> to do better than LWLocks already do, because any simpler scheme
> like
> while (atomic_inc_post(lock) > 0) {
> atomic_dec(lock);
> block(lock);
> }
> for the shared-locker case suffers from a race condition (the lock
> might be released before you actually block()).
Attached...
> The idea would be to start out with something trivial like the above.
> Maybe with an #if for compilers which have something like GCC's
> __sync_synchronize(). We could then gradually add implementations
> for specific architectures, hopefully done by people who actually
> own the hardware and can test.
Yes. But if we go that route, then we have to also support a code
path for architectures for which we don't have that support. That's
going to be more work, so I don't want to do it until we have a case
where there is a good, clear benefit.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company