Hi,
On 2016-07-25 15:02:41 +1200, Thomas Munro wrote:
> > Any thoughts on this approach, or better ways to solve this problem?
> > How valuable is the branch-free property of the circular push and
> > delete operations?
I think it's generally valuable, but I suspect that it doesn't really
matter much for lwlocks, given that this is already the slow path, where
we enter the kernel and such. I suspect that the performance difference
here is a more due to code movement than anything.
> Next I wondered if it would be a bad idea to use slower relocatable
> dlist heads for all LWLocks. Obviously LWLocks are used all over the
> place and it would be bad to slow them down, but I wondered if maybe
> dlist manipulation might not be a very significant part of their work.
I'm pretty sure thats's the case.
> So I put a LWLock and a counter in shared memory, and had N background
> workers run a tight loop that locks, increments and unlocks
> concurrently until the counter reaches 1 billion. This creates
> different degrees of contention and wait list sizes. The worker loop
> looks like this:
>
> while (!finished)
> {
> LWLockAcquire(&control->lock, LW_EXCLUSIVE);
> ++control->counter;
> if (control->counter >= control->goal)
> finished = true;
> LWLockRelease(&control->lock);
> }
You really need shared lwlocks here as well, because exclusive lwlocks
will only ever trigger a single list manipulation (basically a pop from
the wait queue), further minimizing the list manipulation overhead.
I think the better fix here would acutally be to get rid of a pointer
based list here, and a) replace the queue with integer offsets b) make
the queue lock-free. But I understand that you might not want to tackle
that :P
Greetings,
Andres Freund