Re: Wait free LW_SHARED acquisition

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Wait free LW_SHARED acquisition
Дата
Msg-id 52453DCC.1070201@vmware.com
обсуждение исходный текст
Ответ на Re: Wait free LW_SHARED acquisition  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: Wait free LW_SHARED acquisition  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On 27.09.2013 10:21, Andres Freund wrote:
> Hi,
>
> On 2013-09-27 10:14:46 +0300, Heikki Linnakangas wrote:
>> On 27.09.2013 01:55, Andres Freund wrote:
>>> We have had several customers running postgres on bigger machines report
>>> problems on busy systems. Most recently one where a fully cached
>>> workload completely stalled in s_lock()s due to the *shared* lwlock
>>> acquisition in BufferAlloc() around the buffer partition lock.
>>>
>>> Increasing the padding to a full cacheline helps making the partitioning
>>> of the partition space actually effective (before it's essentially
>>> halved on a read-mostly workload), but that still leaves one with very
>>> hot spinlocks.
>>>
>>> So the goal is to have LWLockAcquire(LW_SHARED) never block unless
>>> somebody else holds an exclusive lock. To produce enough appetite for
>>> reading the rest of the long mail, here's some numbers on a
>>> pgbench -j 90 -c 90 -T 60 -S (-i -s 10) on a 4xE5-4620
>>>
>>> master + padding: tps = 146904.451764
>>> master + padding + lwlock: tps = 590445.927065
>>
>> How does that compare with simply increasing NUM_BUFFER_PARTITIONS?
>
> Heaps better. In the case causing this investigation lots of the pages
> with hot spinlocks were the simply the same ones over and over again,
> partitioning the lockspace won't help much there.
> That's not exactly an uncommon scenario since often enough there's a
> small amount of data hit very frequently and lots more that is accessed
> only infrequently. E.g. recently inserted data and such tends to be very hot.

I see. So if only a few buffers are really hot, I'm assuming the problem 
isn't just the buffer partition lock, but also the spinlock on the 
buffer header, and the buffer content lwlock. Yeah, improving LWLocks 
would be a nice wholesale solution to that. I don't see any fundamental 
flaw in your algorithm. Nevertheless, I'm going to throw in a couple of 
other ideas:

* Keep a small 4-5 entry cache of buffer lookups in each backend of most 
recently accessed buffers. Before searching for a buffer in the 
SharedBufHash, check the local cache.

* To pin a buffer, use an atomic fetch-and-add instruction to increase 
the refcount. PinBuffer() also increases usage_count, but you could do 
that without holding a lock; it doesn't need to be accurate.


One problem with your patch is going to be to make it also work without 
the CAS and fetch-and-add instructions. Those are probably present in 
all the architectures we support, but it'll take some effort to get the 
architecture-specific code done. Until it's all done, it would be good 
to be able to fall back to plain spinlocks, which we already have. Also, 
when someone ports PostgreSQL to a new architecture in the future, it 
would be helpful if you wouldn't need to write all the 
architecture-specific code immediately to get it to compile.

Did you benchmark your patch against the compare-and-swap patch I posted 
earlier? 
(http://www.postgresql.org/message-id/519A3587.80603@vmware.com). Just 
on a theoretical level, I would assume your patch to scale better since 
it uses fetch-and-add instead of compare-and-swap for acquiring a shared 
lock. But in practice it might be a wash.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sawada Masahiko
Дата:
Сообщение: Re: Patch for fail-back without fresh backup
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Wait free LW_SHARED acquisition