Re: Lockless StrategyGetBuffer() clock sweep

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Lockless StrategyGetBuffer() clock sweep
Дата	31 октября 2014 г. 09:51:24
Msg-id	CAA4eK1JUPn1rV0ep5DR74skcv+RRK7i2inM1X01ajG+gCX-hMw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Lockless StrategyGetBuffer() clock sweep (Andres Freund <andres@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Thu, Oct 30, 2014 at 5:01 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>
> On 2014-10-30 10:23:56 +0530, Amit Kapila wrote:
> > I have a feeling that this might also have some regression at higher
> > loads (like scale_factor = 5000, shared_buffers = 8GB,
> > client_count = 128, 256) for the similar reasons as bgreclaimer patch,
> > means although both reduces contention around spin lock, however
> > it moves contention somewhere else. I have yet to take data before
> > concluding anything (I am just waiting for your other patch (wait free
> > LW_SHARED) to be committed).
>
> I have a hard time to see how this could be. In the uncontended case the
> number of cachelines touched and the number of atomic operations is
> exactly the same. In the contended case the new implementation does far
> fewer atomic ops - and doesn't do spinning.
>
> What's your theory?

I have observed that once we reduce the contention in one path, it doesn't

always lead to performance/scalability gain and rather shifts to other lock

if that exists. This is the reason why we have to work on reducing contention

around both BufFreeList and Buf Mapping tables lock together. I have taken

some performance data and it seems this patch also exhibits similar behaviour

as bgreclaimer patch and I believe resolving contention around dynahash can

improve the situation (Robert's chash patch can be helpful).

Performance Data

------------------------------
Configuration and Db Details
IBM POWER-8 24 cores, 192 hardware threads
RAM = 492GB

max_connections = 300

shared_buffers = 8GB
checkpoint_segments=30
checkpoint_timeout =15min
Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
Duration of each individual run = 5mins

Test mode - pgbench readonly (-M prepared)

Data below is median of 3 runs, for individual run data check document

attached with this mail.

Scale_Factor = 1000

Patch_ver/Client_Count	128	256
HEAD	265502	201689
Patch	283448	224888

Scale_Factor = 5000

Patch_ver/Client_Count	128	256
HEAD	190435	177477
Patch	171485	167794

The above data indicates that there is performance gain at scale factor

1000, however there is a regression at scale factor 5000.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

perf_lockless_strategy_getbuf.ods

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Lockless StrategyGetBuffer() clock sweep

Вложения