Re: Scaling shared buffer eviction

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Scaling shared buffer eviction
Дата	4 сентября 2014 г. 14:26:04
Msg-id	CAA4eK1+QEcKVaZRi5ipmCnyubFs7tdsjckLgXk_xV-oTCB6KOA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Scaling shared buffer eviction (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Scaling shared buffer eviction
Список	pgsql-hackers

Дерево обсуждения

On Wed, Sep 3, 2014 at 8:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 3, 2014 at 7:27 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >> + while (tmp_num_to_free > 0)
> >>
> >> I am not sure it's a good idea for this value to be fixed at loop
> >> start and then just decremented.
> >
> > It is based on the idea what bgwriter does for num_to_scan and
> > calling it once has advantage that we need to take freelist_lck
> > just once.
>
> Right, we shouldn't call it every loop iteration. However, consider
> this scenario: there are no remaining buffers on the list and the high
> watermark is 2000. We add 2000 buffers to the list. But by the time
> we get done, other backends have already done 500 more allocations, so
> now there are only 1500 buffers on the list. If this should occur, we
> should add an additional 500 buffers to the list before we consider
> sleeping. We want bgreclaimer to be able to run continuously if the
> demand for buffers is high enough.

Its not difficult to handle such cases, but it can have downside also

for the cases where demand from backends is not high.
Consider in above case if instead of 500 more allocations, it just
does 5 more allocations, then bgreclaimer will again have to go through
the list and move 5 buffers and same can happen again by the time
it moves 5 buffers. Another point to keep in mind here is that in this

loop we are reducing the usage_count of buffers as well incase we don't

find buffer with usage_count=0. OTOH if we let bgreclaimer to go for

sleep after it moves initially identified buffers, then the backend which
first finds that the buffers in freelist falls below low water mark can

wake bgreclaimer.

> >> In freelist.c, it seems like a poor idea to have two spinlocks as
> >> consecutive structure members; they'll be in the same cache line,
> >> leading to false sharing. If we merge them into a single spinlock,
> >> does that hurt performance?
> >
> > I have kept them separate so that backends searching for a buffer
> > in freelist doesn't contend with bgreclaimer (while doing clock sweep)
> > or clock sweep being done by other backends. I think it will be bit
> > tricky to devise a test where this can hurt, however it doesn't seem
> > too bad to have two separate locks in this case.
>
> It's not. But if they are in the same cache line, they will behave
> almost like one lock, because the CPU will lock the entire cache line
> for each atomic op. See Tom's comments upthread.

I think to avoid having them in same cache line, we might need to
add some padding (at least 72 bytes) as the structure size including both

the spin locks is 56 bytes on PPC64 m/c and cache line size is 128 bytes.

I have taken performance data as well by keeping them further apart

as suggested by you upthread and by introducing padding, but the

difference in performance is less than 1.5% (on 64 and 128 client count)

which also might be due to variation of data across runs. So now to

proceed we have below options:

a. use two spinlocks as in patch, but keep them as far apart as possible.
This might not have an advantage as compare to what is used currently

in patch, but in future we can adding padding to take the advantage if

possible (currently on PPC64, it doesn't show any noticeable advantage,

however on some other m/c, it might show the advantage).

b. use only one spinlock, this can have disadvantage in certain cases
as mentioned upthread, however those might not be usual cases, so for
now we can consider them as lower priority and can choose this option.

Another point in this regard is that I have to make use of volatile
pointer to prevent code rearrangement in this case.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Pavel Stehule
Дата: 04 сентября 2014 г., 12:42:22
Сообщение: Re: PL/pgSQL 1.2

Следующее

От: Marko Tiikkaja
Дата: 04 сентября 2014 г., 14:37:27
Сообщение: Re: PL/pgSQL 1.2

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Scaling shared buffer eviction

Предыдущее

Следующее