Re: Page replacement algorithm in buffer cache

Поиск

Список

Период

Сортировка

От	Merlin Moncure
Тема	Re: Page replacement algorithm in buffer cache
Дата	2 апреля 2013 г. 00:55:30
Msg-id	CAHyXU0zvW_Yv3Q5i+5DzTjaGmvjUZS+vRiVGq-WwNAj8A=iMmg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Page replacement algorithm in buffer cache (Andres Freund <andres@2ndquadrant.com>)
Ответы	Re: Page replacement algorithm in buffer cache (Jim Nasby <jim@nasby.net>)
Список	pgsql-hackers

Дерево обсуждения

On Mon, Apr 1, 2013 at 4:09 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-04-01 08:28:13 -0500, Merlin Moncure wrote:
>> On Sun, Mar 31, 2013 at 1:27 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> > On Friday, March 22, 2013, Ants Aasma wrote:
>> >>
>> >> On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure <mmoncure@gmail.com>
>> >> wrote:
>> >> > well if you do a non-locking test first you could at least avoid some
>> >> > cases (and, if you get the answer wrong, so what?) by jumping to the
>> >> > next buffer immediately.  if the non locking test comes good, only
>> >> > then do you do a hardware TAS.
>> >> >
>> >> > you could in fact go further and dispense with all locking in front of
>> >> > usage_count, on the premise that it's only advisory and not a real
>> >> > refcount.  so you only then lock if/when it's time to select a
>> >> > candidate buffer, and only then when you did a non locking test first.
>> >> >  this would of course require some amusing adjustments to various
>> >> > logical checks (usage_count <= 0, heh).
>> >>
>> >> Moreover, if the buffer happens to miss a decrement due to a data
>> >> race, there's a good chance that the buffer is heavily used and
>> >> wouldn't need to be evicted soon anyway. (if you arrange it to be a
>> >> read-test-inc/dec-store operation then you will never go out of
>> >> bounds) However, clocksweep and usage_count maintenance is not what is
>> >> causing contention because that workload is distributed. The issue is
>> >> pinning and unpinning.
>> >
>> >
>> > That is one of multiple issues.  Contention on the BufFreelistLock is
>> > another one.  I agree that usage_count maintenance is unlikely to become a
>> > bottleneck unless one or both of those is fixed first (and maybe not even
>> > then)
>>
>> usage_count manipulation is not a bottleneck but that is irrelevant.
>> It can be affected by other page contention which can lead to priority
>> inversion.  I don't be believe there is any reasonable argument that
>> sitting and spinning while holding the BufFreelistLock is a good idea.
>
> In my experience the mere fact of (unlockedly, but still) accessing all the
> buffer headers can cause noticeable slowdowns in write only/mostly workloads with
> big amounts of shmem.
> Due to the write only nature large amounts of the buffers have a similar
> usagecounts (since they are infrequently touched after the initial insertion)
> and there are no free ones around so the search for a buffer frequently runs
> through *all* buffer headers multiple times till it decremented all usagecounts
> to 0. Then comes a period where free buffers are found easily (since all
> usagecounts from the current sweep point onwards are zero). After that it
> starts all over.
> I now have seen that scenario multiple times :(

Interesting -- I was thinking about that too, but it's a separate
problem with a different trigger.  Maybe a bailout should be in there
so that after X usage_count adjustments the sweeper summarily does an
eviction, or maybe the "max" declines from 5 once per hundred buffers
inspected or some such.

merlin

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Alexander Korotkov
Дата: 02 апреля 2013 г., 00:15:56
Сообщение: Re: WIP: index support for regexp search

Следующее

От: Jim Nasby
Дата: 02 апреля 2013 г., 01:39:56
Сообщение: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Page replacement algorithm in buffer cache

Предыдущее

Следующее