Re: Clock sweep not caching enough B-Tree leaf pages?

Поиск

Список

Период

Сортировка

От	Merlin Moncure
Тема	Re: Clock sweep not caching enough B-Tree leaf pages?
Дата	16 апреля 2014 г. 16:25:32
Msg-id	CAHyXU0zTai=AR_utJO0KpcGF=RJQhr-EzYzcAfLL_kjgRqBXcw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Clock sweep not caching enough B-Tree leaf pages? (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Clock sweep not caching enough B-Tree leaf pages? (Andres Freund <andres@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Apr 15, 2014 at 11:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think that the basic problem here is that usage counts increase when
> buffers are referenced, but they decrease when buffers are evicted,
> and those two things are not in any necessary way connected to each
> other.  In particular, if no eviction is happening, reference counts
> will converge to the maximum value.  I've read a few papers about
> algorithms that attempt to segregate the list of buffers into "hot"
> and "cold" lists, and an important property of such algorithms is that
> they mustn't be allowed to make everything hot.  It's easy to be too
> simplistic, here: an algorithm that requires that no more than half
> the list be hot will fall over badly on a workload where the working
> set exceeds the available cache and the really hot portion of the
> working set is 60% of the available cache.  So you need a more
> sophisticated algorithm than that.  But that core property that not
> all buffers can be hot must somehow be preserved, and our algorithm
> doesn't.

A while back you sketched out an idea that did something like that:
hotly accessed buffers became 'perma-pinned' such that they no longer
participated in the clock sweep for eviction and there was a side-line
process that did a two stage eviction (IIRC) from the super hot stack
in order to mitigate locking.  This idea had a couple of nice
properties:

1) very hot buffers no longer get refcounted, reducing spinlock
contention (which has been documented in real world workloads)
2) eviction loop shrinks.  although you still have to check the 'very
hot' flag, thats an unlocked check (again, IIRC) and no further
processing is done.

The downside of this approach was complexity and difficult to test for
edge case complexity.  I would like to point out though that while i/o
efficiency gains are nice, I think contention issues are the bigger
fish to fry.

On Mon, Apr 14, 2014 at 12:11 PM, Peter Geoghegan <pg@heroku.com> wrote:
> 1) Throttles incrementation of usage_count temporally. It becomes
> impossible to increment usage_count for any given buffer more
> frequently than every 3 seconds, while decrementing usage_count is
> totally unaffected.

hm, that's expensive.  how about a heuristic based on the number of
buffer allocations and the size of the buffer pool?

On Wed, Apr 16, 2014 at 8:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-04-16 07:55:44 -0500, Merlin Moncure wrote:
>> What about:  9. Don't wait on locked buffer in the clock sweep.
>
> I don't think we do that? Or are you referring to locked buffer headers?

Right -- exactly.  I posted patch for this a while back. It's quite
trivial: implement a trylock variant of the buffer header lock macro
and further guard the check with a non-locking test (which TAS()
already does generally, but the idea is to avoid the cache line lock
in likely cases of contention).  I believe this to be unambiguously
better: even if it's self healing or unlikely, there is no good reason
to jump into a spinlock fray or even request a contented cache line
while holding a critical lock.

merlin

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Hannu Krosing
Дата: 16 апреля 2014 г., 16:17:00
Сообщение: Re: Question about optimising (Postgres_)FDW

Следующее

От: Alvaro Herrera
Дата: 16 апреля 2014 г., 16:32:42
Сообщение: Re: Need Multixact Freezing Docs

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Clock sweep not caching enough B-Tree leaf pages?

Предыдущее

Следующее