Re: Clock sweep not caching enough B-Tree leaf pages?
От | Merlin Moncure |
---|---|
Тема | Re: Clock sweep not caching enough B-Tree leaf pages? |
Дата | |
Msg-id | CAHyXU0zTai=AR_utJO0KpcGF=RJQhr-EzYzcAfLL_kjgRqBXcw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Clock sweep not caching enough B-Tree leaf pages? (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Clock sweep not caching enough B-Tree leaf pages?
(Andres Freund <andres@2ndquadrant.com>)
|
Список | pgsql-hackers |
On Tue, Apr 15, 2014 at 11:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I think that the basic problem here is that usage counts increase when > buffers are referenced, but they decrease when buffers are evicted, > and those two things are not in any necessary way connected to each > other. In particular, if no eviction is happening, reference counts > will converge to the maximum value. I've read a few papers about > algorithms that attempt to segregate the list of buffers into "hot" > and "cold" lists, and an important property of such algorithms is that > they mustn't be allowed to make everything hot. It's easy to be too > simplistic, here: an algorithm that requires that no more than half > the list be hot will fall over badly on a workload where the working > set exceeds the available cache and the really hot portion of the > working set is 60% of the available cache. So you need a more > sophisticated algorithm than that. But that core property that not > all buffers can be hot must somehow be preserved, and our algorithm > doesn't. A while back you sketched out an idea that did something like that: hotly accessed buffers became 'perma-pinned' such that they no longer participated in the clock sweep for eviction and there was a side-line process that did a two stage eviction (IIRC) from the super hot stack in order to mitigate locking. This idea had a couple of nice properties: 1) very hot buffers no longer get refcounted, reducing spinlock contention (which has been documented in real world workloads) 2) eviction loop shrinks. although you still have to check the 'very hot' flag, thats an unlocked check (again, IIRC) and no further processing is done. The downside of this approach was complexity and difficult to test for edge case complexity. I would like to point out though that while i/o efficiency gains are nice, I think contention issues are the bigger fish to fry. On Mon, Apr 14, 2014 at 12:11 PM, Peter Geoghegan <pg@heroku.com> wrote: > 1) Throttles incrementation of usage_count temporally. It becomes > impossible to increment usage_count for any given buffer more > frequently than every 3 seconds, while decrementing usage_count is > totally unaffected. hm, that's expensive. how about a heuristic based on the number of buffer allocations and the size of the buffer pool? On Wed, Apr 16, 2014 at 8:14 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-04-16 07:55:44 -0500, Merlin Moncure wrote: >> What about: 9. Don't wait on locked buffer in the clock sweep. > > I don't think we do that? Or are you referring to locked buffer headers? Right -- exactly. I posted patch for this a while back. It's quite trivial: implement a trylock variant of the buffer header lock macro and further guard the check with a non-locking test (which TAS() already does generally, but the idea is to avoid the cache line lock in likely cases of contention). I believe this to be unambiguously better: even if it's self healing or unlikely, there is no good reason to jump into a spinlock fray or even request a contented cache line while holding a critical lock. merlin
В списке pgsql-hackers по дате отправления: