Re: our buffer replacement strategy is kind of lame

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: our buffer replacement strategy is kind of lame
Дата
Msg-id 34851930-E7F3-4EF9-BE14-1B5ABAE375F3@nasby.net
обсуждение исходный текст
Ответ на Re: our buffer replacement strategy is kind of lame  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On Aug 13, 2011, at 3:40 PM, Greg Stark wrote:
> It does kind of seem like your numbers indicate we're missing part of
> the picture though. The idea with the clock sweep algorithm is that
> you keep approximately 1/nth of the buffers with each of the n values.
> If we're allowing nearly all the buffers to reach a reference count of
> 5 then you're right that we've lost any information about which
> buffers have been referenced most recently.

One possible missing piece here is that OS clock-sweeps depend on the clock hand to both increment and decrement the
usagecount. The hardware sets a bit any time a page is accessed; as the clock sweeps in increases usage count if the
bitis set and decreases it if it's clear. I believe someone else in the thread suggested this, and I definitely think
it'sworth an experiment. Presumably this would also ease some lock contention issues. 

There is another piece that might be relevant... many (most?) OSes keep multiple lists of pages. FreeBSD for example
containsthese page lists (http://www.freebsd.org/doc/en/articles/vm-design/article.html). Full description follows, but
Ithink the biggest take-away is that there is a difference in how pages are handled once they are no longer active
basedon whither the page is dirty or not. 

Active: These pages are actively in use and are not currently under consideration for eviction. This is roughy
equivalentto all of our buffers with a usage count of 5. 

When an active page's usage count drops to it's minimum value, it will get unmapped from process space and moved to one
oftwo queues: 

Inactive: DIRTY pages that are eligible for eviction once they've been written out.

Cache: CLEAN pages that may be immediately reclaimed

Free: A small set of pages that are basically the tail of the Cache list. The OS *must* maintain some pages on this
listto support memory needed during interrupt handling. The size of this list is typically kept very small, and I'm not
sureif non-interrupt processing will pull from this list. 

It's important to note that the OS can pull a page back out of the Inactive and Cache lists back into Active very
cheaply.

I think there are two interesting points here. First: after a page has been determined to no longer be in active use it
goesinto inactive or cache based on whether it's dirty. ISTM that allows for much better scheduling of the flushing of
dirtypages. That said; I'm not sure how much that would help us due to checkpoint requirements. 

Second: AFAIK only the Active list has a clock sweep. I believe the others are LRU (the mentioned URL refers to them as
queues).I believe this works well because if a page faults it just needs to be removed from whichever queue it is in,
addedto the Active queue, and mapped back into process space. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joachim Wieland
Дата:
Сообщение: Re: synchronized snapshots
Следующее
От: Greg Smith
Дата:
Сообщение: Re: index-only scans