Re: Page replacement algorithm in buffer cache

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: Page replacement algorithm in buffer cache
Дата
Msg-id CAHyXU0ycV+S33vL7wM+78vWDb=x3X=B-JKpYb66eHzQr5n2tQw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Page replacement algorithm in buffer cache  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
On Mon, Apr 1, 2013 at 3:32 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Apr  1, 2013 at 11:55:07AM -0500, Merlin Moncure wrote:
>> > In fact, BufFreelistLock is really misnamed, because for the most
>> > part, the "free list" as we implement is going to be empty.  What the
>> > BufFreelistLock is really doing is serializing the process of scanning
>> > for a free buffer.  I think THAT is the problem.  If we could arrange
>> > things so as to hold BufFreelistLock only for the amount of time
>> > needed to remove a buffer from a freelist ... we'd probably buy
>> > ourselves quite a bit.
>>
>> right.  I'm imaging a buffer scan loop that looks something like
>> (uncompiled, untested) this.  "TryLockBufHdr" does a simple TAS
>> without spin, returning the lock state (well, true if it acquired the
>> lock).  usage_count is specifically and deliberately adjusted without
>> having a lock on the buffer header (this would require some careful
>> testing and possible changes elsewhere):
>
> TAS does a CPU 'lock' instruction which affects the cpu cache.  Why not
> just read the value with no lock?

check again, that's exactly what it does. Note the old implementation
did a LockBufHdr() before examining refcount.  The key logic is here:

   if (buf->refcount == 0)   {     if (buf->usage_count > 0)     {       buf->usage_count--;       trycounter =
NBuffers;    }     else     {       if (TryLockBufHdr(buf)       {
 

So we do an unlocked read of refcount and immediately bail if the
buffer is "locked" according to the cpu cache.  Then we check (still
unlocked) usage_count and decrement it:  Our adjustment may be lost,
but so what?  Finally, we attempt one (and only one) cache line lock
(via TAS_SPIN) of the buffer and again bail if we see any problems
there.   Thus, it's impossible to get stuck in a potentially yielding
spin while holding the free list lock.

I dub this: "The Frightened Turtle" strategy of buffer allocation.
It's an idea based on my research trying to solve Vlad's issue of
having server stalls during read-only loads (see here:
http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-td5732045.html)
for a general backgrounder.  The idea may not actually fix his issue,
or there may be other aggravating aspects such as the
always-capricious linux scheduler, but I'm suspicious.

merlin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: "Orphaned" files after initdb
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Page replacement algorithm in buffer cache