Re: our buffer replacement strategy is kind of lame

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: our buffer replacement strategy is kind of lame
Дата
Msg-id CA+TgmoZsc9HPXqct65OK6vorw7bJFKBx0mYu_EWrO=+crRGuwg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: our buffer replacement strategy is kind of lame  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: our buffer replacement strategy is kind of lame  (Simon Riggs <simon@2ndQuadrant.com>)
Re: our buffer replacement strategy is kind of lame  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Sun, Aug 14, 2011 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Sat, Aug 13, 2011 at 11:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Aug 13, 2011 at 4:40 PM, Greg Stark <stark@mit.edu> wrote:
>>> On Sat, Aug 13, 2011 at 8:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> and possibly we ought to put them all in a
>>>> linked list so that the next guy who needs a buffer can just pop one
>>>
>>> The whole point of the clock sweep algorithm is to approximate an LRU
>>> without needing to maintain a linked list. The problem with a linked
>>> list is that you need to serialize access to it so every time you
>>> reference a buffer you need to wait on a lock for the list so you can
>>> move that buffer around in the list.
>>
>> Right, but the same doesn't apply to what I'm talking about.  You just
>> put each guy into the linked list when his reference count goes down
>> to 0.  When you want to allocate a buffer, you pop somebody off.  If
>> his reference count is still 0, you forget about him and pop the next
>> guy, and keep going until you find a suitable victim.
>>
>> The problem with just running the clock sweep every time you need a
>> victim buffer is that you may have to pass over a large number of
>> unevictable buffers before you find someone you can actually kick out.
>>  That's work that should really be getting done in the background, not
>> at the absolute last moment when we discover, hey, we need a buffer.
>
> I think Greg is right here. Those suggested changes just bring back the LRU.

Since the problem with LRU is that it requires moving each buffer to
the front of the list every time it's touched, and since the idea that
I proposed does not require that, I don't know what you mean by this.

> If you keep a separate list then you need to serialize access to it.

That is true.  However, there are some reasons to think it would be
better than what we're doing right now.  Right now, we acquire the
BufFreelistLock, and then take and release some number of spinlocks.
It seems fairly easy to construct cases where that number is routinely
over 100; even on my laptop, I could construct cases where it was
routinely 50-100 range.  A linked list that we can just dequeue things
from could be protected by a single spinlock, which would hopefully be
somewhat faster.  (In the interest of credit where credit is due, this
is pretty much the point Jim Nasby has been making every time this
argument comes up, and I've been dismissing it, but now that I
understand how much work gets done in that clock sweep code, I'm
starting to think he might be right.)

However, if it turns out that that there's still too much contention
over that spinlock, we can probably fix that by maintaining N lists
instead of one, distributing buffers between them in round-robin
fashion, and having each backend choose a list based on its backend ID
modulo N.  The testing I've done so far seems to indicate that
spinlock contention doesn't really become noticeable until you have
32-40 CPUs pounding hard on the same spinlock, so even N = 4 is
probably enough to make the problem go away.  But I don't even think
we're going to have to go that far right at the outset, because
there's another major source of contention in the buffer eviction
path: the buffer mapping locks.

> usage_count == 0 and "unevictable" are dynamic states, so if the
> bgwriter sees those states they can change before anyone uses the
> buffer.

Yep.  That's already a problem.  When we pin the buffer to be evicted,
we're not yet holding the relevant buffer mapping locks.  By the time
we do, someone else can have pinned the page.  So there's already
retry logic here.  In theory, you can imagine a backend getting
starved for an arbitrarily long period of time, because it keeps
picking a victim buffer that someone else wrenches away before we
actually manage to kick it out.  I haven't yet seen any evidence that
that's a problem in practice, but if it is, this idea will make it
worse.  It seems hard to know without testing it, though.

The big problem with this idea is that it pretty much requires that
the work you mentioned in one of your other emails - separating the
background writer and checkpoint machinery into two separate processes
- to happen first.  So I'm probably going to have to code that up to
see whether this works.  If you're planning to post that patch soon
I'll wait for it.  Otherwise, I'm going to have to cobble together
something that is at least good enough for testing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: our buffer replacement strategy is kind of lame
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: our buffer replacement strategy is kind of lame