Re: Clock sweep not caching enough B-Tree leaf pages?

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Clock sweep not caching enough B-Tree leaf pages?
Дата
Msg-id CAM3SWZQa2OAVUrfPL-df=we1sMozKBR392SW_NoVuKZEPXhu9w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Clock sweep not caching enough B-Tree leaf pages?  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Clock sweep not caching enough B-Tree leaf pages?  (Robert Haas <robertmhaas@gmail.com>)
Re: Clock sweep not caching enough B-Tree leaf pages?  (Claudio Freire <klaussfreire@gmail.com>)
Список pgsql-hackers
On Tue, Apr 15, 2014 at 9:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 14, 2014 at 1:11 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> In the past, various hackers have noted problems they've observed with
>> this scheme. A common pathology is to see frantic searching for a
>> victim buffer only to find all buffer usage_count values at 5. It may
>> take multiple revolutions of the clock hand before a victim buffer is
>> found, as usage_count is decremented for each and every buffer.  Also,
>> BufFreelistLock contention is considered a serious bottleneck [1],
>> which is related.
>
> I think that the basic problem here is that usage counts increase when
> buffers are referenced, but they decrease when buffers are evicted,
> and those two things are not in any necessary way connected to each
> other.  In particular, if no eviction is happening, reference counts
> will converge to the maximum value.  I've read a few papers about
> algorithms that attempt to segregate the list of buffers into "hot"
> and "cold" lists, and an important property of such algorithms is that
> they mustn't be allowed to make everything hot.

It's possible that I've misunderstood what you mean here, but do you
really think it's likely that everything will be hot, in the event of
using something like what I've sketched here? I think it's an
important measure against this general problem that buffers really
earn the right to be considered hot, so to speak. With my prototype,
in order for a buffer to become as hard to evict as possible, at a
minimum it must be *continually* pinned for at least 30 seconds.
That's actually a pretty tall order. Although, as I said, I wouldn't
be surprised if it was worth making it possible for buffers to be even
more difficult to evict than that. It should be extremely difficult to
evict a root B-Tree page, and to a lesser extent inner pages even
under a lot of cache pressure, for example. There are lots of
workloads in which that can happen, and I have a hard time believing
that it's worth it to evict given the extraordinary difference in
their utility as compared to a lot of other things. I can imagine a
huge barrier against evicting what is actually a relatively tiny
number of pages being worth it.

I don't want to dismiss what you're saying about heating and cooling
being unrelated, but I don't find the conclusion that not everything
can be hot obvious. Maybe "heat" should be relative rather than
absolute, and maybe that's actually what you meant. There is surely
some workload where buffer access actually is perfectly uniform, and
what do you do there? What "temperature" are those buffers?

It occurs to me that within the prototype patch, even though
usage_count is incremented in a vastly slower fashion (in a wall time
sense), clock sweep doesn't take advantage of that. I should probably
investigate having clock sweep become more aggressive in decrementing
in response to realizing that it won't get some buffer's usage_count
down to zero on the next revolution either. There are certainly
problems with that, but they might be fixable. Within the patch, in
order for it to be possible for the usage_count to be incremented in
the interim, an average of 1.5 seconds must pass, so if clock sweep
were to anticipate another no-set-to-zero revolution, it seems pretty
likely that it would be exactly right, or if not then close enough,
since it can only really fail to correct for some buffers getting
incremented once more in the interim. Conceptually, it would be like
multiple logical revolutions were merged into one actual one,
sufficient to have the next revolution find a victim buffer.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: BGWorkers, shared memory pointers, and postmaster restart
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Proposal: variant of regclass