Re: Page replacement algorithm in buffer cache

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Page replacement algorithm in buffer cache
Дата
Msg-id CA+Tgmob37KgvKQCCNiwpNAYPD5fmkyScq4Q9oXCqW+U5P4_L_g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Page replacement algorithm in buffer cache  (Greg Smith <greg@2ndQuadrant.com>)
Ответы Re: Page replacement algorithm in buffer cache  (Amit Kapila <amit.kapila@huawei.com>)
Список pgsql-hackers
On Wed, Apr 3, 2013 at 9:49 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 4/2/13 11:54 AM, Robert Haas wrote:
>> But, having said that, I still think the best idea is what Andres
>> proposed, which pretty much matches my own thoughts: the bgwriter
>> needs to populate the free list, so that buffer allocations don't have
>> to wait for linear scans of the buffer array.
>
> I was hoping this one would make it to a full six years of being on the TODO
> list before it came up again, missed it by a few weeks.  The funniest part
> is that Amit even submitted a patch on this theme a few months ago without
> much feedback:
> http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382852FF97@szxeml509-mbs
> That stalled where a few things have, on a) needing more regression test
> workloads, and b) wondering just what the deal with large shared_buffers
> setting degrading performance was.

Those are impressive results.  I think we should seriously consider
doing something like that for 9.4.  TBH, although more workloads to
test is always better, I don't think this problem is so difficult that
we can't have some confidence in a theoretical analysis.  If I read
the original thread correctly (and I haven't looked at the patch
itself), the proposed patch would actually invalidate buffers before
putting them on the freelist.  That effectively amounts to reducing
shared_buffers, so workloads that are just on the edge of what can fit
in shared_buffers will be harmed, and those that benefit incrementally
from increased shared_buffers will be as well.

What I think we should do instead is collect the buffers that we think
are evictable and stuff them onto the freelist without invalidating
them.  When a backend allocates from the freelist, it can double-check
that the buffer still has usage_count 0.  The odds should be pretty
good.  But even if we sometimes notice that the buffer has been
touched again after being put on the freelist, we haven't expended all
that much extra effort, and that effort happened mostly in the
background.  Consider a scenario where only 10% of the buffers have
usage count 0 (which is not unrealistic).  We scan 5000 buffers and
put 500 on the freelist.  Now suppose that, due to some accident of
the workload, 75% of those buffers get touched again before they're
allocated off the freelist (which I believe to be a pessimistic
estimate for most workloads).  Now, that means that only 125 of those
500 buffers will succeed in satisfying an allocation request.  That's
still a huge win, because it means that each backend only has examine
an average of 4 buffers before it finds one to allocate.  If it had
needed to do the freelist scan itself, it would have had to touch 40
buffers before finding one to allocate.

In real life, I think the gains are apt to be, if anything, larger.
IME, it's common for most or all of the buffer pool to be pinned at
usage count 5.  So you could easily have a situation where the arena
scan has to visit millions of buffers to find one to allocate.  If
that's happening in the background instead of the foreground, it's a
huge win.  Also, note that there's nothing to prevent the arena scan
from happening in parallel with allocations off of the freelist - so
while foreground processes are emptying the freelist, the background
process can be looking for more things to add to it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: corrupt pages detected by enabling checksums
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Number of spinlocks