Re: Page replacement algorithm in buffer cache

Поиск
Список
Период
Сортировка
От Atri Sharma
Тема Re: Page replacement algorithm in buffer cache
Дата
Msg-id CAOeZVideZEz3+G_fL1FT-pBcbeRunTJzDFUL3k8HT0krETRpcQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Page replacement algorithm in buffer cache  (Ants Aasma <ants@cybertec.at>)
Ответы Re: Page replacement algorithm in buffer cache  (Ants Aasma <ants@cybertec.at>)
Список pgsql-hackers
>
> Moreover, if the buffer happens to miss a decrement due to a data
> race, there's a good chance that the buffer is heavily used and
> wouldn't need to be evicted soon anyway. (if you arrange it to be a
> read-test-inc/dec-store operation then you will never go out of
> bounds) However, clocksweep and usage_count maintenance is not what is
> causing contention because that workload is distributed. The issue is
> pinning and unpinning. There we need an accurate count and there are
> some pages like index roots that get hit very heavily. Things to do
> there would be in my opinion convert to a futex based spinlock so when
> there is contention it doesn't completely kill performance and then
> try to get rid of the contention. Converting to lock-free pinning
> won't help much here as what is killing us here is the cacheline
> bouncing.
>
> One way to get rid of contention is the buffer nailing idea that
> Robert came up with. If some buffer gets so hot that maintaining
> refcount on the buffer header leads to contention, promote that buffer
> to a nailed status, let everyone keep their pin counts locally and
> sometime later revisit the nailing decision and if necessary convert
> pins back to the buffer header.
>
> One other interesting idea I have seen is closeable scalable nonzero
> indication (C-SNZI) from scalable rw-locks [1]. The idea there is to
> use a tree structure to dynamically stripe access to the shared lock
> counter when contention is detected. Downside is that considerable
> amount of shared memory is needed so there needs to be some way to
> limit the resource usage. This is actually somewhat isomorphic to the
> nailing idea.
>
> The issue with the current buffer management algorithm is that it
> seems to scale badly with increasing shared_buffers. I think the
> improvements should concentrate on finding out what is the problem
> there and figuring out how to fix it. A simple idea to test would be
> to just partition shared buffers along with the whole clock sweep
> machinery into smaller ones, like the buffer mapping hash tables
> already are. This should at the very least reduce contention for the
> clock sweep even if it doesn't reduce work done per page miss.
>

One way to distribute memory contention in case of spinlocks could be
to utilize the fundamentals of NUMA architecture. Specifically, we can
let the contending backends spin on local flags instead on the buffer
header flags directly. As access to local cache lines is much cheaper
and faster than memory locations which are far away in NUMA, we could
potentially reduce the memory overhead for a specific line and reduce
the overall overheads as well.

Regards.

Atri


--
Regards,

Atri
l'apprenant



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Enabling Checksums
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: pg_dump/restore syntax checking bug?