Re: Page replacement algorithm in buffer cache

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Page replacement algorithm in buffer cache
Дата
Msg-id CAMkU=1zVSyNRR_AQh4j_w6h37+qyvAz4fY8A+QP8e0dsuBg7Fw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Page replacement algorithm in buffer cache  (Ants Aasma <ants@cybertec.at>)
Ответы Re: Page replacement algorithm in buffer cache  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-hackers
On Friday, March 22, 2013, Ants Aasma wrote:
On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> well if you do a non-locking test first you could at least avoid some
> cases (and, if you get the answer wrong, so what?) by jumping to the
> next buffer immediately.  if the non locking test comes good, only
> then do you do a hardware TAS.
>
> you could in fact go further and dispense with all locking in front of
> usage_count, on the premise that it's only advisory and not a real
> refcount.  so you only then lock if/when it's time to select a
> candidate buffer, and only then when you did a non locking test first.
>  this would of course require some amusing adjustments to various
> logical checks (usage_count <= 0, heh).

Moreover, if the buffer happens to miss a decrement due to a data
race, there's a good chance that the buffer is heavily used and
wouldn't need to be evicted soon anyway. (if you arrange it to be a
read-test-inc/dec-store operation then you will never go out of
bounds) However, clocksweep and usage_count maintenance is not what is
causing contention because that workload is distributed. The issue is
pinning and unpinning.

That is one of multiple issues.  Contention on the BufFreelistLock is another one.  I agree that usage_count maintenance is unlikely to become a bottleneck unless one or both of those is fixed first (and maybe not even then)

...

 
The issue with the current buffer management algorithm is that it
seems to scale badly with increasing shared_buffers.

I do not think that this is the case.  Neither of the SELECT-only contention points (pinning/unpinning of index root blocks when all data is in shared_buffers, and BufFreelistLock when all data is not in shared_buffers) are made worse by increasing shared_buffers that I have seen.  They do scale badly with number of concurrent processes, though.

The reports of write-heavy workloads not scaling well with shared_buffers do not seem to be driven by the buffer management algorithm, or at least not the freelist part of it.  They mostly seem to center on the kernel and the IO controllers.

 Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pkg-config files for libpq and ecpg
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Hash Join cost estimates