Re: Page replacement algorithm in buffer cache

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: Page replacement algorithm in buffer cache
Дата	31 марта 2013 г. 18:27:11
Msg-id	CAMkU=1zVSyNRR_AQh4j_w6h37+qyvAz4fY8A+QP8e0dsuBg7Fw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Page replacement algorithm in buffer cache (Ants Aasma <ants@cybertec.at>)
Ответы	Re: Page replacement algorithm in buffer cache
Список	pgsql-hackers

Дерево обсуждения

On Friday, March 22, 2013, Ants Aasma wrote:

On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> well if you do a non-locking test first you could at least avoid some
> cases (and, if you get the answer wrong, so what?) by jumping to the
> next buffer immediately. if the non locking test comes good, only
> then do you do a hardware TAS.
>
> you could in fact go further and dispense with all locking in front of
> usage_count, on the premise that it's only advisory and not a real
> refcount. so you only then lock if/when it's time to select a
> candidate buffer, and only then when you did a non locking test first.
> this would of course require some amusing adjustments to various
> logical checks (usage_count <= 0, heh).

Moreover, if the buffer happens to miss a decrement due to a data
race, there's a good chance that the buffer is heavily used and
wouldn't need to be evicted soon anyway. (if you arrange it to be a
read-test-inc/dec-store operation then you will never go out of
bounds) However, clocksweep and usage_count maintenance is not what is
causing contention because that workload is distributed. The issue is
pinning and unpinning.

That is one of multiple issues. Contention on the BufFreelistLock is another one. I agree that usage_count maintenance is unlikely to become a bottleneck unless one or both of those is fixed first (and maybe not even then)

...

The issue with the current buffer management algorithm is that it
seems to scale badly with increasing shared_buffers.

I do not think that this is the case. Neither of the SELECT-only contention points (pinning/unpinning of index root blocks when all data is in shared_buffers, and BufFreelistLock when all data is not in shared_buffers) are made worse by increasing shared_buffers that I have seen. They do scale badly with number of concurrent processes, though.

The reports of write-heavy workloads not scaling well with shared_buffers do not seem to be driven by the buffer management algorithm, or at least not the freelist part of it. They mostly seem to center on the kernel and the IO controllers.

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Page replacement algorithm in buffer cache