ReadRecentBuffer() doesn't scale well

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	ReadRecentBuffer() doesn't scale well
Дата	27 июня 2023 г. 05:05:46
Msg-id	20230627020546.t6z4tntmj7wmjrfh@awork3.anarazel.de обсуждение исходный текст
Ответы	Re: ReadRecentBuffer() doesn't scale well
Список	pgsql-hackers

Дерево обсуждения

Hi,

As mentioned nearby [1], Thomas brought up [2] the idea of using
ReadRecentBuffer() _bt_getroot().  I couldn't resist and prototyped it.

Unfortunately it scaled way worse at first. This is not an inherent issue, but
due to an implementation choice in ReadRecentBuffer().  Whereas the normal
BufferAlloc() path uses PinBuffer(), ReadRecentBuffer() first does
LockBufHdr(), checks if the buffer ID is the same and then uses
PinBuffer_Locked().

The problem with that is that PinBuffer() takes care to not hold the buffer
header spinlock, it uses compare_exchange to atomically acquire the pin, while
guaranteing nobody holds the lock.  When holding the buffer header spinlock,
there obviously is the risk of being scheduled out (or even just not have
exclusive access to the cacheline).

ReadRecentBuffer() scales worse even if LockBufHdr() is immediately followed
by PinBuffer_Locked(), so it's really just holding the lock that is the issue.


The fairly obvious solution to this is to just use PinBuffer() and just unpin
the buffer if its identity was changed concurrently. There could be an
unlocked pre-check as well.  However, there's the following comment in
ReadRecentBuffer():
             * It's now safe to pin the buffer.  We can't pin first and ask
             * questions later, because it might confuse code paths like
             * InvalidateBuffer() if we pinned a random non-matching buffer.
             */

But I'm not sure I buy that - there's plenty other things that can briefly
acquire a buffer pin (e.g. checkpointer, reclaiming the buffer for other
contents, etc).



Another difference between using PinBuffer() and PinBuffer_locked() is that
the latter does not adjust a buffer's usagecount.

Leaving the scalability issue aside, isn't it somewhat odd that optimizing a
codepath to use ReadRecentBuffer() instead of ReadBuffer() leads to not
increasing usagecount anymore?


FWIW, once that's fixed, using ReadRecentBuffer() for _bt_getroot(), caching
the root page's buffer id in RelationData, seems a noticeable win. About 7% in
a concurrent, read-only pgbench that utilizes batches of 10. And it should be
easy to get much bigger wins, e.g. with a index nested loop with a relatively
small index on the inner side.


Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/20230627013458.axge7iylw7llyvww%40awork3.anarazel.de
[2] https://twitter.com/MengTangmu/status/1673439083518115840

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Andres Freund
Дата: 27 июня 2023 г., 04:34:58
Сообщение: False sharing for PgBackendStatus, made worse by in-core query_id handling

Следующее

От: jian he
Дата: 27 июня 2023 г., 05:35:53
Сообщение: Re: Do we want a hashset type?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

ReadRecentBuffer() doesn't scale well

Предыдущее

Следующее