ReadRecentBuffer() doesn't scale well

Поиск
Список
Период
Сортировка
От Andres Freund
Тема ReadRecentBuffer() doesn't scale well
Дата
Msg-id 20230627020546.t6z4tntmj7wmjrfh@awork3.anarazel.de
обсуждение исходный текст
Ответы Re: ReadRecentBuffer() doesn't scale well
Список pgsql-hackers
Hi,

As mentioned nearby [1], Thomas brought up [2] the idea of using
ReadRecentBuffer() _bt_getroot().  I couldn't resist and prototyped it.

Unfortunately it scaled way worse at first. This is not an inherent issue, but
due to an implementation choice in ReadRecentBuffer().  Whereas the normal
BufferAlloc() path uses PinBuffer(), ReadRecentBuffer() first does
LockBufHdr(), checks if the buffer ID is the same and then uses
PinBuffer_Locked().

The problem with that is that PinBuffer() takes care to not hold the buffer
header spinlock, it uses compare_exchange to atomically acquire the pin, while
guaranteing nobody holds the lock.  When holding the buffer header spinlock,
there obviously is the risk of being scheduled out (or even just not have
exclusive access to the cacheline).

ReadRecentBuffer() scales worse even if LockBufHdr() is immediately followed
by PinBuffer_Locked(), so it's really just holding the lock that is the issue.


The fairly obvious solution to this is to just use PinBuffer() and just unpin
the buffer if its identity was changed concurrently. There could be an
unlocked pre-check as well.  However, there's the following comment in
ReadRecentBuffer():
             * It's now safe to pin the buffer.  We can't pin first and ask
             * questions later, because it might confuse code paths like
             * InvalidateBuffer() if we pinned a random non-matching buffer.
             */

But I'm not sure I buy that - there's plenty other things that can briefly
acquire a buffer pin (e.g. checkpointer, reclaiming the buffer for other
contents, etc).



Another difference between using PinBuffer() and PinBuffer_locked() is that
the latter does not adjust a buffer's usagecount.

Leaving the scalability issue aside, isn't it somewhat odd that optimizing a
codepath to use ReadRecentBuffer() instead of ReadBuffer() leads to not
increasing usagecount anymore?


FWIW, once that's fixed, using ReadRecentBuffer() for _bt_getroot(), caching
the root page's buffer id in RelationData, seems a noticeable win. About 7% in
a concurrent, read-only pgbench that utilizes batches of 10. And it should be
easy to get much bigger wins, e.g. with a index nested loop with a relatively
small index on the inner side.


Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/20230627013458.axge7iylw7llyvww%40awork3.anarazel.de
[2] https://twitter.com/MengTangmu/status/1673439083518115840



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: False sharing for PgBackendStatus, made worse by in-core query_id handling
Следующее
От: jian he
Дата:
Сообщение: Re: Do we want a hashset type?