Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock
От | Greg Burd |
---|---|
Тема | Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock |
Дата | |
Msg-id | 70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com обсуждение исходный текст |
Ответ на | Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock
|
Список | pgsql-hackers |
On Aug 17 2025, at 12:57 am, Thomas Munro <thomas.munro@gmail.com> wrote: > On Sun, Aug 17, 2025 at 4:34 PM Thomas Munro <thomas.munro@gmail.com> wrote: >> Or if you don't like those odds, maybe it'd be OK to keep % but use it >> rarely and without the CAS that can fail. > > ... or if we wanted to try harder to avoid %, could we relegate it to > the unlikely CLOCK-went-all-the-way-around-again-due-to-unlucky-scheduling > case, but use subtraction for the expected periodic overshoot? > > if (hand >= NBuffers) > { > hand = hand < Nbuffers * 2 ? hand - NBuffers : hand % NBuffers; > /* Base value advanced by backend that overshoots by one tick. */ > if (hand == 0) > pg_atomic_fetch_add_u64(&StrategyControl->ticks_base, NBuffers); > } > Hi Tomas, Thanks for all the ideas, I have tried out a few of them and a number of other ideas. I've done a lot of measurement and had a few off channel discussions about this and I think the best way to move forward is to just focus on the removal of the freelist and not bother with the lock or changing clock-sweep right now too much. So, the attached patch set keeps the first two from the last set but drops the rest. But wait, there's more... As a *bonus* I've added a new third patch with some proposed changes to spark discussions. As I researched experiences in the field at scale a few other buffer management issues came to light. The one in particular that I tried to address in this new patch 0003 has to do with very large shared_buffers (NBuffers) and very large active datasets causing most buffer usage counts to be at or near the max value (5). In these cases the clock-sweep algorithm needs to perform NBuffers * 5 "ticks" before identifying a buffer to evict. This also pollutes the completePasses value used to inform the bgwriter where to start working. So, in this patch I add per-backend buffer usage tracking and proactive pressure management. Each tick of the hand can now decrement usage by a calculated amount, not just 1, based on /hand-wavy-first-attempt at magic/. The thing I'm sure this doesn't help with, and may in fact hurt, is keeping frequently accessed buffers in the buffer pool. I imagine a two tier approach to this where some small subset of buffers that are reused frequently enough are not even considered by the clock-sweep algorithm. Regardless, I feel the first two patches on this set address the intention of this thread. I added patch 0003 just to start a conversation, please chime in if any of this interests you. Maybe this new patch should take on a life of its own in a new thread? If anyone thinks this approach has some merit, I'll do that. I look forward to thoughts on these idea, and hopefully to finding someone willing to help me get the first two over the line. best. -greg
Вложения
В списке pgsql-hackers по дате отправления: