Re: Adding basic NUMA awareness
От | Andres Freund |
---|---|
Тема | Re: Adding basic NUMA awareness |
Дата | |
Msg-id | mntwceou3ouc4usvktwutlbt6p3bqrzy73dw5nockzodhkud4g@7bchfsl3qpth обсуждение исходный текст |
Ответ на | Re: Adding basic NUMA awareness (Tomas Vondra <tomas@vondra.me>) |
Ответы |
Re: Adding basic NUMA awareness
|
Список | pgsql-hackers |
Hi, On 2025-08-07 11:24:18 +0200, Tomas Vondra wrote: > The patch does a much simpler thing - treat the weight as a "budget", > i.e. number of buffers to allocate before proceeding to the "next" > partition. So it allocates 55 buffers from P1, then 45 buffers from P2, > and then goes back to P1 in a round-robin way. The advantage is it can > do away without a PRNG. I think that's a good plan. A few comments about the clock sweep patch: - It'd be easier to review if BgBufferSync() weren't basically re-indented wholesale. Maybe you could instead move the relevant code to a helper function that's called by BgBufferSync() for each clock? - I think choosing a clock sweep partition in every tick would likely show up in workloads that do a lot of buffer replacement, particularly if buffers in the workload often have a high usagecount (and thus more ticks are used). Given that your balancing approach "sticks" with a partition for a while, could we perhaps only choose the partition after exhausting that budget? - I don't really understand what > + /* > + * Buffers that should have been allocated in this partition (but might > + * have been redirected to keep allocations balanced). > + */ > + pg_atomic_uint32 numRequestedAllocs; > + is intended for. Adding yet another atomic increment for every clock sweep tick seems rather expensive... - I wonder if the balancing budgets being relatively low will be good enough. It's not too hard to imagine that this frequent "partition choosing" will be bad in buffer access heavy workloads. But it's probably the right approach until we've measured it being a problem. - It'd be interesting to do some very simple evaluation like a single pg_prewarm() of a relation that's close to the size of shared buffers and verify that we don't end up evicting newly read in buffers. I think your approach should work, but verifying that... I wonder if we could make some of this into tests somehow. It's pretty easy to break this kind of thing and not notice, as everything just continues to work, just a tad slower. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: