Re: Adding basic NUMA awareness

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: Adding basic NUMA awareness
Дата	13 августа 18:16:24
Msg-id	mntwceou3ouc4usvktwutlbt6p3bqrzy73dw5nockzodhkud4g@7bchfsl3qpth обсуждение исходный текст
Ответ на	Re: Adding basic NUMA awareness (Tomas Vondra <tomas@vondra.me>)
Ответы	Re: Adding basic NUMA awareness
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 2025-08-07 11:24:18 +0200, Tomas Vondra wrote:
> The patch does a much simpler thing - treat the weight as a "budget",
> i.e. number of buffers to allocate before proceeding to the "next"
> partition. So it allocates 55 buffers from P1, then 45 buffers from P2,
> and then goes back to P1 in a round-robin way. The advantage is it can
> do away without a PRNG.

I think that's a good plan.


A few comments about the clock sweep patch:

- It'd be easier to review if BgBufferSync() weren't basically re-indented
  wholesale. Maybe you could instead move the relevant code to a helper
  function that's called by BgBufferSync() for each clock?

- I think choosing a clock sweep partition in every tick would likely show up
  in workloads that do a lot of buffer replacement, particularly if buffers
  in the workload often have a high usagecount (and thus more ticks are used).
  Given that your balancing approach "sticks" with a partition for a while,
  could we perhaps only choose the partition after exhausting that budget?

- I don't really understand what

> +    /*
> +     * Buffers that should have been allocated in this partition (but might
> +     * have been redirected to keep allocations balanced).
> +     */
> +    pg_atomic_uint32 numRequestedAllocs;
> +

  is intended for.

  Adding yet another atomic increment for every clock sweep tick seems rather
  expensive...


- I wonder if the balancing budgets being relatively low will be good
  enough. It's not too hard to imagine that this frequent "partition choosing"
  will be bad in buffer access heavy workloads. But it's probably the right
  approach until we've measured it being a problem.


- It'd be interesting to do some very simple evaluation like a single
  pg_prewarm() of a relation that's close to the size of shared buffers and
  verify that we don't end up evicting newly read in buffers.  I think your
  approach should work, but verifying that...

  I wonder if we could make some of this into tests somehow. It's pretty easy
  to break this kind of thing and not notice, as everything just continues to
  work, just a tad slower.

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Adding basic NUMA awareness