Re: Reducing System Allocator Thrashing of ExecutorState to Alleviate FDW-related Performance Degradations

Поиск
Список
Период
Сортировка
Hi,

On 2023-02-17 17:26:20 +1300, David Rowley wrote:
> I didn't hear it mentioned explicitly here, but I suspect it's faster
> when increasing the initial size due to the memory context caching
> code that reuses aset MemoryContexts (see context_freelists[] in
> aset.c). Since we reset the context before caching it, then it'll
> remain fast when we can reuse a context, provided we don't need to do
> a malloc for an additional block beyond the initial block that's kept
> in the cache.

I'm not so sure this is the case. Which is one of the reasons I'd really like
to see a) memory context stats for executor context b) a CPU profile of the
problem c) a reproducer.

Jonah, did you just increase the initial size, or did you potentially also
increase the maximum block size?

And did you increase ALLOCSET_DEFAULT_INITSIZE everywhere, or just passed a
larger block size in CreateExecutorState()?  If the latter,the context
freelist wouldn't even come into play.


A 8MB max block size is pretty darn small if you have a workload that ends up
with a gigabytes worth of blocks.

And the problem also could just be that the default initial blocks size takes
too long to ramp up to a reasonable block size. I think it's 20 blocks to get
from ALLOCSET_DEFAULT_INITSIZE to ALLOCSET_DEFAULT_MAXSIZE.  Even if you
allocate a good bit more than 8MB, having to additionally go through 20
smaller chunks is going to be noticable until you reach a good bit higher
number of blocks.


> Maybe we should think of a more general-purpose way of doing this
> caching which just keeps a global-to-the-process dclist of blocks
> laying around.  We could see if we have any free blocks both when
> creating the context and also when we need to allocate another block.

Not so sure about that. I suspect the problem could just as well be the
maximum block size, leading to too many blocks being allocated. Perhaps we
should scale that to a certain fraction of work_mem, by default?

Either way, I don't think we should go too deep without some data, too likely
to miss the actual problem.



> I see no reason why this couldn't be shared among the other context
> types rather than keeping this cache stuff specific to aset.c.  slab.c
> might need to be pickier if the size isn't exactly what it needs, but
> generation.c should be able to make use of it the same as aset.c
> could.  I'm unsure what'd we'd need in the way of size classing for
> this, but I suspect we'd need to pay attention to that rather than do
> things like hand over 16MBs of memory to some context that only wants
> a 1KB initial block.

Possible. I can see something like a generic "free block" allocator being
useful. Potentially with allocating the underlying memory with larger mmap()s
than we need for individual blocks.


Random note:

I wonder if we should having a bitmap (in an int) in front of aset's
freelist. In a lot of cases we incur plenty cache misses, just to find the
freelist bucket empty.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: Move defaults toward ICU in 16?