Re: Draft for basic NUMA observability
От | Andres Freund |
---|---|
Тема | Re: Draft for basic NUMA observability |
Дата | |
Msg-id | q4f4v2xnf7anfl5ucjm5dbqtnaspsq5z2ajndag35ed75avt5e@eqpqm4a5ap7q обсуждение исходный текст |
Ответ на | Re: Draft for basic NUMA observability (Tomas Vondra <tomas@vondra.me>) |
Список | pgsql-hackers |
Hi, On 2025-04-06 13:51:34 +0200, Tomas Vondra wrote: > On 4/6/25 00:29, Andres Freund wrote: > >> + > >> + if (firstNumaTouch) > >> + elog(DEBUG1, "NUMA: page-faulting the buffercache for proper NUMA readouts"); > > > > Over the patchseries the related code is duplicated. Seems like it'd be good > > to put it into pg_numa instead? This seems like the thing that's good to > > abstract away in one central spot. > > > > Abstract away which part, exactly? I thought about moving some of the > code to port/pg_numa.c, but it didn't seem worth it. The easiest would be to just have a single function that does this for the whole shared memory allocation, without having to integrate it with the per-allocation or per-buffer loop. > > > >> + /* > >> + * If we have multiple OS pages per buffer, fill those in too. We > >> + * always want at least one OS page, even if there are multiple > >> + * buffers per page. > >> + * > >> + * Altough we could query just once per each OS page, we do it > >> + * repeatably for each Buffer and hit the same address as > >> + * move_pages(2) requires page aligment. This also simplifies > >> + * retrieval code later on. Also NBuffers starts from 1. > >> + */ > >> + for (j = 0; j < Max(1, pages_per_buffer); j++) > >> + { > >> + char *buffptr = (char *) BufferGetBlock(i + 1); > >> + > >> + fctx->record[idx].bufferid = bufferid; > >> + fctx->record[idx].numa_page = j; > >> + > >> + os_page_ptrs[idx] > >> + = (char *) TYPEALIGN(os_page_size, > >> + buffptr + (os_page_size * j)); > > > > FWIW, this bit here is the most expensive part of the function itself, as the > > compiler has no choice than to implement it as an actual division, as > > os_page_size is runtime variable. > > > > It'd be fine to leave it like that, the call to numa_move_pages() is way more > > expensive. But it shouldn't be too hard to do that alignment once, rather than > > having to do it over and over. > > > > Division? It's entirely possible I'm missing something obvious, but I > don't see any divisions in this code. Oops. The division was only added in a subsequent patch, not the quoted code. At the time it was: + /* calculate ID of the OS memory page */ + fctx->record[idx].numa_page + = ((char *) os_page_ptrs[idx] - startptr) / os_page_size; Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: