Re: pgsql: Introduce pg_shmem_allocations_numa view
От | Tomas Vondra |
---|---|
Тема | Re: pgsql: Introduce pg_shmem_allocations_numa view |
Дата | |
Msg-id | 6c9f9f7e-947b-4fc3-bdb6-b0696d7492e5@vondra.me обсуждение исходный текст |
Ответ на | Re: pgsql: Introduce pg_shmem_allocations_numa view (Christoph Berg <myon@debian.org>) |
Список | pgsql-hackers |
On 6/23/25 21:57, Christoph Berg wrote: > Re: Andres Freund >> How confident are we that this isn't actually because we passed a bogus >> address to the kernel or such? With this patch, are *any* pages recognized as >> valid on the machines that triggered the error? > > See upthread - the first 35 pages were ok, then a lot of -14. > >> I wonder if we ought to report the failures as a separate "numa node" >> (e.g. NULL as node id) instead ... > > Did that now, using N+1 (== 1 here) for errors in this Debian i386 > environment (chroot on an amd64 host): > > select * from pg_shmem_allocations_numa \crosstabview > > name │ 0 │ 1 > ────────────────────────────────────────────────┼──────────┼────────── > multixact_offset │ 69632 │ 65536 > subtransaction │ 139264 │ 131072 > notify │ 139264 │ 0 > Shared Memory Stats │ 188416 │ 131072 > serializable │ 188416 │ 86016 > PROCLOCK hash │ 4096 │ 0 > FinishedSerializableTransactions │ 4096 │ 0 > XLOG Ctl │ 2117632 │ 2097152 > Shared MultiXact State │ 4096 │ 0 > Proc Header │ 4096 │ 0 > Archiver Data │ 4096 │ 0 > .... more 0s in the last column ... > AioHandleData │ 1429504 │ 0 > Buffer Blocks │ 67117056 │ 67108864 > Buffer IO Condition Variables │ 266240 │ 0 > Proc Array │ 4096 │ 0 > .... more 0s > (73 rows) > > > There is something fishy with pg_buffercache. If I restart PG, I'm > getting "Bad address" (errno 14), this time as return value of > move_pages(). > > postgres =# select * from pg_buffercache_numa; > DEBUG: 00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:383 > 2025-06-23 19:41:41.315 UTC [1331894] ERROR: failed NUMA pages inquiry: Bad address > 2025-06-23 19:41:41.315 UTC [1331894] STATEMENT: select * from pg_buffercache_numa; > ERROR: XX000: failed NUMA pages inquiry: Bad address > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:394 > > Repeated calls are fine. > Huh. So it's only the first call that does this? Can you maybe print the addresses passed to pg_numa_query_pages? I wonder if there's some bug in how we fill that array. Not sure why would it happen only on 32-bit systems, though. I'll create a 32-bit VM so that I can try reproducing this. regards -- Tomas Vondra
В списке pgsql-hackers по дате отправления: