Re: Draft for basic NUMA observability
От | Patrick Stählin |
---|---|
Тема | Re: Draft for basic NUMA observability |
Дата | |
Msg-id | 3d8bccef-1395-40ec-bc3d-cccd1882227a@packi.ch обсуждение исходный текст |
Ответ на | Re: Draft for basic NUMA observability (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
Список | pgsql-hackers |
Hi Jakub On 7/24/25 10:01 AM, Jakub Wartak wrote: > On Tue, Jul 22, 2025 at 11:30 AM Patrick Stählin <me@packi.ch> wrote: >> >> Hi! >> >> On 4/7/25 11:27 PM, Tomas Vondra wrote: >>> >>> I've pushed all three parts of v29, with some additional corrections >>> (picked lower OIDs, bumped catversion, fixed commit messages). >> >> While building the PG18 beta1/2 packages I noticed that in our build >> containers the selftest for pg_buffercache_numa and numa failed. It >> seems that libnuma was available and pg_numa_init/numa_available returns >> no errors, we still fail in pg_numa_query_pages/move_pages with EPERM >> yielding the following error when accessing >> pg_buffercache_numa/pg_shmem_allocations_numa: >> >> ERROR: failed NUMA pages inquiry: Operation not permitted >> >> The man-page of move_pages lead me to believe that this is because of >> the missing capability CAP_SYS_NICE on the process but I couldn't prove >> that theory with the attached patch. >> The patch did make the tests pass but also disabled NUMA permanently on >> a vanilla Debian VM and that is certainly not wanted. It may well be >> that my understanding of checking capabilities and how they work is >> incomplete. I also think that adding a new dependency for the reason of >> just checking the capability is probably a bit of an overkill, maybe we >> can check if we can access move_pages once without an error before >> treating it as one? >> >> I'd be happy to debug this further but I have limited access to our >> build-infra, I should be able to sneak in commands during the build though. > > > Hi Patrick, > > So is it because the container was started without CAP_SYS_NICE so > even root -> postgres is not having this cap? In my book container > would be rather small and certainly single container wouldn't be > spanning multiple CPU sockets, so I would just disable libnuma, anyway > if I do on regular VM: > [...] This is just for the build-env but it runs the selftest and this fails then. The containers this is running in prod is a totally different setup and there the numa calls actually work. Disabling it may be an option but it would be nice to detect that we can't access it at runtime. > Can you provide exact details about this container technology? We use podman to set everything up. > Can you provide /usr/sbin/capsh --print just before starting PG there? > Maybe this is more cgroup/cpuset somehow related too? Here is the output, it seems that cap_sys_nice is missing from the bounding set: + /usr/sbin/capsh --print Current: = Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_sys_chroot,cap_setfcap Ambient set = Current IAB: !cap_dac_read_search,!cap_linux_immutable,!cap_net_broadcast,!cap_net_admin,!cap_net_raw,!cap_ipc_lock,!cap_ipc_owner,!cap_sys_module,!cap_sys_rawio,!cap_sys_ptrace,!cap_sys_pacct,!cap_sys_admin,!cap_sys_boot,!cap_sys_nice,!cap_sys_resource,!cap_sys_time,!cap_sys_tty_config,!cap_mknod,!cap_lease,!cap_audit_write,!cap_audit_control,!cap_mac_override,!cap_mac_admin,!cap_syslog,!cap_wake_alarm,!cap_block_suspend,!cap_audit_read,!cap_perfmon,!cap_bpf,!cap_checkpoint_restore Securebits: 00/0x0/1'b0 (no-new-privs=0) secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: no (unlocked) secure-no-ambient-raise: no (unlocked) uid=2000(buildkite-agent) euid=2000(buildkite-agent) gid=2000(buildkite-agent) groups=2000(buildkite-agent) Guessed mode: HYBRID (4) > Anyway, there is a simpler way to make the tests pass if that's what > you are after. We do have > contrib/pg_buffercache/sql/pg_buffercache_numa.sql which is expected > to match outputs in pg_buffercache_numa.out OR (!) > pg_buffercache_numa_1.out. We could just handle this edge case by > adding pg_buffercache_numa_2.out too probably (which would just > contain semi-valid scenario for "ERROR: failed NUMA pages inquiry: > Operation not permitted") Ah, didn't know that was a possibility. Until this sees more usage than just querying the state, this may be a nice workaround. If this is more wide-spread we probably need something a bit more robust for the detection. I already patch out the tests for our build-env so for me it's "solved" but that is certainly not a proper solution. Just FYI, I'll be on PTO so I won't have access to the build-env in the next two weeks. Thanks, Patrick
В списке pgsql-hackers по дате отправления: