Обсуждение: failed NUMA pages inquiry status: Operation not permitted
> src/test/regress/expected/numa.out | 13 +++ > src/test/regress/expected/numa_1.out | 5 + numa_1.out is catching this error: ERROR: libnuma initialization failed or NUMA is not supported on this platform This is what I'm getting when running PG18 in docker on Debian trixie (libnuma 2.0.19). However, on older distributions, the error is different: postgres =# select * from pg_shmem_allocations_numa; ERROR: XX000: failed NUMA pages inquiry status: Operation not permitted LOCATION: pg_get_shmem_allocations_numa, shmem.c:691 This makes the numa regression tests fail in Docker on Debian bookworm (libnuma 2.0.16) and older and all of the Ubuntu LTS releases. The attached patch makes it accept these errors, but perhaps it would be better to detect it in pg_numa_available(). Christoph
Вложения
On 10/16/25 13:38, Christoph Berg wrote: >> src/test/regress/expected/numa.out | 13 +++ >> src/test/regress/expected/numa_1.out | 5 + > > numa_1.out is catching this error: > > ERROR: libnuma initialization failed or NUMA is not supported on this platform > > This is what I'm getting when running PG18 in docker on Debian trixie > (libnuma 2.0.19). > > However, on older distributions, the error is different: > > postgres =# select * from pg_shmem_allocations_numa; > ERROR: XX000: failed NUMA pages inquiry status: Operation not permitted > LOCATION: pg_get_shmem_allocations_numa, shmem.c:691 > > This makes the numa regression tests fail in Docker on Debian bookworm > (libnuma 2.0.16) and older and all of the Ubuntu LTS releases. > It's probably more about the kernel version. What kernels are used by these systems? > The attached patch makes it accept these errors, but perhaps it would > be better to detect it in pg_numa_available(). > Not sure how would that work. It seems this is some sort of permission check in numa_move_pages, that's not what pg_numa_available does. Also, it may depending on the page queried (e.g. whether it's exclusive or shared by multiple processes). thanks -- Tomas Vondra
Re: Tomas Vondra > It's probably more about the kernel version. What kernels are used by > these systems? It's the very same kernel, just different docker containers on the same system. I did not investigate yet where the problem is coming from, different libnuma versions seemed like the best bet. Same (differing) results on both these systems: Linux turing 6.16.7+deb14-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.16.7-1 (2025-09-11) x86_64 GNU/Linux Linux jenkins 6.1.0-39-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.148-1 (2025-08-26) x86_64 GNU/Linux > Not sure how would that work. It seems this is some sort of permission > check in numa_move_pages, that's not what pg_numa_available does. Also, > it may depending on the page queried (e.g. whether it's exclusive or > shared by multiple processes). It's probably the lack of some process capability in that environment. Maybe there is a way to query that, but I don't know much about that yet. Christoph
Re: To Tomas Vondra > It's the very same kernel, just different docker containers on the > same system. I did not investigate yet where the problem is coming > from, different libnuma versions seemed like the best bet. numactl shows the problem already: Host system: $ numactl --show policy: default preferred node: current physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 cpubind: 0 nodebind: 0 membind: 0 preferred: debian:trixie-slim container: $ numactl --show physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 No NUMA support available on this system. debian:bookworm-slim container: $ numactl --show get_mempolicy: Operation not permitted get_mempolicy: Operation not permitted get_mempolicy: Operation not permitted get_mempolicy: Operation not permitted policy: default preferred node: current physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 cpubind: 0 nodebind: 0 membind: 0 preferred: Running with sudo does not change the result. So maybe all that's needed is a get_mempolicy() call in pg_numa_available() ? Christoph
Re: To Tomas Vondra > So maybe all that's needed is a get_mempolicy() call in > pg_numa_available() ? Or perhaps give up on pg_numa_available, and just have two _1.out and _2.out that just contain the two different error messages, without trying to catch the problem. Christoph
On 10/16/25 16:54, Christoph Berg wrote: > Re: Tomas Vondra >> It's probably more about the kernel version. What kernels are used by >> these systems? > > It's the very same kernel, just different docker containers on the > same system. I did not investigate yet where the problem is coming > from, different libnuma versions seemed like the best bet. > > Same (differing) results on both these systems: > Linux turing 6.16.7+deb14-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.16.7-1 (2025-09-11) x86_64 GNU/Linux > Linux jenkins 6.1.0-39-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.148-1 (2025-08-26) x86_64 GNU/Linux > Hmmm. Those seem like relatively recent kernels. >> Not sure how would that work. It seems this is some sort of permission >> check in numa_move_pages, that's not what pg_numa_available does. Also, >> it may depending on the page queried (e.g. whether it's exclusive or >> shared by multiple processes). > > It's probably the lack of some process capability in that environment. > Maybe there is a way to query that, but I don't know much about that > yet. > move_page() manpage mentions PTRACE_MODE_READ_REALCREDS (man ptrace) so maybe that's it. -- Tomas Vondra
> So maybe all that's needed is a get_mempolicy() call in
> pg_numa_available() ?
numactl 2.0.19 --show does this:
if (numa_available() < 0) {
show_physcpubind();
printf("No NUMA support available on this system.\n");
exit(1);
}
int numa_available(void)
{
if (get_mempolicy(NULL, NULL, 0, 0, 0) < 0 && (errno == ENOSYS || errno == EPERM))
return -1;
return 0;
}
pg_numa_available is already calling numa_available.
But numactl 2.0.16 has this:
int numa_available(void)
{
if (get_mempolicy(NULL, NULL, 0, 0, 0) < 0 && errno == ENOSYS)
return -1;
return 0;
}
... which is not catching the "permission denied" error I am seeing.
So maybe PG should implement numa_available itself like that. (Or
accept the output difference so the regression tests are passing.)
Christoph