Hi,
we are using zalando postgres operator and i changed / set huge pages
on kubernetes nodes from something undefined to 1536 (undefined
because i was pretty sure before changing it to 1536 i saw an initial
value of 1024 with 670 in use.
Postgres stoped working after setting it to 1536 and restarting the
node. I was scratching my head why because i did saw huge pages before
and didn't connect it at all.
I found core dumps and this is the output:
Core was generated by `/usr/lib/postgresql/14/bin/postgres -D
/home/postgres/pgdata/pgroot/data --conf'.
Program terminated with signal SIGBUS, Bus error.
warning: Section `.reg-xstate/999' in core file too small.
#0 0x0000558ea5345148 in PGSharedMemoryCreate ()
(gdb) bt
#0 0x0000558ea5345148 in PGSharedMemoryCreate ()
#1 0x0000558ea53c157f in CreateSharedMemoryAndSemaphores ()
#2 0x0000558ea5357240 in PostmasterMain ()
#3 0x0000558ea506777a in main ()
This gave me the first indication that it is related to huge pages
setting on the node itself.
I would go into more detail but honestly I believe this might be easy
to find and I also assume it shouldn't segfault but return an error
message indicating the / a issue.
I'm aware that huge pages and other normal features like swap are not
normal inside kubernetes but fyi in kubernetes 1.28 there will be huge
pages support https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/
Thanks,
Sigi