Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От | Andres Freund |
---|---|
Тема | Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes |
Дата | |
Msg-id | 20230122002704.yoskrrfkbgi7xcfs@awork3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes (Andres Freund <andres@anarazel.de>) |
Список | pgsql-bugs |
Hi, On 2023-01-21 15:29:22 -0800, Andres Freund wrote: > On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote: > > On 1/20/23 23:48, PG Bug reporting form wrote: > > > In these cases, the initdb phase will attempt to allocate huge pages that > > > are available in the OS, but it will be denied access by Kubernetes and > > > fail. > > > > Well, so how exactly this fails? Does that mean Kubernetes broke mmap() > > with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are > > not available, or what? Because that's the only explanation I can see, > > looking at the code. > > Yea, that's what I was wondering about as well. > > > > Or it just does not realize there are no hugepages, returns something > > and then crashes with SIGBUS later when trying to access it? > > I assume that that's the case. There's references to bus errors in a bunch of > the linked issues. E.g. > https://github.com/CrunchyData/postgres-operator/issues/413 > > selecting default max_connections ... sh: line 1: 60 Bus error (core dumped) "/usr/pgsql-10/bin/postgres"--boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=none< "/dev/null" > "/dev/null" 2>&1 > > It's possible that the problem would go away if we used MAP_POPULATE for the > allocation. > I'd guess that this is annoying cgroups stuff :( Ah, the fun: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/hugetlb.html The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per control group and enforces the limit during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to fault in HugeTLB pages beyond its limit. Therefore the application needs to know exactly how many HugeTLB pages it uses before hand, and the sysadmin needs to make sure that there are enough available on the machine for all the users to avoid processes getting SIGBUS. but there's also Reservation accounting hugetlb.<hugepagesize>.rsvd.limit_in_bytes hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes hugetlb.<hugepagesize>.rsvd.usage_in_byteshugetlb.<hugepagesize>.rsvd.failcnt The HugeTLB controller allows to limit the HugeTLB reservations per control group and enforces the controller limit at reservation time and at the fault of HugeTLB memory for which no reservation exists. Since reservation limits are enforced at reservation time (on mmap or shget), reservation limits never causes the application to get SIGBUS signal if the memory was reserved before hand. For MAP_NORESERVE allocations, the reservation limit behaves the same as the fault limit, enforcing memory usage at fault time and causing the application to receive a SIGBUS if it’s crossing its limit. Reservation limits are superior to page fault limits described above, since reservation limits are enforced at reservation time (on mmap or shget), and never causes the application to get SIGBUS signal if the memory was reserved before hand. This allows for easier fallback to alternatives such as non-HugeTLB memory for example. In the case of page fault accounting, it’s very hard to avoid processes getting SIGBUS since the sysadmin needs precisely know the HugeTLB usage of all the tasks in the system and make sure there is enough pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited systems is practically impossible with page fault accounting. So the problem is that the wrong time of cgroup limits are used. I don't know if that's a kubernetes or a postgres-operator issue. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Следующее
От: Tomas VondraДата:
Сообщение: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes