Kubernetes, cgroups v2 and OOM killer - how to avoid?
От | Ancoron Luciferis |
---|---|
Тема | Kubernetes, cgroups v2 and OOM killer - how to avoid? |
Дата | |
Msg-id | b5a262b6-f33f-4ace-b4c3-0c28ad0369b7@googlemail.com обсуждение исходный текст |
Ответы |
Re: Kubernetes, cgroups v2 and OOM killer - how to avoid?
Re: Kubernetes, cgroups v2 and OOM killer - how to avoid? |
Список | pgsql-general |
Hi, I've been investigating this topic every now and then but to this day have not come to a setup that consistently leads to a PostgreSQL backend process receiving an allocation error instead of being killed externally by the OOM killer. Why this is a problem for me? Because while applications are accessing their DBs (multiple services having their own DBs, some high-frequency), the whole server goes into recovery and kills all backends/connections. While my applications are written to tolerate that, it also means that at that time, esp. for the high-frequency apps, events are piling up, which then leads to a burst as soon as connectivity is restored. This in turn leads to peaks in resource usage in other places (event store, in-memory buffers from apps, ...), which sometimes leads to a series of OOM killer events being triggered, just because some analytics query went overboard. Ideally, I'd find a configuration that only terminates one backend but leaves the others working. I am wondering whether there is any way to receive a real ENOMEM inside a cgroup as soon as I try to allocate beyond its memory.max, instead of relying on the OOM killer. I know the recommendation is to have vm.overcommit_memory set to 2, but then that affects all workloads on the host, including critical infra like the kubelet, CNI, CSI, monitoring, ... I have already gone through and tested the obvious: https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT And yes, I know that Linux cgroups v2 memory.max is not an actual hard limit: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files Any help is greatly appreciated! Cheers, Ancoron
В списке pgsql-general по дате отправления: