Re: Disabling memory overcommit deemed dangerous
От | David Geier |
---|---|
Тема | Re: Disabling memory overcommit deemed dangerous |
Дата | |
Msg-id | 22227ab4-3322-42ba-84fd-d43201ccc614@gmail.com обсуждение исходный текст |
Ответ на | Re: Disabling memory overcommit deemed dangerous (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Hi Tom! On 02.09.2025 20:10, Tom Lane wrote: > David Geier <geidav.pg@gmail.com> writes: > > If you are aware of such places, please submit patches to fix them, > because they are bugs with or without overcommit. Overcommit does > *not* prevent the kernel from returning ENOMEM, so this seems like > an extremely specious argument for not telling people to disable > overcommit. Yes, but to the best of my knowledge only for really wild allocation requests. I haven't come across any ENOMEM in my testing when overcommit was enabled. I agree that we want these places fixed regardless. I'll submit a patch for the strdup() calls but there's a bigger problem here: we don't really have means to test the changes we make. For example the bug in [2] requires, according to the discussion, some more involved refactoring of the cleanup code. How do we make sure these changes are actually correct? We could build some infrastructure for OOM testing but it feels like wasted effort because even if we fixed all the problems of category (1), we're still not good because of (2) and (3). > >> 2. On Linux, running OOM during stack expansion triggers SIGSEGV. > > Again, allowing overcommit is hardly a cure. It's not but neither is disallowing overcommit. > >> 3. Other processes running on the same system are mostly not safe >> against failing memory allocations. > > The overcommit recommendation is only meant for machines that are > more or less dedicated to Postgres, so I'm not sure how much this > matters. Also, we've seen comparable problems on some platforms > after running the kernel out of file descriptors. The bottom line > is that you need a reasonable amount of headroom in your system > provisioning. That's rarely the case in a production environment. Typically there are backups, monitoring, virus scanner, etc. running on the same host which are usually not resilient against failure (e.g. don't automatically restart / retry). Same goes for e.g. the login problem mentioned. Say a DBA runs into an OOM, checks out the documentation and applies the overcommit change. Now he has a false sense of safety and will be surprised that suddenly his service got new, unexpected points of failure. > > We have very substantial field experience showing that leaving memory > overcommit enabled also makes the system unreliable, if it approaches > OOM conditions. I don't think removing that advice is an improvement. Completely agreed. Leaving overcommit enabled is also bad. There's no safe way of running PostgreSQL in the presence of OOMs. Therefore, it depends on what's more important: having some chance PostgreSQL stays up but risking other programs to die, or always have PostgreSQL die but have the other programs always stay up. I think it would be good make the tradeoffs both settings have more explicit in the documentation and stress that actually the most important is to configure PostgreSQL such that OOMs are very unlikely to happen. If you agree I can draft a patch. -- David Geier
В списке pgsql-hackers по дате отправления: