Re: BUG #18675: Postgres is not realasing memory causing OOM
От | Tomas Vondra |
---|---|
Тема | Re: BUG #18675: Postgres is not realasing memory causing OOM |
Дата | |
Msg-id | 9ce91971-2771-4b3c-aad4-b3d5b952d5d1@vondra.me обсуждение исходный текст |
Ответ на | Re: BUG #18675: Postgres is not realasing memory causing OOM (Maciej Jaros <eccenux@gmail.com>) |
Ответы |
Re: BUG #18675: Postgres is not realasing memory causing OOM
|
Список | pgsql-bugs |
On 10/29/24 13:26, Maciej Jaros wrote: > Thanks, Tomas. That helped me understand the problem and the comments a > bit more. To answer some questions about our setup and possible causes > of the issues: > > * We are not using any non-standard extensions. We use PL/pgSQL in > some maintenance scripts, but that extension is built-in, so I guess > you didn't mean that. > * This is mostly a default configuration (aside from memory and CPU > adjustments). More specifically all of the JIT options are in their > default states (none are set). OK. Yesterday you posted this: ^ name ^ value ^ | jit | on | | jit_above_cost | 100000 | | jit_debugging_support | off | | jit_dump_bitcode | off | | jit_expressions | on | | jit_inline_above_cost | 500000 | | jit_optimize_above_cost | 500000 | | jit_profiling_support | off | | jit_provider | llvmjit | | jit_tuple_deforming | on | Which means the JIT is enabled. > * For context about connections, we have 10 app servers (Tomcat, JDK > 17, Hibernate 5) connecting to the database in question, each with > around 20 active connections in practice (pooling, though some > schedules might add a bit). There is also a data warehouse with > pooling and should not exceed 20 connections. So, in practice, we > have around 250 connections, not the 500 we have in | > max_connections| setting. Also most of the connections are idle most > of the time. So at least our |max_connections| is quite > conservative, I think. Seems like that. You haven't shared any information about how much memory is used by individual backends, but it might be interesting to look at that, and check if the memory usage is high for some subset of backends (say, those for the warehouse). > * We terminate all queries running longer than 30 minutes. Typical > queries are below 1 second, and Java responses are mostly limited to > 20 seconds. Additionally, most queries have a limit of 25 (25 items > per page). That just supports my speculation this is not the in-query memory leak where we allocate memory in a memory context, because that'd be freed at the end of a query. I'm not sure about what happens to memory allocated by LLVM if a query gets interrupted because of a timeout. How often do queries hit the 30-minute limit? > * The application is in use from 8 am to 6-7 pm, and it is mostly idle > at night. There is some maintenance at night (including vacuum and > vacuumlo). RAM availability stays flat at night, and I would expect > it to drop at some point around 7-8 pm. RAM usage on separate app > servers does drop after hours. > > So, yes, that RAM usage is strange, and that’s why I reported it. It > doesn’t seem like this is a problem unique to us. I found questions > about memory usage on Stack Overflow, like this one, for example: > HowtolimitthememoryavailableforPostgreSQLserver <https:// > stackoverflow.com/questions/28844170/how-to-limit-the-memory-that-is- > available-for-postgresql-server>. There is a comment there that seems to > describe what could be a bug (looks like a bug to me). Maybe not a bug- > bug, but definitely an unwanted behavior: > > Note that even if postgres logically releases memory it has > allocated, it may not be returned to operating system depending on > the |malloc()|/|free()| implementation of your execution > environment. That may result in multiple PostgreSQL processes > getting over the limit due use of hash aggregation as described > above and the memory is never released to OS even though PostgreSQL > isn't actually using it either. This happens because technically | > malloc()| may use |brk()| behind the scenes and releasing memory > back to OS is only possible only in some special cases. > That's not a bug, that's what glibc does for everything in user space. Yes, it can interfere with overcommit if you have the limit set too llow (but that's not your case), and most of the time it's not an issue thanks to virtual memory etc. It also should not lead to indefinite growth, the memory should be reused for future allocations. For the "memory limit", it's true we don't have a way to do that, but it's also not clear it'd actually help in any way. If you have a memory leak in the JIT code, that's completely outside our control - we don't even know how much memory LLVM allocated etc. so this would not be covered by the limit. > So, that comment led me to suggest adding some kind of process. I called > it a garbage collector, but maybe David is right; maybe that’s not > accurate. Anyway, that process, in my view, could try to actually > release memory to the system to prevent the OOM killer from doing its > bidding. Is that possible? I don’t know, don't know inner workings of > PG. I also don’t understand why calling |free| would not release memory. > I’m also not sure if that description of malloc/free is accurate, but it > does seem to align with what I’m seeing. > There are different ways to define garbage collection, but memory contexts could be seen as doing that. Of course, that's only "our" side, it has no impact on what happens in glibc. That's a different layer, we have no visibility into that. Anyway, people have already suggested you try disabling JIT by setting jit = off and see it that fixes the issue. If yes, that significantly narrows the area where the bug could be. regards -- Tomas Vondra
В списке pgsql-bugs по дате отправления: