Re: BUG #18675: Postgres is not realasing memory causing OOM

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: BUG #18675: Postgres is not realasing memory causing OOM
Дата	29 октября 2024 г. 19:16:31
Msg-id	9ce91971-2771-4b3c-aad4-b3d5b952d5d1@vondra.me обсуждение исходный текст
Ответ на	Re: BUG #18675: Postgres is not realasing memory causing OOM (Maciej Jaros <eccenux@gmail.com>)
Ответы	Re: BUG #18675: Postgres is not realasing memory causing OOM
Список	pgsql-bugs

Дерево обсуждения


On 10/29/24 13:26, Maciej Jaros wrote:
> Thanks, Tomas. That helped me understand the problem and the comments a
> bit more. To answer some questions about our setup and possible causes
> of the issues:
> 
>   * We are not using any non-standard extensions. We use PL/pgSQL in
>     some maintenance scripts, but that extension is built-in, so I guess
>     you didn't mean that.
>   * This is mostly a default configuration (aside from memory and CPU
>     adjustments). More specifically all of the JIT options are in their
>     default states (none are set).

OK. Yesterday you posted this:

^                    name ^   value ^
|                     jit |      on |
|          jit_above_cost |  100000 |
|   jit_debugging_support |     off |
|        jit_dump_bitcode |     off |
|         jit_expressions |      on |
|   jit_inline_above_cost |  500000 |
| jit_optimize_above_cost |  500000 |
|   jit_profiling_support |     off |
|            jit_provider | llvmjit |
|     jit_tuple_deforming |      on |

Which means the JIT is enabled.

>   * For context about connections, we have 10 app servers (Tomcat, JDK
>     17, Hibernate 5) connecting to the database in question, each with
>     around 20 active connections in practice (pooling, though some
>     schedules might add a bit). There is also a data warehouse with
>     pooling and should not exceed 20 connections. So, in practice, we
>     have around 250 connections, not the 500 we have in |
>     max_connections| setting. Also most of the connections are idle most
>     of the time. So at least our |max_connections| is quite
>     conservative, I think.

Seems like that. You haven't shared any information about how much
memory is used by individual backends, but it might be interesting to
look at that, and check if the memory usage is high for some subset of
backends (say, those for the warehouse).

>   * We terminate all queries running longer than 30 minutes. Typical
>     queries are below 1 second, and Java responses are mostly limited to
>     20 seconds. Additionally, most queries have a limit of 25 (25 items
>     per page).

That just supports my speculation this is not the in-query memory leak
where we allocate memory in a memory context, because that'd be freed at
the end of a query. I'm not sure about what happens to memory allocated
by LLVM if a query gets interrupted because of a timeout. How often do
queries hit the 30-minute limit?

>   * The application is in use from 8 am to 6-7 pm, and it is mostly idle
>     at night. There is some maintenance at night (including vacuum and
>     vacuumlo). RAM availability stays flat at night, and I would expect
>     it to drop at some point around 7-8 pm. RAM usage on separate app
>     servers does drop after hours.
> 
> So, yes, that RAM usage is strange, and that’s why I reported it. It
> doesn’t seem like this is a problem unique to us. I found questions
> about memory usage on Stack Overflow, like this one, for example:
> HowtolimitthememoryavailableforPostgreSQLserver <https://
> stackoverflow.com/questions/28844170/how-to-limit-the-memory-that-is-
> available-for-postgresql-server>. There is a comment there that seems to
> describe what could be a bug (looks like a bug to me). Maybe not a bug-
> bug, but definitely an unwanted behavior:
> 
>     Note that even if postgres logically releases memory it has
>     allocated, it may not be returned to operating system depending on
>     the |malloc()|/|free()| implementation of your execution
>     environment. That may result in multiple PostgreSQL processes
>     getting over the limit due use of hash aggregation as described
>     above and the memory is never released to OS even though PostgreSQL
>     isn't actually using it either. This happens because technically |
>     malloc()| may use |brk()| behind the scenes and releasing memory
>     back to OS is only possible only in some special cases.
> 

That's not a bug, that's what glibc does for everything in user space.
Yes, it can interfere with overcommit if you have the limit set too llow
(but that's not your case), and most of the time it's not an issue
thanks to virtual memory etc. It also should not lead to indefinite
growth, the memory should be reused for future allocations.

For the "memory limit", it's true we don't have a way to do that, but
it's also not clear it'd actually help in any way. If you have a memory
leak in the JIT code, that's completely outside our control - we don't
even know how much memory LLVM allocated etc. so this would not be
covered by the limit.

> So, that comment led me to suggest adding some kind of process. I called
> it a garbage collector, but maybe David is right; maybe that’s not
> accurate. Anyway, that process, in my view, could try to actually
> release memory to the system to prevent the OOM killer from doing its
> bidding. Is that possible? I don’t know, don't know inner workings of
> PG. I also don’t understand why calling |free| would not release memory.
> I’m also not sure if that description of malloc/free is accurate, but it
> does seem to align with what I’m seeing.
> 

There are different ways to define garbage collection, but memory
contexts could be seen as doing that. Of course, that's only "our" side,
it has no impact on what happens in glibc. That's a different layer, we
have no visibility into that.

Anyway, people have already suggested you try disabling JIT by setting

jit = off

and see it that fixes the issue. If yes, that significantly narrows the
area where the bug could be.


regards

-- 
Tomas Vondra

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #18675: Postgres is not realasing memory causing OOM