Re: memory leak in trigger handling (since PG12)

Поиск
Список
Период
Сортировка
От Jakub Wartak
Тема Re: memory leak in trigger handling (since PG12)
Дата
Msg-id CAKZiRmxadz_2UT02yVGvzJ3hp=v27K8kNuHxahV1O9gJR17j5w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: memory leak in trigger handling (since PG12)  (Andres Freund <andres@anarazel.de>)
Ответы Re: memory leak in trigger handling (since PG12)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
Hi, just two cents:

On Tue, May 23, 2023 at 8:01 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-05-23 13:28:30 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > Could it help to have a mode where the executor shutdown hook checks how much
> > > memory is allocated in ExecutorState and warns if its too much?
> >
> > It'd be very hard to set a limit for what's "too much", since the amount
> > of stuff created initially will depend on the plan size.
>
> I was thinking of some limit that should really never be reached outside of a
> leak or work_mem based allocations, say 2GB or so.

RE: instrumentation subthread:
if that helps then below technique can work somewhat good on normal
binaries for end users (given there are debug symbols installed), so
maybe we don't need that much infrastructure added in to see the hot
code path:

perf probe -x /path/to/postgres 'palloc' 'size=%di:u64' # RDI on
x86_64(palloc size arg0)
perf record -avg --call-graph dwarf -e probe_postgres:palloc -aR -p
<pid> sleep 3 # cannot be longer, huge overhead (~3s=~2GB)

it produces:
    50.27%  (563d0e380670) size=24
            |
            ---palloc
               bms_copy
               ExecUpdateLockMode
               ExecBRUpdateTriggers
               ExecUpdate
[..]

    49.73%  (563d0e380670) size=16
            |
            ---palloc
               bms_copy
               RelationGetIndexAttrBitmap
               ExecUpdateLockMode
               ExecBRUpdateTriggers
               ExecUpdate
[..]

Now we know that those small palloc() are guilty, but we didn't know
at the time with Tomas. The problem here is that we do not know in
palloc() - via its own arguments for which MemoryContext this is going
to be allocated for. This is problematic for perf, because on RHEL8, I
was not able to generate an uprobe that was capable of reaching a
global variable (CurrentMemoryContext) at that time.

Additionally what was even more frustrating on diagnosing that case on
the customer end system, was that such OOMs were crashing other
PostgreSQL clusters on the same OS. Even knowing the exact guilty
statement it was impossible to limit RSS memory usage of that
particular backend. So, what you are proposing makes a lot of sense.
Also it got me thinking of implementing safety-memory-net-GUC
debug_query_limit_backend_memory=X MB that would inject
setrlimit(RLIMIT_DATA) through external extension via hook(s) and
un-set it later, but the man page states it works for mmap() only
after Linux 4.7+ so it is future proof but won't work e.g. on RHEL7 -
maybe that's still good enough?; Or, well maybe try to hack a palloc()
a little, but that has probably too big overhead, right? (just
thinking loud).

-Jakub Wartak.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Large files for relations
Следующее
От: Michael Paquier
Дата:
Сообщение: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?