Обсуждение: pg_walinspect memory leaks

Поиск
Список
Период
Сортировка

pg_walinspect memory leaks

От
Peter Geoghegan
Дата:
It looks like pg_walinspect's GetWALRecordsInfo() routine doesn't take
sufficient care with memory management. It should avoid memory leaks
of the kind that lead to OOMs whenever
pg_get_wal_records_info_till_end_of_wal() has to return very many
tuples. Right now it isn't that hard to make that happen, even on a
system where memory is plentiful. I wasn't expecting that, because all
of these functions use a tuplestore.

More concretely, it looks like GetWALRecordInfo() calls
CStringGetTextDatum/cstring_to_text in a way that accumulates way too
much memory in ExprContext. This could be avoided by using a separate
memory context that is reset periodically, or something else along the
same lines.

-- 
Peter Geoghegan



Re: pg_walinspect memory leaks

От
Andres Freund
Дата:
Hi,

On 2023-02-13 15:22:02 -0800, Peter Geoghegan wrote:
> More concretely, it looks like GetWALRecordInfo() calls
> CStringGetTextDatum/cstring_to_text in a way that accumulates way too
> much memory in ExprContext.

Additionally, we leak two stringinfos for each record.


> This could be avoided by using a separate memory context that is reset
> periodically, or something else along the same lines.

Everything other than a per-row memory context that's reset each time seems
hard to manage in this case.

Somehwat funnily, GetWALRecordsInfo() then ends up being unnecessarily
dilligent about cleaning up O(1) memory, after not caring about O(N) memory...

Greetings,

Andres Freund



Re: pg_walinspect memory leaks

От
Bharath Rupireddy
Дата:
On Tue, Feb 14, 2023 at 6:25 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-02-13 15:22:02 -0800, Peter Geoghegan wrote:
> > More concretely, it looks like GetWALRecordInfo() calls
> > CStringGetTextDatum/cstring_to_text in a way that accumulates way too
> > much memory in ExprContext.
>
> Additionally, we leak two stringinfos for each record.
>
>
> > This could be avoided by using a separate memory context that is reset
> > periodically, or something else along the same lines.
>
> Everything other than a per-row memory context that's reset each time seems
> hard to manage in this case.
>
> Somehwat funnily, GetWALRecordsInfo() then ends up being unnecessarily
> dilligent about cleaning up O(1) memory, after not caring about O(N) memory...

Thanks for reporting. I'll get back to you on this soon.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: pg_walinspect memory leaks

От
Bharath Rupireddy
Дата:
On Tue, Feb 14, 2023 at 4:07 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Feb 14, 2023 at 6:25 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2023-02-13 15:22:02 -0800, Peter Geoghegan wrote:
> > > More concretely, it looks like GetWALRecordInfo() calls
> > > CStringGetTextDatum/cstring_to_text in a way that accumulates way too
> > > much memory in ExprContext.
> >
> > Additionally, we leak two stringinfos for each record.
> >
> >
> > > This could be avoided by using a separate memory context that is reset
> > > periodically, or something else along the same lines.
> >
> > Everything other than a per-row memory context that's reset each time seems
> > hard to manage in this case.
> >
> > Somehwat funnily, GetWALRecordsInfo() then ends up being unnecessarily
> > dilligent about cleaning up O(1) memory, after not caring about O(N) memory...
>
> Thanks for reporting. I'll get back to you on this soon.

The memory usage goes up with many WAL records in GetWALRecordsInfo().
The affected functions are pg_get_wal_records_info() and
pg_get_wal_records_info_till_end_of_wal(). I think the best way to fix
this is to use a temporary memory context (like the jsonfuncs.c),
reset it after every tuple is put into the tuple store. This fix keeps
the memory under limits. I'm attaching the patches here. For HEAD, I'd
want to be a bit defensive and use the temporary memory context for
pg_get_wal_fpi_info() too.

And, the fix also needs to be back-patched to PG15.

[1]
HEAD:
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
COMMAND
1105979 ubuntu    20   0   28.5g  28.4g 150492 R  80.7  93.0   1:47.12
postgres

PATCHED:
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
COMMAND
  13149 ubuntu    20   0  173244 156872 150688 R 79.0   0.5   1:25.09
postgres

postgres=# select count(*) from
pg_get_wal_records_info_till_end_of_wal('0/1000000');
  count
----------
 35285649
(1 row)

postgres=# select pg_backend_pid();
 pg_backend_pid
----------------
          13149
(1 row)

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: pg_walinspect memory leaks

От
Michael Paquier
Дата:
On Thu, Feb 16, 2023 at 06:00:00PM +0530, Bharath Rupireddy wrote:
> The memory usage goes up with many WAL records in GetWALRecordsInfo().
> The affected functions are pg_get_wal_records_info() and
> pg_get_wal_records_info_till_end_of_wal(). I think the best way to fix
> this is to use a temporary memory context (like the jsonfuncs.c),
> reset it after every tuple is put into the tuple store. This fix keeps
> the memory under limits. I'm attaching the patches here.

What you are doing here looks OK, at quick glance.  That's common
across the code, see also dblink or file_fdw.

> For HEAD, I'd
> want to be a bit defensive and use the temporary memory context for
> pg_get_wal_fpi_info() too.

If there is a burst of FPWs across the range you are scanning, the
problem could be equally worse.  Sorry for missing that.
--
Michael

Вложения

Re: pg_walinspect memory leaks

От
Jeff Davis
Дата:
On Thu, 2023-02-16 at 18:00 +0530, Bharath Rupireddy wrote:
> I'm attaching the patches here. For HEAD, I'd
> want to be a bit defensive and use the temporary memory context for
> pg_get_wal_fpi_info() too.

I don't see why we shouldn't backpatch that, too?

Also, it seems like we should do the same thing for the loop in
GetXLogSummaryStats(). Maybe just for the outer loop is fine (the inner
loop is only 16 elements); though again, there's not an obvious
downside to fixing that, too.


--
Jeff Davis
PostgreSQL Contributor Team - AWS





Re: pg_walinspect memory leaks

От
Bharath Rupireddy
Дата:
On Sat, Feb 18, 2023 at 5:07 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Thu, 2023-02-16 at 18:00 +0530, Bharath Rupireddy wrote:
> > I'm attaching the patches here. For HEAD, I'd
> > want to be a bit defensive and use the temporary memory context for
> > pg_get_wal_fpi_info() too.
>
> I don't see why we shouldn't backpatch that, too?

pg_get_wal_fpi_info() is added in v16, so backpatching isn't necessary.

> Also, it seems like we should do the same thing for the loop in
> GetXLogSummaryStats(). Maybe just for the outer loop is fine (the inner
> loop is only 16 elements); though again, there's not an obvious
> downside to fixing that, too.

Firstly, WAL record traversing loop in GetWalStats() really doesn't
leak memory, because it just increments some counters and doesn't
palloc any memory. Similarly, the loops in GetXLogSummaryStats() too
don't palloc any memory, so no memory leak. I've seen no memory growth
during execution of pg_get_wal_stats_till_end_of_wal() for 35million
WAL records, see [1] PID 543967 (during the execution of the stats
function, the memory usage remained constant). Therefore, I feel that
the fix isn't required for GetWalStats().

[1]
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
COMMAND
 543967 ubuntu    20   0  168668 152056 149988 R  99.7   0.5   1:33.72
postgres
 412271 ubuntu    20   0 1101852 252724  42904 S   1.3   0.8   2:18.36
node
 412208 ubuntu    20   0  965000 112488  36012 S   0.3   0.4   0:23.46
node
 477193 ubuntu    20   0 5837096  34172   9420 S   0.3   0.1   0:00.93
cpptools-srv

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: pg_walinspect memory leaks

От
Jeff Davis
Дата:
On Mon, 2023-02-20 at 15:17 +0530, Bharath Rupireddy wrote:

> Similarly, the loops in GetXLogSummaryStats() too
> don't palloc any memory, so no memory leak.

Break on palloc in gdb in that loop and you'll see a palloc in
CStringGetTextDatum(name). In general, you should expect *GetDatum() to
palloc unless you're sure that it's pass-by-value. Even
Float8GetDatum() has code to account for pass-by-ref float8s.

There are also a couple calls to psprintf() in the stats_per_record
path.

>  I've seen no memory growth
> during execution of pg_get_wal_stats_till_end_of_wal() for 35million
> WAL records, see [1] PID 543967 (during the execution of the stats
> function, the memory usage remained constant). Therefore, I feel that
> the fix isn't required for GetWalStats().

That is true because the loops in GetXLogSummaryStats() are based on
constants. It does at most RM_MAX_ID * MAX_XLINFO_TYPES calls to
FillXLogStatsRow() regardless of the number of WAL records.
It's not a significant amount of memory, at least today. But, since
we're already using the temp context pattern, we might as well use it
here for clarity so that we don't have to guess about whether the
amount of memory is significant or not.

Committed to 16 with the changes to GetXLogSummaryStats() as well.
Committed unmodified version of your 15 backport. Thank you!


--
Jeff Davis
PostgreSQL Contributor Team - AWS





Re: pg_walinspect memory leaks

От
Michael Paquier
Дата:
On Mon, Feb 20, 2023 at 11:34:03AM -0800, Jeff Davis wrote:
> Committed to 16 with the changes to GetXLogSummaryStats() as well.
> Committed unmodified version of your 15 backport. Thank you!

Thanks for taking care of the FPI code path, Jeff!
--
Michael

Вложения