Обсуждение: Re: Should pg 11 use a lot more memory building an spgist index?

Поиск
Список
Период
Сортировка

Re: Should pg 11 use a lot more memory building an spgist index?

От
Tom Lane
Дата:
I wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> How about modifying SysScanDescData to have a memory context member,
>> which is created by systable_beginscan and destroyed by endscan?

> I think it would still have problems, in that it would affect in which
> context index_getnext delivers its output.  We could reasonably make
> these sorts of changes in places where the entire index_beginscan/
> index_getnext/index_endscan call structure is in one place, but that
> doesn't apply for the systable functions.

Actually, after looking more closely, index_getnext doesn't return a
palloc'd object anyway, or at least not one that the caller is responsible
for managing.  So maybe that could work.

I was confused about why the memory leak in Bruno's example is so much
larger in HEAD than v11; spgbeginscan does allocate more stuff than
before, but only if numberOfOrderBys > 0, which surely doesn't apply for
the exclusion-check code path.  Eventually I realized that the difference
is because commit 2a6368343 made SpGistScanOpaqueData a good deal larger
than it had been (10080 vs 6264 bytes, on my x86_64 box), so it was just
the failure to pfree that struct that accounted for the bigger leak.

There's another issue with 2a6368343, though: it added a couple of
fmgr_info_copy calls to spgbeginscan.  I don't understand why it'd be a
good idea to do that rather than using the FmgrInfos in the index's
relcache entry directly.  Making a copy every time risks a memory leak,
because spgendscan has no way to free any fn_extra data that gets
allocated by the called function; and by the same token it's inefficient,
because the function has no way to cache fn_extra data across queries.

If we consider only the leak aspect, this coding presents a reason why
we should try to change things as Alvaro suggests.  However, because of
the poor-caching aspect, I think this code is pretty broken anyway,
and once we fix it the issue is much less clear-cut.

(Looking around, it seems like we're very schizophrenic about whether
to copy index relcache support function FmgrInfos or just use them
directly.  But nbtree and hash seem to always do the latter, so I think
it's probably reasonable to standardize on that.)

Anyway, the attached proposed patch for HEAD makes Bruno's problem go
away, and it also fixes an unrelated leak introduced by 2a6368343
because it reallocates so->orderByTypes each time through spgrescan.
I think we should apply this, and suitable subsets in the back branches,
to fix the immediate problem.  Then we can work at leisure on a more
invasive HEAD-only patch, if anyone is excited about that.

            regards, tom lane

diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index a63fde2..c883ae9 100644
*** a/src/backend/access/spgist/spgscan.c
--- b/src/backend/access/spgist/spgscan.c
*************** spgbeginscan(Relation rel, int keysz, in
*** 294,303 ****
      /* Set up indexTupDesc and xs_hitupdesc in case it's an index-only scan */
      so->indexTupDesc = scan->xs_hitupdesc = RelationGetDescr(rel);

      if (scan->numberOfOrderBys > 0)
      {
!         so->zeroDistances = palloc(sizeof(double) * scan->numberOfOrderBys);
!         so->infDistances = palloc(sizeof(double) * scan->numberOfOrderBys);

          for (i = 0; i < scan->numberOfOrderBys; i++)
          {
--- 294,311 ----
      /* Set up indexTupDesc and xs_hitupdesc in case it's an index-only scan */
      so->indexTupDesc = scan->xs_hitupdesc = RelationGetDescr(rel);

+     /* Allocate various arrays needed for order-by scans */
      if (scan->numberOfOrderBys > 0)
      {
!         /* This will be filled in spgrescan, but allocate the space here */
!         so->orderByTypes = (Oid *)
!             palloc(sizeof(Oid) * scan->numberOfOrderBys);
!
!         /* These arrays have constant contents, so we can fill them now */
!         so->zeroDistances = (double *)
!             palloc(sizeof(double) * scan->numberOfOrderBys);
!         so->infDistances = (double *)
!             palloc(sizeof(double) * scan->numberOfOrderBys);

          for (i = 0; i < scan->numberOfOrderBys; i++)
          {
*************** spgbeginscan(Relation rel, int keysz, in
*** 305,313 ****
              so->infDistances[i] = get_float8_infinity();
          }

!         scan->xs_orderbyvals = palloc0(sizeof(Datum) * scan->numberOfOrderBys);
!         scan->xs_orderbynulls = palloc(sizeof(bool) * scan->numberOfOrderBys);
!         memset(scan->xs_orderbynulls, true, sizeof(bool) * scan->numberOfOrderBys);
      }

      fmgr_info_copy(&so->innerConsistentFn,
--- 313,324 ----
              so->infDistances[i] = get_float8_infinity();
          }

!         scan->xs_orderbyvals = (Datum *)
!             palloc0(sizeof(Datum) * scan->numberOfOrderBys);
!         scan->xs_orderbynulls = (bool *)
!             palloc(sizeof(bool) * scan->numberOfOrderBys);
!         memset(scan->xs_orderbynulls, true,
!                sizeof(bool) * scan->numberOfOrderBys);
      }

      fmgr_info_copy(&so->innerConsistentFn,
*************** spgrescan(IndexScanDesc scan, ScanKey sc
*** 336,341 ****
--- 347,353 ----
          memmove(scan->keyData, scankey,
                  scan->numberOfKeys * sizeof(ScanKeyData));

+     /* initialize order-by data if needed */
      if (orderbys && scan->numberOfOrderBys > 0)
      {
          int            i;
*************** spgrescan(IndexScanDesc scan, ScanKey sc
*** 343,350 ****
          memmove(scan->orderByData, orderbys,
                  scan->numberOfOrderBys * sizeof(ScanKeyData));

-         so->orderByTypes = (Oid *) palloc(sizeof(Oid) * scan->numberOfOrderBys);
-
          for (i = 0; i < scan->numberOfOrderBys; i++)
          {
              ScanKey        skey = &scan->orderByData[i];
--- 355,360 ----
*************** spgendscan(IndexScanDesc scan)
*** 380,390 ****
--- 390,411 ----
      MemoryContextDelete(so->tempCxt);
      MemoryContextDelete(so->traversalCxt);

+     if (so->keyData)
+         pfree(so->keyData);
+
+     if (so->state.deadTupleStorage)
+         pfree(so->state.deadTupleStorage);
+
      if (scan->numberOfOrderBys > 0)
      {
+         pfree(so->orderByTypes);
          pfree(so->zeroDistances);
          pfree(so->infDistances);
+         pfree(scan->xs_orderbyvals);
+         pfree(scan->xs_orderbynulls);
      }
+
+     pfree(so);
  }

  /*

Re: Should pg 11 use a lot more memory building an spgist index?

От
Amit Langote
Дата:
On 2018/10/30 4:48, Tom Lane wrote:
> I was confused about why the memory leak in Bruno's example is so much
> larger in HEAD than v11; spgbeginscan does allocate more stuff than
> before, but only if numberOfOrderBys > 0, which surely doesn't apply for
> the exclusion-check code path.  Eventually I realized that the difference
> is because commit 2a6368343 made SpGistScanOpaqueData a good deal larger
> than it had been (10080 vs 6264 bytes, on my x86_64 box), so it was just
> the failure to pfree that struct that accounted for the bigger leak.
> 
> There's another issue with 2a6368343, though: it added a couple of
> fmgr_info_copy calls to spgbeginscan.  I don't understand why it'd be a
> good idea to do that rather than using the FmgrInfos in the index's
> relcache entry directly.  Making a copy every time risks a memory leak,
> because spgendscan has no way to free any fn_extra data that gets
> allocated by the called function; and by the same token it's inefficient,
> because the function has no way to cache fn_extra data across queries.
> 
> If we consider only the leak aspect, this coding presents a reason why
> we should try to change things as Alvaro suggests.  However, because of
> the poor-caching aspect, I think this code is pretty broken anyway,
> and once we fix it the issue is much less clear-cut.
> 
> (Looking around, it seems like we're very schizophrenic about whether
> to copy index relcache support function FmgrInfos or just use them
> directly.  But nbtree and hash seem to always do the latter, so I think
> it's probably reasonable to standardize on that.)

Agreed about trying to avoid fmgr_info_copy(), at least in this case.

By the way, do I get it right that the reason to want to avoid copying in
this instance is that the currently active context could be a long-lived
one, as in the case of create index, alter table add constraint, etc.
calling the copying code for every tuple?  There are other instances of
fmgr_info_copy throughout index AM implementations, including the helper
function ScanKeyEntryInitializeWithInfo() called from them, but the copies
made in those cases don't appear very leak-prone.

> Anyway, the attached proposed patch for HEAD makes Bruno's problem go
> away, and it also fixes an unrelated leak introduced by 2a6368343
> because it reallocates so->orderByTypes each time through spgrescan.
> I think we should apply this, and suitable subsets in the back branches,
> to fix the immediate problem.  Then we can work at leisure on a more
> invasive HEAD-only patch, if anyone is excited about that.

Makes sense to fix it like this for back-patching.

Thanks,
Amit



Re: Should pg 11 use a lot more memory building an spgist index?

От
Amit Langote
Дата:
On 2018/10/30 4:48, Tom Lane wrote:
> I wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> How about modifying SysScanDescData to have a memory context member,
>>> which is created by systable_beginscan and destroyed by endscan?
> 
>> I think it would still have problems, in that it would affect in which
>> context index_getnext delivers its output.  We could reasonably make
>> these sorts of changes in places where the entire index_beginscan/
>> index_getnext/index_endscan call structure is in one place, but that
>> doesn't apply for the systable functions.
> 
> Actually, after looking more closely, index_getnext doesn't return a
> palloc'd object anyway, or at least not one that the caller is responsible
> for managing.  So maybe that could work.

Instead of SysScanDescData, could we add one to IndexScanData?  It's
somewhat clear that catalog scans don't leak much, but user index scans
can, as seen in this case.

In this particular case, I see that it's IndexCheckExclusion() that
invokes check_unique_or_exclusion_constraint() repeatedly for each heap
tuple after finishing building index on the heap.  The latter performs
index_beginscan/endscan() for every heap tuple, but doesn't bother to
release the memory allocated for IndexScanDesc and its members.

I've tried to fix that with the attached patches.

0001 adds the ability for the callers of index_beginscan to specify a
memory context.  index_beginscan_internals switches to that context before
calling ambeginscan and stores into a new field xs_scan_cxt of
IndexScanData, but maybe storing it is unnecessary.

0002 builds on that and adds the ability for the callers of
check_exclusion_or_unique_constraints to specify a memory context.  Using
that, it fixes the leak by teaching IndexCheckExclusion and
ExecIndexInsertTuples to pass a memory context that's reset before calling
check_exclusion_or_unique_constraints() for the next tuple.

The following example shows that the patch works.

With HEAD:

create table foo (a int4range);
alter table foo add exclude using spgist (a with &&);
-- this consumes about 1GB
insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
alter table foo drop constraint foo_a_excl;
-- this too
alter table foo add exclude using spgist (a with &&);

Patched:

create table foo (a int4range);
alter table foo add exclude using spgist (a with &&);
-- memory consumption doesn't grow above 37MB
insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
alter table foo drop constraint foo_a_excl;
-- ditto
alter table foo add exclude using spgist (a with &&);

Thoughts?

Thanks,
Amit

Вложения

Re: Should pg 11 use a lot more memory building an spgist index?

От
Amit Langote
Дата:
On Tue, Oct 30, 2018 at 7:11 PM Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
>
> On 2018/10/30 4:48, Tom Lane wrote:
> > I wrote:
> >> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> >>> How about modifying SysScanDescData to have a memory context member,
> >>> which is created by systable_beginscan and destroyed by endscan?
> >
> >> I think it would still have problems, in that it would affect in which
> >> context index_getnext delivers its output.  We could reasonably make
> >> these sorts of changes in places where the entire index_beginscan/
> >> index_getnext/index_endscan call structure is in one place, but that
> >> doesn't apply for the systable functions.
> >
> > Actually, after looking more closely, index_getnext doesn't return a
> > palloc'd object anyway, or at least not one that the caller is responsible
> > for managing.  So maybe that could work.
>
> Instead of SysScanDescData, could we add one to IndexScanData?  It's
> somewhat clear that catalog scans don't leak much, but user index scans
> can, as seen in this case.
>
> In this particular case, I see that it's IndexCheckExclusion() that
> invokes check_unique_or_exclusion_constraint() repeatedly for each heap
> tuple after finishing building index on the heap.  The latter performs
> index_beginscan/endscan() for every heap tuple, but doesn't bother to
> release the memory allocated for IndexScanDesc and its members.
>
> I've tried to fix that with the attached patches.
>
> 0001 adds the ability for the callers of index_beginscan to specify a
> memory context.  index_beginscan_internals switches to that context before
> calling ambeginscan and stores into a new field xs_scan_cxt of
> IndexScanData, but maybe storing it is unnecessary.
>
> 0002 builds on that and adds the ability for the callers of
> check_exclusion_or_unique_constraints to specify a memory context.  Using
> that, it fixes the leak by teaching IndexCheckExclusion and
> ExecIndexInsertTuples to pass a memory context that's reset before calling
> check_exclusion_or_unique_constraints() for the next tuple.
>
> The following example shows that the patch works.
>
> With HEAD:
>
> create table foo (a int4range);
> alter table foo add exclude using spgist (a with &&);
> -- this consumes about 1GB
> insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
> alter table foo drop constraint foo_a_excl;
> -- this too
> alter table foo add exclude using spgist (a with &&);
>
> Patched:
>
> create table foo (a int4range);
> alter table foo add exclude using spgist (a with &&);
> -- memory consumption doesn't grow above 37MB
> insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
> alter table foo drop constraint foo_a_excl;
> -- ditto
> alter table foo add exclude using spgist (a with &&);

Sorry I forgot to try the example with your patch.  Maybe, it will
have more or less the same behavior as mine, although I didn't realize
that when I started writing my patch.

Thanks,
Amit


Re: Should pg 11 use a lot more memory building an spgist index?

От
Amit Langote
Дата:
On 2018/10/30 21:27, Amit Langote wrote:
> On Tue, Oct 30, 2018 at 7:11 PM Amit Langote
>> I've tried to fix that with the attached patches.
>>
>> 0001 adds the ability for the callers of index_beginscan to specify a
>> memory context.  index_beginscan_internals switches to that context before
>> calling ambeginscan and stores into a new field xs_scan_cxt of
>> IndexScanData, but maybe storing it is unnecessary.
>>
>> 0002 builds on that and adds the ability for the callers of
>> check_exclusion_or_unique_constraints to specify a memory context.  Using
>> that, it fixes the leak by teaching IndexCheckExclusion and
>> ExecIndexInsertTuples to pass a memory context that's reset before calling
>> check_exclusion_or_unique_constraints() for the next tuple.
>>
>> The following example shows that the patch works.
>>
>> With HEAD:
>>
>> create table foo (a int4range);
>> alter table foo add exclude using spgist (a with &&);
>> -- this consumes about 1GB
>> insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
>> alter table foo drop constraint foo_a_excl;
>> -- this too
>> alter table foo add exclude using spgist (a with &&);
>>
>> Patched:
>>
>> create table foo (a int4range);
>> alter table foo add exclude using spgist (a with &&);
>> -- memory consumption doesn't grow above 37MB
>> insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
>> alter table foo drop constraint foo_a_excl;
>> -- ditto
>> alter table foo add exclude using spgist (a with &&);
> 
> Sorry I forgot to try the example with your patch.  Maybe, it will
> have more or less the same behavior as mine, although I didn't realize
> that when I started writing my patch.

Yep, I checked that fix-spgist-memory-leaks-1.patch posted upthread gives
almost the same numbers, i.e., the maximum amount of memory consumed.

Maybe, we don't need to spoil the interface of index_beginscan with the
new memory context argument like my patch does if the simple following of
its contract by amendscan would suffice.

Thanks,
Amit



Re: Should pg 11 use a lot more memory building an spgist index?

От
Tom Lane
Дата:
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> Maybe, we don't need to spoil the interface of index_beginscan with the
> new memory context argument like my patch does if the simple following of
> its contract by amendscan would suffice.

Yeah, I'm not enamored of changing the API of index_beginscan for this;
the existing convention that it should allocate in CurrentMemoryContext
seems perfectly suitable.  And changing it would create a lot of code
churn, not only for us but for externally-maintained AMs.

What is at stake is changing the API of amendscan, to specify that it
need not release memory because the current context is expected to be
destroyed or reset shortly after ending the scan.  Then, for the small
number of call sites where that wouldn't be true, it's on those callers
to set up a suitable context and switch into it.  Note this is actually
forwards compatible, in that an AM that's still following the convention
of releasing stuff manually would not be broken.

            regards, tom lane


Re: Should pg 11 use a lot more memory building an spgist index?

От
Bruno Wolff III
Дата:
I see that a fix got committed. Thanks!
I'll double check it after the point release comes out (which looks like it 
will be next week) and let you know if there is still a problem.