aggregation memory leak and fix

Поиск
Список
Период
Сортировка
От Erik Riedel
Тема aggregation memory leak and fix
Дата
Msg-id wqwfdpu00gNtImTPUm@andrew.cmu.edu
обсуждение исходный текст
Ответы Re: [HACKERS] aggregation memory leak and fix  (Bruce Momjian <maillist@candle.pha.pa.us>)
Re: [HACKERS] aggregation memory leak and fix  (Bruce Momjian <maillist@candle.pha.pa.us>)
Re: [HACKERS] aggregation memory leak and fix  (Bruce Momjian <maillist@candle.pha.pa.us>)
Re: [HACKERS] aggregation memory leak and fix  (Bruce Momjian <maillist@candle.pha.pa.us>)
Список pgsql-hackers
Platform:  Alpha, Digital UNIX 4.0D
Software:  PostgreSQL 6.4.2 and 6.5 snaphot (11 March 1999)

I have a table as follows:

Table    = lineitem
+------------------------+----------------------------------+-------+
|              Field     |              Type                | Length|
+------------------------+----------------------------------+-------+
| l_orderkey             | int4 not null                    |     4 |
| l_partkey              | int4 not null                    |     4 |
| l_suppkey              | int4 not null                    |     4 |
| l_linenumber           | int4 not null                    |     4 |
| l_quantity             | float4 not null                  |     4 |
| l_extendedprice        | float4 not null                  |     4 |
| l_discount             | float4 not null                  |     4 |
| l_tax                  | float4 not null                  |     4 |
| l_returnflag           | char() not null                  |     1 |
| l_linestatus           | char() not null                  |     1 |
| l_shipdate             | date                             |     4 |
| l_commitdate           | date                             |     4 |
| l_receiptdate          | date                             |     4 |
| l_shipinstruct         | char() not null                  |    25 |
| l_shipmode             | char() not null                  |    10 |
| l_comment              | char() not null                  |    44 |
+------------------------+----------------------------------+-------+
Index:    lineitem_index_

that ends up having on the order of 500,000 rows (about 100 MB on disk).  

I then run an aggregation query as:

--
-- Query 1
--
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, 
sum(l_extendedprice) as sum_base_price, 
sum(l_extendedprice*(1-l_discount)) as sum_disc_price, 
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, 
avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, 
avg(l_discount) as avg_disc, count(*) as count_order 
from lineitem 
where l_shipdate <= ('1998-12-01'::datetime - interval '90 day')::date 
group by l_returnflag, l_linestatus 
order by l_returnflag, l_linestatus;


when I run this against 6.4.2, the postgres process grows to upwards of
1 GB of memory (at which point something overflows and it dumps core) -
I watch it grow through 200 MB, 400 MB, 800 MB, dies somewhere near 1 GB
of allocated memory).

If I take out a few of the "sum" expressions it gets better, removing
sum_disk_price and sum_charge causes it to be only 600 MB and the query
actually (eventually) completes.  Takes about 10 minutes on my 500 MHz
machine with 256 MB core and 4 GB of swap.

The problem seems to be the memory allocation mechanism.  Looking at a
call trace, it is doing some kind of "sub query" plan for each row in
the database.  That means it does ExecEval and postquel_function and
postquel_execute and all their friends for each row in the database. 
Allocating a couple hundred bytes for each one.

The problem is that none of these allocations are freed - they seem to
depend on the AllocSet to free them at the end of the transaction.  This
means it isn't a "true" leak, because the bytes are all freed at the
(very) end of the transaction, but it does mean that the process grows
to unreasonable size in the meantime.  There is no need for this,
because the individual expression results are aggregated as it goes
along, so the intermediate nodes can be freed.

I spent half a day last week chasing down the offending palloc() calls
and execution stacks sufficiently that I think I found the right places
to put pfree() calls.

As a result, I have changes in the files:

src/backend/executor/execUtils.c
src/backend/executor/nodeResult.c 
src/backend/executor/nodeAgg.c 
src/backend/executor/execMain.c 

patches to these files are attached at the end of this message.  These
files are based on the 6.5.0 snapshot downloaded from ftp.postgreql.org
on 11 March 1999.

Apologies for sending patches to a non-released version.  If anyone has
problems applying the patches, I can send the full files (I wanted to
avoid sending a 100K shell archive to the list).  If anyone cares about
reproducing my exact problem with the above table, I can provide the 100
MB pg_dump file for download as well.

Secondary Issue:  the reason I did not use the 6.4.2 code to make my
changes is because the AllocSet calls in that one were particularly
egregious - they only had the skeleton of the allocsets code that exists
in the 6.5 snapshots, so they were calling malloc() for all of the 8 and
16 byte allocations that the above query causes.

Using the fixed code reduces the maximum memory requirement on the above
query to about 210 MB, and reduces the runtime to (an acceptable) 1.5
minutes - a factor of more than 6x improvement on my 256 MB machine.

Now the biggest part of the execution time is in the sort before the
aggregation (which isn't strictly needed, but that is an optimization
for another day).

Open Issue: there is still a small "leak" that I couldn't eliminate, I
think I chased it down to the constvalue allocated in
execQual::ExecTargetList(), but I couldn't figure out where to properly
free it.  8 bytes leaked was much better than 750 bytes, so I stopped
banging my head on that particular item.

Secondary Open Issue: what I did have to do to get down to 210 MB of
core was reduce the minimum allocation size in AllocSet to 8 bytes from
16 bytes.  That reduces the 8 byte leak above to a true 8 byte, rather
than a 16 byte leak.  Otherwise, I think the size was 280 MB (still a
big improvement on 1000+ MB).  I only changed this in my code and I am
not including a changed mcxt.c for that.

I hope my changes are understandable/reasonable.  Enjoy.

Erik Riedel
Carnegie Mellon University
www.cs.cmu.edu/~riedel

--------------[aggregation_memory_patch.sh]-----------------------

#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
#    execMain.c.diff
#    execUtils.c.diff
#    nodeAgg.c.diff
#    nodeResult.c.diff
# This archive created: Fri Mar 19 15:47:17 1999
export PATH; PATH=/bin:/usr/bin:$PATH
if test -f 'execMain.c.diff'
thenecho shar: "will not over-write existing file 'execMain.c.diff'"
else
cat << \SHAR_EOF > 'execMain.c.diff'
583c
.
398a

.
396a/* XXX - clean up some more from ExecutorStart() - er1p */if (NULL == estate->es_snapshot) {  /* nothing to free
*/}else {  if (estate->es_snapshot->xcnt > 0) {     pfree(estate->es_snapshot->xip);  }  pfree(estate->es_snapshot);}
 
if (NULL == estate->es_param_exec_vals) {  /* nothing to free */} else {  pfree(estate->es_param_exec_vals);
estate->es_param_exec_vals= NULL;}
 

.
SHAR_EOF
fi
if test -f 'execUtils.c.diff'
thenecho shar: "will not over-write existing file 'execUtils.c.diff'"
else
cat << \SHAR_EOF > 'execUtils.c.diff'
368a
}

/* ----------------*        ExecFreeExprContext* ----------------*/
void
ExecFreeExprContext(CommonState *commonstate)
{ExprContext *econtext;
/* ---------------- *    get expression context.  if NULL then this node has *    none so we just return. *
----------------*/econtext = commonstate->cs_ExprContext;if (econtext == NULL)    return;
 
/* ---------------- *    clean up memory used. * ---------------- */pfree(econtext);commonstate->cs_ExprContext =
NULL;
}

/* ----------------*        ExecFreeTypeInfo* ----------------*/
void
ExecFreeTypeInfo(CommonState *commonstate)
{TupleDesc tupDesc;
tupDesc = commonstate->cs_ResultTupleSlot->ttc_tupleDescriptor;if (tupDesc == NULL)    return;
/* ---------------- *    clean up memory used. * ----------------
*/FreeTupleDesc(tupDesc);commonstate->cs_ResultTupleSlot->ttc_tupleDescriptor= NULL;
 
.
274a

.
SHAR_EOF
fi
if test -f 'nodeAgg.c.diff'
thenecho shar: "will not over-write existing file 'nodeAgg.c.diff'"
else
cat << \SHAR_EOF > 'nodeAgg.c.diff'
376a                    pfree(oldVal); /* XXX - new, let's free the old datum - er1p */
.
374a                    oldVal = value1[aggno]; /* XXX - save so we can free later - er1p */
.
112aDatum  oldVal = (Datum) NULL;  /* XXX - so that we can save and free on
each iteration - er1p */
.
SHAR_EOF
fi
if test -f 'nodeResult.c.diff'
thenecho shar: "will not over-write existing file 'nodeResult.c.diff'"
else
cat << \SHAR_EOF > 'nodeResult.c.diff'
278apfree(resstate); node->resstate = NULL; /* XXX - new for us - er1p */
.
265aExecFreeExprContext(&resstate->cstate); /* XXX - new for us - er1p */ExecFreeTypeInfo(&resstate->cstate); /* XXX -
newfor us - er1p */
 
.
SHAR_EOF
fi
exit 0
#    End of shell archive



В списке pgsql-hackers по дате отправления:

Предыдущее
От: jwieck@debis.com (Jan Wieck)
Дата:
Сообщение: Re: [HACKERS] Re: Developers Globe (FINAL)
Следующее
От: jwieck@debis.com (Jan Wieck)
Дата:
Сообщение: Re: [HACKERS] Re: Developers Globe (FINAL)