Обсуждение: Memory leaks in record_out and record_send
I looked into the problem complained of here: http://archives.postgresql.org/pgsql-general/2012-11/msg00279.php which turns out to have nothing to do with joins and everything to do with the fact that record_out() leaks memory like mad. It leaks both the strings returned by the per-column output functions and any column values that have to be detoasted. You can easily reproduce this with an example like create table leak (f1 int, f2 text); insert into leak select x, 'foo' from generate_series(1,1000000) x; select leak from leak; The attached patch against HEAD fixes this, as well as a similar leakage in record_send(). The added code is lifted directly from printtup() so it's not adding any new assumptions to the system. I wonder though if we ought to think about running output functions in a short-lived memory context instead of the executor's main context. We've considered that before, I think, and it's always been the path of least resistance to fix the output functions instead --- but there will always be another leak I'm afraid. OTOH I can't see trying to back-patch a solution like that. If we want to fix this in the back branches (and note the complaint linked above is against 8.3), I think we have to do it as attached. Thoughts? regards, tom lane diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c index 13e574d4e82b4412f7ab32e02398d1671d381097..d4ed7d0ca06ac1a0dee6b4963989a5b41713fb3b 100644 *** a/src/backend/utils/adt/rowtypes.c --- b/src/backend/utils/adt/rowtypes.c *************** typedef struct ColumnIOData *** 32,37 **** --- 32,38 ---- Oid column_type; Oid typiofunc; Oid typioparam; + bool typisvarlena; FmgrInfo proc; } ColumnIOData; *************** record_out(PG_FUNCTION_ARGS) *** 364,369 **** --- 365,371 ---- { ColumnIOData *column_info = &my_extra->columns[i]; Oid column_type = tupdesc->attrs[i]->atttypid; + Datum attr; char *value; char *tmp; bool nq; *************** record_out(PG_FUNCTION_ARGS) *** 387,403 **** */ if (column_info->column_type != column_type) { - bool typIsVarlena; - getTypeOutputInfo(column_type, &column_info->typiofunc, ! &typIsVarlena); fmgr_info_cxt(column_info->typiofunc, &column_info->proc, fcinfo->flinfo->fn_mcxt); column_info->column_type = column_type; } ! value = OutputFunctionCall(&column_info->proc, values[i]); /* Detect whether we need double quotes for this value */ nq = (value[0] == '\0'); /* force quotes for empty string */ --- 389,412 ---- */ if (column_info->column_type != column_type) { getTypeOutputInfo(column_type, &column_info->typiofunc, ! &column_info->typisvarlena); fmgr_info_cxt(column_info->typiofunc, &column_info->proc, fcinfo->flinfo->fn_mcxt); column_info->column_type = column_type; } ! /* ! * If we have a toasted datum, forcibly detoast it here to avoid ! * memory leakage inside the type's output routine. ! */ ! if (column_info->typisvarlena) ! attr = PointerGetDatum(PG_DETOAST_DATUM(values[i])); ! else ! attr = values[i]; ! ! value = OutputFunctionCall(&column_info->proc, attr); /* Detect whether we need double quotes for this value */ nq = (value[0] == '\0'); /* force quotes for empty string */ *************** record_out(PG_FUNCTION_ARGS) *** 416,432 **** /* And emit the string */ if (nq) ! appendStringInfoChar(&buf, '"'); for (tmp = value; *tmp; tmp++) { char ch = *tmp; if (ch == '"' || ch == '\\') ! appendStringInfoChar(&buf, ch); ! appendStringInfoChar(&buf, ch); } if (nq) ! appendStringInfoChar(&buf, '"'); } appendStringInfoChar(&buf, ')'); --- 425,447 ---- /* And emit the string */ if (nq) ! appendStringInfoCharMacro(&buf, '"'); for (tmp = value; *tmp; tmp++) { char ch = *tmp; if (ch == '"' || ch == '\\') ! appendStringInfoCharMacro(&buf, ch); ! appendStringInfoCharMacro(&buf, ch); } if (nq) ! appendStringInfoCharMacro(&buf, '"'); ! ! pfree(value); ! ! /* Clean up detoasted copy, if any */ ! if (DatumGetPointer(attr) != DatumGetPointer(values[i])) ! pfree(DatumGetPointer(attr)); } appendStringInfoChar(&buf, ')'); *************** record_send(PG_FUNCTION_ARGS) *** 714,719 **** --- 729,735 ---- { ColumnIOData *column_info = &my_extra->columns[i]; Oid column_type = tupdesc->attrs[i]->atttypid; + Datum attr; bytea *outputbytes; /* Ignore dropped columns in datatype */ *************** record_send(PG_FUNCTION_ARGS) *** 734,756 **** */ if (column_info->column_type != column_type) { - bool typIsVarlena; - getTypeBinaryOutputInfo(column_type, &column_info->typiofunc, ! &typIsVarlena); fmgr_info_cxt(column_info->typiofunc, &column_info->proc, fcinfo->flinfo->fn_mcxt); column_info->column_type = column_type; } ! outputbytes = SendFunctionCall(&column_info->proc, values[i]); /* We assume the result will not have been toasted */ pq_sendint(&buf, VARSIZE(outputbytes) - VARHDRSZ, 4); pq_sendbytes(&buf, VARDATA(outputbytes), VARSIZE(outputbytes) - VARHDRSZ); pfree(outputbytes); } pfree(values); --- 750,784 ---- */ if (column_info->column_type != column_type) { getTypeBinaryOutputInfo(column_type, &column_info->typiofunc, ! &column_info->typisvarlena); fmgr_info_cxt(column_info->typiofunc, &column_info->proc, fcinfo->flinfo->fn_mcxt); column_info->column_type = column_type; } ! /* ! * If we have a toasted datum, forcibly detoast it here to avoid ! * memory leakage inside the type's output routine. ! */ ! if (column_info->typisvarlena) ! attr = PointerGetDatum(PG_DETOAST_DATUM(values[i])); ! else ! attr = values[i]; ! ! outputbytes = SendFunctionCall(&column_info->proc, attr); /* We assume the result will not have been toasted */ pq_sendint(&buf, VARSIZE(outputbytes) - VARHDRSZ, 4); pq_sendbytes(&buf, VARDATA(outputbytes), VARSIZE(outputbytes) - VARHDRSZ); + pfree(outputbytes); + + /* Clean up detoasted copy, if any */ + if (DatumGetPointer(attr) != DatumGetPointer(values[i])) + pfree(DatumGetPointer(attr)); } pfree(values);
Tom Lane <tgl@sss.pgh.pa.us> writes: > OTOH I can't see trying to back-patch a solution like that. If we want > to fix this in the back branches (and note the complaint linked above is > against 8.3), I think we have to do it as attached. > > Thoughts? I've been using textin(record_out(NEW)) in generic partitioning triggers, and you can find examples of that trick in the wiki, so I think we have users of that in the field. Please indeed do consider backpatching! I don't have an opinion on the opportunity to use a shorter memory context, I feel that would need some more involved analytics than my brainpower of the moment allows me to consider. Thanks, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Thoughts?
> I've been using textin(record_out(NEW)) in generic partitioning
> triggers, and you can find examples of that trick in the wiki, so I
> think we have users of that in the field.
I think explicit calls like that actually wouldn't be a problem,
since they'd be run in a per-tuple context anyway.  The cases that
are problematic are hard-coded I/O function calls.  I'm worried
about the ones like, say, plpgsql's built-in conversion operations.
We could probably fix printtup's usage with some confidence, but
there are a lot of other ones.
        regards, tom lane
			
		Tom Lane <tgl@sss.pgh.pa.us> writes: > I think explicit calls like that actually wouldn't be a problem, > since they'd be run in a per-tuple context anyway. The cases that > are problematic are hard-coded I/O function calls. I'm worried > about the ones like, say, plpgsql's built-in conversion operations. > We could probably fix printtup's usage with some confidence, but > there are a lot of other ones. That's a good reason to get them into a shorter memory context, but which? per transaction maybe? shorter? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I think explicit calls like that actually wouldn't be a problem,
>> since they'd be run in a per-tuple context anyway.  The cases that
>> are problematic are hard-coded I/O function calls.  I'm worried
>> about the ones like, say, plpgsql's built-in conversion operations.
>> We could probably fix printtup's usage with some confidence, but
>> there are a lot of other ones.
> That's a good reason to get them into a shorter memory context, but
> which? per transaction maybe? shorter?
It would have to be per-tuple to do any good.  The existing behavior
is per-query and causes problems if lots of rows are output.  In plpgsql
it would be a function-call-lifespan leak.
        regards, tom lane
			
		On Tue, Nov 13, 2012 at 12:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wonder though if we ought to think about running output functions in > a short-lived memory context instead of the executor's main context. > We've considered that before, I think, and it's always been the path > of least resistance to fix the output functions instead --- but there > will always be another leak I'm afraid. Such is the lot of people who code in C. I worry that the number of memory contexts we're kicking around already is imposing a significant distributed overhead on the system that is hard to measure but nevertheless real, and that this will add to it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Nov 13, 2012 at 12:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I wonder though if we ought to think about running output functions in
>> a short-lived memory context instead of the executor's main context.
>> We've considered that before, I think, and it's always been the path
>> of least resistance to fix the output functions instead --- but there
>> will always be another leak I'm afraid.
> Such is the lot of people who code in C.  I worry that the number of
> memory contexts we're kicking around already is imposing a significant
> distributed overhead on the system that is hard to measure but
> nevertheless real, and that this will add to it.
Yeah, perhaps.  I'd like to think that a MemoryContextReset is cheaper
than a bunch of retail pfree's, but it's hard to prove anything without
actually coding and testing it --- and on modern machines, effects like
cache locality could swamp pure instruction-count gains anyway.
Anyway, I committed the narrow fix for the moment.
        regards, tom lane
			
		On Tue, Nov 13, 2012 at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Nov 13, 2012 at 12:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I wonder though if we ought to think about running output functions in >>> a short-lived memory context instead of the executor's main context. >>> We've considered that before, I think, and it's always been the path >>> of least resistance to fix the output functions instead --- but there >>> will always be another leak I'm afraid. > >> Such is the lot of people who code in C. I worry that the number of >> memory contexts we're kicking around already is imposing a significant >> distributed overhead on the system that is hard to measure but >> nevertheless real, and that this will add to it. > > Yeah, perhaps. I'd like to think that a MemoryContextReset is cheaper > than a bunch of retail pfree's, but it's hard to prove anything without > actually coding and testing it --- and on modern machines, effects like > cache locality could swamp pure instruction-count gains anyway. Yeah. The thing that concerns me is that I think we have a pretty decent number of memory contexts where the expected number of allocations is very small ... and we have the context *just in case* we do more than that in certain instances. I've seen profiles where the setup/teardown costs of memory contexts are significant ... which doesn't mean that those examples would perform better with fewer memory contexts, but it's enough to make me pause for thought. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote:
> Yeah.  The thing that concerns me is that I think we have a pretty
> decent number of memory contexts where the expected number of
> allocations is very small ... and we have the context *just in case*
> we do more than that in certain instances.  I've seen profiles where
> the setup/teardown costs of memory contexts are significant ... which
> doesn't mean that those examples would perform better with fewer
> memory contexts, but it's enough to make me pause for thought.
So, for my 2c, I'm on the other side of this, personally.  We have
memory contexts for more-or-less exactly this issue.  It's one of the
great things about PG- it's resiliant and very unlikely to have large or
bad memory leaks in general, much of which can, imv, be attributed to
our use of memory contexts.
Thanks,
    Stephen
			
		On Tue, Nov 13, 2012 at 05:50:08PM -0500, Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: > > Yeah. The thing that concerns me is that I think we have a pretty > > decent number of memory contexts where the expected number of > > allocations is very small ... and we have the context *just in case* > > we do more than that in certain instances. I've seen profiles where > > the setup/teardown costs of memory contexts are significant ... which > > doesn't mean that those examples would perform better with fewer > > memory contexts, but it's enough to make me pause for thought. > > So, for my 2c, I'm on the other side of this, personally. We have > memory contexts for more-or-less exactly this issue. It's one of the > great things about PG- it's resiliant and very unlikely to have large or > bad memory leaks in general, much of which can, imv, be attributed to > our use of memory contexts. If the problem is that we create memory context overhead which is not necessary in many cases, perhaps we can reduce the overhead somehow. IIRC we have a seperate function for resetting a context and freeing it entirely. If there was a quick test we could do such that resetting a context did nothing unless at least (say) 16k had been allocated, that might reduce the cost for many very small allocations. Ofcourse, unless someone comes up with a way to measure the cost this is all handwaving, but it might a nice project for someone interested in learning to hack postgres. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer