Обсуждение: more on out-of-memory

Поиск
Список
Период
Сортировка

more on out-of-memory

От
Alvaro Herrera
Дата:
So, the reason I started the thread about postmaster dying on OOM is
that somebody asked me on IM what could have caused a backend to die
with this backtrace:

libc.so.1`_ndoprnt+0x14()
libc.so.1`fprintf+0x11d()
AllocSetStats+0x15d()
MemoryContextStatsInternal+0x1c()
MemoryContextStats+0xb()
AllocSetAlloc+0x1c0()
MemoryContextAllocZeroAligned+0x57()
makeTypeNameFromNameList+0x20()
SystemTypeName+0x40()
base_yyparse+0xcd42()
raw_parser+0x29()
pg_parse_query+0x23()
exec_simple_query+0x6d()
PostgresMain+0xf6a()
BackendRun+0x254()
BackendStartup+0xf8()
ServerLoop+0x116()
PostmasterMain+0xd98()
main+0x18a()
0x4e08ec()

Postmaster only logged this one with

2009-04-06 16:33:48 EDT::@:[13741]: LOG:  server process (PID 12146) was terminated by signal 11

and there's no indication of any activity from that process in the log
at all.

Several other processes seem to be exiting or terminating transactions
with errno "Not enough space".

His question was: is it possible that we're handing a NULL pointer to a
%s on fprintf?  The involved code looks like this:
    fprintf(stderr,        "%s: %lu total in %ld blocks; %lu free (%ld chunks); %lu used\n",        set->header.name,
totalspace,nblocks, freespace, nchunks,        totalspace - freespace);
 

And since this is being called from AllocSetAlloc, which is always
handed a complete memory context (and not something that has only been
partially set), I think the answer is that it's not possible, and that
the bug must be on libc which is perhaps not handling out-of-memory very
cleanly in its fprintf implementation.

Am I all wet?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: more on out-of-memory

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> So, the reason I started the thread about postmaster dying on OOM is
> that somebody asked me on IM what could have caused a backend to die
> with this backtrace:

[ of course you realize this is a backend, not the postmaster ]

> His question was: is it possible that we're handing a NULL pointer to a
> %s on fprintf?  The involved code looks like this:
> ...
> And since this is being called from AllocSetAlloc, which is always
> handed a complete memory context (and not something that has only been
> partially set), I think the answer is that it's not possible, and that
> the bug must be on libc which is perhaps not handling out-of-memory very
> cleanly in its fprintf implementation.

Another theory is that the name pointer got clobbered by some sort of
memory-stomping bug.  (We don't know from the available evidence that
it was NULL --- it could have been any garbage value that pointed
outside backend memory.)  However, given that the context clearly
indicates being out-of-memory overall, your theory seems a bit more
probable.

The really odd thing is that the stack trace is so short; it seems
to have failed *very* early in query parsing, which is hard to credit
unless this person is in the habit of sending megabytes-long queries.
I guess if the system as a whole were under really severe memory
pressure, a backend could hit OOM without having eaten much itself.

What platform is this, and which PG version?
        regards, tom lane


Re: more on out-of-memory

От
Heikki Linnakangas
Дата:
Alvaro Herrera wrote:
> His question was: is it possible that we're handing a NULL pointer to a
> %s on fprintf?  The involved code looks like this:
> 
>         fprintf(stderr,
>             "%s: %lu total in %ld blocks; %lu free (%ld chunks); %lu used\n",
>             set->header.name, totalspace, nblocks, freespace, nchunks,
>             totalspace - freespace);

Note that glibc prints "(null)" if you pass NULL for %s. Others don't.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com