Re: Backends dying due to memory exhaustion--I'm stonkered

Поиск
Список
Период
Сортировка
От Doug McNaught
Тема Re: Backends dying due to memory exhaustion--I'm stonkered
Дата
Msg-id m3g0i5sugy.fsf@belphigor.mcnaught.org
обсуждение исходный текст
Ответ на Backends dying due to memory exhaustion--I'm stonkered  (Doug McNaught <doug@wireboard.com>)
Ответы Re: Backends dying due to memory exhaustion--I'm stonkered
Список pgsql-general
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Doug McNaught <doug@wireboard.com> writes:
> > The problem I'm having is that the backends will crash randomly, after
> > the database has been up for a few days, with:
> > FATAL 1:  Memory exhausted in AllocSetAlloc()
>
> > The system has plenty of memory and swap, and under normal
> > circumstances the backends take up 10-15 megabytes.  If it's a
> > runaway situation of some kind, it happens very fast, as I've even
> > taken snapshots of the process table at 1 minute intervals, and they
> > show no abnormality right up to the time of the crash.
>
> Hmm.  That puts a damper on the idea that it's a memory leak --- doesn't
> eliminate the theory entirely, however.  The other likely theory is that
> you've got a variable-size column value someplace whose size word has
> been corrupted, so that it claims to be umpteen megabytes long.  Any
> attempt to copy such a value out of the tuple it's in will result in
> an instant "out of memory" complaint.

Hmm, very interesting.  Does VARCHAR count as a variable-size column?
One funny thing is that the nightly VACUUM doesn't always fail--the
system will run smoothly for one to three days on average before a
crash.

> Is there any consistency about which table is being touched when the
> failure occurs?  It's not hard to isolate and delete a damaged tuple
> once you know which table it's in, but if you've got a lot of tables
> the initial search can be tedious.

I'll check into this.  Having just looked over my error logs, I see
some suspects but nothing jumps out at me.  Unfortunately, OpenACS has
a boatload of tables, and there are 8 different instances, each with
its own database.

> One way to get more info is to tweak the code to abort() just before
> it would normally report the out-of-memory error.  Then you will get
> a coredump and can learn something from the backtrace (don't forget
> to compile with -g).

That's a thought, and I will try it.  I'm currently (as of yesterday's
crash) running with -d 2 and output sent to a logfile.  Is this
debuglevel high enough to tell me which table contains the bad tuple,
if that's indeed the problem?

If I can't nail it down that way, how hard would it be to write a C
program to scan all the tuples in a database looking for bogus size
fields?

-Doug

В списке pgsql-general по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: high level specs on PL ?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Backends dying due to memory exhaustion--I'm stonkered