Обсуждение: backend dies suddenly after a lot of error messages

Поиск
Список
Период
Сортировка

backend dies suddenly after a lot of error messages

От
Mirko Kaffka
Дата:
Hi,

We have problems with backend processes that close the channel because of
palloc() failures. When an INSERT statement fails, the backend reports an
error (e.g. `Cannot insert a duplicate key into a unique index') and
allocates a few bytes more memory. The next SQL statement that fails
causes the backend to allocate more memory again, etc. until we have no
more virtual memory left. Is this a bug?
We are using postgres 6.4.2 on FreeBSD 2.2.8.

It also works with psql:

toy=> create table mytable (i integer unique);
NOTICE:  CREATE TABLE/UNIQUE will create implicit index mytable_i_key for
table mytable
CREATE
toy=> \q

~ $ # now do a lot of inserts that cause error messages:
~ $ while true; do echo "INSERT INTO mytable VALUES (1);"; done | psql toy
INSERT INTO mytable VALUES (1);
ERROR:  Cannot insert a duplicate key into a unique index
...quite a lot of these messages
INSERT INTO mytable VALUES (1);
ERROR:  Cannot insert a duplicate key into a unique index
INSERT INTO mytable VALUES (1);

pqReadData() -- backend closed the channel unexpectedly.
        This probably means the backend terminated abnormally before or
while processing the request.
We have lost the connection to the backend, so further processing is
impossible.  Terminating.

Hmm, why does the backend allocate more and more memory with each failed
INSERT ?
Any clues?

Thanks,
Mirko




Re: [GENERAL] backend dies suddenly after a lot of error messages

От
Thomas Reinke
Дата:
A bug report on this was filled out against the 6.3 release as well.
Don't know the status of it, however :(

Mirko Kaffka wrote:
>
> Hi,
>
> We have problems with backend processes that close the channel because of
> palloc() failures. When an INSERT statement fails, the backend reports an
> error (e.g. `Cannot insert a duplicate key into a unique index') and
> allocates a few bytes more memory. The next SQL statement that fails
> causes the backend to allocate more memory again, etc. until we have no
> more virtual memory left. Is this a bug?
> We are using postgres 6.4.2 on FreeBSD 2.2.8.
>
> It also works with psql:
>
> toy=> create table mytable (i integer unique);
> NOTICE:  CREATE TABLE/UNIQUE will create implicit index mytable_i_key for
> table mytable
> CREATE
> toy=> \q
>
> ~ $ # now do a lot of inserts that cause error messages:
> ~ $ while true; do echo "INSERT INTO mytable VALUES (1);"; done | psql toy
> INSERT INTO mytable VALUES (1);
> ERROR:  Cannot insert a duplicate key into a unique index
> ...quite a lot of these messages
> INSERT INTO mytable VALUES (1);
> ERROR:  Cannot insert a duplicate key into a unique index
> INSERT INTO mytable VALUES (1);
>
> pqReadData() -- backend closed the channel unexpectedly.
>         This probably means the backend terminated abnormally before or
> while processing the request.
> We have lost the connection to the backend, so further processing is
> impossible.  Terminating.
>
> Hmm, why does the backend allocate more and more memory with each failed
> INSERT ?
> Any clues?
>
> Thanks,
> Mirko

--
------------------------------------------------------------
Thomas Reinke                            Tel: (416) 460-7021
Director of Technology                   Fax: (416) 598-2319
E-Soft Inc.                         http://www.e-softinc.com

Re: [HACKERS] backend dies suddenly after a lot of error messages

От
Tom Lane
Дата:
Mirko Kaffka <mirko@interface-business.de> writes:
> We have problems with backend processes that close the channel because of
> palloc() failures. When an INSERT statement fails, the backend reports an
> error (e.g. `Cannot insert a duplicate key into a unique index') and
> allocates a few bytes more memory. The next SQL statement that fails
> causes the backend to allocate more memory again, etc. until we have no
> more virtual memory left. Is this a bug?

Yeah, I'd say so --- all the memory used should get freed at transaction
end, but evidently it isn't happening.

> We are using postgres 6.4.2 on FreeBSD 2.2.8.

I still see it with 6.5-current sources.  Will take a look.

            regards, tom lane

Re: [HACKERS] backend dies suddenly after a lot of error messages

От
jwieck@debis.com (Jan Wieck)
Дата:
>
> Mirko Kaffka <mirko@interface-business.de> writes:
> > We have problems with backend processes that close the channel because of
> > palloc() failures. When an INSERT statement fails, the backend reports an
> > error (e.g. `Cannot insert a duplicate key into a unique index') and
> > allocates a few bytes more memory. The next SQL statement that fails
> > causes the backend to allocate more memory again, etc. until we have no
> > more virtual memory left. Is this a bug?
>
> Yeah, I'd say so --- all the memory used should get freed at transaction
> end, but evidently it isn't happening.
>
> > We are using postgres 6.4.2 on FreeBSD 2.2.8.
>
> I still see it with 6.5-current sources.  Will take a look.

    I  remember  to  have  taken  some  but haven't found all the
    places.  I think there's still something in  tcop  where  the
    querytree list is malloc()'d.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] backend dies suddenly after a lot of error messages

От
Tom Lane
Дата:
jwieck@debis.com (Jan Wieck) writes:
>> Yeah, I'd say so --- all the memory used should get freed at transaction
>> end, but evidently it isn't happening.

>     I  remember  to  have  taken  some  but haven't found all the
>     places.  I think there's still something in  tcop  where  the
>     querytree list is malloc()'d.

I saw that yesterday --- for no particularly good reason, postgres.c
wants to deal with the query list as an array rather than a list;
it goes to great lengths to convert the lists it's given into an array,
which it has to be able to resize, etc etc.  I was thinking of ripping
all that out and just using a palloc'd list.  At the time I didn't have
any justification for it except code beautification, which isn't a good
enough reason to be changing code late in beta... but a memory leak
is...

However, the leakage being complained of seems to be several kilobytes
per failed command, which is much more than that one malloc usage can
be blamed for.  Any other thoughts?  I was wondering if maybe a whole
palloc context somewhere is getting lost; not sure where to look though.
One thing I did find was that leakage occurs very early.  You can feed
the system commands that will fail in parsing, like saygarbage;
and the memory usage still rises with each one.
        regards, tom lane


Re: [HACKERS] backend dies suddenly after a lot of error messages

От
Tom Lane
Дата:
>> Yeah, I'd say so --- all the memory used should get freed at transaction
>> end, but evidently it isn't happening.
>> 
>> I still see it with 6.5-current sources.  Will take a look.

Ah-ha, I think I see it: AtCommit_Memory releases memory in the blank
portal (by doing EndPortalAllocMode()).  AtAbort_Memory forgets to do so.
Will commit this fix momentarily.

>     I  remember  to  have  taken  some  but haven't found all the
>     places.  I think there's still something in  tcop  where  the
>     querytree list is malloc()'d.

That is a relatively minor leak, compared to leaking *all* memory
allocated in the failed transaction, which is what it was doing until
now :-(.  But I think I will fix it anyway ... the code is awfully
ugly, and it is still a leak.
        regards, tom lane


Re: [HACKERS] backend dies suddenly after a lot of error messages

От
Don Baccus
Дата:
At 08:33 PM 5/12/99 -0400, Tom Lane wrote:

>That is a relatively minor leak, compared to leaking *all* memory
>allocated in the failed transaction, which is what it was doing until
>now :-(.  But I think I will fix it anyway ... the code is awfully
>ugly, and it is still a leak.

I'm a lurker, a compiler writer who has just begun using
Postgres as the database engine behind a bird population
tracking project I'm putting up on the web on my own
time, on a linux box running AOLServer and, for now at
least, postgres.

In my researching postgres vs. paying Oracle (which didn't
seem too bad until I learned about their extra fees for
web sites and multiple-CPU boxes) vs. mySql etc, the one
biggest complaint I've run across when talking to people
running web sites backed by Postgres has been that the
back end starts dying after weeks ... days ... hours
depending on the type of site.

On questioning folks, it seemed pretty clear that in 
some of these cases significant memory leaking was
causing the system to run out of memory.

And last week I managed to generate long sequences
of SQL that would eat available memory in about
15 minutes.  I've been lurking around a couple of
these postgres lists trying to figure out whether
or not it was a known problem before making noise
about it.

So, imagine my pleasure at seeing this short thread
on the problem and, even better, the solution!

Well, if not the (only) leak, at least one very,
very serious memory leak.  Just how many kb were
being leaked for each failed transaction?

I think you may've just slammed a stake through the 
heart of a very significant bug causing a lot of
people seemingly unexplainable flakey back-end
behavior...this fix alone may do a lot to erase
the impression some have that postgres is not
reliable enough to support any web site based
on a large database with lots of transactions.




- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, and other goodies at
http://donb.photo.net


Re: [HACKERS] backend dies suddenly after a lot of error messages

От
Tom Lane
Дата:
Don Baccus <dhogaza@pacifier.com> writes:
> I think you may've just slammed a stake through the 
> heart of a very significant bug

Thanks for the compliment :-).  You might actually be right;
this bug could go a long way towards explaining why some people
find Postgres very reliable and others don't.  The first group's
apps don't tend to provoke any SQL errors, and/or don't try to
continue running with the same backend after an error.

> And last week I managed to generate long sequences
> of SQL that would eat available memory in about
> 15 minutes.  I've been lurking around a couple of
> these postgres lists trying to figure out whether
> or not it was a known problem before making noise
> about it.

We're aware of a number of memory-leak type problems, although
most of them are just temporary leakage situations (the memory
will eventually be freed, if you have enough memory to complete
the transaction...).  I'm hoping that we can make a serious dent
in that class of problem for release 6.6.

I believe that all the Postgres developers have a bedrock commitment
to making the system as stable and bulletproof as we can.  But it
takes time to root out the subtler bugs.  I got lucky tonight ;-)

> Well, if not the (only) leak, at least one very,
> very serious memory leak.  Just how many kb were
> being leaked for each failed transaction?

I was measuring about 4K per cycle for a trivial parsing error,
like feeding "garbage;" to the backend repeatedly.  It could be
a *lot* more depending on how much work got done before the error
was detected.  Worst case you might lose megabytes...
        regards, tom lane


Re: [HACKERS] backend dies suddenly after a lot of error messages

От
Don Baccus
Дата:
At 09:39 PM 5/12/99 -0400, Tom Lane wrote:

>Thanks for the compliment :-).  You might actually be right;
>this bug could go a long way towards explaining why some people
>find Postgres very reliable and others don't.  The first group's
>apps don't tend to provoke any SQL errors, and/or don't try to
>continue running with the same backend after an error.

AOLServer, in particular, will keep a backend alive 
forever unless the site goes idle for (typically)
some minutes.  In this way, no overhead for backend
start-up is suffered by a busy site.  AOLServer manages
the threads associated with particular http connections,
while the (typically) tcl scripts servicing the connections
ask for, use, and release database handles (the tcl
interpreter runs inside the server) .  Each handle
is a connection to the db backend, and these connections
get passed around by the server to various threads as
they're released by tcl "ns_db releasehandle" calls.

So ... ANY permament memory leak by the backend will tear things
down eventually.  "Randomly", from the sysadmin's point of view.

Don't feel bad, I know of one very busy Oracle site
that kicks things down once every 24 hrs in the
dead of night for fear of cumulative leaks or, well,
any of a number of imaginable db problems :)

>We're aware of a number of memory-leak type problems, although
>most of them are just temporary leakage situations (the memory
>will eventually be freed, if you have enough memory to complete
>the transaction...).

Relatively harmless in the environment I'm describing...

>  I'm hoping that we can make a serious dent
>in that class of problem for release 6.6.

Still worth getting rid of, though!

>I believe that all the Postgres developers have a bedrock commitment
>to making the system as stable and bulletproof as we can.

Yes, I've gathered that in my reading of this group over the
last three days, and in my reading of older posts.

And y'all have fixed that other horrible bug from the
web service POV: table-level locking.  Ugh.  I'd given up
on using postgres for my project until I learned that 6.5
doesn't suffer from this limitation.

>I was measuring about 4K per cycle for a trivial parsing error,
>like feeding "garbage;" to the backend repeatedly.  It could be
>a *lot* more depending on how much work got done before the error
>was detected.  Worst case you might lose megabytes...

Memory's cheap, but not THAT cheap :)

OK, I'll go back to lurking again.  Keep up the good work,
folks.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, and other goodies at
http://donb.photo.net


Re: [HACKERS] backend dies suddenly after a lot of error messages

От
Bruce Momjian
Дата:
> jwieck@debis.com (Jan Wieck) writes:
> >> Yeah, I'd say so --- all the memory used should get freed at transaction
> >> end, but evidently it isn't happening.
> 
> >     I  remember  to  have  taken  some  but haven't found all the
> >     places.  I think there's still something in  tcop  where  the
> >     querytree list is malloc()'d.
> 
> I saw that yesterday --- for no particularly good reason, postgres.c
> wants to deal with the query list as an array rather than a list;
> it goes to great lengths to convert the lists it's given into an array,
> which it has to be able to resize, etc etc.  I was thinking of ripping
> all that out and just using a palloc'd list.  At the time I didn't have
> any justification for it except code beautification, which isn't a good
> enough reason to be changing code late in beta... but a memory leak
> is...

I also thought the array usage we very strange,�and I could not figure
out why they used it.  I figured as I learned more about the backend, I
would understnd their wisdom, but at this point, I think it was just
sloppy code.

> However, the leakage being complained of seems to be several kilobytes
> per failed command, which is much more than that one malloc usage can
> be blamed for.  Any other thoughts?  I was wondering if maybe a whole
> palloc context somewhere is getting lost; not sure where to look though.
> One thing I did find was that leakage occurs very early.  You can feed
> the system commands that will fail in parsing, like say
>     garbage;
> and the memory usage still rises with each one.

Gee, it garbage doesn't get very far into the parser, does it.  It never
makes it out of the grammar.  It may be the 8k query buffer?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] backend dies suddenly after a lot of error messages

От
Bruce Momjian
Дата:
> I was measuring about 4K per cycle for a trivial parsing error,
> like feeding "garbage;" to the backend repeatedly.  It could be
> a *lot* more depending on how much work got done before the error
> was detected.  Worst case you might lose megabytes...

The strange thing is that we don't usually hear about crash/leaks very
much.  We just started hearing about it more in the past week or so.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [GENERAL] backend dies suddenly after a lot of error messages

От
Bruce Momjian
Дата:
> > We have problems with backend processes that close the channel because of
> > palloc() failures. When an INSERT statement fails, the backend reports an
> > error (e.g. `Cannot insert a duplicate key into a unique index') and
> > allocates a few bytes more memory. The next SQL statement that fails
> > causes the backend to allocate more memory again, etc. until we have no
> > more virtual memory left. Is this a bug?
> > We are using postgres 6.4.2 on FreeBSD 2.2.8.
> >
> > It also works with psql:
> >
> > toy=> create table mytable (i integer unique);
> > NOTICE:  CREATE TABLE/UNIQUE will create implicit index mytable_i_key for
> > table mytable
> > CREATE
> > toy=> \q
> >
> > ~ $ # now do a lot of inserts that cause error messages:
> > ~ $ while true; do echo "INSERT INTO mytable VALUES (1);"; done | psql toy
> > INSERT INTO mytable VALUES (1);
> > ERROR:  Cannot insert a duplicate key into a unique index
> > ...quite a lot of these messages
> > INSERT INTO mytable VALUES (1);
> > ERROR:  Cannot insert a duplicate key into a unique index
> > INSERT INTO mytable VALUES (1);
> >
> > pqReadData() -- backend closed the channel unexpectedly.
> >         This probably means the backend terminated abnormally before or
> > while processing the request.
> > We have lost the connection to the backend, so further processing is
> > impossible.  Terminating.
> >
> > Hmm, why does the backend allocate more and more memory with each failed
> > INSERT ?
> > Any clues?

There was a bug in pre-6.5 versions that caused elog failure not to
release their memory.  There is still a small leak for elogs, but it is
only a few bytes.  You should find this is fixed in 6.5.


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026