Обсуждение: High memory usage

Поиск
Список
Период
Сортировка

High memory usage

От
"Rainer Mager"
Дата:
Hi all,

    We're seeing one usage of a database use significant more memory than
we're used to and I'm hoping for some hints at what Postgres allocates
memory for. BTW, we're running PG 7.1 on Linux.

    First of all we see some of the PG processes use as much as 80 MB
(according to top and ps). This seems to be allocated during a query
(SELECT). Unfortunately, we haven't been able to discover exactly what query
is doing this. Anyway, the first problem is this high memory allocation.
Considering the database is only about 80 MB total (according to disk usage
of the entire data directory), allocating 80 MB seems excessive.
    The second significant problem is that this memory allocation does not go
down once the SELECT is completed. According to ps the process in IDLE yet
the memory usage remains the same. If it was only one process then  that
would be fine but we see this happening with as many as 3 processes
simultaneously and the box only has 256 MB RAM (although we plan to up that
to 512 soon).

    So, if anyone has any general idea as to when PG allocates memory (for
example, I assume and hope, the PG process wouldn't allocate enough memory
to have the entire result set in memory) I'd appreciate the info. Any other
clues are also welcome.
    We have other, near identical, instances of this database which do not
show this behavior. The particular queries being done will differ between
them and one particular table is about 10 times larger (150,000 rows instead
of 15,000) in this db as compared to the others but I don't see either of
those factors causing this large a difference. Well, I don't know, how much
memory do semi-complex queries take?


Thanks,

--Rainer


Re: High memory usage

От
Tom Lane
Дата:
"Rainer Mager" <rmager@vgkk.com> writes:
>     First of all we see some of the PG processes use as much as 80 MB
> (according to top and ps). This seems to be allocated during a query
> (SELECT). Unfortunately, we haven't been able to discover exactly what query
> is doing this.

If you can catch the thing while the memory usage is going up, you could
attach to the backend with gdb and examine debug_query_string to see
what the current query is.  (Note: I think you need 7.1.1 or later to
have this variable available.  You should be on 7.1.2 in any case, just
on general principles.)  Otherwise, consider turning on query logging
and see if you can correlate log entries with the memory consumption.

> The second significant problem is that this memory allocation does not go
> down once the SELECT is completed. According to ps the process in IDLE yet
> the memory usage remains the same.

Very few Unix programs of any description will release memory back to
the OS before process exit.  Postgres is no different.  Start a fresh
backend if you need to cut down the memory usage.

There's not a lot more we can say about it until you can identify the
query that sucks up all the space.  7.1 in general is a lot better about
memory consumption than prior releases, but it sounds like you may have
found a leak that's not plugged yet.  I would like to know about it when
you identify the problem query.

            regards, tom lane

RE: High memory usage

От
"Rainer Mager"
Дата:
I've continued to work on this problem with limited results so far. I have
lowered the connection re-use setting in our connection pool but this
doesn't seem to have helped. I believe that one particular query ends up
allocating around 80MB of memory and if more than 2 or 3 of the connections
in the pool do this simultaneously then the total memory usage becomes too
high (number of connections * 80 MB). I haven't been able to determine the
exact query yet, I'm still working on it.


I have also discovered another problem with this system and running out of
memory. This one, I believe is a bug in the JDBC driver. Here's the
situation:

The Java program processes a large number (in this case, around 60,000) of
database actions (each consists of a few simple queries and a larger insert
or update), all in a single Java thread. For each of these 60,000 actions
there is a PreparedStatement created. In the PostgreSQL JDBC driver's
implementation of PreparedStatement the following exists during the object
instantiation:


        // We use ThreadLocal for SimpleDateFormat's because they are not
that
        // thread safe, so each calling thread has its own object.
        private ThreadLocal tl_df   = new ThreadLocal(); // setDate()
SimpleDateFormat
        private ThreadLocal tl_tsdf = new ThreadLocal(); // setTimestamp()
SimpleDateFormat


So, you can see that every single PreparedStatement allocates 2 ThreadLocal
objects. The interesting part is that each of these objects persists for the
entire time the thread persists (in our case, until all 60,000 actions are
completed). This is stated in the Sun Javadocs for ThreadLocal:

    Each thread holds an implicit reference to its copy of a
    ThreadLocal as long as the thread is alive and the
    ThreadLocal object is accessible; after a thread goes
    away, all of its copies of ThreadLocal variables are
    subject to garbage collection (unless other references
    to these copies exist).

So, this means that ANY APPLICATION that uses PreparedStatements in a thread
that 1) either does a lot of PreparedStatements or 2) never dies (i.e., a
main thread) will ALWAYS eventually have an out of memory error. Simply put,
this is a MEMORY LEAK. I imagine that the leak is very small, the
ThreadLocal object only contains one member variable, maybe 64 bytes or less
(depending on the VM implementation). So, our 60,000 PreparedStatements of 2
ThreadLocals each times 64 bytes (my wild guess) is 7.5MB.

The good news is that ThreadLocal is ONLY used in PreparedStatemnt and not
in any other parts of the JDBC driver.

I'll work on a patch but if someone has already done this I would be
grateful.



--Rainer

> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Tom Lane
> Sent: Thursday, June 14, 2001 10:03 AM
> To: Rainer Mager
> Cc: PostgreSQL Admin
> Subject: Re: [ADMIN] High memory usage
>
> If you can catch the thing while the memory usage is going up, you could
> attach to the backend with gdb and examine debug_query_string to see
> what the current query is.  (Note: I think you need 7.1.1 or later to
> have this variable available.  You should be on 7.1.2 in any case, just
> on general principles.)  Otherwise, consider turning on query logging
> and see if you can correlate log entries with the memory consumption.


RE: High memory usage [PATCH]

От
"Rainer Mager"
Дата:
Attached is my patch to the official 7.1.2 PreparedStatement.java class.
This seems to work quite well for me in a test case. To try to reproduce the
seen problem I will need to test all night. I'll report tomorrow.

BTW, this is my first attempt at making a unified diff so I might have done
something wrong. If this diff doesn't apply please tell me.

--Rainer

> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Rainer Mager
> Sent: Wednesday, June 20, 2001 9:08 AM
> To: Tom Lane
> Cc: PostgreSQL Admin
> Subject: RE: [ADMIN] High memory usage
>
>
> I'll work on a patch but if someone has already done this I would be
> grateful.

Вложения

RE: High memory usage

От
"Rainer Mager"
Дата:
Hi,

    Here is a query that demonstrates the problem. Running this takes about
60MB until it is done at which time it is freed (I was wrong when I said
otherwise earlier). Interestingly, the same amount of memory is used when
doing an EXPLAIN on this query. Also it happens to return 0 rows. Please
excuse the weird characters in the middle this is a Japanese (UTF8)
database. Also please excuse Outlook breaking the query, it is just one long
line.


SELECT DISTINCT product.product_id FROM product, pr_prop_str alias_table_0,
pr_prop_str alias_table_1, pr_prop_str alias_table_2, pr_prop_str
alias_table_3, pr_prop_str alias_table_4, pr_prop_str alias_table_5,
pr_prop_str alias_table_6, pr_prop_str alias_table_7, pr_prop_str
alias_table_8 WHERE product.product_id = alias_table_0.product_id AND
product.product_id = alias_table_1.product_id AND product.product_id =
alias_table_2.product_id AND product.product_id = alias_table_3.product_id
AND product.product_id = alias_table_4.product_id AND product.product_id =
alias_table_5.product_id AND product.product_id = alias_table_6.product_id
AND product.product_id = alias_table_7.product_id AND product.product_id =
alias_table_8.product_id AND ( alias_table_0.pr_property_id = 147 AND
alias_table_0.str = '3E362cb' ) AND ( alias_table_1.pr_property_id = 18 AND
alias_table_1.str > '000999999' ) AND ( alias_table_2.pr_property_id = 18
AND alias_table_2.str < '004999999' ) AND ( alias_table_3.pr_property_id =
51 AND alias_table_3.str = '蛬Oウ縷C~O縷Cウ縷C~I縷Cォ' ) AND (
alias_table_4.pr_property_id = 115 AND alias_table_4.str = '1' ) AND (
alias_table_5.pr_property_id = 68 AND alias_table_5.str = '05' ) AND (
alias_table_6.pr_property_id = 113 AND alias_table_6.str < '030001' ) AND
( alias_table_7.pr_property_id = 57 AND alias_table_7.str < '19980101' ) AND
( alias_table_8.pr_property_id = 158 AND alias_table_8.str = '1' );


    So, why is so much memory used and are there any settings I can change to
reduce that (possibly at the expense of performance)? The problem is that
sometimes this query will be running 4 or 5 times simultaneously and using
all of the box's memory.

Thanks,

--Rainer


> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Tom Lane
> Sent: Thursday, June 14, 2001 10:03 AM
> To: Rainer Mager
> Cc: PostgreSQL Admin
> Subject: Re: [ADMIN] High memory usage
>
>
> There's not a lot more we can say about it until you can identify the
> query that sucks up all the space.  7.1 in general is a lot better about
> memory consumption than prior releases, but it sounds like you may have
> found a leak that's not plugged yet.  I would like to know about it when
> you identify the problem query.
>
>             regards, tom lane


RE: High memory usage

От
"Rainer Mager"
Дата:
More info on this...

First, this type of query seems to use exponential amounts of memory. I
tried a join with 10 pieces (the one below with 9 used ~60MB) and it went up
immediately to ~ 390MB usage and has been sitting there at 100% CPU usage
for more than 10 minutes. This is on a database with almost no data in it.
Similarly, a query with only 8 joins is very quick and has much lower memory
usage.


Best regards,

--Rainer


> -----Original Message-----
> From: Rainer Mager [mailto:rmager@vgkk.com]
> Sent: Wednesday, June 20, 2001 4:19 PM
> To: Tom Lane
> Cc: PostgreSQL Admin
> Subject: RE: [ADMIN] High memory usage
>
>
> Hi,
>
>     Here is a query that demonstrates the problem. Running this
> takes about 60MB until it is done at which time it is freed (I
> was wrong when I said otherwise earlier). Interestingly, the same
> amount of memory is used when doing an EXPLAIN on this query.
> Also it happens to return 0 rows. Please excuse the weird
> characters in the middle this is a Japanese (UTF8) database. Also
> please excuse Outlook breaking the query, it is just one long line.
>
>
> SELECT DISTINCT product.product_id FROM product, pr_prop_str
> alias_table_0, pr_prop_str alias_table_1, pr_prop_str
> alias_table_2, pr_prop_str alias_table_3, pr_prop_str
> alias_table_4, pr_prop_str alias_table_5, pr_prop_str
> alias_table_6, pr_prop_str alias_table_7, pr_prop_str
> alias_table_8 WHERE product.product_id = alias_table_0.product_id
> AND product.product_id = alias_table_1.product_id AND
> product.product_id = alias_table_2.product_id AND
> product.product_id = alias_table_3.product_id AND
> product.product_id = alias_table_4.product_id AND
> product.product_id = alias_table_5.product_id AND
> product.product_id = alias_table_6.product_id AND
> product.product_id = alias_table_7.product_id AND
> product.product_id = alias_table_8.product_id AND (
> alias_table_0.pr_property_id = 147 AND alias_table_0.str =
> '3E362cb' ) AND ( alias_table_1.pr_property_id = 18 AND
> alias_table_1.str > '000999999' ) AND (
> alias_table_2.pr_property_id = 18 AND alias_table_2.str <
> '004999999' ) AND ( alias_table_3.pr_property_id = 51 AND
> alias_table_3.str = '蛬Oウ縷C~O縷Cウ縷C~I縷Cォ' ) AND (
> alias_table_4.pr_property_id = 115 AND alias_table_4.str = '1' )
> AND ( alias_table_5.pr_property_id = 68 AND alias_table_5.str =
> '05' ) AND ( alias_table_6.pr_property_id = 113 AND
> alias_table_6.str < '030001' ) AND ( alias_table_7.pr_property_id
> = 57 AND alias_table_7.str < '19980101' ) AND (
> alias_table_8.pr_property_id = 158 AND alias_table_8.str = '1' );


Re: High memory usage

От
"Ross J. Reedstrom"
Дата:
Ranier -
Can you explain in words what this query is supposed to be doing?

I'm guessing, from the DISTINCT, and the use of multiple occurances of
the same table, that the result you want can be gotten at in some other
way, that lets the backend be smarter about how it does it. Since it _is_
releasing the memory, i.e., it's not a new leak, I'm guessing that Tom
just got a whole lot less interested ;-) But helping you use PostgreSQL
better is part of what the community does, as well.

Hmm, you mention that _planning_ this query sucks up the memory, as well.
My guess is it's an interaction of the optimizer with the plan for this
query, which might have many, nearly identical cost plans, since 8 of
the 9 tables are actually the same table.

Ross

SELECT DISTINCT product.product_id
FROM     product,
    pr_prop_str alias_table_0,
    pr_prop_str alias_table_1,
    pr_prop_str alias_table_2,
    pr_prop_str alias_table_3,
    pr_prop_str alias_table_4,
    pr_prop_str alias_table_5,
    pr_prop_str alias_table_6,
    pr_prop_str alias_table_7,
    pr_prop_str alias_table_8
WHERE         product.product_id = alias_table_0.product_id
    AND product.product_id = alias_table_1.product_id
    AND product.product_id = alias_table_2.product_id
    AND product.product_id = alias_table_3.product_id
    AND product.product_id = alias_table_4.product_id
    AND product.product_id = alias_table_5.product_id
    AND product.product_id = alias_table_6.product_id
    AND product.product_id = alias_table_7.product_id
    AND product.product_id = alias_table_8.product_id
    AND ( alias_table_0.pr_property_id = 147
        AND alias_table_0.str = '3E362cb' )
    AND ( alias_table_1.pr_property_id = 18
        AND alias_table_1.str > '000999999' )
    AND ( alias_table_2.pr_property_id = 18
        AND alias_table_2.str < '004999999' )
    AND ( alias_table_3.pr_property_id = 51
        AND alias_table_3.str = '?$Bi_O?$B%&e_C~O?$Be_C?$B%&e_C~I?$Be_C?$B%)' )
    AND ( alias_table_4.pr_property_id = 115
        AND alias_table_4.str = '1' )
    AND ( alias_table_5.pr_property_id = 68
        AND alias_table_5.str = '05' )
    AND ( alias_table_6.pr_property_id = 113
        AND alias_table_6.str < '030001' )
    AND ( alias_table_7.pr_property_id = 57
        AND alias_table_7.str < '19980101' )
    AND ( alias_table_8.pr_property_id = 158
        AND alias_table_8.str = '1' );

On Wed, Jun 20, 2001 at 04:18:52PM +0900, Rainer Mager wrote:
> Hi,
>
>     Here is a query that demonstrates the problem. Running this takes about
> 60MB until it is done at which time it is freed (I was wrong when I said
> otherwise earlier). Interestingly, the same amount of memory is used when
> doing an EXPLAIN on this query. Also it happens to return 0 rows. Please
> excuse the weird characters in the middle this is a Japanese (UTF8)
> database. Also please excuse Outlook breaking the query, it is just one long
> line.
>
>

RE: High memory usage

От
"Rainer Mager"
Дата:
Thanks for the reply.

I can try to explain the query but it is being generated semi-automatically
so it is not hard coded for a particular purpose. Before going into the
explanation, though, I  have a litte bit of new information. First, it is
ONLY the join condition that matters, the other parameters do not make a
difference in terms of memory usage. That is, the following, simplified
query, uses the same amount of memory. Also having or removing the DISTINCT
makes no difference. Also, for some VERY odd reason, adding a 10th
constraint caused the EXPLAIN to take significantly LESS memory but the
actual query still took much MORE memory.

> SELECT DISTINCT product.product_id
> FROM product,
>     pr_prop_str alias_table_0,
>     pr_prop_str alias_table_1,
>     pr_prop_str alias_table_2,
>     pr_prop_str alias_table_3,
>     pr_prop_str alias_table_4,
>     pr_prop_str alias_table_5,
>     pr_prop_str alias_table_6,
>     pr_prop_str alias_table_7,
>     pr_prop_str alias_table_8
> WHERE   product.product_id = alias_table_0.product_id
>     AND product.product_id = alias_table_1.product_id
>     AND product.product_id = alias_table_2.product_id
>     AND product.product_id = alias_table_3.product_id
>     AND product.product_id = alias_table_4.product_id
>     AND product.product_id = alias_table_5.product_id
>     AND product.product_id = alias_table_6.product_id
>     AND product.product_id = alias_table_7.product_id
>     AND product.product_id = alias_table_8.product_id;

Obviously this query isn't particularly interesting by itself but it does,
perhaps, simplify the problem. If you create a table called 'product' with
'product_id' and a table called 'pr_prop_str' with 'product_id',  then you
can test the above query. For me, even with minimal data in these tables the
query still took ~60MB. As for an explanation of the full query:

What is happening is that a 'product' is being searched for that fulfills a
number of criteria that are specified in the pr_prop_str (product properties
strings) table. So we join all the product IDs to make sure the product has
all of the required properties. Then we add in the particular property
conditions. Each property has an ID (for example, the first pr_property_id
is 147, that might coorespond to a model number or something like that) that
we use  in conjunction with the particular requirement (in this example, the
model number must be '3E362cb').

I hope that makes sense.


--Rainer



Re: High memory usage

От
Tom Lane
Дата:
"Rainer Mager" <rmager@vgkk.com> writes:
> First, this type of query seems to use exponential amounts of memory.

Hmm.  What does EXPLAIN show as the plan for the query?  How long does
it take to do the EXPLAIN?  If the EXPLAIN alone takes a lot of time/
space, then you need to reduce the size of the planner's search space
with an explicit JOIN --- see
http://www.ca.postgresql.org/users-lounge/docs/7.1/postgres/explicit-joins.html

            regards, tom lane

Re: High memory usage [PATCH]

От
Bruce Momjian
Дата:
Patch format looks good to me.  I will wait to see what others say about
its content.


> Attached is my patch to the official 7.1.2 PreparedStatement.java class.
> This seems to work quite well for me in a test case. To try to reproduce the
> seen problem I will need to test all night. I'll report tomorrow.
>
> BTW, this is my first attempt at making a unified diff so I might have done
> something wrong. If this diff doesn't apply please tell me.
>
> --Rainer
>
> > -----Original Message-----
> > From: pgsql-admin-owner@postgresql.org
> > [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Rainer Mager
> > Sent: Wednesday, June 20, 2001 9:08 AM
> > To: Tom Lane
> > Cc: PostgreSQL Admin
> > Subject: RE: [ADMIN] High memory usage
> >
> >
> > I'll work on a patch but if someone has already done this I would be
> > grateful.

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

RE: High memory usage

От
"Rainer Mager"
Дата:
This looks very promising. I'll have to look over it and try a few things
out but I think it puts me on the right track.

Thanks!

--Rainer


> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Thursday, June 21, 2001 10:56 AM
> To: Rainer Mager
> Cc: PostgreSQL Admin
> Subject: Re: [ADMIN] High memory usage
>
>
> "Rainer Mager" <rmager@vgkk.com> writes:
> > First, this type of query seems to use exponential amounts of memory.
>
> Hmm.  What does EXPLAIN show as the plan for the query?  How long does
> it take to do the EXPLAIN?  If the EXPLAIN alone takes a lot of time/
> space, then you need to reduce the size of the planner's search space
> with an explicit JOIN --- see
> http://www.ca.postgresql.org/users-lounge/docs/7.1/postgres/explic
> it-joins.html
>
>             regards, tom lane


Re: High memory usage

От
Tom Lane
Дата:
"Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> My guess is it's an interaction of the optimizer with the plan for this
> query, which might have many, nearly identical cost plans, since 8 of
> the 9 tables are actually the same table.

Yes, I suspect the same.  A large fraction of the possible subplans
would have exactly identical costs, which would keep the planner from
discarding any of them (normally, it drops clearly-inferior subplans
instantly, which does a great deal to limit exponential search
behavior).  It doesn't help any that the WHERE conditions are all
so similar, either.

I have a strong suspicion that the database schema is poorly thought
out, but lacking any concrete info, it's hard to offer suggestions
for improvement.

            regards, tom lane

RE: High memory usage

От
Steven Lane
Дата:
>Date: Wed, 20 Jun 2001 23:06:56 -0500
>To: "Rainer Mager" <rmager@vgkk.com>
>From: Steven Lane <stevelcmc@mindspring.com>
>Subject: RE: High memory usage
>Cc: pgsql-admin@postgresql.orgl
>Bcc:
>X-Attachments:
>
>>Hi,
>>
>>    Hmm, I'm not sure I understand how I could use EXISTS in my query.
>>Can you
>>give me an example?
>>
>>Thanks,
>>
>>--Rainer
>>
>
>>> I hope this is not a massively stupid response, but creating a new
>>> self-join condition for each additional criterion seems a rather expensive
>>> approach. Can this be done more quickly using multiple EXISTS
>>> conditions to
>>> check for each of the specified criteria?
>>>
>>> -- sgl
>>>
>>>
>
>Something like
>
>SELECT DISTINCT product.product_id FROM product p
>WHERE
>EXISTS ( SELECT pr_prop_str.pr_property_id FROM pr_prop_str prs
>                  WHERE p.product_id = prs.product_id
>                  AND prs.pr_property_id = 147 AND prs.str = '3E362cb' )
>AND
>EXISTS ( SELECT pr_prop_str.pr_property_id FROM pr_prop_str prs
>                  WHERE p.product_id = prs.product_id
>                  AND prs.pr_property_id = 18 AND prs.str BETWEEN
>'000999999'  AND '004999999' )
>AND
>EXISTS ( SELECT pr_prop_str.pr_property_id FROM pr_prop_str prs
>                  WHERE p.product_id = prs.product_id
>                  AND prs.pr_property_id = 51 AND prs.str =
>'$Bi_(BO$B%&e_(BC~O$Be_(BC$B%&e_(BC~I$Be_(BC$B%)(B' )
>AND
>EXISTS ( SELECT pr_prop_str.pr_property_id FROM pr_prop_str prs
>                  WHERE p.product_id = prs.product_id
>                  AND prs.pr_property_id = 115 AND prs.str =1 )
>AND
>EXISTS ( SELECT pr_prop_str.pr_property_id FROM pr_prop_str prs
>                  WHERE p.product_id = prs.product_id
>                  AND prs.pr_property_id = 68 AND prs.str =5 )
>AND
>EXISTS ( SELECT pr_prop_str.pr_property_id FROM pr_prop_str prs
>                  WHERE p.product_id = prs.product_id
>                  AND prs.pr_property_id = 113 AND prs.str < '030001' )
>AND
>EXISTS ( SELECT pr_prop_str.pr_property_id FROM pr_prop_str prs
>                  WHERE p.product_id = prs.product_id
>                  AND prs.pr_property_id = 57 AND prs.str < '19980101' )
>AND
>EXISTS ( SELECT pr_prop_str.pr_property_id FROM pr_prop_str prs
>                  WHERE p.product_id = prs.product_id
>                  AND prs.pr_property_id = 158 AND prs.str=1 )
>
>
>was kind of what I had in mind.
>