Обсуждение: How to monitor resources on Linux.

Поиск
Список
Период
Сортировка

How to monitor resources on Linux.

От
John R Allgood
Дата:
Hello All

    I have some questions on memory resources and linux. We are
currently running Dell Poweredge 2950 with dual core opeterons and 8GB
RAM. Postgres version is 7.4.17 on RHEL4. Could someone explain to me
how to best monitor the memory resources on this platform. Top shows a
high memory usage nearly all is being used. ipcs -m shows the following
output. If I am looking at this correctly each of the postgres entries
represents a postmaster with the number of connections. If I calculate
the first entry it comes to around 3.4GB of RAM being used is this
correct. We have started running into memory issues and I think we have
exhausted all the memory on the system. I think the best approach would
be to add more memory unless someone can suggest other options.  We have
a 2 node cluster running about 10 separate postmasters divided evenly on
each node. Each postmaster is a separate division is our company if we
have a problems with one database not everyone is down.

0x0052ea91 163845     postgres  600        133947392  26
0x00530db9 196614     postgres  600        34529280   24
0x00530201 229383     postgres  600        34529280   21
0x005305e9 262152     postgres  600        4915200    3
0x005311a1 294921     postgres  600        34529280   28
0x0052fe19 327690     postgres  600        4915200    4

Thanks

John Allgood - Systems Admin
Turbo Logistics


Re: How to monitor resources on Linux.

От
"Medi Montaseri"
Дата:
The 3.4G per process seems too un-realistic. Here is a simple way to isolate or narrow the scope of the problem at hand.

Bring the server up, go to the run level that you run PG, but stop PG, now measure your memory consumption. This is your baseline.
Now start PG, but no connection, just idle, measure your memory consumption
Then bang on your PG (or wait for a busy time) and measure your memory consumption.

Tools available on linux include ps(1), vmstat(1), top(1), ipcs(1), proc(5)

Medi

On 8/28/07, John R Allgood <jallgood@the-allgoods.net> wrote:
Hello All

    I have some questions on memory resources and linux. We are
currently running Dell Poweredge 2950 with dual core opeterons and 8GB
RAM. Postgres version is 7.4.17 on RHEL4. Could someone explain to me
how to best monitor the memory resources on this platform. Top shows a
high memory usage nearly all is being used. ipcs -m shows the following
output. If I am looking at this correctly each of the postgres entries
represents a postmaster with the number of connections. If I calculate
the first entry it comes to around 3.4GB of RAM being used is this
correct. We have started running into memory issues and I think we have
exhausted all the memory on the system. I think the best approach would
be to add more memory unless someone can suggest other options.  We have
a 2 node cluster running about 10 separate postmasters divided evenly on
each node. Each postmaster is a separate division is our company if we
have a problems with one database not everyone is down.

0x0052ea91 163845     postgres  600        133947392  26
0x00530db9 196614     postgres  600        34529280   24
0x00530201 229383     postgres  600        34529280   21
0x005305e9 262152     postgres  600        4915200    3
0x005311a1 294921     postgres  600        34529280   28
0x0052fe19 327690     postgres  600        4915200    4

Thanks

John Allgood - Systems Admin
Turbo Logistics


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: How to monitor resources on Linux.

От
Tom Lane
Дата:
John R Allgood <jallgood@the-allgoods.net> writes:
>     I have some questions on memory resources and linux. We are
> currently running Dell Poweredge 2950 with dual core opeterons and 8GB
> RAM. Postgres version is 7.4.17 on RHEL4. Could someone explain to me
> how to best monitor the memory resources on this platform. Top shows a
> high memory usage nearly all is being used.

That's meaningless: what you have to look at is the breakdown of *how*
it is being used.  The normal state of affairs is that there is no
"free" memory to speak of, because the kernel will keep around cached
disk pages as long as it can, so as to save a read if they are
referenced again.  You're only in memory trouble when the percentage
used for disk buffers gets real small.

> ipcs -m shows the following
> output. If I am looking at this correctly each of the postgres entries
> represents a postmaster with the number of connections. If I calculate
> the first entry it comes to around 3.4GB of RAM being used is this
> correct.

That's *completely* wrong.  It's shared memory, so by definition there
is one copy, not one per process.

One thing you have to watch out for is that "top" tends to report some
or all shared memory as part of the address space of each attached
process; so adding up the process sizes shown by top gives a
ridiculously inflated estimate.  However, it's tough to tell exactly how
much is being double-counted :-(.  I tend to look at top's aggregate
numbers, which are pretty real, and ignore the per-process ones.

> We have started running into memory issues

How do you know that?

Another good tool is to watch "vmstat 1" output.  If you see a lot of
swapin/swapout traffic, then maybe you do indeed need more RAM.

> We have a 2 node cluster running about 10 separate postmasters divided
> evenly on each node.

I was wondering why so many postgres-owned shmem segments.  Is it
intentional that you've given them radically different amounts of
memory?  Some of these guys are scraping along with just a minimal
number of buffers ...

> 0x0052ea91 163845     postgres  600        133947392  26
> 0x00530db9 196614     postgres  600        34529280   24
> 0x00530201 229383     postgres  600        34529280   21
> 0x005305e9 262152     postgres  600        4915200    3
> 0x005311a1 294921     postgres  600        34529280   28
> 0x0052fe19 327690     postgres  600        4915200    4

            regards, tom lane

Re: How to monitor resources on Linux.

От
John R Allgood
Дата:
Hey Tom

    Thanks for responding. This issue came around because of a situation
yesterday with processes being killed off by the kernel.  I believe my
co worker Geof Myers sent a post yesterday and the response was to
adjust the vm.commit_memory=2. Several time throughout the day we see
memory usage peak and then it will go down. We have multiple postmasters
running for each of our division so that I we have a problem with a
database it only affects that one. It make it diffucult to tune a system
with this many postmasters running. Each database is tuned according to
need. We allow anywhere between 5-50 max connections. So what I am
looking for is?  Exactly what am I looking at with ipcs -m, free, and top.

Thanks

Tom Lane wrote:
> John R Allgood <jallgood@the-allgoods.net> writes:
>
>>     I have some questions on memory resources and linux. We are
>> currently running Dell Poweredge 2950 with dual core opeterons and 8GB
>> RAM. Postgres version is 7.4.17 on RHEL4. Could someone explain to me
>> how to best monitor the memory resources on this platform. Top shows a
>> high memory usage nearly all is being used.
>>
>
> That's meaningless: what you have to look at is the breakdown of *how*
> it is being used.  The normal state of affairs is that there is no
> "free" memory to speak of, because the kernel will keep around cached
> disk pages as long as it can, so as to save a read if they are
> referenced again.  You're only in memory trouble when the percentage
> used for disk buffers gets real small.
>
>
>> ipcs -m shows the following
>> output. If I am looking at this correctly each of the postgres entries
>> represents a postmaster with the number of connections. If I calculate
>> the first entry it comes to around 3.4GB of RAM being used is this
>> correct.
>>
>
> That's *completely* wrong.  It's shared memory, so by definition there
> is one copy, not one per process.
>
> One thing you have to watch out for is that "top" tends to report some
> or all shared memory as part of the address space of each attached
> process; so adding up the process sizes shown by top gives a
> ridiculously inflated estimate.  However, it's tough to tell exactly how
> much is being double-counted :-(.  I tend to look at top's aggregate
> numbers, which are pretty real, and ignore the per-process ones.
>
>
>> We have started running into memory issues
>>
>
> How do you know that?
>
> Another good tool is to watch "vmstat 1" output.  If you see a lot of
> swapin/swapout traffic, then maybe you do indeed need more RAM.
>
>
>> We have a 2 node cluster running about 10 separate postmasters divided
>> evenly on each node.
>>
>
> I was wondering why so many postgres-owned shmem segments.  Is it
> intentional that you've given them radically different amounts of
> memory?  Some of these guys are scraping along with just a minimal
> number of buffers ...
>
>
>> 0x0052ea91 163845     postgres  600        133947392  26
>> 0x00530db9 196614     postgres  600        34529280   24
>> 0x00530201 229383     postgres  600        34529280   21
>> 0x005305e9 262152     postgres  600        4915200    3
>> 0x005311a1 294921     postgres  600        34529280   28
>> 0x0052fe19 327690     postgres  600        4915200    4
>>
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
>


Re: How to monitor resources on Linux.

От
Alvaro Herrera
Дата:
John R Allgood wrote:
> Hey Tom
>
>    Thanks for responding. This issue came around because of a situation
> yesterday with processes being killed off by the kernel.  I believe my co
> worker Geof Myers sent a post yesterday and the response was to adjust the
> vm.commit_memory=2. Several time throughout the day we see memory usage
> peak and then it will go down. We have multiple postmasters running for
> each of our division so that I we have a problem with a database it only
> affects that one. It make it diffucult to tune a system with this many
> postmasters running. Each database is tuned according to need. We allow
> anywhere between 5-50 max connections. So what I am looking for is?

Any of work_mem or maintenance_worm_mem set too high can cause excessive
memory usage.  What do you have these set to?

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"World domination is proceeding according to plan"        (Andrew Morton)

Re: How to monitor resources on Linux.

От
John R Allgood
Дата:
We are using the defaults for these values. Keep in mind we are allowing between 5-50 max connections per postmaster.  Here is an example of our largest database. It is 7.9GB we allow 50 max connections and the buffers are set to 16000/125MB. This is our master database and it has a lot of activity as compared to the other databases. We run VACUUM at midday VACUUM FULL at night, VACUUM ANALYZE on weekends.

Thanks

Alvaro Herrera wrote:
John R Allgood wrote: 
Hey Tom
  Thanks for responding. This issue came around because of a situation 
yesterday with processes being killed off by the kernel.  I believe my co 
worker Geof Myers sent a post yesterday and the response was to adjust the 
vm.commit_memory=2. Several time throughout the day we see memory usage 
peak and then it will go down. We have multiple postmasters running for 
each of our division so that I we have a problem with a database it only 
affects that one. It make it diffucult to tune a system with this many 
postmasters running. Each database is tuned according to need. We allow 
anywhere between 5-50 max connections. So what I am looking for is?     
Any of work_mem or maintenance_worm_mem set too high can cause excessive
memory usage.  What do you have these set to?
 

Re: How to monitor resources on Linux.

От
Andrew Sullivan
Дата:
On Tue, Aug 28, 2007 at 03:40:03PM -0400, John R Allgood wrote:
> lot of activity as compared to the other databases. We run VACUUM at
> midday VACUUM FULL at night, VACUUM ANALYZE on weekends.

If you are running VACUUM often enough, then you should _never_ need
VACUUM FULL.  And weekly VACUUM ANALYSE is probably too infrequent.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.
        --H.W. Fowler

Re: How to monitor resources on Linux.

От
John R Allgood
Дата:
We were running vacuum and vacuum full daily without the vaccum analyze on weekends. After about 2 weeks the master database would slow down. How often do you run VACUUM or are you using the autovacumm daemon.

Andrew Sullivan wrote:
On Tue, Aug 28, 2007 at 03:40:03PM -0400, John R Allgood wrote: 
lot of activity as compared to the other databases. We run VACUUM at 
midday VACUUM FULL at night, VACUUM ANALYZE on weekends.   
If you are running VACUUM often enough, then you should _never_ need
VACUUM FULL.  And weekly VACUUM ANALYSE is probably too infrequent. 

A
 

Re: How to monitor resources on Linux.

От
"Scott Marlowe"
Дата:
On 8/28/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
> On Tue, Aug 28, 2007 at 03:40:03PM -0400, John R Allgood wrote:
> > lot of activity as compared to the other databases. We run VACUUM at
> > midday VACUUM FULL at night, VACUUM ANALYZE on weekends.
>
> If you are running VACUUM often enough, then you should _never_ need
> VACUUM FULL.  And weekly VACUUM ANALYSE is probably too infrequent.

I would go so far as to say that vacuum fulls should never need to be
scheduled.  they should only be run when the DBA has looked at the DB
and determined that "something bad has happened" and needs to run it.
And even then, reindexdb is usually a better choice.

Also, by 7.4 autovacuum existed, even if it isn't perfect yet.  It's
still better than weekly analyze.

As for the top output, I'm pretty sure it's in bytes.

133947392 is about 125Meg as the OP mentioned later is what he has
shared mem set to.

You said: "we see memory usage peak and then it will go down"

What do you mean by this?  What does free say before during and after.

Here's free on my db server right now:

             total       used       free     shared    buffers     cached
Mem:       2072460    2043440      29020          0      42980    1891160
-/+ buffers/cache:     109300    1963160
Swap:      2097144        536    2096608

Note that I'm showing 29Meg free.  But I've got 42Meg buffers and 1.8Gig cached.

My memory's not used up.

So, we're all just trying to be sure that you really are running out of memory.

Re: How to monitor resources on Linux.

От
Andrew Sullivan
Дата:
On Tue, Aug 28, 2007 at 04:14:09PM -0400, John R Allgood wrote:
> We were running vacuum and vacuum full daily without the vaccum analyze
> on weekends. After about 2 weeks the master database would slow down.

That doesn't surprise me. If you have enough writes, the regular
vacuum isn't running often enough.  The goal is to vacuum "just
enough".  The vacuum delay stuff in more recent releases is valuable
here.

> How often do you run VACUUM or are you using the autovacumm daemon.

We have a complicated set of scripts that vacuum some tables very
often, some other tables less often, yet other tables rarely, and
some tables only once a week.  Autovacuum is currently in final
testing, though, I believe (though it's not my department any more,
so liberal salting of my words is needed).

--
Andrew Sullivan  | ajs@crankycanuck.ca
Everything that happens in the world happens at some place.
        --Jane Jacobs

Re: How to monitor resources on Linux.

От
Andrew Sullivan
Дата:
On Tue, Aug 28, 2007 at 03:24:43PM -0500, Scott Marlowe wrote:
> Also, by 7.4 autovacuum existed, even if it isn't perfect yet.  It's
> still better than weekly analyze.

I wouldn't use it -- it had serious issues.  But this is another
point: 7.4 has a big whack of performance issues compared to later
releases.  So if upgrading is at all an option, it's worth
considering.  (This is all unrelated to memory use, though.)

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
However important originality may be in some fields, restraint and
adherence to procedure emerge as the more significant virtues in a
great many others.   --Alain de Botton

Re: How to monitor resources on Linux.

От
John R Allgood
Дата:
Here is the output from free on one of the nodes. I have seen free mem go as low as 15 and then go back up. Like I was saying earlier my concern was why the kernel started killing my postmasters. Here is the kernel message "kernel: oom-killer: gfp_mask=0xd0". This started happening when were running our midday backup. After backup runs vacuum will start up right after. We rebooted the servers last night and today backup and vacuum ran fine. Below the free out put I have added the logging out of /var/log/messages. Thanks this is a great list.

i                             total       used       free     shared    buffers     cached
Mem:          8116       5969       2146          0        144       4318
Low:           821        510        310
High:         7294       5459       1835
-/+ buffers/cache:       1506       6609
Swap:         2000          0       1999



ug 27 12:24:42 gan-lxc-01 kernel: Swap cache: add 2104, delete 2017, find 829/1136, race 0+0
Aug 27 12:24:42 gan-lxc-01 kernel: 1229 bounce buffer pages
Aug 27 12:24:42 gan-lxc-01 kernel: Free swap:       2047424kB
Aug 27 12:24:42 gan-lxc-01 kernel: 2260992 pages of RAM
Aug 27 12:24:42 gan-lxc-01 kernel: 1867512 pages of HIGHMEM
Aug 27 12:24:42 gan-lxc-01 kernel: 183273 reserved pages
Aug 27 12:24:42 gan-lxc-01 kernel: 942026 pages shared
Aug 27 12:24:42 gan-lxc-01 kernel: 87 pages swap cached
Aug 27 12:24:42 gan-lxc-01 kernel: Out of Memory: Killed process 19383 (postmaster).



Scott Marlowe wrote:
On 8/28/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote: 
On Tue, Aug 28, 2007 at 03:40:03PM -0400, John R Allgood wrote:   
lot of activity as compared to the other databases. We run VACUUM at
midday VACUUM FULL at night, VACUUM ANALYZE on weekends.     
If you are running VACUUM often enough, then you should _never_ need
VACUUM FULL.  And weekly VACUUM ANALYSE is probably too infrequent.   
I would go so far as to say that vacuum fulls should never need to be
scheduled.  they should only be run when the DBA has looked at the DB
and determined that "something bad has happened" and needs to run it.
And even then, reindexdb is usually a better choice.

Also, by 7.4 autovacuum existed, even if it isn't perfect yet.  It's
still better than weekly analyze.

As for the top output, I'm pretty sure it's in bytes.

133947392 is about 125Meg as the OP mentioned later is what he has
shared mem set to.

You said: "we see memory usage peak and then it will go down"

What do you mean by this?  What does free say before during and after.

Here's free on my db server right now:
            total       used       free     shared    buffers     cached
Mem:       2072460    2043440      29020          0      42980    1891160
-/+ buffers/cache:     109300    1963160
Swap:      2097144        536    2096608

Note that I'm showing 29Meg free.  But I've got 42Meg buffers and 1.8Gig cached.

My memory's not used up.

So, we're all just trying to be sure that you really are running out of memory.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
              http://www.postgresql.org/docs/faq
 

Re: How to monitor resources on Linux.

От
"Scott Marlowe"
Дата:
On 8/28/07, John R Allgood <jallgood@the-allgoods.net> wrote:
>
>  Here is the output from free on one of the nodes. I have seen free mem go
> as low as 15 and then go back up. Like I was saying earlier my concern was
> why the kernel started killing my postmasters. Here is the kernel message
> "kernel: oom-killer: gfp_mask=0xd0". This started happening when were
> running our midday backup. After backup runs vacuum will start up right
> after. We rebooted the servers last night and today backup and vacuum ran
> fine. Below the free out put I have added the logging out of
> /var/log/messages. Thanks this is a great list.
>
>  i                             total       used       free     shared
> buffers     cached
>  Mem:          8116       5969       2146          0        144       4318
>  Low:           821        510        310
>  High:         7294       5459       1835
>  -/+ buffers/cache:       1506       6609
>  Swap:         2000          0       1999

I'm assuming those numbers are in megabytes.  If so, then they're
pretty realistic.  you've 2 Gig free, and 4 Gig cached, with 144 Meg
buffer mem.  Very reasonable.  Have you got the output of free when
things are going wrong?

>  ug 27 12:24:42 gan-lxc-01 kernel: Swap cache: add 2104, delete 2017, find
> 829/1136, race 0+0
>  Aug 27 12:24:42 gan-lxc-01 kernel: 1229 bounce buffer pages
>  Aug 27 12:24:42 gan-lxc-01 kernel: Free swap:       2047424kB
>  Aug 27 12:24:42 gan-lxc-01 kernel: 2260992 pages of RAM
>  Aug 27 12:24:42 gan-lxc-01 kernel: 1867512 pages of HIGHMEM
>  Aug 27 12:24:42 gan-lxc-01 kernel: 183273 reserved pages
>  Aug 27 12:24:42 gan-lxc-01 kernel: 942026 pages shared
>  Aug 27 12:24:42 gan-lxc-01 kernel: 87 pages swap cached
>  Aug 27 12:24:42 gan-lxc-01 kernel: Out of Memory: Killed process 19383
> (postmaster).

IS there any other context to go with this, like something from the
postgres logs at the same time, maybe top output sorted by memory...

I'm wondering if you're just running a few queries that fire really
big sorts off and that's what's getting you.

Re: How to monitor resources on Linux.

От
Tom Lane
Дата:
John R Allgood <jallgood@the-allgoods.net> writes:
> Here is the output from free on one of the nodes.

Hmmm ... I'm not exactly a kernel jock, but I find the presence of these
lines in the output to be mighty suspicious:

> Low:           821        510        310
> High:         7294       5459       1835

In a true 64-bit system you should not have any distinction between low
and high memory (and indeed "free" doesn't print any such thing on my
x86_64 box).  Maybe you are running a 32-bit kernel?  Anyway it seems
likely that the out-of-memory situation was due to oversubscribed lowmem
rather than being out of memory globally, especially since you are
running at zero swap usage.  This is not the trace of a system that's
under any memory pressure overall:

>              total       used       free
> Mem:          8116       5969       2146
> -/+ buffers/cache:       1506       6609
> Swap:         2000          0       1999

Something else to check is whether having swap only a quarter the size
of physical RAM is a good idea or not.  I'm not sure what the latest
configuration theories are for Linux, but the old-school-Unix theory was
always that you should have more swap space than RAM.  When memory
overcommit is disabled, having plenty of swap space available may be
necessary even if you're seemingly not using it --- the kernel needs to
be sure that there would be someplace to put everything if it had to
materialize all the virtual copy-on-write pages that the current process
set is sharing.

            regards, tom lane