Обсуждение: what server stats to track / monitor ?

От:
Alan McKay
Дата:

Hey folks,

I'm new to performance monitoring and tuning of PG/Linux (have a fair
bit of experience in Windows, though those skills were last used about
5 years ago)

I finally have Munin set up in my production environment, and my
goodness it tracks a whole whack of stuff by default!

I want to turn off the graphing of unimportant data, to unclutter the
graphs and focus on what's important.

So, from the perspective of both Linux and PG, is there canonical list
of "here are the most important X things to track" ?

On the PG side I currently have 1 graph for # connections, another for
DB size, and another for TPS.  Then there are a few more graphs that
are really cluttered up, each with 8 or 9 things on them.

On the Linux side, I clearly want to track HD usage, CPU, memory.  But
not sure what aspects of each.  There is also a default Munin graph
for IO Stat - not sure what I am looking for there (I know what it
does of course, just not sure what to look for in the numbers)

I know some of this stuff was mentioned at PG Con so now I start going
back through all my notes and the videos.  Already been reviewing.

If there is not already a wiki page for this I'll write one.   I see
this is a good general jump off point :

http://wiki.postgresql.org/wiki/Performance_Optimization

But jumping off from there (and searching on "Performance") does not
come up with anything like what I am talking about.

Is there some good Linux performance monitoring and tuning reading
that you can recommend?

thanks,
-Alan

--
“Mother Nature doesn’t do bailouts.”
         - Glenn Prickett

От:
Joshua Tolley
Дата:

On Fri, Jun 12, 2009 at 03:52:19PM -0400, Alan McKay wrote:
> I want to turn off the graphing of unimportant data, to unclutter the
> graphs and focus on what's important.

I'm unfamiliar with Munin, but if you can turn off the graphing (so as to
achieve your desired level of un-cluttered-ness) without disabling the capture
of the data that was being graphed, you'll be better off. Others' opinions may
certainly vary, but in my experience, provided you're not causing a
performance problem simply because you're monitoring so much stuff, you're
best off capturing every statistic reasonably possible. The time will probably
come when you'll find that that statistic, and all the history you've been
capturing for it, becomes useful.

- Josh / eggyknap


От:
Alan McKay
Дата:

> I'm unfamiliar with Munin, but if you can turn off the graphing (so as to
> achieve your desired level of un-cluttered-ness) without disabling the capture
> of the data that was being graphed, you'll be better off. Others' opinions may
> certainly vary, but in my experience, provided you're not causing a
> performance problem simply because you're monitoring so much stuff, you're
> best off capturing every statistic reasonably possible. The time will probably
> come when you'll find that that statistic, and all the history you've been
> capturing for it, becomes useful.

Yes, Munin does allow me to turn off graphing without turning off collecting.

Any pointers for good reading material here?   Other tips?


--
“Don't eat anything you've ever seen advertised on TV”
         - Michael Pollan, author of "In Defense of Food"

От:
Alan McKay
Дата:

Yes, I'm familiar with Staplr - if anyone from myyearbook.com is
listening in, I'm still hoping for that 0.7 update :-)   I plan to run
both for the immediate term at least.

But this only concerns collecting - my biggest concern is how to
read/interpret the data!  Pointers to good reading material would be
greatly appreciated.

On Fri, Jun 12, 2009 at 4:40 PM, Rauan Maemirov<> wrote:
> Hi Alan. For simple needs you can use Staplr, it's very easy to configure.
> There's also one - zabbix, pretty much.


--
“Don't eat anything you've ever seen advertised on TV”
         - Michael Pollan, author of "In Defense of Food"

От:
Rauan Maemirov
Дата:

Hi Alan. For simple needs you can use Staplr, it's very easy to configure.
There's also one - zabbix, pretty much.

2009/6/13 Alan McKay:
> Hey folks,
>
> I'm new to performance monitoring and tuning of PG/Linux (have a fair
> bit of experience in Windows, though those skills were last used about
> 5 years ago)
>
> I finally have Munin set up in my production environment, and my
> goodness it tracks a whole whack of stuff by default!
>
> I want to turn off the graphing of unimportant data, to unclutter the
> graphs and focus on what's important.
>
> So, from the perspective of both Linux and PG, is there canonical list
> of "here are the most important X things to track" ?
>
> On the PG side I currently have 1 graph for # connections, another for
> DB size, and another for TPS.  Then there are a few more graphs that
> are really cluttered up, each with 8 or 9 things on them.
>
> On the Linux side, I clearly want to track HD usage, CPU, memory.  But
> not sure what aspects of each.  There is also a default Munin graph
> for IO Stat - not sure what I am looking for there (I know what it
> does of course, just not sure what to look for in the numbers)
>
> I know some of this stuff was mentioned at PG Con so now I start going
> back through all my notes and the videos.  Already been reviewing.
>
> If there is not already a wiki page for this I'll write one.   I see
> this is a good general jump off point :
>
> http://wiki.postgresql.org/wiki/Performance_Optimization
>
> But jumping off from there (and searching on "Performance") does not
> come up with anything like what I am talking about.
>
> Is there some good Linux performance monitoring and tuning reading
> that you can recommend?
>
> thanks,
> -Alan
>
> --
> “Mother Nature doesn’t do bailouts.”
>         - Glenn Prickett
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

От:
Joshua Tolley
Дата:

On Fri, Jun 12, 2009 at 04:40:12PM -0400, Alan McKay wrote:
> Any pointers for good reading material here?   Other tips?

The manuals and/or source code for your software? Stories, case studies, and
reports from others in similar situations who have gone through problems?
Monitoring's job is to avert crises by letting you know things are going south
before they die completely. So you probably want to figure out ways in which
your setup is most likely to die, and make sure the critical points in that
equation are well-monitored, and you understand the monitoring. Provided you
stick with it long enough, you'll inevitably encounter a breakdown of some
kind or other, which will help you refine your idea of which points are
critical.

Apart from that, I find it's helpful to read about statistics and formal
testing, so you have some idea how confident you can be that the monitors are
accurate, that your decisions are justified, etc. But that's not everyone's
cup of tea...

- Josh / eggyknap

От:
Greg Smith
Дата:

On Fri, 12 Jun 2009, Alan McKay wrote:

> So, from the perspective of both Linux and PG, is there canonical list
> of "here are the most important X things to track" ?

Not really, which is why you haven't gotten such a list from anyone here.
Exactly what's important to track does vary a bit based on expected
workload, and most of the people who have been through this enough to give
you a good answer are too busy to write one (you've been in my "I should
respond to that" queue for two weeks before I found time to write).

> Is there some good Linux performance monitoring and tuning reading
> that you can recommend?

The only good intro to this I've ever seen, from the perspective of
monitoring things would be useful to a database administrator, is the
coverage of monitoring in "Performance Tuning for Linux Servers" by
Johnson/Huizenga/Pulavarty.  Their tuning advice wasn't so useful, but
most OS tuning suggestions aren't either.

The more useful way to ask the question you'd like an answer to is "when
my server starts to perform badly, what does that correlate with?"  Find
out what you need to investigate to figure that out, and you can determine
what you should have been monitoring all along.  That is unfortunately
workload dependant; the stuff that tends to go wrong in a web app is very
different from what happens to a problematic data warehouse for example.

The basic important OS level stuff to watch is:

-Total memory in use
-All the CPU% numbers
-Disk read/write MB/s at all levels of granularity you can collect (total
across the system, filesystem, array, individual disk).  You'll only want
to track the total until there's a problem, at which point it's nice to
have more data to drilldown into.

There's a bunch more disk and memory stats available, I rarely find them
of any use.  The one Linux specific bit I do like to monitor is the line
labeled "Writeback" in /proc/meminfo/ , because that's the best indicator
of how much write cache is being done at the OS level.  That's a warning
sign of many problems in an area Linux often has problems with.

On the database side, you want to periodically check the important
pg_stat-* views to get an idea how much activity and happening (and where
it's happening at), as well as looking for excessive dead tuples and bad
index utilization (which manifests by things like too many sequential
scans):

-pg_stat_user_indexes
-pg_stat_user_tables
-pg_statio_user_indexes
-pg_statio_user_tables

If your system is write-intensive at all, you should watch
pg_stat_bgwriter too to keep an eye on when that goes badly.

At a higher level, it's a good idea to graph the size of the tables and
indexes most important to your application over time.

It can be handy to track things derived from pg_stat_activity too, like
total connections and how old the oldest transaction is.  pg_locks can be
handy to track stats on too, something like these two counts over time:

select (select count(*) from pg_locks where granted) as granted,(select
count(*) from pg_locks where not granted) as ungranted;

That's the basic set I find myself looking at regularly enough that I wish
I always had a historical record of them from the system.  Never bothered
to work this into a more formal article because a) the workload specific
stuff makes it complicated to explain for everyone, b) the wide variation
in and variety of monitoring tools out there, and c) wanting to cover the
material right which takes a while to do on a topic this big.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

От:
Alan McKay
Дата:

Thanks Greg!

On Fri, Jun 26, 2009 at 11:27 PM, Greg Smith<> wrote:



--
“Don't eat anything you've ever seen advertised on TV”
         - Michael Pollan, author of "In Defense of Food"