Обсуждение: odd iostat graph
Hey folks, Earlier in the week I wrote a Munin plugin that takes the "await" and "average queue length" fields from "iostat -x" and graphs them. This seems rather odd to me : http://picasaweb.google.ca/alan.mckay/Work#5380253477470243954 That is Qlen. And await looks similar http://picasaweb.google.ca/alan.mckay/Work#5380254090296723426 This is on an IBM 3650 with the 2 main "internal" drives set up in a mirrored config, and sdb are the 6 other drives set up in a RAID5 with a global hot spare. (4 drives in array + 1 to make it RAID5 + global hot spare) We aren't seeing any performance problems on this per-se. But that just seems like a really odd graph to me. Can anyone explain it? In particular, how regular it is? cheers, -Alan -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"
On 9/11/09 9:58 AM, "Alan McKay" <alan.mckay@gmail.com> wrote: > Hey folks, > > Earlier in the week I wrote a Munin plugin that takes the "await" and > "average queue length" fields from "iostat -x" and graphs them. > > This seems rather odd to me : > > http://picasaweb.google.ca/alan.mckay/Work#5380253477470243954 > > That is Qlen. And await looks similar > > http://picasaweb.google.ca/alan.mckay/Work#5380254090296723426 > > This is on an IBM 3650 with the 2 main "internal" drives set up in a > mirrored config, and sdb are the 6 other drives set up in a RAID5 with > a global hot spare. (4 drives in array + 1 to make it RAID5 + global > hot spare) > > We aren't seeing any performance problems on this per-se. But that > just seems like a really odd graph to me. Can anyone explain it? In > particular, how regular it is? My guess is this is checkpoint related. Find out when your checkpoints are happening. The drops are most likely due to the sync() on all outstanding writes at the end of each checkpoint. The rise is probably small writes not yet on disk in the OS bufffer cache. If this is due to checkpoints, I would expect a burst of write volume to disk at the same time of the drop. You can change your logging settings to output the time of each checkpoint and some stats about them. > > cheers, > -Alan > > -- > ³Don't eat anything you've ever seen advertised on TV² > - Michael Pollan, author of "In Defense of Food" > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
> My guess is this is checkpoint related. I'll assume "checkpoint" is a PG term that I'm not yet familiar with - will query my DBA :-) If this OS buffer cache, wouldn't that be cached an awfully long time? i.e. we're in big trouble if we get a bad crash? -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"
Alan McKay <alan.mckay@gmail.com> wrote: >> My guess is this is checkpoint related. > > I'll assume "checkpoint" is a PG term that I'm not yet familiar with - > will query my DBA :-) A checkpoint flushes all dirty PostgreSQL buffers to the OS and then tells the OS to write them to disk. The exact details of how that's done and the timings involved vary with PostgreSQL version and configuration. > If this OS buffer cache, wouldn't that be cached an awfully long > time? i.e. we're in big trouble if we get a bad crash? Before the commit of a database transaction is completed the changes which are involved in that are written to a write ahead log (WAL). A checkpoint is also recorded in the WAL. On recovery from a crash, PostgreSQL replays all activity from committed transactions after the last checkpoint; so nothing from a committed transaction is lost. -Kevin
On Fri, 11 Sep 2009, Alan McKay wrote: > We aren't seeing any performance problems on this per-se. But that > just seems like a really odd graph to me. Can anyone explain it? In > particular, how regular it is? What's the scale on the bottom there? The label says "by week" but the way your message is written makes me think it's actually a much smaller time frame. If those valleys are around around five minutes apart, those are the checkpoints finishing; the shape of the graph is right for it to be those. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
> What's the scale on the bottom there? The label says "by week" but the way > your message is written makes me think it's actually a much smaller time > frame. If those valleys are around around five minutes apart, those are the > checkpoints finishing; the shape of the graph is right for it to be those. No, that's about 6 days of stats. The numbers 04, 05, 06 are Sept 04, 05, 06 ... This is the part I found oddest - that this takes place over such a huge timeframe! Munin takes a snapshot every 5 minutes, and this graph shows it averaged over that timeframe. -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"
On Fri, 11 Sep 2009, Alan McKay wrote: > Munin takes a snapshot every 5 minutes, and this graph shows it > averaged over that timeframe. The default postgresql.conf puts a checkpoint every 5 minutes as well. It's not going to be as exact as Munin's time though, they'll be just a little longer than that. I wonder if you're seeing the checkpoint pattern anyway. If the checkpoint period is fairly stable (which it could be in your case), is slighly longer than the monitoring one, and each checkpoint has the same basic shape (also usually true), each monitoring sample is going to trace out the usual checkpoint pattern by sampling a slightly later point from successive ones. The shape of the curve you're seeing seems way too close to the standard checkpoint one to be just a coincidence. In any case, 5 minutes is unfortunately for you a really bad sampling period for a PG database because of the similarity to the checkpoint timing. You have to take measurements at least once a minute to see the checkpoints happening usefully at all. I think you're stuck generating an activity graph some other way with finer resolution to get to the bottom of this. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD