Обсуждение: Need to find out which process is hitting hda
I'm using centos 5 as the OS so, there's no fancy dtrace to look at which processes is causing my disks to thrash. I have 4 disks in the box. (all ide, 7200rpm) 1 OS disk [hda] 2 raided (1) disks [hdb/hdc] 1 pg_xlog disk (and also used as an alternate tablespace for [hdd] temp/in-transit files via select, insert into tmp table. delete from tmp table, insert into footable select * from tmp table) Problem now I see from both atop and iostat, the Device: (iostat -dx 10) rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util hda 98.60 14.69 121.98 15.08 1775.02 2908.29 34.17 47.53 551.67 7.29 99.95 hdb 0.70 4.20 16.48 2.30 304.50 51.95 18.98 0.21 10.94 8.45 15.86 hdc 0.00 3.40 12.49 2.00 223.78 43.16 18.43 0.07 5.04 4.42 6.40 hdd 0.00 56.94 0.50 3.70 53.55 485.91 128.57 0.02 5.48 3.95 1.66 md0 0.00 0.00 29.57 11.89 526.67 95.10 15.00 0.00 0.00 0.00 0.00 the number of writes and reads on hda is much greater than expected and I'm not sure who/what is causing it. Thanks for any clues.
On Dec 13, 2007 5:06 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote: > I'm using centos 5 as the OS so, there's no fancy dtrace to look at > which processes is causing my disks to thrash. > > I have 4 disks in the box. (all ide, 7200rpm) > > 1 OS disk [hda] > 2 raided (1) disks [hdb/hdc] > 1 pg_xlog disk (and also used as an alternate tablespace for [hdd] > temp/in-transit files via select, insert into tmp table. delete from tmp > table, insert into footable select * from tmp table) > > the number of writes and reads on hda is much greater than expected and I'm not sure who/what is causing it. Logging? just guessing. Or swapping. What's free say?
On Dec 13, 2007 6:06 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote: > I'm using centos 5 as the OS so, there's no fancy dtrace to look at > which processes is causing my disks to thrash. > > I have 4 disks in the box. (all ide, 7200rpm) > > 1 OS disk [hda] > 2 raided (1) disks [hdb/hdc] > 1 pg_xlog disk (and also used as an alternate tablespace for [hdd] > temp/in-transit files via select, insert into tmp table. delete from tmp > table, insert into footable select * from tmp table) > > Problem now I see from both atop and iostat, the Device: (iostat -dx 10) > > rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util > hda 98.60 14.69 121.98 15.08 1775.02 2908.29 34.17 47.53 551.67 7.29 99.95 > hdb 0.70 4.20 16.48 2.30 304.50 51.95 18.98 0.21 10.94 8.45 15.86 > hdc 0.00 3.40 12.49 2.00 223.78 43.16 18.43 0.07 5.04 4.42 6.40 > hdd 0.00 56.94 0.50 3.70 53.55 485.91 128.57 0.02 5.48 3.95 1.66 > md0 0.00 0.00 29.57 11.89 526.67 95.10 15.00 0.00 0.00 0.00 0.00 > > the number of writes and reads on hda is much greater than expected and I'm not sure who/what is causing it. there are a few things that I can think of that can can cause postgres to cause i/o on a drive other than the data drive: * logging (eliminate this by moving logs temporarily) * swapping (swap is high and changing, other ways) * dumps, copy statement (check cron) * procedures, especially the external ones (perl, etc) that write to disk my seat-of-the-pants guess is that you are looking at swap. of course, a runaway program other than postgres can be the cause merlin
"Merlin Moncure" <mmoncure@gmail.com> writes: > there are a few things that I can think of that can can cause postgres > to cause i/o on a drive other than the data drive: > * logging (eliminate this by moving logs temporarily) > * swapping (swap is high and changing, other ways) > * dumps, copy statement (check cron) > * procedures, especially the external ones (perl, etc) that write to disk > my seat-of-the-pants guess is that you are looking at swap. vmstat would confirm or disprove that particular guess, since it tracks swap I/O separately. regards, tom lane
On Thu, 13 Dec 2007, Ow Mun Heng wrote: > I'm using centos 5 as the OS so, there's no fancy dtrace to look at > which processes is causing my disks to thrash. Does plain old top show you anything interesting? If you hit 'c' after starting it you'll get more information about the postgres processes in particular. > 1 OS disk [hda] > rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util > hda 98.60 14.69 121.98 15.08 1775.02 2908.29 34.17 47.53 551.67 7.29 99.95 The funny thing here is that both the writes and reads are very high compared to the other disks. That rules out most of what I go looking for when there's run-away activity. Many common causes do almost all reads (i.e. some filesystem crawler like updatedb running) or almost all writes (loggers gone wild!). Swapping might do both, so consider mine a second vote to correlate this with vmstat output. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, 2007-12-14 at 01:54 -0500, Tom Lane wrote: > "Merlin Moncure" <mmoncure@gmail.com> writes: > > there are a few things that I can think of that can can cause postgres > > to cause i/o on a drive other than the data drive: > > * logging (eliminate this by moving logs temporarily) I'll have to try this > > * swapping (swap is high and changing, other ways) > > * dumps, copy statement (check cron) Not doing any of these > > * procedures, especially the external ones (perl, etc) that write to disk Nope. the only perl running is just pulling data from the master DB into this little box > > > my seat-of-the-pants guess is that you are looking at swap. > > vmstat would confirm or disprove that particular guess, since it tracks > swap I/O separately. procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 2 6 300132 5684 4324 315888 420 32 1024 644 1309 485 35 11 0 54 0 0 6 299820 6768 4328 313004 588 76 3048 576 1263 588 36 12 0 52 0 0 6 299428 5424 4340 313700 480 36 2376 104 1291 438 24 9 0 67 0 2 6 298836 5108 4268 313788 800 0 2312 216 1428 625 30 10 0 60 0 2 6 298316 5692 4192 313044 876 0 1652 1608 1488 656 33 11 0 56 0 2 6 298004 6256 4140 312184 560 4 1740 1572 1445 601 42 11 0 47 0 I kept looking at the io columns and didn't even think of the swap partition. It's true that it's moving quite erratically but I won't say that it's really thrashing. total used free shared buffers cached Mem: 503 498 4 0 3 287 -/+ buffers/cache: 207 295 Swap: 2527 328 2199 (YEP, I know I'm RAM starved on this machine)
Ow Mun Heng <Ow.Mun.Heng@wdc.com> writes: >> vmstat would confirm or disprove that particular guess, since it tracks >> swap I/O separately. > procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ > r b swpd free buff cache si so bi bo in cs us sy id wa st > 2 6 300132 5684 4324 315888 420 32 1024 644 1309 485 35 11 0 54 0 > 0 6 299820 6768 4328 313004 588 76 3048 576 1263 588 36 12 0 52 0 > 0 6 299428 5424 4340 313700 480 36 2376 104 1291 438 24 9 0 67 0 > 2 6 298836 5108 4268 313788 800 0 2312 216 1428 625 30 10 0 60 0 > 2 6 298316 5692 4192 313044 876 0 1652 1608 1488 656 33 11 0 56 0 > 2 6 298004 6256 4140 312184 560 4 1740 1572 1445 601 42 11 0 47 0 > I kept looking at the io columns and didn't even think of the swap > partition. It's true that it's moving quite erratically but I won't say > that it's really thrashing. Hmmm ... my experience is that the si/so columns should show *zero* under normal load. What you're showing here is swap as a sizable percentage of total I/O load, and with the CPU spending the majority of its time in I/O wait, that's clearly where you need to focus your attention. > (YEP, I know I'm RAM starved on this machine) Yeah, that's what it looks like. Head down to your local CompUSA and get some RAM at fire-sale prices ... regards, tom lane
On Dec 14, 2007 1:33 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote: > I kept looking at the io columns and didn't even think of the swap > partition. It's true that it's moving quite erratically but I won't say > that it's really thrashing. > > total used free shared buffers cached > Mem: 503 498 4 0 3 287 > -/+ buffers/cache: 207 295 > Swap: 2527 328 2199 > > (YEP, I know I'm RAM starved on this machine) Good lord, my laptop has more memory than that. :) What Tom said, buy some more RAM. Also, look at turning down the swappiness setting as well.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 16 Dec 2007 17:55:55 -0600 "Scott Marlowe" <scott.marlowe@gmail.com> wrote: > On Dec 14, 2007 1:33 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote: > > I kept looking at the io columns and didn't even think of the swap > > partition. It's true that it's moving quite erratically but I won't > > say that it's really thrashing. > > > > total used free shared buffers > > cached Mem: 503 498 4 0 > > 3 287 -/+ buffers/cache: 207 295 > > Swap: 2527 328 2199 > > > > (YEP, I know I'm RAM starved on this machine) > > Good lord, my laptop has more memory than that. :) My phone has more memory than that :P Sincerely, Joshua D. Drake - -- The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD' -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHZb6bATb/zqfZUUQRAqmxAJ4o2PzaSUrxEAT9ElAfFNdnofKwaACfR6IZ 3uf1dtRME1SUyKKbPY1iwKU= =KJFh -----END PGP SIGNATURE-----
On Dec 16, 2007 6:11 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Sun, 16 Dec 2007 17:55:55 -0600 > "Scott Marlowe" <scott.marlowe@gmail.com> wrote: > > > On Dec 14, 2007 1:33 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote: > > > I kept looking at the io columns and didn't even think of the swap > > > partition. It's true that it's moving quite erratically but I won't > > > say that it's really thrashing. > > > > > > total used free shared buffers > > > cached Mem: 503 498 4 0 > > > 3 287 -/+ buffers/cache: 207 295 > > > Swap: 2527 328 2199 > > > > > > (YEP, I know I'm RAM starved on this machine) > > > > Good lord, my laptop has more memory than that. :) > > My phone has more memory than that :P Now that you mention it, my phone does indeed have more memory than my laptop as well. sheesh. technology doesn't march forward, it drag races forwards.
On Sun, 2007-12-16 at 16:11 -0800, Joshua D. Drake wrote: > On Sun, 16 Dec 2007 17:55:55 -0600 > "Scott Marlowe" <scott.marlowe@gmail.com> wrote: > > > On Dec 14, 2007 1:33 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote: > > > I kept looking at the io columns and didn't even think of the swap > > > partition. It's true that it's moving quite erratically but I won't > > > say that it's really thrashing. > > > > > > total used free shared buffers > > > cached Mem: 503 498 4 0 > > > 3 287 -/+ buffers/cache: 207 295 > > > Swap: 2527 328 2199 > > > > > > (YEP, I know I'm RAM starved on this machine) > > > > Good lord, my laptop has more memory than that. :) > > My phone has more memory than that :P What can I say :-p budgets are tight