Обсуждение: vacuuming slow
Can concurrent updates/deletes slow down vacuum when it is progressing ? I mean to ask if vacuum would have to redo or stall its work because of the updates/deletes. Is it even possible that it goes into a long loop while such updates occur ? The reason for my question is that I'm seeing vacuuming take several hours on a big table (~1million rows) that is frequently being updated (1000 updates/min). They are run around 2 times a day and each time it takes ~3 hrs. There are various other processes on the box using the database, but the I/O load isn't very high. When vacuum runs, it causes various I/O tasks to run very slowly. To speed this up I have changed vacuum_mem from 8MB to 1Gb. This did help. I'm still collecting data to see how much it has improved it by. But I suspect this isn't the end. The other thing I do not understand is as follows: I looked at pg_stat_all_tables and obtained n_tup_ins, upd, del etc on a production machine to reproduce the long vacuum times on my local system. But when I simulated inserts/updates/deletes to get at the same stats for the table, vacuum ran surprisingly fast (production took 1.5hrs, local system 3mins). On my local system though I had turned off other application processes. But I do not find my results reasonable since these processes are not I/O intensive. Thus, I'm trying to either (1) fix my application if it is inevitable, so I don't do as many updates/min or (2) understand vacuuming better and fix something I've missed. Thanks all, Joe
Joe Maldonado <jmaldonado@webehosting.biz> writes: > Can concurrent updates/deletes slow down vacuum when it is progressing ? I > mean to ask if vacuum would have to redo or stall its work because of the > updates/deletes. Is it even possible that it goes into a long loop while > such updates occur ? vacuum has to take an exclusive lock at the page level on any page it is trying to remove tuples from. The code is such that any ordinary operations on the same page will block the vacuum; vacuum doesn't get the lock until no one else is interested in the page. Given sufficiently heavy concurrent activity on the table, I suppose it could take quite a while for vacuum to finish. > The reason for my question is that I'm seeing vacuuming take several hours > on a big table (~1million rows) that is frequently being updated (1000 > updates/min). They are run around 2 times a day and each time it takes ~3 > hrs. There are various other processes on the box using the database, but > the I/O load isn't very high. When vacuum runs, it causes various I/O > tasks to run very slowly. However, if the scenario I described were your problem, the vacuum would be spending most of its time just blocked waiting for page locks; it therefore wouldn't have much effect on I/O. What I suspect is that your machine has no I/O bandwidth to spare and the extra demands of the vacuum are just saturating your disk. You might look at iostat or vmstat output to see what's going on. Also check with ps or top to see if the vacuuming backend spends most of its time in "S" (sleep) or "D" (disk IO) state. If you are using PG 8.0 you could experiment with vacuum_cost_delay and associated parameters. This would slow down vacuum even more in terms of elapsed time to finish, but it should reduce the impact on concurrent processing. regards, tom lane
Tom Lane wrote: >Joe Maldonado <jmaldonado@webehosting.biz> writes: > > >>Can concurrent updates/deletes slow down vacuum when it is progressing ? I >>mean to ask if vacuum would have to redo or stall its work because of the >>updates/deletes. Is it even possible that it goes into a long loop while >>such updates occur ? >> >> > >vacuum has to take an exclusive lock at the page level on any page it is >trying to remove tuples from. The code is such that any ordinary >operations on the same page will block the vacuum; vacuum doesn't get >the lock until no one else is interested in the page. Given >sufficiently heavy concurrent activity on the table, I suppose it could >take quite a while for vacuum to finish. > > > >>The reason for my question is that I'm seeing vacuuming take several hours >>on a big table (~1million rows) that is frequently being updated (1000 >>updates/min). They are run around 2 times a day and each time it takes ~3 >>hrs. There are various other processes on the box using the database, but >>the I/O load isn't very high. When vacuum runs, it causes various I/O >>tasks to run very slowly. >> >> > >However, if the scenario I described were your problem, the vacuum would >be spending most of its time just blocked waiting for page locks; it >therefore wouldn't have much effect on I/O. What I suspect is that your >machine has no I/O bandwidth to spare and the extra demands of the >vacuum are just saturating your disk. You might look at iostat or >vmstat output to see what's going on. Also check with ps or top to >see if the vacuuming backend spends most of its time in "S" (sleep) >or "D" (disk IO) state. > > > most of the time is spent in D state. Also, the rest of the system isn't doing much I/O. iostat shows that without vacuum (on SCSI Raid1), our application is not using much I/O on an avg (although there is bursty I/O) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz /dev/sda 0.00 1.60 0.00 1.20 0.00 22.40 0.00 11.20 18.67 1.97 await svctm %util 164.17 42.50 5.10 and with vacuum Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz /dev/sda 0.20 96.20 0.30 90.70 4.00 1468.80 2.00 734.40 16.18 3050.21 await svctm %util 3244.55 10.99 100.00 A await time of ~3seconds is sure to slow down other queries from our application and the system in general. But our tables and nature of our I/O updates are bursty. I have 2 questions : 1. Does Checkpointing write the pages that have been vacuumed or does vacuum not affect Checkpointing I/O ? 2. Since vacuum in 7. 4 capable of disrupting disk latency (although disk bandwidth is prob only 40% used) so much, given that it is in D state most of the time, is rewriting the application to use temp tables, truncate etc so as to avoid vacuum a good idea ? Thanks. >If you are using PG 8.0 you could experiment with vacuum_cost_delay and >associated parameters. This would slow down vacuum even more in terms >of elapsed time to finish, but it should reduce the impact on concurrent >processing. > > regards, tom lane > > OK thanks
Joe Maldonado <jmaldonado@webehosting.biz> writes: > iostat shows that without vacuum (on SCSI Raid1), our application > is not using much I/O on an avg (although there is bursty I/O) > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s > avgrq-sz avgqu-sz > /dev/sda 0.00 1.60 0.00 1.20 0.00 22.40 0.00 11.20 > 18.67 1.97 > await svctm %util > 164.17 42.50 5.10 > and with vacuum > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s > avgrq-sz avgqu-sz > /dev/sda 0.20 96.20 0.30 90.70 4.00 1468.80 2.00 734.40 > 16.18 3050.21 > await svctm %util > 3244.55 10.99 100.00 OK, so sure enough vacuum is saturating your disk. > 1. Does Checkpointing write the pages that have been vacuumed or does > vacuum not affect Checkpointing I/O ? The checkpointer would be sharing in the burden, yes. Vacuum isn't all that much different from normal updates in how the changes get written out --- it just tends to make a lot more updates than typical queries do. I notice that the write volume vastly exceeds the read volume. If that's consistently true, it suggests that you might get some win by doing vacuums more often --- that is, you are leaving too much work to be done by any one vacuum scan. > 2. Since vacuum in 7. 4 capable of disrupting disk latency (although > disk bandwidth is prob only 40% used) so much, given that it is in D > state most of the time, is rewriting the application to use temp > tables, truncate etc so as to avoid vacuum a good idea ? Short of buying faster disks or moving to 8.0, I think that's your next option. Note that 8.0's vacuum-cost-delay feature doesn't reduce the I/O volume at all, it just spreads out the impact. So the vacuum will take longer than it does now, but hopefully it won't kill the performance of concurrent queries. Combining that with launching vacuums more frequently is probably the best hope at solving this without better hardware or massive application changes. regards, tom lane