Обсуждение: vacuuming slow

Поиск
Список
Период
Сортировка

vacuuming slow

От
Joe Maldonado
Дата:
Can concurrent updates/deletes slow down vacuum when it is progressing ? I
mean to ask if vacuum would have to redo or stall its work because of the
updates/deletes. Is it even possible that it goes into a long loop while
such updates occur ?

The reason for my question is that I'm seeing vacuuming take several hours
on a big table (~1million rows) that is frequently being updated (1000
updates/min). They are run around 2 times a day and each time it takes ~3
hrs. There are various other processes on the box using the database, but
the I/O load isn't very high. When vacuum runs, it causes various I/O
tasks to run very slowly.

To speed this up I have changed vacuum_mem from 8MB to 1Gb. This did help.
I'm still collecting data to see how much it has improved it by. But I
suspect this isn't the end.

The other thing I do not understand is as follows: I looked at
pg_stat_all_tables and obtained n_tup_ins, upd, del etc on a production
machine to reproduce the long vacuum times on my local system. But when I
simulated inserts/updates/deletes to get at the same stats for the table,
vacuum ran surprisingly fast (production took 1.5hrs, local system 3mins).
On my local system though I had turned off other application processes.
But I do not find my results reasonable since these processes are not I/O
intensive.

Thus, I'm trying to either (1) fix my application if it is inevitable, so
I don't do as many updates/min or (2) understand vacuuming better and fix
something I've missed.

Thanks all,

Joe

Re: vacuuming slow

От
Tom Lane
Дата:
Joe Maldonado <jmaldonado@webehosting.biz> writes:
> Can concurrent updates/deletes slow down vacuum when it is progressing ? I
> mean to ask if vacuum would have to redo or stall its work because of the
> updates/deletes. Is it even possible that it goes into a long loop while
> such updates occur ?

vacuum has to take an exclusive lock at the page level on any page it is
trying to remove tuples from.  The code is such that any ordinary
operations on the same page will block the vacuum; vacuum doesn't get
the lock until no one else is interested in the page.  Given
sufficiently heavy concurrent activity on the table, I suppose it could
take quite a while for vacuum to finish.

> The reason for my question is that I'm seeing vacuuming take several hours
> on a big table (~1million rows) that is frequently being updated (1000
> updates/min). They are run around 2 times a day and each time it takes ~3
> hrs. There are various other processes on the box using the database, but
> the I/O load isn't very high. When vacuum runs, it causes various I/O
> tasks to run very slowly.

However, if the scenario I described were your problem, the vacuum would
be spending most of its time just blocked waiting for page locks; it
therefore wouldn't have much effect on I/O.  What I suspect is that your
machine has no I/O bandwidth to spare and the extra demands of the
vacuum are just saturating your disk.  You might look at iostat or
vmstat output to see what's going on.  Also check with ps or top to
see if the vacuuming backend spends most of its time in "S" (sleep)
or "D" (disk IO) state.

If you are using PG 8.0 you could experiment with vacuum_cost_delay and
associated parameters.  This would slow down vacuum even more in terms
of elapsed time to finish, but it should reduce the impact on concurrent
processing.

            regards, tom lane

Re: vacuuming slow

От
Joe Maldonado
Дата:
Tom Lane wrote:

>Joe Maldonado <jmaldonado@webehosting.biz> writes:
>
>
>>Can concurrent updates/deletes slow down vacuum when it is progressing ? I
>>mean to ask if vacuum would have to redo or stall its work because of the
>>updates/deletes. Is it even possible that it goes into a long loop while
>>such updates occur ?
>>
>>
>
>vacuum has to take an exclusive lock at the page level on any page it is
>trying to remove tuples from.  The code is such that any ordinary
>operations on the same page will block the vacuum; vacuum doesn't get
>the lock until no one else is interested in the page.  Given
>sufficiently heavy concurrent activity on the table, I suppose it could
>take quite a while for vacuum to finish.
>
>
>
>>The reason for my question is that I'm seeing vacuuming take several hours
>>on a big table (~1million rows) that is frequently being updated (1000
>>updates/min). They are run around 2 times a day and each time it takes ~3
>>hrs. There are various other processes on the box using the database, but
>>the I/O load isn't very high. When vacuum runs, it causes various I/O
>>tasks to run very slowly.
>>
>>
>
>However, if the scenario I described were your problem, the vacuum would
>be spending most of its time just blocked waiting for page locks; it
>therefore wouldn't have much effect on I/O.  What I suspect is that your
>machine has no I/O bandwidth to spare and the extra demands of the
>vacuum are just saturating your disk.  You might look at iostat or
>vmstat output to see what's going on.  Also check with ps or top to
>see if the vacuuming backend spends most of its time in "S" (sleep)
>or "D" (disk IO) state.
>
>
>
most of the time is spent in D state. Also, the rest of the system isn't
doing
much I/O. iostat shows that without vacuum (on SCSI Raid1), our application
is not using much I/O on an avg (although there is bursty I/O)

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz
/dev/sda     0.00   1.60  0.00  1.20    0.00   22.40     0.00    11.20
   18.67     1.97

await  svctm  %util
164.17  42.50   5.10

and with vacuum

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz
/dev/sda     0.20  96.20  0.30 90.70    4.00 1468.80     2.00   734.40
   16.18  3050.21

await  svctm  %util
3244.55  10.99 100.00

A await time of ~3seconds is sure to slow down other queries from our
application and the system in general. But our tables and nature of
our I/O updates are bursty.

I have 2 questions :

1. Does Checkpointing write the pages that have been vacuumed or does
vacuum not affect Checkpointing I/O ?

2. Since vacuum in 7. 4 capable of disrupting disk latency (although
disk bandwidth is prob only 40% used) so much, given that it is in D
state most of the time, is rewriting the application to use temp
tables, truncate etc so as to avoid vacuum a good idea ?

Thanks.

>If you are using PG 8.0 you could experiment with vacuum_cost_delay and
>associated parameters.  This would slow down vacuum even more in terms
>of elapsed time to finish, but it should reduce the impact on concurrent
>processing.
>
>            regards, tom lane
>
>
OK thanks


Re: vacuuming slow

От
Tom Lane
Дата:
Joe Maldonado <jmaldonado@webehosting.biz> writes:
> iostat shows that without vacuum (on SCSI Raid1), our application
> is not using much I/O on an avg (although there is bursty I/O)

> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz
> /dev/sda     0.00   1.60  0.00  1.20    0.00   22.40     0.00    11.20
>    18.67     1.97

> await  svctm  %util
> 164.17  42.50   5.10

> and with vacuum

> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz
> /dev/sda     0.20  96.20  0.30 90.70    4.00 1468.80     2.00   734.40
>    16.18  3050.21

> await  svctm  %util
> 3244.55  10.99 100.00

OK, so sure enough vacuum is saturating your disk.

> 1. Does Checkpointing write the pages that have been vacuumed or does
> vacuum not affect Checkpointing I/O ?

The checkpointer would be sharing in the burden, yes.  Vacuum isn't all
that much different from normal updates in how the changes get written
out --- it just tends to make a lot more updates than typical queries
do.

I notice that the write volume vastly exceeds the read volume.  If
that's consistently true, it suggests that you might get some win
by doing vacuums more often --- that is, you are leaving too much
work to be done by any one vacuum scan.

> 2. Since vacuum in 7. 4 capable of disrupting disk latency (although
> disk bandwidth is prob only 40% used) so much, given that it is in D
> state most of the time, is rewriting the application to use temp
> tables, truncate etc so as to avoid vacuum a good idea ?

Short of buying faster disks or moving to 8.0, I think that's your next
option.

Note that 8.0's vacuum-cost-delay feature doesn't reduce the I/O volume
at all, it just spreads out the impact.  So the vacuum will take longer
than it does now, but hopefully it won't kill the performance of
concurrent queries.  Combining that with launching vacuums more
frequently is probably the best hope at solving this without better
hardware or massive application changes.

            regards, tom lane