Re: [GENERAL] Autovacuum Improvements

Поиск

Список

Период

Сортировка

От	Gregory Stark
Тема	Re: [GENERAL] Autovacuum Improvements
Дата	22 января 2007 г. 13:52:38
Msg-id	877ivfeypy.fsf@stark.xeocode.com обсуждение
Ответ на	Re: [GENERAL] Autovacuum Improvements (Bruce Momjian <bruce@momjian.us>)
Ответы	Re: [GENERAL] Autovacuum Improvements Re: [GENERAL] Autovacuum Improvements Re: [GENERAL] Autovacuum Improvements Re: [GENERAL] Autovacuum Improvements Re: [GENERAL] Autovacuum Improvements
Список	pgsql-hackers

Дерево обсуждения

"Bruce Momjian" <bruce@momjian.us> writes:

> I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> out how small, and it seems about 15% of the total table, which means if
> we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> 80%, assuming 5% of the table is scanned.

Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.

That is, if you assum that only 5% of the table will be scanned and you
arrange to do it sequentially then you should expect the i/o to be marginally
faster than just reading the entire table. Vacuum does do some cpu work and
wouldn't have to consult the clog as often, so it would still be somewhat
faster.

The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.
I also had some theories involving linux being confused by the seeks and
turning off read-ahead but I could never prove them.

In short, to see big benefits you would have to have a much smaller percentage
of the table being read. That shouldn't be taken to mean that the DSM is a
loser. There are plenty of use cases where tables can be extremely large and
have only very small percentages that are busy. The big advantage of the DSM
is that it takes the size of the table out of the equation and replaces it
with the size of the busy portion of the table. So updating a single record in
a terabyte table has the same costs as updating a single record in a kilobyte
table.

Sadly that's not quite true due to indexes, and due to the size of the bitmap
itself. But going back to your numbers it does mean that if you update a
single row out of a terabyte table then we'll be removing about 85% of the i/o
(minus the i/o needed to read the DSM, about .025%). If you update about 1%
then you would be removing substantially less, and once you get to about 10%
then you're back where you started.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [GENERAL] Autovacuum Improvements