Re: [GENERAL] Autovacuum Improvements

Поиск
Список
Период
Сортировка
От Gregory Stark
Тема Re: [GENERAL] Autovacuum Improvements
Дата
Msg-id 877ivfeypy.fsf@stark.xeocode.com
обсуждение исходный текст
Ответ на Re: [GENERAL] Autovacuum Improvements  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: [GENERAL] Autovacuum Improvements  (Bruce Momjian <bruce@momjian.us>)
Re: [GENERAL] Autovacuum Improvements  (Heikki Linnakangas <heikki@enterprisedb.com>)
Re: [GENERAL] Autovacuum Improvements  (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
Re: [GENERAL] Autovacuum Improvements  (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
Re: [GENERAL] Autovacuum Improvements  (Martijn van Oosterhout <kleptog@svana.org>)
Список pgsql-hackers
"Bruce Momjian" <bruce@momjian.us> writes:

> I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> out how small, and it seems about 15% of the total table, which means if
> we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> 80%, assuming 5% of the table is scanned.

Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.

That is, if you assum that only 5% of the table will be scanned and you
arrange to do it sequentially then you should expect the i/o to be marginally
faster than just reading the entire table. Vacuum does do some cpu work and
wouldn't have to consult the clog as often, so it would still be somewhat
faster.

The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.
I also had some theories involving linux being confused by the seeks and
turning off read-ahead but I could never prove them.

In short, to see big benefits you would have to have a much smaller percentage
of the table being read. That shouldn't be taken to mean that the DSM is a
loser. There are plenty of use cases where tables can be extremely large and
have only very small percentages that are busy. The big advantage of the DSM
is that it takes the size of the table out of the equation and replaces it
with the size of the busy portion of the table. So updating a single record in
a terabyte table has the same costs as updating a single record in a kilobyte
table.

Sadly that's not quite true due to indexes, and due to the size of the bitmap
itself. But going back to your numbers it does mean that if you update a
single row out of a terabyte table then we'll be removing about 85% of the i/o
(minus the i/o needed to read the DSM, about .025%). If you update about 1%
then you would be removing substantially less, and once you get to about 10%
then you're back where you started.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Simon Riggs"
Дата:
Сообщение: Re: [GENERAL] Autovacuum Improvements
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: [GENERAL] Autovacuum Improvements