Re: [HACKERS] Autovacuum Improvements

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: [HACKERS] Autovacuum Improvements
Дата
Msg-id 200701221827.l0MIRJp00294@momjian.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Autovacuum Improvements  (Gregory Stark <stark@enterprisedb.com>)
Ответы Re: [HACKERS] Autovacuum Improvements  (Gregory Stark <stark@enterprisedb.com>)
Re: [HACKERS] Autovacuum Improvements  ("Simon Riggs" <simon@2ndquadrant.com>)
Re: [HACKERS] Autovacuum Improvements  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: [HACKERS] Autovacuum Improvements  (Richard Huxton <dev@archonet.com>)
Список pgsql-general
Yep, agreed on the random I/O issue.  The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once.  And
vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.

---------------------------------------------------------------------------

Gregory Stark wrote:
> "Bruce Momjian" <bruce@momjian.us> writes:
>
> > I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> > out how small, and it seems about 15% of the total table, which means if
> > we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> > 80%, assuming 5% of the table is scanned.
>
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%.
>
> That is, if you assum that only 5% of the table will be scanned and you
> arrange to do it sequentially then you should expect the i/o to be marginally
> faster than just reading the entire table. Vacuum does do some cpu work and
> wouldn't have to consult the clog as often, so it would still be somewhat
> faster.
>
> The theory online was that as long as you're reading one page from each disk
> track you're going to pay the same seek overhead as reading the entire track.
> I also had some theories involving linux being confused by the seeks and
> turning off read-ahead but I could never prove them.
>
> In short, to see big benefits you would have to have a much smaller percentage
> of the table being read. That shouldn't be taken to mean that the DSM is a
> loser. There are plenty of use cases where tables can be extremely large and
> have only very small percentages that are busy. The big advantage of the DSM
> is that it takes the size of the table out of the equation and replaces it
> with the size of the busy portion of the table. So updating a single record in
> a terabyte table has the same costs as updating a single record in a kilobyte
> table.
>
> Sadly that's not quite true due to indexes, and due to the size of the bitmap
> itself. But going back to your numbers it does mean that if you update a
> single row out of a terabyte table then we'll be removing about 85% of the i/o
> (minus the i/o needed to read the DSM, about .025%). If you update about 1%
> then you would be removing substantially less, and once you get to about 10%
> then you're back where you started.
>
>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

В списке pgsql-general по дате отправления:

Предыдущее
От: Gregory Stark
Дата:
Сообщение: Re: [HACKERS] Autovacuum Improvements
Следующее
От: Alexander Presber
Дата:
Сообщение: printf-like format strings