Faster and more frequent VACUUM (was PostgreSQL clustering VS MySQL clustering)

Поиск

Список

Период

Сортировка

От	Simon Riggs
Тема	Faster and more frequent VACUUM (was PostgreSQL clustering VS MySQL clustering)
Дата	24 января 2005 г. 08:47:24
Msg-id	1106556097.31592.139.camel@localhost.localdomain обсуждение исходный текст
Ответ на	Re: PostgreSQL clustering VS MySQL clustering (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-performance

Дерево обсуждения

On Sun, 2005-01-23 at 15:40 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Changing the idea slightly might be better: if a row update would cause
> > a block split, then if there is more than one row version then we vacuum
> > the whole block first, then re-attempt the update.
>
> "Block split"?  I think you are confusing tables with indexes.

Terminologically loose, as ever. :(
I meant both tables and indexes and was referring to the part of the
algorithm that is entered when we have a block-full situation.

> Chasing down prior versions of the same row is not very practical
> anyway, since there is no direct way to find them.
>
> One possibility is, if you tried to insert a row on a given page but
> there's not room, to look through the other rows on the same page to see
> if any are deletable (xmax below the GlobalXmin event horizon).  This
> strikes me as a fairly expensive operation though, especially when you
> take into account the need to get rid of their index entries first.

Thats what I was suggesting, vac the whole page, not just those rows.

Doing it immediately greatly increases the chance that the index blocks
would be in cache also.

> Moreover, the check would often be unproductive.
> The real issue with any such scheme is that you are putting maintenance
> costs into the critical paths of foreground processes that are executing
> user queries.  I think that one of the primary advantages of the
> Postgres storage design is that we keep that work outside the critical
> path and delegate it to maintenance processes that can run in the
> background.  We shouldn't lightly toss away that advantage.

Completely agree. ...which is why I was trying to find a place for such
an operation in-front-of another expensive operation which is also
currently on the critical path. That way there might be benefit rather
than just additional overhead.

> There was some discussion in Toronto this week about storing bitmaps
> that would tell VACUUM whether or not there was any need to visit
> individual pages of each table.  Getting rid of useless scans through
> not-recently-changed areas of large tables would make for a significant
> reduction in the cost of VACUUM.

ISTM there are two issues here, which are only somewhat related:
- speed of VACUUM on large tables
- ability to run VACUUM very frequently on very frequently updated
tables

The needs-maintenance bitmap idea hits both, whilst the on-the-spot idea
only hits the second one, even if it does it +/- better. Gut feel says
we would implement only one idea...so...

On balance that indicates the need-maintenance bitmap is a better idea,
and one for which we already have existing code.
A few questions...
- wouldn't we need a bitmap per relation?
- wouldn't all the extra bitmaps need to be cached in shared_buffers,
which could use up a good proportion of buffer cache space
- maybe we should use a smaller block size and a different cache for it
- how would we update the bitmap without creating a new LWlock that
needs to be acquired for every block write and so reducing scalability?
- would this be implemented as an option for each table, so that we
could avoid the implementation overhead? (Or perhaps don't have a bitmap
if table is less than 16 blocks?)

--
Best Regards, Simon Riggs

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Faster and more frequent VACUUM (was PostgreSQL clustering VS MySQL clustering)