Re: estimating the need for VACUUM FULL and REINDEX

Поиск
Список
Период
Сортировка
От Bill Moran
Тема Re: estimating the need for VACUUM FULL and REINDEX
Дата
Msg-id 20070508084611.dd545b9d.wmoran@collaborativefusion.com
обсуждение исходный текст
Ответ на estimating the need for VACUUM FULL and REINDEX  (Guillaume Cottenceau <gc@mnc.ch>)
Список pgsql-performance
In response to Guillaume Cottenceau <gc@mnc.ch>:

> I'm trying to come up with a way to estimate the need for a
> VACUUM FULL and/or a REINDEX on some tables.

You shouldn't vacuum full unless you have a good reason.  Vacuum full
causes index bloat.

> According to documentation[1], VACUUM FULL's only benefit is
> returning unused disk space to the operating system; am I correct
> in assuming there's also the benefit of optimizing the
> performance of scans, because rows are physically compacted on
> the disk?

In my experience, the smaller the overall database size, the less shared
memory it requires.  Keeping it vacuumed will reduce the amount of space
taken up in memory, which means it's more likely that the data you need
at any particular time is in memory.

Look up a thread with my name on it a lot related to reindexing.  I did
some experiments with indexes and reindexing and the only advantage I found
was that the space requirement for the indexes is reduced by reindexing.
I was not able to find any performance difference in newly created indexes
vs. indexes that were starting to bloat.

> With that in mind, I've tried to estimate how much benefit would
> be brought by running VACUUM FULL, with the output of VACUUM
> VERBOSE. However, it seems that for example the "removable rows"
> reported by each VACUUM VERBOSE run is actually reused by VACUUM,
> so is not what I'm looking for.

I'm not sure what you mean by that last sentence.

There are only two circumstances (I can think of) for running vacuum
full:
1) You've just made some major change to the database (such as adding
   an obscene # of records, making massive changes to a large
   percentage of the existing data, or issuing a lot of "alter table")
   and want to get the FSM back down to a manageable size.
2) You are desperately hurting for disk space, and need a holdover
   until you can get bigger drives.

Reindexing pretty much falls into the same 2 scenerios.  I do recommend
that you reindex after any vacuum full.

However, a much better approach is to either schedule frequent vacuums
(without the full) or configure/enable autovacuum appropriately for your
setup.

> Then according to documentation[2], REINDEX has some benefit when
> all but a few index keys on a page have been deleted, because the
> page remains allocated (thus, I assume it improves index scan
> performance, am I correct?). However, again I'm unable to
> estimate the expected benefit. With a slightly modified version
> of a query found in documentation[3] to see the pages used by a
> relation[4], I'm able to see that the index data from a given
> table...
>
>             relname         | relpages | reltuples
>     ------------------------+----------+-----------
>      idx_sessions_owner_key |       38 |      2166
>      pk_sessions            |       25 |      2166
>
> ...is duly optimized after a REINDEX:
>
>             relname         | relpages | reltuples
>     ------------------------+----------+-----------
>      idx_sessions_owner_key |       13 |      2166
>      pk_sessions            |        7 |      2166
>
> but what I'd need is really these 38-13 and 25-7 figures (or
> estimates) prior to running REINDEX.

Again, my experience shows that reindexing is only worthwhile if you're
really hurting for disk space/memory.

I don't know of any way to tell what size an index would be if it were
completely packed, but it doesn't seem as if this is the best approach
anyway.  Newer versions of PG have the option to create indexes with
empty space already there at creation time (I believe this is called
"fill factor") to allow for future growth.

The only other reason I can see for vacuum full/reindex is if you _can_.
For example, if there is a period that you know the database will be
unused that it sufficiently long that you know these operations can
complete.  Keep in mind that both reindex and vacuum full create performance
problems while they are running.  If you knew, however, that the system
was _never_ being used between 6:00 PM and 8:00 AM, you could run them
over night.  In that case, I would recommend replacing vacuum full with
cluster.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

В списке pgsql-performance по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: estimating the need for VACUUM FULL and REINDEX
Следующее
От: "Steinar H. Gunderson"
Дата:
Сообщение: Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7