Re: How often do I need to reindex tables?

Поиск
Список
Период
Сортировка
От Bill Moran
Тема Re: How often do I need to reindex tables?
Дата
Msg-id 20070228140858.492efaf2.wmoran@collaborativefusion.com
обсуждение исходный текст
Ответ на Re: How often do I need to reindex tables?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: How often do I need to reindex tables?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
In response to Tom Lane <tgl@sss.pgh.pa.us>:

> Bill Moran <wmoran@collaborativefusion.com> writes:
> > I don't agree.  I think that regular indexing is mandatory under some
> > workloads.  Example:
> > ...
> > There are some additional indexes that I've snipped from the output that also
> > saw some benefit from reindexing, but let's just focus on file_fp_idx.
>
> Can you describe the usage pattern of that index?  I'm curious why it
> doesn't maintain reasonably static size.  How often is the underlying
> table vacuumed?

bacula=# \d file
                             Table "public.file"
   Column   |  Type   |                       Modifiers
------------+---------+-------------------------------------------------------
 fileid     | integer | not null default nextval('file_fileid_seq'::regclass)
 fileindex  | integer | not null default 0
 jobid      | integer | not null
 pathid     | integer | not null
 filenameid | integer | not null
 markid     | integer | not null default 0
 lstat      | text    | not null
 md5        | text    | not null
Indexes:
    "file_pkey" PRIMARY KEY, btree (fileid)
    "file_fp_idx" btree (filenameid, pathid)
    "file_jobid_idx" btree (jobid)

Now, that table stores a record for each file that is backed up (i.e. there's
a unique tuple for each time a file is backed up)  To save space in the
database, the file name and file path are stored in separate tables and
referenced by an ID.

This particular server has the following characteristics:
bacula=# select count(*) from file;
  count
---------
 8068956
(1 row)

bacula=# select count(*) from filename;
 count
--------
 151954
(1 row)

bacula=# select count(*) from path;
 count
-------
 49642
(1 row)

There are 21 jobs, each ranging in size from 2000 - 5000 files.  Each job
runs twice a day.  So you're looking at about 60,000 new rows at midnight
and 60,000 new rows at noon each day.  With the purge cycle, about the
same number of rows are being deleted as are being added, so the table
size stays pretty constant.

Because I know exactly when database activity is occurring on this system,
I have autovacuum disabled, and I manually run a vacuum analyze on this
database twice a day: once at 8:00 AM and again at 4:00 PM.  I had to bump
max_fsm_pages up to 60000 to keep vacuum effective.

Note that the index under discussion is the only one in this database that
shows significant bloat.  I could probably just reindex that one on a
regular schedule, but since I know when the database is quiescent, there's
no reason I can think of not to reindex the whole thing.

Anything else I can provide that would be useful?

--
Bill Moran
Collaborative Fusion Inc.

В списке pgsql-general по дате отправления:

Предыдущее
От: Iannsp
Дата:
Сообщение: About PostgreSQL Block Size
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: grant on sequence and pg_restore/pg_dump problem