Обсуждение: Disk Utilization Increases And Time for Vacuum Increases.

Поиск
Список
Период
Сортировка

Disk Utilization Increases And Time for Vacuum Increases.

От
"Vikram D. Gaitonde"
Дата:
=========================================================
Disk Utilization Increases And Time for Vacuum Increases.
=========================================================

As reported in some threads before the Disk Utilization
of Postgres Database increases as time passes.

I have taken all the steps that were suggested in the
mailing list and still see that the disk utilization
keeps growing.

One action we have consiously not done is "REINDEX" on the
table. We want to avoid that as far as possible.

I have given my detailed setup and test plan and results.
Any suggestions on how we can take care of this disk utilization
would be appreciated.


*********************************************
Setup:
*********************************************
    ======
    schema
    ======
    create table stressdb (
        i integer not null,
        s1 varchar(1024) not null,
        s2 varchar(1024) not null,
        s3 varchar(1024) not null,
        b lo
    );
    alter table stressdb add constraint pk_stressdb primary key (i);
    create index idx_stressdb_1 on stressdb (s1);
    create index idx_stressdb_2 on stressdb (s2);
    create index idx_stressdb_3 on stressdb (s3);
    create index idx_stressdb_4 on stressdb (i, s1, s2, s3);
    create index idx_stressdb_5 on stressdb (s1, s2, s3);
    ======

    =====================
    Postgres version: 7.2
    =====================

    ===============
    postgresql.conf
    ===============
    <--snip-->
    shared_buffers = 8192
    max_fsm_relations = 200
    max_fsm_pages = 1000000
    sort_mem = 32768
    vacuum_mem = 16384
    <--snip-->




*********************************************
Stress Test:
*********************************************
    loop {
       Add 100000 rows
       Delete 100000 rows
       Run Vacuum
       pg_unlink all orphaned items from pg_largeobjects
    }

*********************************************
Results:
*********************************************
    1: Time taken to run vacuum increases in each iteration.

    2: disk utilization increases with each iteration.
    # df -k
                   Filesystem 1K-blocks      Used Available Use% Mounted on
      START STATS: /dev/sda6    1968620    110004   1838616   6% /var
      iteration 1: /dev/sda6    1968620    435436   1513184  23% /var
      iteration 2: /dev/sda6    1968620    547916   1400704  29% /var
      iteration 3: /dev/sda6    1968620    725872   1222748  38% /var
      iteration 4: /dev/sda6    1968620    985488    963132  51% /var


    3: relpages for the indices increases with the first few
       iterations and then stays constant.

       This is determined by running:
          "SELECT RELNAME, RELPAGES, RELTUPLES FROM PG_CLASS"

       NOTE: Also the reltuples always stay at 0 (zero).
             I dont know how to interpret that. What does
             it mean.



      1st iteration
               relname             | relpages | reltuples
    -------------------------------+----------+-----------
     idx_stressdb_4                |     1697 |         0
     idx_stressdb_5                |     1317 |         0
     idx_stressdb_1                |      603 |         0
     idx_stressdb_3                |      593 |         0
     idx_stressdb_2                |      588 |         0
     pk_stressdb                   |      306 |         0

      2nd iteration
                 relname             | relpages | reltuples
    ---------------------------------+----------+-----------
     idx_stressdb_5                |     2639 |         0
     idx_stressdb_4                |     1697 |         0
     idx_stressdb_1                |     1196 |         0
     idx_stressdb_3                |     1180 |         0
     idx_stressdb_2                |     1178 |         0
     pk_stressdb                   |      306 |         0

      ** 5th ** iteration
                 relname             | relpages | reltuples
    ---------------------------------+----------+-----------
     idx_stressdb_5                |     3945 |         0
     idx_stressdb_1                |     1785 |         0
     idx_stressdb_2                |     1779 |         0
     idx_stressdb_3                |     1760 |         0
     idx_stressdb_4                |     1697 |         0
     pk_stressdb                   |      306 |         0











__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Re: Disk Utilization Increases And Time for Vacuum Increases.

От
Curt Sampson
Дата:
On Fri, 22 Aug 2003, Vikram D. Gaitonde wrote:

> One action we have consiously not done is "REINDEX" on the
> table. We want to avoid that as far as possible.

Why do you want to avoid this? It may be the only option you have, other
than a database dump and reload.

>     3: relpages for the indices increases with the first few
>        iterations and then stays constant.
>
>        This is determined by running:
>           "SELECT RELNAME, RELPAGES, RELTUPLES FROM PG_CLASS"

This seems odd. I would have thought for sure that it was your indexes
growing. Maybe you should try looking at the individual files in the
data directory to see which ones are growing, and then use the filename
to look up the item in pg_class via the relfilenode column to find out
what your problem objects are.

>        NOTE: Also the reltuples always stay at 0 (zero).
>              I dont know how to interpret that. What does
>              it mean.

It means you haven't done an ANALYZE after changing the size of the
database.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.NetBSD.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

Re: Disk Utilization Increases And Time for Vacuum Increases.

От
Jeff Boes
Дата:
>One action we have consiously not done is "REINDEX" on the
>table. We want to avoid that as far as possible.

Why? It's usually a very painless step, unless the table has millions and
millions of rows. We reindex tables with multiple indexes and several million
rows on a weekly basis. The only downside is that the table seems to be quite
"busy" during the process, which only takes a few minutes.

AFAIK, VACUUM doesn't reclaim space taken up by indexes. In fact, the more
deletes you do, the larger the index space gets, and the *slower* the index
performs. A periodic REINDEX cleans up a lot of problems.


Re: Disk Utilization Increases And Time for Vacuum Increases.

От
"scott.marlowe"
Дата:
On Mon, 25 Aug 2003, Jeff Boes wrote:

> >One action we have consiously not done is "REINDEX" on the
> >table. We want to avoid that as far as possible.
>
> Why? It's usually a very painless step, unless the table has millions and
> millions of rows. We reindex tables with multiple indexes and several million
> rows on a weekly basis. The only downside is that the table seems to be quite
> "busy" during the process, which only takes a few minutes.
>
> AFAIK, VACUUM doesn't reclaim space taken up by indexes. In fact, the more
> deletes you do, the larger the index space gets, and the *slower* the index
> performs. A periodic REINDEX cleans up a lot of problems.

Note that in 7.4 the fix for this is in, so if you have a chance to test
it out with your indexes and their growth problem please test it to see if
it works right.

I haven't tested 7.4 beta1 yet very hard, just on my workstation, with
relatively low level stuff.


Re: Disk Utilization Increases And Time for Vacuum Increases.

От
Andrew Sullivan
Дата:
On Fri, Aug 22, 2003 at 12:09:00PM -0700, Vikram D. Gaitonde wrote:
> One action we have consiously not done is "REINDEX" on the
> table. We want to avoid that as far as possible.

I'm afraid that's the problem, though.  The code cannot recover freed
btree pages which can't be recycled for others to use.  There is no
choice but to reindex certain indexes.  You can use pgstattuple and
oid2name to get some idea of what is taking all the space, and
thereby be selective.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Disk Utilization Increases And Time for Vacuum Increases.

От
Curt Sampson
Дата:
On Tue, 26 Aug 2003, Andrew Sullivan wrote:

> I'm afraid that's the problem, though.  The code cannot recover freed
> btree pages which can't be recycled for others to use.  There is no
> choice but to reindex certain indexes.

While we're at it, anybody got a clue on how to reindex system tables?
Our unit testing framework (yes, we unit test all of our database code)
has to do a lot of schema loads and drops, and the indexes for the
system tables (particularly pg_attribute) tend to grow dramatically,
slowing performance. (The total size of the pg_catalog starts out around
a few megabytes, and doesn't take too long to grow to several hundred
megabytes.)

So far, the only way I've found to fix this is to do a complete dump
and reload of the database, but that's a pain since we have tables with
mutual constraints that can't be reloaded without modifying the dump
file.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.NetBSD.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

Re: Disk Utilization Increases And Time for Vacuum Increases.

От
Bruce Momjian
Дата:
scott.marlowe wrote:
> On Mon, 25 Aug 2003, Jeff Boes wrote:
>
> > >One action we have consiously not done is "REINDEX" on the
> > >table. We want to avoid that as far as possible.
> >
> > Why? It's usually a very painless step, unless the table has millions and
> > millions of rows. We reindex tables with multiple indexes and several million
> > rows on a weekly basis. The only downside is that the table seems to be quite
> > "busy" during the process, which only takes a few minutes.
> >
> > AFAIK, VACUUM doesn't reclaim space taken up by indexes. In fact, the more
> > deletes you do, the larger the index space gets, and the *slower* the index
> > performs. A periodic REINDEX cleans up a lot of problems.
>
> Note that in 7.4 the fix for this is in, so if you have a chance to test
> it out with your indexes and their growth problem please test it to see if
> it works right.
>
> I haven't tested 7.4 beta1 yet very hard, just on my workstation, with
> relatively low level stuff.

I am not sure we have completely dealt with index growth in 7.4.  What
we have new in 7.4 is the ability for non-FULL VACUUM to collect info on
free index pages and reuse them.

However, VACUUM FULL does not shrink the index table unless those pages
are the last pages of the file.  (Could it shift free pages to the end
and then truncate index?)  Also, does VACUUM FULL on an index put the
empty index pages in the FSM map?  It doesn't do that for heap pages
because there are none after vacuum, but there might be free index pages
that we should record.

Interesting I found a reference of doing an auto-reindex as part of
VACUUM FULL:

    #ifdef NOT_USED

        /*
         * reindex in VACUUM is dangerous under WAL. ifdef out until it
         * becomes safe.
         */
        if (reindex)
        {
            vac_close_indexes(nindexes, Irel);
            Irel = (Relation *) NULL;
            activate_indexes_of_a_table(onerel, false);
        }
    #endif   /* NOT_USED */


--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Disk Utilization Increases And Time for Vacuum Increases.

От
Tom Lane
Дата:
Curt Sampson <cjs@cynic.net> writes:
> While we're at it, anybody got a clue on how to reindex system tables?

The procedure given in the REINDEX reference page doesn't work for you?

            regards, tom lane

Re: Disk Utilization Increases And Time for Vacuum Increases.

От
Curt Sampson
Дата:
On Wed, 27 Aug 2003, Tom Lane wrote:

> Curt Sampson <cjs@cynic.net> writes:
> > While we're at it, anybody got a clue on how to reindex system tables?
> The procedure given in the REINDEX reference page doesn't work for you?

You mean with the standalone backend? All the other developers get
pissed at me when I take down the database cluster. :-) Heck, I get
pissed at me because some other thing I was running in the background
goes and blows up.

I'm not actually sure why there would be a problem (aside from
performance) in removing and rebuilding an index on a system table,
unless there are cached query plans somewhere that would try to use
the old or now-nonexistent index.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.NetBSD.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

Re: Disk Utilization Increases And Time for Vacuum Increases.

От
Tom Lane
Дата:
Curt Sampson <cjs@cynic.net> writes:
> I'm not actually sure why there would be a problem (aside from
> performance) in removing and rebuilding an index on a system table,

All the hard-coded low-level stuff that expects the index to be there
blows up.

This might possibly be made to work for indexes on inessential catalogs
like pg_statistic, but I can't imagine it working for pg_attribute...

            regards, tom lane

Re: Disk Utilization Increases And Time for Vacuum Increases.

От
Bruce Momjian
Дата:
Bruce Momjian wrote:
> > > AFAIK, VACUUM doesn't reclaim space taken up by indexes. In fact, the more
> > > deletes you do, the larger the index space gets, and the *slower* the index
> > > performs. A periodic REINDEX cleans up a lot of problems.
> >
> > Note that in 7.4 the fix for this is in, so if you have a chance to test
> > it out with your indexes and their growth problem please test it to see if
> > it works right.
> >
> > I haven't tested 7.4 beta1 yet very hard, just on my workstation, with
> > relatively low level stuff.
>
> I am not sure we have completely dealt with index growth in 7.4.  What
> we have new in 7.4 is the ability for non-FULL VACUUM to collect info on
> free index pages and reuse them.
>
> However, VACUUM FULL does not shrink the index table unless those pages
> are the last pages of the file.  (Could it shift free pages to the end
> and then truncate index?)  Also, does VACUUM FULL on an index put the
> empty index pages in the FSM map?  It doesn't do that for heap pages
> because there are none after vacuum, but there might be free index pages
> that we should record.

I haven't seen anyone comment on this.  I think we need to address this
for 7.4.  (Collecting index free space in FSM is new in 7.4.)  I am
concerned that people who are only running VACUUM FULL at night will
have problems with index growth.  Also, why can't we move totally-empty
index pages to the end and truncate the file?

> Interesting I found a reference of doing an auto-reindex as part of
> VACUUM FULL:
>
>     #ifdef NOT_USED
>
>         /*
>          * reindex in VACUUM is dangerous under WAL. ifdef out until it
>          * becomes safe.
>          */
>         if (reindex)
>         {
>             vac_close_indexes(nindexes, Irel);
>             Irel = (Relation *) NULL;
>             activate_indexes_of_a_table(onerel, false);
>         }
>     #endif   /* NOT_USED */

Will we ever reindex as part of VACUUM FULL?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Disk Utilization Increases And Time for Vacuum Increases.

От
Bruce Momjian
Дата:
I have looked at the code and it seems VACUUM FULL does collect index
free space into the FSM via index_vacuum_cleanup().  Is that true?

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Bruce Momjian wrote:
> > > > AFAIK, VACUUM doesn't reclaim space taken up by indexes. In fact, the more
> > > > deletes you do, the larger the index space gets, and the *slower* the index
> > > > performs. A periodic REINDEX cleans up a lot of problems.
> > >
> > > Note that in 7.4 the fix for this is in, so if you have a chance to test
> > > it out with your indexes and their growth problem please test it to see if
> > > it works right.
> > >
> > > I haven't tested 7.4 beta1 yet very hard, just on my workstation, with
> > > relatively low level stuff.
> >
> > I am not sure we have completely dealt with index growth in 7.4.  What
> > we have new in 7.4 is the ability for non-FULL VACUUM to collect info on
> > free index pages and reuse them.
> >
> > However, VACUUM FULL does not shrink the index table unless those pages
> > are the last pages of the file.  (Could it shift free pages to the end
> > and then truncate index?)  Also, does VACUUM FULL on an index put the
> > empty index pages in the FSM map?  It doesn't do that for heap pages
> > because there are none after vacuum, but there might be free index pages
> > that we should record.
>
> I haven't seen anyone comment on this.  I think we need to address this
> for 7.4.  (Collecting index free space in FSM is new in 7.4.)  I am
> concerned that people who are only running VACUUM FULL at night will
> have problems with index growth.  Also, why can't we move totally-empty
> index pages to the end and truncate the file?
>
> > Interesting I found a reference of doing an auto-reindex as part of
> > VACUUM FULL:
> >
> >     #ifdef NOT_USED
> >
> >         /*
> >          * reindex in VACUUM is dangerous under WAL. ifdef out until it
> >          * becomes safe.
> >          */
> >         if (reindex)
> >         {
> >             vac_close_indexes(nindexes, Irel);
> >             Irel = (Relation *) NULL;
> >             activate_indexes_of_a_table(onerel, false);
> >         }
> >     #endif   /* NOT_USED */
>
> Will we ever reindex as part of VACUUM FULL?
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Disk Utilization Increases And Time for Vacuum Increases.

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I have looked at the code and it seems VACUUM FULL does collect index
> free space into the FSM via index_vacuum_cleanup().  Is that true?

Yes.

            regards, tom lane