Обсуждение: Question on REINDEX
All, A couple of questions regarding REINDEX command: Running PostgreSQL 7.4.2 on Solaris. 1) When is it necessary to run REINDEX or drop/create an index? All I could really find in the docs is: "In some situations it is worthwhile to rebuild indexes periodically with the REINDEX command. (There is also contrib/reindexdb which can reindex an entire database.) However, PostgreSQL 7.4 has substantially reduced the need for this activity compared to earlier releases." What are these situations? We have a database with some large tables. Currently we reindex (actually drop/create) nightly. But as the tables have grown this has become prohibitively time-consuming. According to the above comment it may not be necessary at all. 2) If reindexing is necessary, how can this be done in a non-obtrusive way in a production environment. Our database is being updated constantly. REINDEX locks client apps out while in progress. Same with "CREATE INDEX" when we drop/create. The table can have over 10 million row. Recreating the indexes seems to take hours. This is too long to lock the client apps out. Is there any other solution? thanks, Bill __________________________________ Do you Yahoo!? Make Yahoo! your home page http://www.yahoo.com/r/hs
Bill, > 1) When is it necessary to run REINDEX or drop/create > an index? All I could really find in the docs is: If you need to VACUUM FULL, you need to REINDEX as well. For example, if you drop millions of rows from a table. > 2) If reindexing is necessary, how can this be done in > a non-obtrusive way in a production environment. Our > database is being updated constantly. REINDEX locks > client apps out while in progress. Same with "CREATE > INDEX" when we drop/create. The table can have over > 10 million row. Recreating the indexes seems to take > hours. This is too long to lock the client apps out. > Is there any other solution? Better to up your max_fsm_pages and do regular VACUUMs regularly and frequently so that you don't have to REINDEX at all. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Mon, Apr 18, 2005 at 12:21:42 -0700, Bill Chandler <billybobc1210@yahoo.com> wrote: > > Running PostgreSQL 7.4.2 on Solaris. > > 1) When is it necessary to run REINDEX or drop/create > an index? All I could really find in the docs is: > > "In some situations it is worthwhile to rebuild > indexes periodically with the REINDEX command. (There > is also contrib/reindexdb which can reindex an entire > database.) However, PostgreSQL 7.4 has substantially > reduced the need for this activity compared to earlier > releases." In pathologic cases it is possible to have a lot of empty space on a lot of your index pages. Reindexing would change that to a smaller number. In earlier versions, I think it was possible to have completely empty pages and this happened for patterns of use (new values monotonically increasing, oldest values deleted first) that were actually seen in practice.
Josh Berkus <josh@agliodbs.com> writes: >> 1) When is it necessary to run REINDEX or drop/create >> an index? All I could really find in the docs is: > If you need to VACUUM FULL, you need to REINDEX as well. For example, if you > drop millions of rows from a table. That's probably a pretty good rule of thumb. It's worth noting that VACUUM FULL tends to actively bloat indexes, not reduce them in size, because it has to create new index entries for the rows it moves before it can delete the old ones. So if a VACUUM FULL moves many rows you are likely to see the indexes get bigger not smaller. > Better to up your max_fsm_pages and do regular VACUUMs regularly and > frequently so that you don't have to REINDEX at all. Yes, definitely. Also consider using CLUSTER rather than VACUUM FULL when you need to clean up after massive deletions from a table. It's not any less intrusive in terms of locking, but it's often faster and it avoids the index bloat problem (since it effectively does a REINDEX). regards, tom lane
> > Josh Berkus <josh@agliodbs.com> writes: > >> 1) When is it necessary to run REINDEX or drop/create > >> an index? All I could really find in the docs is: > > > If you need to VACUUM FULL, you need to REINDEX as well. > For example, > > if you drop millions of rows from a table. > > That's probably a pretty good rule of thumb. It's worth > noting that VACUUM FULL tends to actively bloat indexes, not > reduce them in size, because it has to create new index > entries for the rows it moves before it can delete the old > ones. So if a VACUUM FULL moves many rows you are likely to > see the indexes get bigger not smaller. > Is my current understanding correct: 1) VACUUM defragments each page locally - moves free space to the end of page. 2) VACUUM FULL defragments table globally - tries to fill up all partially free pages and deletes all resulting empty pages. 3) Both VACUUM and VACUUM FULL do only local defragment for indexes. 4) If you want indexes to become fully defragmented, you need to REINDEX. If you happen to use triggers for denormalization, like I do, then you have a lot of updates, which means that tables and indexes become quicky cluttered with pages, which contain mostly dead tuples. If those tables and indexes fill up shared buffers, then PostgreSQL slows down, because it has to do a lot more IO than normal. Regular VACUUM FULL helped, but I needed REINDEX as well, otherwise indexes grew bigger than tables itself! > > Better to up your max_fsm_pages and do regular VACUUMs regularly and > > frequently so that you don't have to REINDEX at all. > > Yes, definitely. Also consider using CLUSTER rather than > VACUUM FULL when you need to clean up after massive deletions > from a table. It's not any less intrusive in terms of > locking, but it's often faster and it avoids the index bloat > problem (since it effectively does a REINDEX). > Hmm, thanks for a tip. BTW, is output of select count(1), sum(relpages) from pg_class where relkind in ('r','i','t') good estimate for max_fsm_relations and max_fsm_pages? Are these parameters used only during VACUUM or in runtime too? Tambet
"Tambet Matiisen" <t.matiisen@aprote.ee> writes: > Is my current understanding correct: > 1) VACUUM defragments each page locally - moves free space to the end of > page. > 2) VACUUM FULL defragments table globally - tries to fill up all > partially free pages and deletes all resulting empty pages. Both versions of VACUUM do within-page defragmentation. Also, both versions will remove entirely-empty pages at the end of a table. The difference is that VACUUM FULL actively attempts to make pages at the end empty, by moving their contents into free space in earlier pages. Plain VACUUM never does cross-page data movement, which is how come it doesn't need as strong a lock. BTW, VACUUM FULL does the data movement back-to-front, and stops as soon as it finds a tuple it cannot move down; which is a reasonable strategy since the goal is merely to make the file shorter. But it's entirely likely that there will be lots of empty space left at the end. For instance the final state could have one 4K tuple in the last page and up to 4K-1 free bytes in every earlier page. > 3) Both VACUUM and VACUUM FULL do only local defragment for indexes. > 4) If you want indexes to become fully defragmented, you need to > REINDEX. I don't think "defragment" is a notion that applies to indexes, at least not in the same way as for tables. It's true that there is no cross-page data movement in either case. In the last release or two we've been able to recognize and recycle entirely-empty pages in both btree and hash indexes, but such pages are almost never returned to the OS; they're put on a freelist for re-use within the index, instead. If you allow the table to grow to much more than its "normal" size, ie, you allow many dead tuples to be formed, then getting back to "normal" size is going to require VACUUM FULL + REINDEX (or you can use CLUSTER or some varieties of ALTER TABLE). This is not the recommended maintenance process however. Sufficiently frequent plain VACUUMs should generally hold the free space to a tolerable level without requiring any exclusive locking. > Hmm, thanks for a tip. BTW, is output of > select count(1), sum(relpages) from pg_class where relkind in > ('r','i','t') > good estimate for max_fsm_relations and max_fsm_pages? Within that one database, yes --- don't forget you must sum these numbers across all DBs in the cluster. Also you need some slop in the max_fsm_pages setting because of quantization in the space usage. It's probably easier to let VACUUM VERBOSE do the calculation for you. regards, tom lane
Tambet, > Hmm, thanks for a tip. BTW, is output of > > select count(1), sum(relpages) from pg_class where relkind in > ('r','i','t') Well, if you do that for all databases in the cluster, it's the number you start with. However, setting FSM_pages to that would be assuming that you excpected 100% of the rows to be replaced by UPDATES or DELETEs before you ran VACUUM. I generally run VACUUM a little sooner than that. See the end portion of: http://www.powerpostgresql.com/PerfList -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: >> select count(1), sum(relpages) from pg_class where relkind in >> ('r','i','t') > Well, if you do that for all databases in the cluster, it's the number you > start with. However, setting FSM_pages to that would be assuming that you > excpected 100% of the rows to be replaced by UPDATES or DELETEs before you > ran VACUUM. I generally run VACUUM a little sooner than that. Not at all. What it says is that you expect 100% of the pages to have useful amounts of free space, which is a *much* weaker criterion. I think you can usually get away with setting max_fsm_pages to less than your actual disk footprint, but I'm not sure how much less. It'd probably depend a lot on your usage pattern --- for instance, insert-only history tables don't need any FSM space. regards, tom lane
Tom, > Not at all. What it says is that you expect 100% of the pages to have > useful amounts of free space, which is a *much* weaker criterion. Hmmm. Good point. This seems to be another instance where my rule-of-thumb was based on false logic but nevertheless arrived at correct numbers. I've seldom, if ever, set FSM_pages above 50% of the pages in the active database ... and never run out. Hmmmm .... actually, it seems like, if you are vacuuming regularly, you only *do* need to track pages that have been touched by DELETE or UPDATE. Other pages would have already been vacuumed and not have any useful free space left. Yes? -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: >> Not at all. What it says is that you expect 100% of the pages to have >> useful amounts of free space, which is a *much* weaker criterion. > Hmmmm .... actually, it seems like, if you are vacuuming regularly, you only > *do* need to track pages that have been touched by DELETE or UPDATE. Other > pages would have already been vacuumed and not have any useful free space > left. Yes? Well, the space has to be remembered until it's reused. On the other hand, there's nothing that says FSM has to be aware of all the free space available at all times --- the real criterion to avoid bloat is that after a VACUUM, enough space is logged in FSM to satisfy all the insertions that will happen before the next VACUUM. So you could have situations where free space is temporarily forgotten (for lack of slots in FSM), but other free space gets used instead, and eventually a later VACUUM re-finds that free space and puts it into FSM. I think it's true that the more often you vacuum, the less FSM you need, but this doesn't have much to do with how much free space is actually out there on disk. It's because you only need enough FSM to record the free space you'll need until the next vacuum. regards, tom lane
On Tue, Apr 19, 2005 at 10:06:40AM -0400, Tom Lane wrote: > BTW, VACUUM FULL does the data movement back-to-front, and stops as soon > as it finds a tuple it cannot move down; which is a reasonable strategy > since the goal is merely to make the file shorter. But it's entirely > likely that there will be lots of empty space left at the end. For > instance the final state could have one 4K tuple in the last page and > up to 4K-1 free bytes in every earlier page. Am I right in thinking that vacuum does at least two passes: one front-to-back to find removable tuples, and other back-to-front for movement? Because if it doesn't work this way, it wouldn't relabel (change Xmin/Xmax) tuples in early pages. Or does it do something different? I know maintenance_work_mem is used for storing TIDs of to-be-moved tuples for index cleanup ... how does it relate to the above? -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Crear es tan difícil como ser libre" (Elsa Triolet)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Am I right in thinking that vacuum does at least two passes: one > front-to-back to find removable tuples, and other back-to-front for > movement? VACUUM FULL, yes. VACUUM only does the first one. > I know maintenance_work_mem is used for storing TIDs of to-be-moved > tuples for index cleanup ... how does it relate to the above? TIDs of to-be-deleted tuples, actually. Movable tuples aren't stored, they're just found on-the-fly during the back-to-front pass. regards, tom lane