Re: create index concurrently - duplicate index to reduce time without an index

Поиск

Список

Период

Сортировка

От	Alban Hertroys
Тема	Re: create index concurrently - duplicate index to reduce time without an index
Дата	4 июня 2010 г. 09:34:08
Msg-id	A1D7243C-E3CA-47C1-93C8-A7CF1AF3EF5E@solfertje.student.utwente.nl обсуждение исходный текст
Ответ на	Re: create index concurrently - duplicate index to reduce time without an index (<Gareth.Williams@csiro.au>)
Список	pgsql-general

Дерево обсуждения

> Thanks Greg, Alban and others,
>
> This has cleared up a misunderstanding I had about why one should reindex.  Re-reading the documentation
http://www.postgresql.org/docs/8.4/interactive/sql-reindex.htmlit is clear now that reindex or recreating and index
shouldnot normally be needed - certainly not to keep an index up-to-date.  I would have guessed that VACUUM or VACUUM
ANALYSEon the table that the index is associated would have been sufficient to reclaim space for a 'bloated' index
(maybeonly VACUUM FULL would help).  In any case we can leave reindexing or full vacuum for outages where we are
interruptingservice anyway. 

VACUUM FULL actually causes bloat to indexes. It rearranges the data in the tables so that any gaps get used, but while
doingthat it also needs to update the indices related to those tables. 

Normal VACUUM and VACUUM ANALYSE don't have this problem though, they just mark table space that's no longer in use
(transactionthat deleted rows has committed them) as reusable, so that later INSERTs can put their data in there. This
isa bit of a simplification of what's really going on - for the details check the documentation. 

Autovacuum does VACUUM ANALYSE in the background, using multiple threads in recent versions. You can (and seeing your
useof the database you probably should) tune how often it vacuums tables through several configuration parameters. 

Of course, running ANALYSE when you _know_ data in a table has changed significantly means you don't have to wait for
autovacto get around to analysing that table. 

> I was heartened by the responses and tried further testing (if it could not hurt, why not try and see if it could be
faster),but ran into a problem.  A few times when I was trying to drop an index (before or after creating a duplicate
indexwith 'concurrently'), the dropping of the index stalled.  It seems that this was because of existing connection: 
> postgres: rods ICAT 130.102.163.141(58061) idle in transaction
> And new clients block.
>
> Is that expected? Should we look for 'bugs' in out client that seems to be holding a connection?

I'm not exactly sure why that is (I can't look into your database), but my guess is that the index is locked by a
transaction.Apparently the transaction you refer to has uncommitted work that depends on the index at some point. 

Keeping transactions open for a long time is usually a bad idea.

You saw that you can't drop an index in use by a transaction for example, but autovacuum is running into similar issues
-it can't reclaim space until the transaction finishes as the transaction locks things that autovacuum will want to
touch.
That probably means (I'm not sure it works that way, but it seems likely) that that autovacuum thread gets stuck at a
lockand can't continue until the transaction holding the lock frees it. 


> For the application we have, I'm ready to give up on this train of investigation for optimization and just vacuum
analysekey tables regularly and vaccuum and maybe reindex more completely during outages - though we aim for outages to
beinfrequent.  The database holds data representing a virtual filesystem structure with millions of file (and
associatedaccess controls, and information on underlying storage resources and replication).  There is probably not
muchupdate or delete of the main data - at least compared with the total holdings and the new data/files which are
regularlybeing added to the system. 

In practice VACUUM FULL and REINDEX are used to reclaim disk space. That of itself doesn't look much like it'd improve
performance,but using less disk space also means that data gets more tightly packed in your disk cache, for example.
REINDEXcan mean an index that didn't fit into RAM now does. They're both rather intrusive operations though, so it's a
matterof balancing the costs and benefits. Many databases don't need to bother with VACUUM FULL or REINDEX. 

> Ps. Greg, I don't understand the issue with 'concurrently rebuild (of) an index that enforces a constraint or unique
index'. I don't think I care much right at the moment, but I'm generally interested and others might be too.. Would you
expectthe create index to fail or to cause locking or just transient performance degradation? 


I think what Greg was getting at is that there's a dependency tree between indexes and constraints: A primary key is
implementedusing a unique index. You can create a new (unique) index on the same columns concurrently, but you can't
replacethe primary key index with it as you're not allowed to drop the index without dropping the PK constraint. If you
haveany FK constraints pointing to that table, you can't drop the PK constraint without also dropping the FK
constraints.

Quite a bit of trouble to go through to replace one index.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4c08c88410157954111193!

В списке pgsql-general по дате отправления:

Предыдущее

От: Pavel Stehule
Дата: 04 июня 2010 г., 07:27:06
Сообщение: Re: handling out parameter

Следующее

От: peeratat tungsungnern
Дата: 04 июня 2010 г., 12:05:25
Сообщение: please help me. I can't pg_dumg DB

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: create index concurrently - duplicate index to reduce time without an index

Предыдущее

Следующее