Обсуждение: unique index corruption

Поиск
Список
Период
Сортировка

unique index corruption

От
pg noob
Дата:

Hi all,

In PostgreSQL 8.4...

I am wondering if autovacuum will periodically rebuild indexes?

If not, how advisable is it to reindex periodically?

We recently had a case of unique index corruption which ended up allowing duplicate
primary key IDs to get inserted and caused widespread data model integrity havoc.

The system ran for several days before it was noticed.

After the problem was discovered we found that a reindex would fail because there were
duplicate ID values.

Our thought is that if we had a scheduled cron job that would periodically do a reindex this
corruption might have been caught sooner by detecting the reindex failure.

If a reindex is something that should be done frequently as part of regular maintenance
why isn't there a mode of autovacuum that does this automatically?  Or maybe there is and
I just don't know about it..?

Thank you.


Re: unique index corruption

От
Merlin Moncure
Дата:
On Wed, Jul 24, 2013 at 1:50 PM, pg noob <pgnube@gmail.com> wrote:
>
> Hi all,
>
> In PostgreSQL 8.4...
>
> I am wondering if autovacuum will periodically rebuild indexes?

it will not. REINDEX requires a heavy lock and for most applications
it would be just plain untenable to be run without some type of
application coordination, especially if the index is large.

> If not, how advisable is it to reindex periodically?

no harm in that at all -- notwithstanding the locking issues.

> We recently had a case of unique index corruption which ended up allowing
> duplicate
> primary key IDs to get inserted and caused widespread data model integrity
> havoc.
>
> The system ran for several days before it was noticed.
>
> After the problem was discovered we found that a reindex would fail because
> there were
> duplicate ID values.
>
> Our thought is that if we had a scheduled cron job that would periodically
> do a reindex this
> corruption might have been caught sooner by detecting the reindex failure.
>
> If a reindex is something that should be done frequently as part of regular
> maintenance
> why isn't there a mode of autovacuum that does this automatically?  Or maybe
> there is and
> I just don't know about it..?


REINDEX to remove corruption is essentially dealing with the symptoms
of a problem and IMNSHO is not a real long term solution. First thing
to figure out is if we have a postgres bug or some other failure (in
particular, hardware).   Unusual circumstances, for example any
unplanned shutdowns around the time corruption started to appear,
might give some clues in terms of figuring where the real problem
lies.  Essentially this is not supposed to happen.

merlin


Re: unique index corruption

От
Sergey Konoplev
Дата:
On Wed, Jul 24, 2013 at 11:50 AM, pg noob <pgnube@gmail.com> wrote:
> In PostgreSQL 8.4...
>
> I am wondering if autovacuum will periodically rebuild indexes?

It doesn't rebuild indexes, it marks empty index pages for reuse.

> If not, how advisable is it to reindex periodically?

Here described the recommendations and explanations for 8.4
http://www.postgresql.org/docs/8.4/static/routine-reindex.html

> After the problem was discovered we found that a reindex would fail because
> there were
> duplicate ID values.
>
> Our thought is that if we had a scheduled cron job that would periodically
> do a reindex this
> corruption might have been caught sooner by detecting the reindex failure.

Periodical reindex is a very dubious technique to monitor database
corruption. Honestly, I have never heard of any standard or
recommended practice of doing it. However, there is a tool that
pretends to do so (https://github.com/tvondra/pg_check), but I do not
now what state it is currently and if it is production ready.

> If a reindex is something that should be done frequently as part of regular
> maintenance
> why isn't there a mode of autovacuum that does this automatically?  Or maybe
> there is and
> I just don't know about it..?

It is not necessary to reindex to be a part of regular maintenance.
The main goal of autovacuum is to effectively reuse space and to
update statistics. If autovacuum is configured properly reindex is not
required.

However, if you have some high/bulk-update/delete operations
autovacuum might not manage with bloat, and in this case you can use
this tool pgcompactor (https://code.google.com/p/pgtoolkit/) in
conjunction with pgstattuple extension or pg_repack
(https://github.com/reorg/pg_repack).

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com