Re: Invalid indexes should not consume update overhead

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Invalid indexes should not consume update overhead
Дата
Msg-id CAA4eK1JNmxL=kKB210GfA14q4T4yAC_U6XPo6yhi9VuBhxEPog@mail.gmail.com
обсуждение исходный текст
Ответ на Invalid indexes should not consume update overhead  (Tomasz Ostrowski <tometzky+pg@ato.waw.pl>)
Список pgsql-bugs
On Wed, Jul 13, 2016 at 4:40 PM, Tomasz Ostrowski
<tometzky+pg@ato.waw.pl> wrote:
> When index is build concurrently and this build fails the index is left in
> invalid state. It's basically unusable for anything, but, according to
> documentation:
>
>> If a problem arises while scanning the table, such as a deadlock or
>> a uniqueness violation in a unique index, the CREATE INDEX command
>> will fail but leave behind an "invalid" index. This index will be
>> ignored for querying purposes because it might be incomplete; however
>> it *will* *still* *consume* *update* *overhead*. The psql \d command
>> will report such an index as INVALID
>
>
> I think this update overhead is actually wasted - there's no way to make use
> of it, as the only way to make the index usable again is to reindex it or
> drop and recreate.
>
> In the other hand if invalid indexes would have no update overhead then they
> may actually be useful. Please consider the following situation (taken from
> the real world):
>
> - I have a very large table with a large number of indexes on a server with
> large number of CPUs.
> - I need to add a new not null column and I need to do this in a fairly
> small maintenance window.
> - I tried to simply "alter table tablename add column columnname int not
> null default 0", but it did not end in 24 hours - it updates these tens of
> indexes in single process, using 100% of a single CPU - unacceptable.
> - I mark all the indexes as invalid (currently I'm just saving and dropping
> them).
> - This time adding of the column takes an hour.
> - I vacuum full or cluster the table, as it has now bloated at least to 200%
> - this is also fast, as indexes are not updated.
> - I'm reindexing (currently recreating) all indexes, but instead of
> calculating them on a single CPU I use all available CPU's (I can have like
> 40 of them on Amazon AWS RDS cheaply if I need them only for a few hours).
> This also ends in an hour.
> - The world is saved and everybody celebrate.
>
> Dropping and recreating indexes is not a very safe operation - their
> definitions need to be saved somewhere out of the database and if anything
> goes wrong it would not be easy to restore them. If it would be possible to
> just set them invalid (feature request in passing) and if invalid indexed
> would not add cost on updates, then this would be much easier and safer.
>

How can you consider marking invalid index as valid index without
reindexing it?  It is quite possible that in the meantime the table
has been updated.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column