Re: Corrupt index stopping autovacuum system wide

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Corrupt index stopping autovacuum system wide
Дата
Msg-id CAH2-Wzk3U+3OTQAAWCkgvYNR59V05BknJxbjyP7AVQzEgb9eyg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Corrupt index stopping autovacuum system wide  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-general
On Wed, Jul 17, 2019 at 11:43 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> This might make things worse operationally, though.  If searches aren't
> failing but vacuum is, we'd break a production system that currently
> works.

If searches aren't failing and VACUUM works, then that's probably down
to dumb luck. The user's luck could change at any time (actually, it's
quite possible that the index is already giving wrong answers without
anybody realizing). That's not always true, of course -- you could
have an OOM condition in VACUUM, where it really does make sense to
retry. But it should be true for the category of errors where we
behave more aggressively than just giving up, such as "failed to
re-find parent key" error Aaron noticed.

> Well, vacuum knows what index is being processed.  Maybe you're thinking
> that autovac can get an out-of-memory condition or something like that;
> perhaps we can limit the above only when an ERRCODE_DATA_CORRUPTED
> condition is reported (and make sure all such conditions do that.  As
> far as I remember we have a patch for this particular error to be
> reported as such.)

I don't think that it would be that hard to identify errors that
nbtree VACUUM could throw that clearly indicate corruption, without
any hope of the problem self-correcting without the DBA running a
REINDEX. There will be a small amount of gray area, perhaps, but
probably not enough to matter.

> > (c) automatically disabling constraint indexes seems less than desirable.
>
> Disabling them for writes, yeah.

I think that it's fair to say that all bets are off once you see the
"failed to re-find parent key" error, or any other such error that
indicates corruption. Admittedly it isn't 100% clear that disabling
constraint enforcement to unblock autovacuum for the whole cluster is
better than any available alternative; it's really hard to reason
about things when we already know that the database has corruption.

I think that it's okay that almost anything can break when somebody
creates an index on a non-immutable expression (including VACUUM),
provided that all the problems only affect the table with the broken
index. OTOH, taking down the entire Postgres cluster as an indirect
consequence of one person's ill-considered CREATE INDEX really sucks.
That distinction seems important to me.

-- 
Peter Geoghegan



В списке pgsql-general по дате отправления:

Предыдущее
От: Perumal Raj
Дата:
Сообщение: Resolved: Looking for Postgres upgrade Metrix
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Corrupt index stopping autovacuum system wide