Still more REINDEX fun

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Still more REINDEX fun
Дата
Msg-id 15577.1303324914@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Still more REINDEX fun  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
What with the recent discussions, I've been looking harder at the
REINDEX code's interactions with HOT, and I've found another problem
altogether.  To wit, IndexBuildHeapScan considers the DELETE_IN_PROGRESS
case to be comparable to RECENTLY_DEAD, but that analogy fails for
HOT-updated tuples.  If we have a DELETE_IN_PROGRESS HOT-updated tuple,
then we must not index it if the DELETE commits (the live tuple at the
end of the chain is the one to index, instead).  But what if the DELETE
rolls back?  This tuple might again become LIVE, in which case we *must*
index it.  In the current code, we assume we don't have to index it.
Then if the other transaction aborts before we reach the end-of-chain
tuple, we'll conclude that one is DEAD, and end up creating no index
entry at all for this HOT chain.

AFAICS the only solution is to wait out the deleting transaction, as we
already do in other cases where the behavior has to depend on the commit
outcome.

I've been able to reproduce a failure based on this in HEAD, but the
exposed bug surface is incredibly narrow.  You need

(1) Somebody doing a REINDEX on a system catalog (CLUSTER or VACUUM FULL
won't cause it, since there are no HOT chains in their output tables).

(2) Somebody else doing DDL that results in a HOT update in the target
catalog, and then rolling back their transaction.  The DDL has to happen
before the REINDEX starts (else its transient RowExclusiveLock would
block against the REINDEX) but the abort has to happen while the REINDEX
is running.

(3) The abort has to happen between the time that REINDEX examines the
HOT-updated tuple and the time it examines the end-of-chain tuple.
Since these are necessarily on the same page, that's a very narrow
window.

The types of DDL that can result in a HOT update in system catalogs are
pretty narrow to start with, since no indexed column can be affected.
Furthermore, the REINDEX has to be processing a non-unique index
(else it would have waited to see if the other transaction committed)
and there can't have been any earlier unique indexes on the catalog
(ditto).

The case I was able to replicate was a failure to index an updated
pg_constraint tuple after an ALTER ADD INHERIT operation, caused by
MergeConstraintsIntoExisting's update of pg_constraint.coninhcount.
I'm not sure how many other cases there are in the system, particularly
in the back branches (several of the possible cases I noted while
looking for something to try seem to be new in 9.1).  It seems possible
that nobody has ever seen this case in the field.

But it gets worse.  The faulty analogy to RECENTLY_DEAD also caused us
to do this:
                   else if (indexInfo->ii_BrokenHotChain)                       indexIt = false;

which means if the REINDEX has previously seen any RECENTLY_DEAD or
DELETE_IN_PROGRESS HOT-updated tuple, we won't index any
DELETE_IN_PROGRESS tuple, HOT or not.  But of course the deleting
transaction could still roll back.  The time windows here are much
larger than for the case described above, and the potential damage isn't
confined to a single index entry.  This case could conceivably explain
some reports that I'd previously written off as filesystem malfeasance,
for instance this recent report:
http://archives.postgresql.org/pgsql-general/2011-04/msg00623.php
It's still true though that you have to be REINDEXing system catalogs to
be at risk, else you shouldn't be seeing any IN_PROGRESS tuples.

Furthermore, yesterday's patch introduced still a third failure mode
here, applying to both the RECENTLY_DEAD and DELETE_IN_PROGRESS cases:
ii_BrokenHotChain doesn't mean there are certainly broken HOT chains,
only that there might be broken HOT chains.  If the higher-level code
knows that it's a REINDEX and therefore doesn't set indcheckxmin, then
MVCC scans could try to use the index afterwards, and fail to find
RECENTLY_DEAD or DELETE_IN_PROGRESS tuples that they should have found,
but that we decided not to index on the strength of having previously
found a potential broken HOT chain.

I think the best fix for these latter issues is to simply delete the
"optimization" of not indexing stuff based on ii_BrokenHotChain having
become true.  I don't think that saves enough work to justify working
through all the corner cases of when it's really safe.

So this stuff needs fixed ...
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Formatting Curmudgeons WAS: MMAP Buffers
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Formatting Curmudgeons WAS: MMAP Buffers