Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
Дата
Msg-id CABOikdPRx-5Q2_EfyQEHKokt1ixjRNicT3BkxUapv7SeNs74=Q@mail.gmail.com
обсуждение исходный текст
Ответ на [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Ответы Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers


On Mon, Jan 30, 2017 at 7:20 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

Based on my investigation so far and the evidence that I've collected, what probably happens is that after a relcache invalidation arrives at the new transaction, it recomputes the rd_indexattr but based on the old, cached rd_indexlist. So the rd_indexattr does not include the new columns. Later rd_indexlist is updated, but the rd_indexattr remains what it was.


I was kinda puzzled this report did not get any attention, though it's clearly a data corruption issue. Anyways, now I know why this is happening and can reproduce this very easily via debugger and two sessions.

The offending code is RelationGetIndexAttrBitmap(). Here is the sequence of events:

Taking the same table as in the report:

CREATE TABLE cic_tab_accounts (
        aid bigint UNIQUE,
        abalance bigint,
        aid1 bigint
);

1. Session S1 calls DROP INDEX pgb_a_aid1 and completes
2. Session S2 is in process of rebuilding its rd_indexattr bitmap (because of previous relcache invalidation caused by DROP INDEX). To be premise, assume that the session has finished rebuilding its index list. Since DROP INDEX has just happened,
4793     indexoidlist = RelationGetIndexList(relation);

3. S1 then starts CREATE INDEX CONCURRENTLY pgb_a_aid1 ON cic_tab_accounts(aid1)
4. S1 creates catalog entry for the new index, commits phase 1 transaction and builds MVCC snapshot for the second phase and also finishes the initial index build
5. S2 now proceeds. Now notice that S2 had computed the index list based on the old view i.e. just one index.

The following comments in relcahce.c are now significant:

4799     /*
4800      * Copy the rd_pkindex and rd_replidindex value computed by
4801      * RelationGetIndexList before proceeding.  This is needed because a
4802      * relcache flush could occur inside index_open below, resetting the
4803      * fields managed by RelationGetIndexList. (The values we're computing
4804      * will still be valid, assuming that caller has a sufficient lock on
4805      * the relation.)
4806      */

That's what precisely goes wrong.

6. When index_open is called, relcache invalidation generated by the first transaction commit in CIC gets accepted and processed. As part of this invalidation, rd_indexvalid is set to 0, which invalidates rd_indexlist too
7. But S2 is currently using index list obtained at step 2 above (which has only old index). It goes ahead and computed rd_indexattr based on just the old index.
8. While relcahce invalidation processing at step 6 cleared rd_indexattr (along with rd_indexvalid), it is again set at
4906     relation->rd_indexattr = bms_copy(indexattrs);

But what gets set at step 8 is based on the old view. There is no way rd_indexattr will be invalidated until S2 receives another relcache invalidation. This will happen when the CIC proceeds. But until then, every update done by S2 will generate broken HOT chains, leading to data corruption.

I can reproduce this entire scenario using gdb sessions. This also explains why the patch I sent earlier helps to solve the problem. 

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
Следующее
От: David Fetter
Дата:
Сообщение: Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATEand DELETE