Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

Поиск
Список
Период
Сортировка
От Michail Nikolaev
Тема Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Дата
Msg-id CANtu0og6P+O10XLm-AsoqOhZYEjr8SEHFETadSJ8ifO01YP1qg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements  (Michail Nikolaev <michail.nikolaev@gmail.com>)
Ответы Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Список pgsql-hackers
Hello, Matthias!

I just realized there is a much simpler and safe way to deal with the problem.

So, d9d076222f5b94a85e0e318339cfc44b8f26022d(1) had a bug because the scan was not protected by a snapshot. At the same time, we want this snapshot to affect not all the relations, but only a subset of them. And there is already a proper way to achieve that - different types of visibility horizons!

So, to resolve the issue, we just need to create a separated horizon value for such situation as building an index concurrently.

For now, let's name it `VISHORIZON_BUILD_INDEX_CONCURRENTLY` for example. By default, its value is equal to `VISHORIZON_DATA`. But in some cases it "stops" propagating forward while concurrent index is building, like this:

            h->create_index_concurrently_oldest_nonremovable =TransactionIdOlder(h->create_index_concurrently_oldest_nonremovable, xmin);
            if (!(statusFlags & PROC_IN_SAFE_IC))
                        h->data_oldest_nonremovable = TransactionIdOlder(h->data_oldest_nonremovable, xmin);

The `PROC_IN_SAFE_IC` marks backend xmin as ignored by `VISHORIZON_DATA` but not by `VISHORIZON_BUILD_INDEX_CONCURRENTLY`.

After, we need to use appropriate horizon for relations which are processed by `PROC_IN_SAFE_IC` backends. There are a few ways to do it, we may start prototyping with `rd_indexisbuilding` from previous message:

        static inline GlobalVisHorizonKind
        GlobalVisHorizonKindForRel(Relation rel)
        ........
            if (rel != NULL && rel->rd_indexvalid && rel->rd_indexisbuilding)
                    return VISHORIZON_BUILD_INDEX_CONCURRENTLY;


There are few more moments need to be considered:

* Does it move the horizon backwards?

It is allowed for the horizon to move backwards (like said in `ComputeXidHorizons`) but anyway - in that case the horizon for particular relations just starts to lag behind the horizon for other relations.
Invariant is like that: `VISHORIZON_BUILD_INDEX_CONCURRENTLY` <= `VISHORIZON_DATA` <= `VISHORIZON_CATALOG` <= `VISHORIZON_SHARED`.

* What is about old cached versions of `Relation` objects without `rd_indexisbuilding` yet set?

This is not a problem because once the backend registers a new index, it waits for all transactions without that knowledge to end (`WaitForLockers`). So, new ones will also get information about new horizon for that particular relation.

* What is about TOAST?
To keep TOAST horizon aligned with relation building the index, we may do the next thing (as first implementation iteration):

        else if (rel != NULL && ((rel->rd_indexvalid && rel->rd_indexisbuilding) || IsToastRelation(rel)))
                return VISHORIZON_BUILD_INDEX_CONCURRENTLY;

For the normal case, `VISHORIZON_BUILD_INDEX_CONCURRENTLY` is equal to `VISHORIZON_DATA` - nothing is changed at all. But while the concurrent index is building, the TOAST horizon is guaranteed to be aligned with its parent relation. And yes, it is better to find an easy way to affect only TOAST relations related to the relation with index building in progress.

New horizon adds some complexity, but not too much, in my opinion. I am pretty sure it is worth being done because the ability to rebuild indexes without performance degradation is an extremely useful feature.
Things to be improved:
* better way to track relations with concurrent indexes being built (with mechanics to understood what index build was failed)
* better way to affect TOAST tables only related to concurrent index build
* better naming

Patch prototype in attachment.
Also, maybe it is worth committing test separately - it was based on Andrey Borodin work (2). The test fails well in the case of incorrect implementation.

[1]: https://github.com/postgres/postgres/commit/d9d076222f5b94a85e0e318339cfc44b8f26022d#diff-8879f0173be303070ab7931db7c757c96796d84402640b9e386a4150ed97b179R1779-R1793
[2]: https://github.com/x4m/postgres_g/commit/d0651e7d0d14862d5a4dac076355
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: Re: 'trusted'/'untrusted' PL in DoD/DISA PostgreSQL STIGs
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Weird test mixup