Обсуждение: Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY

Поиск
Список
Период
Сортировка

Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, everyone! 

(added to CC involved persons based on history part and previous thread activity)

This thread is an extraction of part of https://commitfest.postgresql.org/patch/4971/ (STIR) into a separate commitfest entry as was agreed in [0].

HISTORY

In 2021 Álvaro proposed [1] and committed [2] the feature: VACUUM ignores snapshots involved in concurrent indexing operations. This was a great feature in PG14.

But in 2022 a bug related to the tuples missing in indexes was detected, and a little bit later explained by Andres [3]. As a result, the feature was reverted [4].

This patch set brings back part of it (first phase of CIC) using snapshot resetting technique.
For the second phase - another solution is available in entry mentioned above.

STRUCTURE

It is based on Matthias' idea [5] - to just reset snapshots every so often during a concurrent index build. It may work only during the first scan (because we'll miss some tuples during validation scan with such an approach).
Logic is simple – since the index built by the first scan already misses a lot of tuples – we may not worry to miss a few more – the validation phase is going to fix it anyway. Of course, it is not so simple in case of unique indexes, but still possible.

Commits are:

- Add stress tests for concurrent index builds

This is a set of stress tests to ensure concurrent index operations are worked correctly.
It easily detects the bug from 2022 (and other different things I met during the development, some were in master and already fixed - [6] and [7]).

- Reset snapshots periodically in non-unique non-parallel concurrent index builds

Apply this technique to the simplest case – non-unique and non-parallel. Snapshot is changed "between" pages.
One possible place here to worry about – to ensure xmin advanced we need to call InvalidateCatalogSnapshot during each snapshot switch.
So, theoretically it may cause some issues, but the table is locked to changes during the process. At least commit [2] (which ignored xmin of
CIC backend) did the same thing in essence, actually. Another more "clear" option here - we may just extract a separate catalog snapshot horizon (one more field near xmin specially only for catalog snapshot), it seems to be a pretty straightforward change.

- Support snapshot resets in parallel concurrent index builds

Extend that technique to parallel builds. It is mostly about ensuring workers have an initial snapshot restored from the leader before the leader goes to reset it.

- Support snapshot resets in concurrent builds of unique indexes

The most tricky commit in the patch set – apply that to unique indexes. Changing of snapshots may cause issues with validation of unique constraints. Currently, validation is done during the sorting of tuples, but that doesn't work with tuples read with different snapshots (some of them are dead already).

To deal with it:
- in case we see two identical tuples during tuplesort – ignore if some of them are dead according to SnapshotSelf, but fail if two are alive. It is not a required part, it is just mechanics for fail-fast behavior and may be removed or limited in spent resources.
- to provide the guarantee – during _bt_load compare the inserted index value with previously inserted. If they are equal – make sure only a single SnapshotSelf alive tuple exists in the whole equal "group" (it may include more than two tuples in general).

Theoretically it may affect performance of _bt_load because of _bt_keep_natts(_fast) call for each tuple, but I was unable to notice any significant difference here.

Best regards,
Mikhail.

[0]: https://www.postgresql.org/message-id/flat/CAEze2Wg6d2M8hop4uwdQXeH-YkOmEHyqp83%2BaE7vEzKdmu7w-A%40mail.gmail.com#b5dbefccc537167bc8c1efe0ad063491
[1]: https://www.postgresql.org/message-id/flat/20210115142926.GA19300%40alvherre.pgsql#0988173cb0cf4b8eb710a6cdaa88fcac
[2]: https://github.com/postgres/postgres/commit/d9d076222f5b94a85e0e318339cfc44b8f26022d
[3]: https://www.postgresql.org/message-id/flat/20220524190133.j6ee7zh4f5edt5je%40alap3.anarazel.de#1781408f40034c414ad6738140c118ef
[4]: https://github.com/postgres/postgres/commit/e28bb885196916b0a3d898ae4f2be0e38108d81b
[5]: https://www.postgresql.org/message-id/flat/CAEze2WgW6pj48xJhG_YLUE1QS%2Bn9Yv0AZQwaWeb-r%2BX%3DHAxU_g%40mail.gmail.com#b3809c158de4481bb1b29894aaa63fae

Вложения

Re: Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:

Re: Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:

Re: Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:

Re: Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY

От
Álvaro Herrera
Дата:
On 2026-Mar-20, Mihail Nikalayeu wrote:

> Rebased.

Thanks.  The test in 0001 is a bit on the slow side; should we make it
optional with PG_TEST_EXTRA?  It's even slower than the slowest test in
pg_amcheck, which is already somewhat too long.

1/9 postgresql:setup / tmp_install                               OK                0.84s
2/9 postgresql:setup / install_test_files                        OK                0.05s
3/9 postgresql:setup / initdb_cache                              OK                1.26s
4/9 postgresql:pg_amcheck / pg_amcheck/001_basic                 OK                0.20s   9 subtests passed
5/9 postgresql:pg_amcheck / pg_amcheck/005_opclass_damage        OK                4.53s   10 subtests passed
6/9 postgresql:pg_amcheck / pg_amcheck/002_nonesuch              OK                5.00s   107 subtests passed
7/9 postgresql:pg_amcheck / pg_amcheck/004_verify_heapam         OK                8.59s   32 subtests passed
8/9 postgresql:pg_amcheck / pg_amcheck/003_check                 OK               19.14s   75 subtests passed
9/9 postgresql:pg_amcheck / pg_amcheck/006_cic                   OK               26.74s   12 subtests passed

The last pgbench subtest mentions GIN in the test name but doesn't
actually run it.  Do we care?  Would it be good to make the table be
unlogged?

These 10ms sleeps look suspicious.  Should they be shorter?  Longer?  Is
it better if they are longer, because they help provide better coverage;
or how did you choose that value?  I think all-but-one backends will
complete all the 999 transactions in the first 10ms sleep that the one
backend running the CIC does.  Am I right about this?  Would it work to
have no sleeps at all?  Maybe instead of doing 1000 transactions, we
should run pgbench for 5 second or so; this would run several thousand
transactions.

Or should we just consider the test as not-for-commit, and only a
development aid?

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Hello, Álvaro!

Thanks for looking into it.

> Or should we just consider the test as not-for-commit, and only a
> development aid?

Initially I thought so, but looks like it is possible to make it committable.

> The test in 0001 is a bit on the slow side; should we make it
> optional with PG_TEST_EXTRA?

I made *parameters* to be depended  on PG_TEST_EXTRA ~= stress. It is
possible to apply the same pattern for other stress tests too.

> The last pgbench subtest mentions GIN in the test name but doesn't
> actually run it.  Do we care?  Would it be good to make the table be
> unlogged?

Fixed, it has its own pgbench because it has its own gin_index_check.

> Would it be good to make the table be unlogged?

Good idea, done.

> I think all-but-one backends will
>complete all the 999 transactions in the first 10ms sleep that the one
> backend running the CIC does.  Am I right about this?

It actually has enough time to do multiple CIC (I see it from the
log). I updated the test to random delay, for non-stress variants -
from 0 to 1, for stress - up to 10.

Also, fixed a few small styling issues + added additional fixes for
waiting for a snapshot to be restored by a parallel worker.

Best regards,
Mikhail.

Вложения

Re: Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:

Re: Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата:
Rebased +

1) extract duplicate key error reporting into _bt_report_duplicate()
2) limit heap fetches in the tuplesort fail-fast duplicate check to a
configurable percentage


Best regards,
Mikhail.

Вложения

Re: Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY

От
Mihail Nikalayeu
Дата: