Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

Поиск
Список
Период
Сортировка
От Michail Nikolaev
Тема Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
Дата
Msg-id CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com
обсуждение исходный текст
Ответы Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
Список pgsql-hackers
Hello, hackers.

While testing my work on (1) I was struggling with addressing a strange issue with ON CONFLICT UPDATE and REINDEX CONCURRENTLY.

After some time, I have realized the same issue persists on the master branch as well :)

I have prepared two TAP tests to reproduce the issues (2), also in attachment.

First one, does the next thing:

    CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp);
    CREATE INDEX idx ON tbl(i, updated_at); -- it is not required to reproduce but make it to happen faster

Then it runs next scripts with pgbench concurrently:

    1) INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set updated_at = now();
    2) INSERT INTO tbl VALUES(42,now()) on conflict(i) do update set updated_at = now();
    3) INSERT INTO tbl VALUES(69,now()) on conflict(i) do update set updated_at = now();

Also, during pgbench the next command is run in the loop:

    REINDEX INDEX CONCURRENTLY tbl_pkey;

For some time, everything looks more-less fine (except live locks, but this is the issue for the next test).
But after some time, about a minute or so (on ~3000th REINDEX) it just fails like this:

                make -C src/test/modules/test_misc/ check PROVE_TESTS='t/006_*'

                # waiting for an about 3000, now is 2174, seconds passed : 84
                # waiting for an about 3000, now is 2175, seconds passed : 84
                # waiting for an about 3000, now is 2176, seconds passed : 84
                # waiting for an about 3000, now is 2177, seconds passed : 84
                # waiting for an about 3000, now is 2178, seconds passed : 84
                # waiting for an about 3000, now is 2179, seconds passed : 84
                # waiting for an about 3000, now is 2180, seconds passed : 84
                # waiting for an about 3000, now is 2181, seconds passed : 84
                # waiting for an about 3000, now is 2182, seconds passed : 84
                # waiting for an about 3000, now is 2183, seconds passed : 84
                # waiting for an about 3000, now is 2184, seconds passed : 84

                #   Failed test 'concurrent INSERTs, UPDATES and RC status (got 2 vs expected 0)'
                #   at t/006_concurrently_unique_fail.pl line 69.

                #   Failed test 'concurrent INSERTs, UPDATES and RC stderr /(?^:^$)/'
                #   at t/006_concurrently_unique_fail.pl line 69.
                #                   'pgbench: error: pgbench: error: client 4 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(13) already exists.
                # client 15 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 9 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 8 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 3 script 2 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(69) already exists.
                # pgbench: error: client 2 script 2 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(69) already exists.
                # pgbench: error: client 12 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccold"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccold"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 18 script 2 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(69) already exists.
                # pgbench: error: pgbench:client 14 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey"
                # DETAIL:  Key (i)=(13) already exists.
                #  error: client 1 script 0 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey"
                # DETAIL:  Key (i)=(13) already exists.
                # pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey"
                # DETAIL:  Key (i)=(69) already exists.
                # pgbench: error: client 13 script 1 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(42) already exists.
                # pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(42) already exists.
                # pgbench: error: client 5 script 1 aborted in command 1 query 0: ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccnew"
                # DETAIL:  Key (i)=(42) already exists.
                # pgbench: error: Run was aborted; the above results are incomplete.
                # '

Probably something wrong with arbiter index selection for different backends. I am afraid it could be a symptom of a more serious issue.

-------------------------------------

The second test shows an interesting live lock state in the similar situation.

    CREATE UNLOGGED TABLE tbl(i int primary key, n int);
    CREATE INDEX idx ON tbl(i, n);
    INSERT INTO tbl VALUES(13,1);

pgbench concurrently runs single command

    INSERT INTO tbl VALUES(13,1) on conflict(i) do update set n = tbl.n + EXCLUDED.n;

And also reindexing in the loop

    REINDEX INDEX CONCURRENTLY tbl_pkey;

After the start, a little bit strange issue happens

     make -C src/test/modules/test_misc/ check PROVE_TESTS='t/007_*'

    # going to start reindex, num tuples in table is 1
        # reindex 0 done in 0.00704598426818848 seconds, num inserted during reindex tuples is 0 speed is 0 per second
        # going to start reindex, num tuples in table is 7
        # reindex 1 done in 0.453176021575928 seconds, num inserted during reindex tuples is 632 speed is 1394.60158947115 per second
        # going to start reindex, num tuples in table is 647
    #  current n is 808, 808 per one second
    #  current n is 808, 0 per one second
    #  current n is 808, 0 per one second
    #  current n is 808, 0 per one second
    #  current n is 808, 0 per one second
    #  current n is 811, 3 per one second
    #  current n is 917, 106 per one second
    #  current n is 1024, 107 per one second
        # reindex 2 done in 8.4104950428009 seconds, num inserted during reindex tuples is 467 speed is 55.5258635340064 per second
        # going to start reindex, num tuples in table is 1136
    #  current n is 1257, 233 per one second
    #  current n is 1257, 0 per one second
    #  current n is 1257, 0 per one second
    #  current n is 1257, 0 per one second
    #  current n is 1257, 0 per one second
    #  current n is 1490, 233 per one second
        # reindex 3 done in 5.21368479728699 seconds, num inserted during reindex tuples is 411 speed is 78.8310026363446 per second
        # going to start reindex, num tuples in table is 1566

In some moments, all CPUs all hot but 30 connections are unable to do any upsert. I think it may be somehow caused by two arbiter indexes (old and new reindexed one).

Best regards,
Mikhail.

[1]: https://www.postgresql.org/message-id/flat/CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A%40mail.gmail.com#d4be02ff70f3002522f9fadbd165d631
[2]: https://github.com/michail-nikolaev/postgres/commit/9446f944b415306d9e5d5ab98f69938d8f5ee87f
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Fujii.Yuki@df.MitsubishiElectric.co.jp"
Дата:
Сообщение: Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Windows: openssl & gssapi dislike each other