[PATCH] BUG FIX: inconsistent page found in BRIN_REGULAR_PAGE

Поиск
Список
Период
Сортировка
От 王海洋
Тема [PATCH] BUG FIX: inconsistent page found in BRIN_REGULAR_PAGE
Дата
Msg-id CACciXADOfErX9Bx0nzE_SkdfXr6Bbpo5R=v_B6MUTEYW4ya+cg@mail.gmail.com
обсуждение исходный текст
Ответы Re: [PATCH] BUG FIX: inconsistent page found in BRIN_REGULAR_PAGE  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-bugs

Hi hackers,

I found that when wal_consistency_checking = brin is set, it may cause redo abort, all the standby-nodes lost, and the primary node can not be restart.

This bug exists in all versions of PostgreSQL.

The operation steps are as follows:

    1. Create a primary instance, set wal_consistency_checking = brin, and start the primary instance.

        initdb -D pg_test
        echo "wal_consistency_checking = brin" >> pg_test/postgresql.conf
        echo "port=53320" >> pg_test/postgresql.conf
        pg_ctl start -D pg_test -l pg_test.logfile

    2. Create a standby instance.

        pg_basebackup -R -p 53320 -D pg_test_slave
        echo "wal_consistency_checking = brin" >> pg_test_slave/postgresql.conf
        echo "port=53321" >> pg_test_slave/postgresql.conf
        pg_ctl start -D pg_test_slave -l pg_test_slave.logfile

    3. Execute brin_redo_abort.sql through psql, and find that the standby machine is lost.

        psql -p 53320 -f brin_redo_abort.sql

    4. The standby instance is lost during redo, FATAL messages as follows:

        FATAL:  inconsistent page found, rel 1663/12978/16387, forknum 0, blkno 2

    5. The primary instance cannot be restarted through pg_ctl restart -mi.

        pg_ctl restart -D pg_test -mi -l pg_test.logfile

    6. FATAL messages when restart primary instance as follows:

        FATAL:  inconsistent page found, rel 1663/12978/16387, forknum 0, blkno 2

I analyzed the reasons as follows:

    1. When the revmap needs to be extended by brinRevmapExtend,
    we may set BRIN_EVACUATE_PAGE flag on a REGULAR_PAGE to prevent
    other concurrent backends from adding more BrinTuple to that page
    in brin_start_evacuating_page.

    2. But, during redo-process, it is not needed to set BRIN_EVACUATE_PAGE
    flag on that REGULAR_PAGE after removing the old BrinTuple in
    brin_xlog_update, since no one will add BrinTuple to that Page at
    this time.

    3. As a result, this will cause a FATAL message to be thrown in
    CheckXLogConsistency after redo, due to inconsistency checking of
    the BRIN_EVACUATE_PAGE flag, finally cause redo to abort.

    4. Therefore, the BRIN_EVACUATE_PAGE flag should be cleared before
    CheckXLogConsistency.


For the above reasons, the patch file, sql file, shell script file, and the log files are given in the attachment.

Best Regards!
Haiyang Wang
Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY
Следующее
От: Richard Guo
Дата:
Сообщение: Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY