Обсуждение: The same 2PC data maybe recovered twice

Поиск

Список

Период

Сортировка

The same 2PC data maybe recovered twice

От

"蔡梦娟(玊于)"

Дата:

06 июля 2023 г., 02:02:15

Hi, all. I want to report a bug about recovery of 2pc data, in current implementation of crash recovery, there are two ways to recover 2pc data:

1、before redo, func restoreTwoPhaseData() will restore 2pc data those xid < ShmemVariableCache->nextXid, which is initialized from checkPoint.nextXid;

2、during redo, func xact_redo() will add 2pc from wal;

The following scenario may cause the same 2pc to be added repeatedly：

1、start creating checkpoint_1, checkpoint_1.redo is set as curInsert;

2、before set checkPoint_1.nextXid, a new 2pc is prepared, suppose the xid of this 2pc is 100, and then ShmemVariableCache->nextXid will be advanced as 101;

3、checkPoint_1.nextXid is set as 101;

4、in CheckPointTwoPhase() of checkpoint_1, 2pc_100 won't be copied to disk because its prepare_end_lsn > checkpoint_1.redo;

5、checkPoint_1 is finished, after checkpoint_timeout, start creating checkpoint_2;

6、during checkpoint_2, data of 2pc_100 will be copied to disk;

7、before UpdateControlFile() of checkpoint_2, crash happened;

8、during crash recovery, redo will start from checkpoint_1, and 2pc_100 will be restored first by restoreTwoPhaseData() because xid_100 < checkPoint_1.nextXid, which is 101;

9、because prepare_start_lsn of 2pc_100 > checkpoint_1.redo, 2pc_100 will be added again by xact_redo() during wal replay, resulting in the same 2pc data being added twice;

10、In RecoverPreparedTransactions() -> lock_twophase_recover(), lock the same 2pc will cause panic.

Is the above scenario reasonable, and do you have any good ideas for fixing this bug?

Thanks & Best Regard

回复：The same 2PC data maybe recovered twice

От

"蔡梦娟(玊于)"

Дата:

07 июля 2023 г., 09:48:39

Hi, all

I add a patch for pg11 to fix this bug, hope you can check it.

Thanks & Best Regard

------------------------------------------------------------------
发件人：蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com>
发送时间：2023年7月6日(星期四) 10:02
收件人：pgsql-hackers <pgsql-hackers@postgresql.org>
抄　送：pgsql-bugs <pgsql-bugs@postgresql.org>
主　题：The same 2PC data maybe recovered twice

Hi, all. I want to report a bug about recovery of 2pc data, in current implementation of crash recovery, there are two ways to recover 2pc data:
1、before redo, func restoreTwoPhaseData() will restore 2pc data those xid < ShmemVariableCache->nextXid, which is initialized from checkPoint.nextXid;
2、during redo, func xact_redo() will add 2pc from wal;

The following scenario may cause the same 2pc to be added repeatedly：
1、start creating checkpoint_1, checkpoint_1.redo is set as curInsert;
2、before set checkPoint_1.nextXid, a new 2pc is prepared, suppose the xid of this 2pc is 100, and then ShmemVariableCache->nextXid will be advanced as 101;
3、checkPoint_1.nextXid is set as 101;
4、in CheckPointTwoPhase() of checkpoint_1, 2pc_100 won't be copied to disk because its prepare_end_lsn > checkpoint_1.redo;
5、checkPoint_1 is finished, after checkpoint_timeout, start creating checkpoint_2;
6、during checkpoint_2, data of 2pc_100 will be copied to disk;
7、before UpdateControlFile() of checkpoint_2, crash happened;
8、during crash recovery, redo will start from checkpoint_1, and 2pc_100 will be restored first by restoreTwoPhaseData() because xid_100 < checkPoint_1.nextXid, which is 101;
9、because prepare_start_lsn of 2pc_100 > checkpoint_1.redo, 2pc_100 will be added again by xact_redo() during wal replay, resulting in the same 2pc data being added twice;
10、In RecoverPreparedTransactions() -> lock_twophase_recover(), lock the same 2pc will cause panic.

Is the above scenario reasonable, and do you have any good ideas for fixing this bug?

Thanks & Best Regard

Вложения

v1-0001-Fix-a-2PC-transaction-maybe-recovered-twice_11.patch

Re: The same 2PC data maybe recovered twice

От

Andy Fan

Дата:

12 июля 2023 г., 02:57:44

Hi:

On Sat, Jul 8, 2023 at 2:53 AM 蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com> wrote:

Hi, all
I add a patch for pg11 to fix this bug, hope you can check it.

Thanks for the bug report and patch! Usually we talk about bugs

against the master branch, no people want to check out a history

branch and do the discussion there:) This bug is reproducible on

the master IIUC.

I dislike the patch here because it uses more CPU cycles to detect

duplication for every 2pc record. How many CPU cycles we use

depends on how many 2pc are used. How about detecting such

duplication only at restoreTwoPhaseData stage? Instead of

ProcessTwoPhaseBuffer:

if (TransactionIdFollowsOrEquals(xid, origNextXid))
{
...
ereport(WARNING,
(errmsg("removing future two-phase state file for transaction %u",
xid)));
RemoveTwoPhaseFile(xid, true);
...
}

we use:

if (TwoPhaseFileHeader.startup_lsn > checkpoint.redo)
{
ereport(WARNING,
(errmsg("removing future two-phase state file for transaction %u",
xid)));
}

We have several advantages with this approach. a). We only care

about the restoreTwoPhaseData, not for every WAL record recovery.

b). We use constant comparison rather than an-array-for-loop. c).

It is better design since we avoid the issue at the first place rather

than allowing it at the first stage and fix that at the following stage.

The only blocker I know is currently we don't write startup_lsn into

the 2pc checkpoint file and if we do that, the decode on the old 2pc

file will fail. We also have several choices here.

a). Notify users to complete all the pending 2pc before upgrading

within manual. b). Use a different MAGIC NUMBER in the 2pc

checkpoint file to distinguish the 2 versions. Basically I prefer

the method a).

Any suggestion is welcome.

------------------------------------------------------------------
发件人：蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com>
发送时间：2023年7月6日(星期四) 10:02
收件人：pgsql-hackers <pgsql-hackers@postgresql.org>
抄　送：pgsql-bugs <pgsql-bugs@postgresql.org>
主　题：The same 2PC data maybe recovered twice

Hi, all. I want to report a bug about recovery of 2pc data, in current implementation of crash recovery, there are two ways to recover 2pc data:
1、before redo, func restoreTwoPhaseData() will restore 2pc data those xid < ShmemVariableCache->nextXid, which is initialized from checkPoint.nextXid;
2、during redo, func xact_redo() will add 2pc from wal;

The following scenario may cause the same 2pc to be added repeatedly：
1、start creating checkpoint_1, checkpoint_1.redo is set as curInsert;
2、before set checkPoint_1.nextXid, a new 2pc is prepared, suppose the xid of this 2pc is 100, and then ShmemVariableCache->nextXid will be advanced as 101;
3、checkPoint_1.nextXid is set as 101;
4、in CheckPointTwoPhase() of checkpoint_1, 2pc_100 won't be copied to disk because its prepare_end_lsn > checkpoint_1.redo;
5、checkPoint_1 is finished, after checkpoint_timeout, start creating checkpoint_2;
6、during checkpoint_2, data of 2pc_100 will be copied to disk;
7、before UpdateControlFile() of checkpoint_2, crash happened;
8、during crash recovery, redo will start from checkpoint_1, and 2pc_100 will be restored first by restoreTwoPhaseData() because xid_100 < checkPoint_1.nextXid, which is 101;
9、because prepare_start_lsn of 2pc_100 > checkpoint_1.redo, 2pc_100 will be added again by xact_redo() during wal replay, resulting in the same 2pc data being added twice;
10、In RecoverPreparedTransactions() -> lock_twophase_recover(), lock the same 2pc will cause panic.

Is the above scenario reasonable, and do you have any good ideas for fixing this bug?

Thanks & Best Regard

Best Regards