Обсуждение: The same 2PC data maybe recovered twice
------------------------------------------------------------------发件人:蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com>发送时间:2023年7月6日(星期四) 10:02收件人:pgsql-hackers <pgsql-hackers@postgresql.org>抄 送:pgsql-bugs <pgsql-bugs@postgresql.org>主 题:The same 2PC data maybe recovered twiceHi, all. I want to report a bug about recovery of 2pc data, in current implementation of crash recovery, there are two ways to recover 2pc data:1、before redo, func restoreTwoPhaseData() will restore 2pc data those xid < ShmemVariableCache->nextXid, which is initialized from checkPoint.nextXid;2、during redo, func xact_redo() will add 2pc from wal;The following scenario may cause the same 2pc to be added repeatedly:1、start creating checkpoint_1, checkpoint_1.redo is set as curInsert;2、before set checkPoint_1.nextXid, a new 2pc is prepared, suppose the xid of this 2pc is 100, and then ShmemVariableCache->nextXid will be advanced as 101;3、checkPoint_1.nextXid is set as 101;4、in CheckPointTwoPhase() of checkpoint_1, 2pc_100 won't be copied to disk because its prepare_end_lsn > checkpoint_1.redo;5、checkPoint_1 is finished, after checkpoint_timeout, start creating checkpoint_2;6、during checkpoint_2, data of 2pc_100 will be copied to disk;7、before UpdateControlFile() of checkpoint_2, crash happened;8、during crash recovery, redo will start from checkpoint_1, and 2pc_100 will be restored first by restoreTwoPhaseData() because xid_100 < checkPoint_1.nextXid, which is 101;9、because prepare_start_lsn of 2pc_100 > checkpoint_1.redo, 2pc_100 will be added again by xact_redo() during wal replay, resulting in the same 2pc data being added twice;10、In RecoverPreparedTransactions() -> lock_twophase_recover(), lock the same 2pc will cause panic.Is the above scenario reasonable, and do you have any good ideas for fixing this bug?Thanks & Best Regard
Вложения
Hi, allI add a patch for pg11 to fix this bug, hope you can check it.
{
...
ereport(WARNING,
(errmsg("removing future two-phase state file for transaction %u",
xid)));
RemoveTwoPhaseFile(xid, true);
...
}
{
ereport(WARNING,
(errmsg("removing future two-phase state file for transaction %u",
xid)));
}
------------------------------------------------------------------发件人:蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com>发送时间:2023年7月6日(星期四) 10:02收件人:pgsql-hackers <pgsql-hackers@postgresql.org>抄 送:pgsql-bugs <pgsql-bugs@postgresql.org>主 题:The same 2PC data maybe recovered twiceHi, all. I want to report a bug about recovery of 2pc data, in current implementation of crash recovery, there are two ways to recover 2pc data:1、before redo, func restoreTwoPhaseData() will restore 2pc data those xid < ShmemVariableCache->nextXid, which is initialized from checkPoint.nextXid;2、during redo, func xact_redo() will add 2pc from wal;The following scenario may cause the same 2pc to be added repeatedly:1、start creating checkpoint_1, checkpoint_1.redo is set as curInsert;2、before set checkPoint_1.nextXid, a new 2pc is prepared, suppose the xid of this 2pc is 100, and then ShmemVariableCache->nextXid will be advanced as 101;3、checkPoint_1.nextXid is set as 101;4、in CheckPointTwoPhase() of checkpoint_1, 2pc_100 won't be copied to disk because its prepare_end_lsn > checkpoint_1.redo;5、checkPoint_1 is finished, after checkpoint_timeout, start creating checkpoint_2;6、during checkpoint_2, data of 2pc_100 will be copied to disk;7、before UpdateControlFile() of checkpoint_2, crash happened;8、during crash recovery, redo will start from checkpoint_1, and 2pc_100 will be restored first by restoreTwoPhaseData() because xid_100 < checkPoint_1.nextXid, which is 101;9、because prepare_start_lsn of 2pc_100 > checkpoint_1.redo, 2pc_100 will be added again by xact_redo() during wal replay, resulting in the same 2pc data being added twice;10、In RecoverPreparedTransactions() -> lock_twophase_recover(), lock the same 2pc will cause panic.Is the above scenario reasonable, and do you have any good ideas for fixing this bug?Thanks & Best Regard
Hi, allI add a patch for pg11 to fix this bug, hope you can check it.
{
...
ereport(WARNING,
(errmsg("removing future two-phase state file for transaction %u",
xid)));
RemoveTwoPhaseFile(xid, true);
...
}
{
ereport(WARNING,
(errmsg("removing future two-phase state file for transaction %u",
xid)));
}
------------------------------------------------------------------发件人:蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com>发送时间:2023年7月6日(星期四) 10:02收件人:pgsql-hackers <pgsql-hackers@postgresql.org>抄 送:pgsql-bugs <pgsql-bugs@postgresql.org>主 题:The same 2PC data maybe recovered twiceHi, all. I want to report a bug about recovery of 2pc data, in current implementation of crash recovery, there are two ways to recover 2pc data:1、before redo, func restoreTwoPhaseData() will restore 2pc data those xid < ShmemVariableCache->nextXid, which is initialized from checkPoint.nextXid;2、during redo, func xact_redo() will add 2pc from wal;The following scenario may cause the same 2pc to be added repeatedly:1、start creating checkpoint_1, checkpoint_1.redo is set as curInsert;2、before set checkPoint_1.nextXid, a new 2pc is prepared, suppose the xid of this 2pc is 100, and then ShmemVariableCache->nextXid will be advanced as 101;3、checkPoint_1.nextXid is set as 101;4、in CheckPointTwoPhase() of checkpoint_1, 2pc_100 won't be copied to disk because its prepare_end_lsn > checkpoint_1.redo;5、checkPoint_1 is finished, after checkpoint_timeout, start creating checkpoint_2;6、during checkpoint_2, data of 2pc_100 will be copied to disk;7、before UpdateControlFile() of checkpoint_2, crash happened;8、during crash recovery, redo will start from checkpoint_1, and 2pc_100 will be restored first by restoreTwoPhaseData() because xid_100 < checkPoint_1.nextXid, which is 101;9、because prepare_start_lsn of 2pc_100 > checkpoint_1.redo, 2pc_100 will be added again by xact_redo() during wal replay, resulting in the same 2pc data being added twice;10、In RecoverPreparedTransactions() -> lock_twophase_recover(), lock the same 2pc will cause panic.Is the above scenario reasonable, and do you have any good ideas for fixing this bug?Thanks & Best Regard
Hi Michael,
Thank you for your reply. I reviewed the thread you mentioned, and it seems that the issue needing to be fixed is not the same as the one I previously raised.
I am considering whether the 2PC file check logic in PrepareRedoAdd() can be modified. Currently, each time a XLOG_XACT_PREPARE WAL entry is replayed, it checks for the corresponding 2PC file using access(). Each access operation creates a dentry in the OS, and in most cases, the file being accessed does not exist. When there are many 2PC transactions, this logic may lead to an increase in OS slab memory. In worse scenarios, it could cause the reference count of the parent directory's dentry to overflow, potentially leading to an OS crash. Typically, when accessing existing files, the disk will fill up before the dentry reference count overflows.
Therefore, I would like to propose a modification. Attached is my patch for your review, and I hope you can take a look at it.
Best regards,
suyu.cmj
Вложения
On Mon, Jul 14, 2025 at 02:46:25PM +0800, CAI, Mengjuan wrote: > Thank you for your reply. I reviewed the thread you mentioned, and > it seems that the issue needing to be fixed is not the same as the > one I previously raised. > I am considering whether the 2PC file check logic in > PrepareRedoAdd() can be modified. Currently, each time a > XLOG_XACT_PREPARE WAL entry is replayed, it checks for the > corresponding 2PC file using access(). Each access operation creates > a dentry in the OS, and in most cases, the file being accessed does > not exist. When there are many 2PC transactions, this logic may lead > to an increase in OS slab memory. In worse scenarios, it could cause > the reference count of the parent directory's dentry to overflow, > potentially leading to an OS crash. Typically, when accessing > existing files, the disk will fill up before the dentry reference > count overflows. > Therefore, I would like to propose a modification. Attached is my > patch for your review, and I hope you can take a look at it. @@ -2520,8 +2520,16 @@ PrepareRedoAdd(FullTransactionId fxid, char *buf, [...] - if (!XLogRecPtrIsInvalid(start_lsn)) + if (!XLogRecPtrIsInvalid(start_lsn) && !reachedConsistency) Actually, what you are doing is incorrect because we could miss some ERRORs for example if a base backup was incorrect if come files were present in pg_twophase? It's not really true that what you are changing here has no interaction with the beginning of recovery, the other thread is about the fact that reading the 2PC files from disk when !reachedConsistency is a bad concept that we should avoid, impacting the assumption the code path you are changing here relies on. At the end, it may be possible that we're able to remove this check entirely.. -- Michael
Вложения
> present in pg_twophase?
> It's not really true that what you are changing here has no
> interaction with the beginning of recovery, the other thread is about
> the fact that reading the 2PC files from disk when !reachedConsistency
> is a bad concept that we should avoid, impacting the assumption the
> code path you are changing here relies on. At the end, it may be
> possible that we're able to remove this check entirely..