Обсуждение: [BUG] Incorrect historic snapshot may be serialized to disk during fast-forwarding
Hi,
When working on another historic snapshot's bug in [1], I find the $subject.
Here is a test case, but we need to add some log in SnapBuildSerialize() first:
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 6e18baa33cb..6d13b2d811b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1523,6 +1523,19 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
/* consistent snapshots have no next phase */
Assert(builder->next_phase_at == InvalidTransactionId);
+ StringInfoData logbuf;
+ initStringInfo(&logbuf);
+ appendStringInfo(&logbuf, "SnapBuildSerialize: lsn: %X/%08X xmin: %u, xmax: %u, committed: ",
+ LSN_FORMAT_ARGS(lsn), builder->xmin, builder->xmax);
+ for (size_t i = 0; i < builder->committed.xcnt; i++)
+ {
+ if (i > 0)
+ appendStringInfoString(&logbuf, ", ");
+ appendStringInfo(&logbuf, "%u", builder->committed.xip[i]);
+ }
+ elog(LOG, "%s", logbuf.data);
+ pfree(logbuf.data);
+
/*
* We identify snapshots by the LSN they are valid for. We don't need to
* include timelines in the name as each LSN maps to exactly one timeline
1) create table t (id int) with (user_catalog_table = true);
2) select pg_create_logical_replication_slot('s1', 'test_decoding');
3) select pg_create_logical_replication_slot('s2', 'test_decoding');
4) insert into t values (1);
5) select pg_replication_slot_advance('s1', pg_current_wal_lsn());
6) select pg_logical_slot_get_changes('s2', pg_current_wal_lsn(), null);
Then we will find some log like this:
LOG: SnapBuildSerialize: lsn: 0/017D1318 xmin: 768, xmax: 768, committed:
STATEMENT: select pg_replication_slot_advance('s1', pg_current_wal_lsn());
LOG: SnapBuildSerialize: lsn: 0/017D1318 xmin: 768, xmax: 769, committed: 768
STATEMENT: select pg_logical_slot_get_changes('s2', pg_current_wal_lsn(), null);
At the same lsn, we get two different historic snapshots, and the first one (which is incorrect) is serialized to disk.
The main reason is that we don't handle XLOG_HEAP2_NEW_CID during fast-forwarding, so we don't consider the insert as
havinga catalog change.
Attach a patch to fix it.
Looking forward to your reply.
[1]
https://www.postgresql.org/message-id/tencent_21E152AD504A814C071EDF41A4DD7BA84D06%40qq.com
--
Regards,
ChangAo Chen
Вложения
Re:[BUG] Incorrect historic snapshot may be serialized to disk during fast-forwarding
От
ocean_li_996
Дата:
Hi ChangAo,
Thanks for your analyze and report.
At 2025-11-23 00:07:59, "cca5507" <cca5507@qq.com> wrote:
>The main reason is that we don't handle XLOG_HEAP2_NEW_CID during fast-forwarding, so we don't consider the insert as having a catalog change.Yeah. Refers to code below:```case XLOG_HEAP2_NEW_CID:if (!ctx->fast_forward){xl_heap_new_cid *xlrec;xlrec = (xl_heap_new_cid *) XLogRecGetData(buf->record);SnapBuildProcessNewCid(builder, xid, buf->origptr, xlrec);break;}```It is clear that transaction is skipped set has catalog change (in SnapBuildProcessNewCid) duraing fast forward.However, in practice this does not cause any actual issues, because for the "real" system catalogs (those not
specified via user_catalog_table and thus critical for decoding), any HEAP2_NEW_CID changes are always accompanied
by some invalid messages. As a result, they are ultimately marked as catalog changes.
Still, from a code perspective, I would prefer fixing this behavior for logical consistency across the snapshot
builder, even if it doesn’t manifest as a runtime problem today. But we can not provide a test case.
>Attach a patch to fix it.The patch in attachment is better for me. What do you think?Bset regards,Haiyang Li
Вложения
Hi Haiyang, > The patch in attachment is better for me. What do you think? The v2-0001 LGTM. A small suggestion: We should move the 'break' out of the 'if', because we don't want it fall through to XLOG_HEAP2_REWRITE if we are fast-forwarding. -- Regards, ChangAo Chen
Hi ChangAo,
At 2025-11-23 13:31:49, "cca5507" <cca5507@qq.com> wrote: >> The patch in attachment is better for me. What do you think? > >The v2-0001 LGTM. > >A small suggestion: > >We should move the 'break' out of the 'if', because we don't want it fall through to XLOG_HEAP2_REWRITE if we are fast-forwarding.Fair. The patch updated is provided in attachment.Best regards,Haiyang Li