Обсуждение: logical: fix recomputation required LSN on restart_lsn-only advancement
Hi, While reading logical replication code, I found an issue in LogicalConfirmReceivedLocation(). In LogicalConfirmReceivedLocation(), updated_restart is tracked independently from updated_xmin, and the slot is marked dirtyand saved when either one changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only called inside"if (updated_xmin)”. So for the restart-only case: * updated_restart = true * updated_xmin = false * ReplicationSlotSave() runs * ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is false That means the global retention point managed by XLogSetReplicationSlotMinimumLSN() can stay stale until some later unrelatedevent recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the global minimum from slot restat_lsn,skipping it after a restart-only advance can retain excess WAL and may lead to WAL bloat. This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”. Looks like this issue has been there for a long time, so if this analysis is correct, it may also be worth back-patching. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Вложения
On Tue, Apr 21, 2026 at 10:16 AM Chao Li <li.evan.chao@gmail.com> wrote:
Hi,
While reading logical replication code, I found an issue in LogicalConfirmReceivedLocation().
In LogicalConfirmReceivedLocation(), updated_restart is tracked independently from updated_xmin, and the slot is marked dirty and saved when either one changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only called inside "if (updated_xmin)”.
So for the restart-only case:
* updated_restart = true
* updated_xmin = false
* ReplicationSlotSave() runs
* ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is false
That means the global retention point managed by XLogSetReplicationSlotMinimumLSN() can stay stale until some later unrelated event recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the global minimum from slot restat_lsn, skipping it after a restart-only advance can retain excess WAL and may lead to WAL bloat.
This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”.
Looks like this issue has been there for a long time, so if this analysis is correct, it may also be worth back-patching.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
I think this change is reasonable.
This makes the recomputation condition match the state it actually depends on.
This makes the recomputation condition match the state it actually depends on.
Regards,
Xunqi Hu
Chao Li <li.evan.chao@gmail.com> wrote: > While reading logical replication code, I found an issue in LogicalConfirmReceivedLocation(). > > In LogicalConfirmReceivedLocation(), updated_restart is tracked independently from updated_xmin, and the slot is markeddirty and saved when either one changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only calledinside "if (updated_xmin)”. > > So for the restart-only case: > > * updated_restart = true > * updated_xmin = false > * ReplicationSlotSave() runs > * ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is false > > That means the global retention point managed by XLogSetReplicationSlotMinimumLSN() can stay stale until some later unrelatedevent recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the global minimum from slot restat_lsn,skipping it after a restart-only advance can retain excess WAL and may lead to WAL bloat. > > This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”. FYI, this overlaps with another post in the REPACK thread [1]. > Looks like this issue has been there for a long time, so if this analysis is correct, it may also be worth back-patching. As REPACK in PG 19 does not let xmin advance (that should be fixed in the future), I think makes sense to apply [1] to v19. However, during logical replication, xmin (IMO) gets updated rather often, so the problem should not be that severe in earlier versions. [1] https://www.postgresql.org/message-id/TYRPR01MB14195633567DA00ABD42570B794592%40TYRPR01MB14195.jpnprd01.prod.outlook.com -- Antonin Houska Web: https://www.cybertec-postgresql.com
> On Apr 21, 2026, at 15:09, Antonin Houska <ah@cybertec.at> wrote: > > Chao Li <li.evan.chao@gmail.com> wrote: > >> While reading logical replication code, I found an issue in LogicalConfirmReceivedLocation(). >> >> In LogicalConfirmReceivedLocation(), updated_restart is tracked independently from updated_xmin, and the slot is markeddirty and saved when either one changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only calledinside "if (updated_xmin)”. >> >> So for the restart-only case: >> >> * updated_restart = true >> * updated_xmin = false >> * ReplicationSlotSave() runs >> * ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is false >> >> That means the global retention point managed by XLogSetReplicationSlotMinimumLSN() can stay stale until some later unrelatedevent recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the global minimum from slot restat_lsn,skipping it after a restart-only advance can retain excess WAL and may lead to WAL bloat. >> >> This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”. > > FYI, this overlaps with another post in the REPACK thread [1]. > >> Looks like this issue has been there for a long time, so if this analysis is correct, it may also be worth back-patching. > > As REPACK in PG 19 does not let xmin advance (that should be fixed in the > future), I think makes sense to apply [1] to v19. However, during logical > replication, xmin (IMO) gets updated rather often, so the problem should not > be that severe in earlier versions. > > [1] https://www.postgresql.org/message-id/TYRPR01MB14195633567DA00ABD42570B794592%40TYRPR01MB14195.jpnprd01.prod.outlook.com > > -- > Antonin Houska > Web: https://www.cybertec-postgresql.com Thanks for pointing out that. I will review that patch. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/