Обсуждение: logical: fix recomputation required LSN on restart_lsn-only advancement

Поиск
Список
Период
Сортировка

logical: fix recomputation required LSN on restart_lsn-only advancement

От
Chao Li
Дата:
Hi,

While reading logical replication code, I found an issue in LogicalConfirmReceivedLocation().

In LogicalConfirmReceivedLocation(), updated_restart is tracked independently from updated_xmin, and the slot is marked
dirtyand saved when either one changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only called
inside"if (updated_xmin)”.  

So for the restart-only case:

* updated_restart = true
* updated_xmin = false
* ReplicationSlotSave() runs
* ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is false

That means the global retention point managed by XLogSetReplicationSlotMinimumLSN() can stay stale until some later
unrelatedevent recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the global minimum from slot
restat_lsn,skipping it after a restart-only advance can retain excess WAL and may lead to WAL bloat. 

This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”.

Looks like this issue has been there for a long time, so if this analysis is correct, it may also be worth
back-patching.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/





Вложения

Re: logical: fix recomputation required LSN on restart_lsn-only advancement

От
Hu Xunqi
Дата:
On Tue, Apr 21, 2026 at 10:16 AM Chao Li <li.evan.chao@gmail.com> wrote:
Hi,

While reading logical replication code, I found an issue in LogicalConfirmReceivedLocation().

In LogicalConfirmReceivedLocation(), updated_restart is tracked independently from updated_xmin, and the slot is marked dirty and saved when either one changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only called inside "if (updated_xmin)”.

So for the restart-only case:

* updated_restart = true
* updated_xmin = false
* ReplicationSlotSave() runs
* ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is false

That means the global retention point managed by XLogSetReplicationSlotMinimumLSN() can stay stale until some later unrelated event recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the global minimum from slot restat_lsn, skipping it after a restart-only advance can retain excess WAL and may lead to WAL bloat.

This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”.

Looks like this issue has been there for a long time, so if this analysis is correct, it may also be worth back-patching.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

 
I think this change is reasonable.
This makes the recomputation condition match the state it actually depends on.

Regards,
Xunqi Hu

Re: logical: fix recomputation required LSN on restart_lsn-only advancement

От
Antonin Houska
Дата:
Chao Li <li.evan.chao@gmail.com> wrote:

> While reading logical replication code, I found an issue in LogicalConfirmReceivedLocation().
>
> In LogicalConfirmReceivedLocation(), updated_restart is tracked independently from updated_xmin, and the slot is
markeddirty and saved when either one changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only
calledinside "if (updated_xmin)”.  
>
> So for the restart-only case:
>
> * updated_restart = true
> * updated_xmin = false
> * ReplicationSlotSave() runs
> * ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is false
>
> That means the global retention point managed by XLogSetReplicationSlotMinimumLSN() can stay stale until some later
unrelatedevent recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the global minimum from slot
restat_lsn,skipping it after a restart-only advance can retain excess WAL and may lead to WAL bloat. 
>
> This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”.

FYI, this overlaps with another post in the REPACK thread [1].

> Looks like this issue has been there for a long time, so if this analysis is correct, it may also be worth
back-patching.

As REPACK in PG 19 does not let xmin advance (that should be fixed in the
future), I think makes sense to apply [1] to v19. However, during logical
replication, xmin (IMO) gets updated rather often, so the problem should not
be that severe in earlier versions.

[1]
https://www.postgresql.org/message-id/TYRPR01MB14195633567DA00ABD42570B794592%40TYRPR01MB14195.jpnprd01.prod.outlook.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: logical: fix recomputation required LSN on restart_lsn-only advancement

От
Chao Li
Дата:

> On Apr 21, 2026, at 15:09, Antonin Houska <ah@cybertec.at> wrote:
>
> Chao Li <li.evan.chao@gmail.com> wrote:
>
>> While reading logical replication code, I found an issue in LogicalConfirmReceivedLocation().
>>
>> In LogicalConfirmReceivedLocation(), updated_restart is tracked independently from updated_xmin, and the slot is
markeddirty and saved when either one changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only
calledinside "if (updated_xmin)”.  
>>
>> So for the restart-only case:
>>
>> * updated_restart = true
>> * updated_xmin = false
>> * ReplicationSlotSave() runs
>> * ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is false
>>
>> That means the global retention point managed by XLogSetReplicationSlotMinimumLSN() can stay stale until some later
unrelatedevent recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the global minimum from slot
restat_lsn,skipping it after a restart-only advance can retain excess WAL and may lead to WAL bloat. 
>>
>> This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() under “if (updated_restart)”.
>
> FYI, this overlaps with another post in the REPACK thread [1].
>
>> Looks like this issue has been there for a long time, so if this analysis is correct, it may also be worth
back-patching.
>
> As REPACK in PG 19 does not let xmin advance (that should be fixed in the
> future), I think makes sense to apply [1] to v19. However, during logical
> replication, xmin (IMO) gets updated rather often, so the problem should not
> be that severe in earlier versions.
>
> [1]
https://www.postgresql.org/message-id/TYRPR01MB14195633567DA00ABD42570B794592%40TYRPR01MB14195.jpnprd01.prod.outlook.com
>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com

Thanks for pointing out that. I will review that patch.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/