Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
Дата
Msg-id 20210715.142235.660428600377434237.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-bugs
At Wed, 14 Jul 2021 19:10:26 -0400, Jeff Janes <jeff.janes@gmail.com> wrote in 
> On Tue, Jul 13, 2021 at 10:12 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> wrote:
> > Useless WAL files will be removd after a checkpoint runs.
> >
> 
> They should be, but they are not.  That is the bug.   They just hang
> around, checkpoint after checkpoint.  Some of them do get cleaned up, to
> make up for new ones created during that cycle.  It treats
> max_slot_wal_keep the same way it treats wal_keep_size (but only if a
> "lost" slot is hanging around).  If you drop the lost slot, only then does
> it remove all the accumulated WAL at the next checkpoint.

Thanks! I saw the issue here.  Some investigation showd me a doubious
motion of XLogCtl->repliationSlotMinLSN.  Slot invalidation is
forgetting to recalculate it and that misbehavior retreats the segment
horizon.

So the attached worked for me.  I'll repost the polished version
including test.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c7c928f50b..0fc0feb88e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9301,6 +9301,15 @@ CreateCheckPoint(int flags)
     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
     KeepLogSeg(recptr, &_logSegNo);
     InvalidateObsoleteReplicationSlots(_logSegNo);
+
+    /*
+     * Some slots may have been gone, recalculate the segments to keep based on
+     * the remaining slots.
+     */
+    ReplicationSlotsComputeRequiredLSN();
+    XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
+    KeepLogSeg(recptr, &_logSegNo);
+    
     _logSegNo--;
     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
 
@@ -9641,6 +9650,15 @@ CreateRestartPoint(int flags)
     endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
     KeepLogSeg(endptr, &_logSegNo);
     InvalidateObsoleteReplicationSlots(_logSegNo);
+
+    /*
+     * Some slots may have been gone, recalculate the segments to keep based on
+     * the remaining slots.
+     */
+    ReplicationSlotsComputeRequiredLSN();
+    XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
+    KeepLogSeg(endptr, &_logSegNo);
+
     _logSegNo--;
 
     /*

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17106: Renaming system types is possible and it potentially leads to a crash
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows