Re: [HACKERS] Restricting maximum keep segments by repslots

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: [HACKERS] Restricting maximum keep segments by repslots
Дата
Msg-id 20181026.112636.147537766.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] Restricting maximum keep segments by repslots  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: [HACKERS] Restricting maximum keep segments by repslots  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
At Thu, 25 Oct 2018 21:55:18 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20181025.215518.189844649.horiguchi.kyotaro@lab.ntt.co.jp>
> > =# alter system set max_slot_wal_keep_size to '64MB'; -- while
> > wal_keep_segments is 0
> > =# select pg_reload_conf();
> > =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> > remain_pretty from pg_replication_slots ;
> >  slot_name | wal_status |  remain  | remain_pretty
> > -----------+------------+----------+---------------
> >  1         | streaming  | 83885648 | 80 MB
> > (1 row)
> > 
> > ** consume 80MB WAL, and do CHECKPOINT **
> > 
> > =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> > remain_pretty from pg_replication_slots ;
> >  slot_name | wal_status | remain | remain_pretty
> > -----------+------------+--------+---------------
> >  1         | lost       |      0 | 0 bytes
> > (1 row)
> > =# select count(*) from pg_logical_slot_get_changes('1', NULL, NULL);
> >  count
> > -------
> >     15
> > (1 row)
> 
> Mmm. The function looks into the segment already open before
> losing the segment in the file system (precisely, its direcotory
> entry has been deleted). So just 1 lost segment doesn't
> matter. Please try losing more one segment.

I considered this a bit more and the attached patch let
XLogReadRecord() check for segment removal every time it is
called and emits the following error in the case.

> =# select * from pg_logical_slot_get_changes('s1', NULL, NULL);
> ERROR:  WAL record at 0/870001B0 no longer available
> DETAIL:  The segment for the record has been removed.

The reason for doing that in the fucntion is it can happen also
for physical replication when walsender is active but far
behind. The removed(renamed)-but-still-open segment may be
recycled and can be overwritten while reading, and it will be
caught by page/record validation. It is substantially lost in
that sense.  I don't think the strictness is useful for anything..

Thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 775f6366d78ac6818023cc158e37c70119246e19 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 26 Oct 2018 10:07:05 +0900
Subject: [PATCH 5/5] Check removal of in-read segment file.

Checkpoint can remove or recycle a segment file while it is being read
by ReadRecord. This patch checks for the case and error out
immedaitely.  Reading recycled file is basically safe and
inconsistenty caused by overwrites as new segment will be caught by
page/record validation. So this is only for keeping consistency with
the wal_status shown in pg_replication_slots.
---
 src/backend/access/transam/xlogreader.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 0768ca7822..a6c97cf260 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -217,6 +217,7 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 {
     XLogRecord *record;
     XLogRecPtr    targetPagePtr;
+    XLogSegNo    targetSegNo;
     bool        randAccess;
     uint32        len,
                 total_len;
@@ -270,6 +271,18 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
     targetPagePtr = RecPtr - (RecPtr % XLOG_BLCKSZ);
     targetRecOff = RecPtr % XLOG_BLCKSZ;
 
+    /*
+     * checkpoint can remove the segment currently looking for.  make sure the
+     * current segment is still exists. We check this only once per record.
+     */
+    XLByteToSeg(targetPagePtr, targetSegNo, state->wal_segment_size);
+    if (targetSegNo <= XLogGetLastRemovedSegno())
+        ereport(ERROR,
+                (errcode(ERRCODE_NO_DATA),
+                 errmsg("WAL record at %X/%X no longer available",
+                        (uint32)(RecPtr >> 32), (uint32) RecPtr),
+                 errdetail("The segment for the record has been removed.")));
+            
     /*
      * Read the page containing the record into state->readBuf. Request enough
      * byte to cover the whole record header, or at least the part of it that
-- 
2.16.3


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: RE: PostgreSQL Limits and lack of documentation about them.
Следующее
От: Ian Barwick
Дата:
Сообщение: Re: Function to promote standby servers