Re: [HACKERS] Restricting maximum keep segments by repslots

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: [HACKERS] Restricting maximum keep segments by repslots
Дата
Msg-id 20181220.162438.121484007.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] Restricting maximum keep segments by repslots  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: [HACKERS] Restricting maximum keep segments by repslots  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
Thank you for piking this and sorry being late.

At Mon, 19 Nov 2018 13:39:58 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20181119043958.GE4400@paquier.xyz>
> ereport should not be called within xlogreader.c as a base rule:

Ouch! I forgot that. Fixed to use report_invalid_record slightly
changing the message. The code is not required (or cannot be
used) on frontend so #ifndef FRONTENDed the code.

At Tue, 20 Nov 2018 14:07:44 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20181120050744.GJ4400@paquier.xyz>
> On Mon, Nov 19, 2018 at 01:39:58PM +0900, Michael Paquier wrote:
> > I was just coming by to look at bit at the patch series, and bumped
> > into that:
> 
> So I have been looking at the last patch series 0001-0004 posted on this
> thread, and coming from here:
> https://postgr.es/m/20181025.215518.189844649.horiguchi.kyotaro@lab.ntt.co.jp
> 
> /* check that the slot is gone */
> SELECT * FROM pg_replication_slots
> It could be an idea to switch to the expanded mode here, not that it
> matters much still..

No problem doing that. Done.

TAP test complains that it still uses recovery.conf. Fixed. On
the way doing that I added parameter primary_slot_name to
init_from_backup in PostgresNode.pm

> +IsLsnStillAvaiable(XLogRecPtr targetLSN, uint64 *restBytes)
> You mean Available here, not Avaiable.  This function is only used when
> scanning for slot information with pg_replication_slots, so wouldn't it
> be better to just return the status string in this case?

Mmm. Sure. Auto-completion hid it from my eyes. Fixed the name.
The fix sounds reasonable. The function was created as returning
boolean and the name doen't fit the current function. I renamed
the name to GetLsnAvailability() that returns a string.

> Not sure I see the point of the "remain" field, which can be found with
> a simple calculation using the current insertion LSN, the segment size
> and the amount of WAL that the slot is retaining.  It may be interesting
> to document a query to do that though.

It's not that simple. wal_segment_size, max_slot_wal_keep_size,
wal_keep_segments, max_slot_wal_keep_size and the current LSN are
invoved in the calculation which including several conditional
branches, maybe as you see upthread. We could show "the largest
current LSN until WAL is lost" but the "current LSN" is not shown
there. So it is showing the "remain".

> GetOldestXLogFileSegNo() has race conditions if WAL recycling runs in
> parallel, no?  How is it safe to scan pg_wal on a process querying
> pg_replication_slots while another process may manipulate its contents
> (aka the checkpointer or just the startup process with an
> end-of-recovery checkpoint.).  This routine relies on unsafe
> assumptions as this is not concurrent-safe.  You can avoid problems by
> making sure instead that lastRemovedSegNo is initialized correctly at
> startup, which would be normally one segment older than what's in
> pg_wal, which feels a bit hacky to rely on to track the oldest segment.

Concurrent recycling makes the function's result vary between the
segment numbers before and after it. It is unstable but doesn't
matter so much. The reason for the timing is to avoid extra
startup time by a scan over pg_wal that is unncecessary in most
cases.

Anyway the attached patch initializes lastRemovedSegNo in
StartupXLOG().

> It seems to me that GetOldestXLogFileSegNo() should also check for
> segments matching the current timeline, no?

RemoveOldXlogFiles() ignores timeline and the function is made to
behave the same way (in different manner). I added a comment for
the behavior in the function.

> +           if (prev_lost_segs != lost_segs)
> +               ereport(WARNING,
> +                       (errmsg ("some replication slots have lost
> required WAL segments"),
> +                        errdetail_plural(
> +                            "The mostly affected slot has lost %ld
> segment.",
> +                            "The mostly affected slot has lost %ld
> segments.",
> +                            lost_segs, lost_segs)));
> This can become very noisy with the time, and it would be actually
> useful to know which replication slot is impacted by that.

One message per one segment doen't seem so noisy. The reason for
not showing slot identifier individually is just to avoid
complexity comes from involving slot details. DBAs will see the
details in pg_stat_replication.

Anyway I did that in the attached patch. ReplicationSlotsBehind
returns the list of the slot names that behind specified
LSN. With this patch the messages looks as the follows:

WARNING:  some replication slots have lost required WAL segments
DETAIL:  Slot s1 lost 8 segment(s).
WARNING:  some replication slots have lost required WAL segments
DETAIL:  Slots s1, s2, s3 lost at most 9 segment(s).

> +      slot doesn't have valid restart_lsn, this field
> Missing a determinant here, and restart_lsn should have a <literal>
> markup.

structfield? Reworded as below:

|  non-negative. If <structfield>restart_lsn</structfield> is NULL, this
|  field is <literal>unknown</literal>.

I changed "the slot" with "this slot" in the two added fields
(wal_status, remain).

> +    many WAL segments that they fill up the space allotted
> s/allotted/allocated/.

Fixed.

> +      available. The last two states are seen only when
> +      <xref linkend="guc-max-slot-wal-keep-size"/> is non-negative. If the
> +      slot doesn't have valid restart_lsn, this field
> +      is <literal>unknown</literal>.
> I am a bit confused by this statement.  The last two states are "lost"
> and "keeping", but shouldn't "keeping" be the state showing up by
> default as it means that all WAL segments are kept around.

It's "streaming".  I didn't came up with nice words to
distinguish the two states. I'm not sure "keep around" exactly
means but "keeping" here means rather "just not removed yet". The
states could be reworded as the follows:

streaming: kept/keeping/(secure, in the first version)
keeping  : mortal/about to be removed
lost/unkown : (lost/unknown)

Do you have any better wording?

> +# Advance WAL by ten segments (= 160MB) on master
> +advance_wal($node_master, 10);
> +$node_master->safe_psql('postgres', "CHECKPOINT;");
> This makes the tests very costly, which is something we should avoid as
> much as possible.  One trick which could be used here, on top of
> reducing the number of segment switches, is to use initdb
> --wal-segsize=1.

That sounds nice. Done. In the new version the number of segments
can be reduced and a new test item for the initial unkonwn state
as the first item.

Please find the attached new version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From c35115eab0148e44b59eb974821de28684899cd6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 21 Dec 2017 21:20:20 +0900
Subject: [PATCH 1/6] Add WAL relief vent for replication slots

Adds a capability to limit the number of segments kept by replication
slots by a GUC variable.
---
 src/backend/access/transam/xlog.c             | 127 +++++++++++++++++++++-----
 src/backend/replication/slot.c                |  57 ++++++++++++
 src/backend/utils/misc/guc.c                  |  12 +++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h                     |   1 +
 src/include/replication/slot.h                |   1 +
 6 files changed, 174 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c80b14ed97..2a4cec1adf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -101,6 +101,7 @@ int            wal_level = WAL_LEVEL_MINIMAL;
 int            CommitDelay = 0;    /* precommit delay in microseconds */
 int            CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int            wal_retrieve_retry_interval = 5000;
+int            max_slot_wal_keep_size_mb = -1;
 
 #ifdef WAL_DEBUG
 bool        XLOG_DEBUG = false;
@@ -873,6 +874,7 @@ static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 static void LocalSetXLogInsertAllowed(void);
 static void CreateEndOfRecoveryRecord(void);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
+static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr minSlotPtr);
 static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
 static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
 
@@ -9319,6 +9321,53 @@ CreateRestartPoint(int flags)
     return true;
 }
 
+/*
+ * Returns minimum segment number the next checkpoint must leave considering
+ * wal_keep_segments, replication slots and max_slot_wal_keep_size.
+ *
+ * currLSN is the current insert location
+ * minSlotLSN is the minimum restart_lsn of all active slots
+ */
+static XLogSegNo
+GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN)
+{
+    uint64        keepSegs = 0;
+    XLogSegNo    currSeg;
+    XLogSegNo    minSlotSeg;
+
+    XLByteToSeg(currLSN, currSeg, wal_segment_size);
+    XLByteToSeg(minSlotLSN, minSlotSeg, wal_segment_size);
+
+    /*
+     * Calculate keep segments by slots first. The second term of the
+     * condition is just a sanity check.
+     */
+    if (minSlotLSN != InvalidXLogRecPtr && minSlotSeg <= currSeg)
+        keepSegs = currSeg - minSlotSeg;
+
+    /* Cap keepSegs by max_slot_wal_keep_size */
+    if (max_slot_wal_keep_size_mb >= 0)
+    {
+        uint64 limitSegs;
+
+        limitSegs = ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
+
+        /* Apply max_slot_wal_keep_size to keepSegs */
+        if (limitSegs < keepSegs)
+            keepSegs = limitSegs;
+    }
+
+    /* but, keep at least wal_keep_segments segments if any */
+    if (wal_keep_segments > 0 && keepSegs < wal_keep_segments)
+        keepSegs = wal_keep_segments;
+
+    /* avoid underflow, don't go below 1 */
+    if (currSeg <= keepSegs)
+        return 1;
+
+    return currSeg - keepSegs;
+}
+
 /*
  * Retreat *logSegNo to the last segment that we need to retain because of
  * either wal_keep_segments or replication slots.
@@ -9330,38 +9379,66 @@ CreateRestartPoint(int flags)
 static void
 KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 {
-    XLogSegNo    segno;
-    XLogRecPtr    keep;
+    XLogRecPtr    slotminptr = InvalidXLogRecPtr;
+    XLogSegNo    minSegNo;
+    XLogSegNo    slotSegNo;
 
-    XLByteToSeg(recptr, segno, wal_segment_size);
-    keep = XLogGetReplicationSlotMinimumLSN();
+    if (max_replication_slots > 0)
+        slotminptr = XLogGetReplicationSlotMinimumLSN();
 
-    /* compute limit for wal_keep_segments first */
-    if (wal_keep_segments > 0)
+    /*
+     * We should keep certain number of WAL segments after this checkpoint.
+     */
+    minSegNo = GetOldestKeepSegment(recptr, slotminptr);
+
+    /*
+     * warn if the checkpoint flushes the segments required by replication
+     * slots.
+     */
+    if (!XLogRecPtrIsInvalid(slotminptr))
     {
-        /* avoid underflow, don't go below 1 */
-        if (segno <= wal_keep_segments)
-            segno = 1;
+        static XLogSegNo prev_lost_segs = 0;    /* avoid duplicate messages */
+
+        XLByteToSeg(slotminptr, slotSegNo, wal_segment_size);
+
+        if (slotSegNo < minSegNo)
+        {
+            XLogSegNo lost_segs = minSegNo - slotSegNo;
+            if (prev_lost_segs != lost_segs)
+            {
+                /* We have lost a new segment, warn it.*/
+                XLogRecPtr minlsn;
+                char *slot_names;
+                int nslots;
+
+                XLogSegNoOffsetToRecPtr(minSegNo, 0, wal_segment_size, minlsn);
+                slot_names =
+                    ReplicationSlotsEnumerateBehinds(minlsn, ", ", &nslots);
+
+                /*
+                 * Some of the affected slots could have just been removed.
+                 * We don't need show anything here if no affected slot
+                 * remains.
+                 */
+                if (slot_names)
+                {
+                    ereport(WARNING,
+                            (errmsg ("some replication slots have lost required WAL segments"),
+                             errdetail_plural(
+                                 "Slot %s lost %ld segment(s).",
+                                 "Slots %s lost at most %ld segment(s).",
+                                 nslots, slot_names, lost_segs)));
+                }
+            }
+            prev_lost_segs = lost_segs;
+        }
         else
-            segno = segno - wal_keep_segments;
-    }
-
-    /* then check whether slots limit removal further */
-    if (max_replication_slots > 0 && keep != InvalidXLogRecPtr)
-    {
-        XLogSegNo    slotSegNo;
-
-        XLByteToSeg(keep, slotSegNo, wal_segment_size);
-
-        if (slotSegNo <= 0)
-            segno = 1;
-        else if (slotSegNo < segno)
-            segno = slotSegNo;
+            prev_lost_segs = 0;
     }
 
     /* don't delete WAL segments newer than the calculated segment */
-    if (segno < *logSegNo)
-        *logSegNo = segno;
+    if (minSegNo < *logSegNo)
+        *logSegNo = minSegNo;
 }
 
 /*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1f2e7139a7..1805e23171 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1064,6 +1064,63 @@ ReplicationSlotReserveWal(void)
     }
 }
 
+/*
+ * Returns the list of replication slots restart_lsn of whch are behind
+ * specified LSN. Returs palloc'ed character array stuffed with slot names
+ * delimited by the givein separator.  Returns NULL if no slot matches.  If
+ * pnslots is given, the number of the returned slots is returned there.
+ */
+char *
+ReplicationSlotsEnumerateBehinds(XLogRecPtr target, char *separator, int *pnslots)
+{
+    static StringInfoData retstr;
+    static bool retstr_initialized = false;
+    bool insert_separator = false;
+    int i;
+    int nslots = 0;
+
+    Assert (separator);
+    if (max_replication_slots <= 0)
+        return NULL;
+
+    if (!retstr_initialized)
+    {
+        initStringInfo(&retstr);
+        retstr_initialized = true;
+    }
+    else
+        resetStringInfo(&retstr);
+
+    /* construct name list */
+    LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+    for (i = 0 ; i < max_replication_slots ; i++)
+    {
+        ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
+
+        if (s->in_use && s->data.restart_lsn < target)
+        {
+            if (insert_separator)
+                appendStringInfoString(&retstr, separator);
+
+            /*
+             * slot names consist only with lower-case letters. we don't
+             * bother quoting.
+             */
+            appendStringInfoString(&retstr, NameStr(s->data.name));
+            insert_separator = true;
+            nslots++;
+        }
+    }
+    LWLockRelease(ReplicationSlotControlLock);
+
+    /* return the number of slots in the list if requested */
+    if (pnslots)
+        *pnslots = nslots;
+
+    /* return NULL if the result is an empty string */
+    return retstr.data[0] ? retstr.data : NULL;
+}
+
 /*
  * Flush all replication slots to disk.
  *
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fe1939881..438ff723d5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2593,6 +2593,18 @@ static struct config_int ConfigureNamesInt[] =
         NULL, NULL, NULL
     },
 
+    {
+        {"max_slot_wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
+            gettext_noop("Sets the maximum size of extra WALs kept by replication slots."),
+         NULL,
+         GUC_UNIT_MB
+        },
+        &max_slot_wal_keep_size_mb,
+        -1, -1,
+        MAX_KILOBYTES, /* XXX: This is in megabytes, like max/min_wal_size */
+        NULL, NULL, NULL
+    },
+
     {
         {"wal_sender_timeout", PGC_USERSET, REPLICATION_SENDING,
             gettext_noop("Sets the maximum time to wait for WAL replication."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 1fa02d2c93..7b2e07bea1 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -279,6 +279,7 @@
 #max_wal_senders = 10        # max number of walsender processes
                 # (change requires restart)
 #wal_keep_segments = 0        # in logfile segments; 0 disables
+#max_slot_wal_keep_size = -1    # measured in bytes; -1 disables
 #wal_sender_timeout = 60s    # in milliseconds; 0 disables
 
 #max_replication_slots = 10    # max number of replication slots
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index f3a7ba4d42..2cf9c9bc98 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -108,6 +108,7 @@ extern int    wal_segment_size;
 extern int    min_wal_size_mb;
 extern int    max_wal_size_mb;
 extern int    wal_keep_segments;
+extern int    max_slot_wal_keep_size_mb;
 extern int    XLOGbuffers;
 extern int    XLogArchiveTimeout;
 extern int    wal_retrieve_retry_interval;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 7964ae254f..69e4fccb5e 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -199,6 +199,7 @@ extern void ReplicationSlotsComputeRequiredLSN(void);
 extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
+extern char *ReplicationSlotsEnumerateBehinds(XLogRecPtr target, char *separator, int *pnslots);
 
 extern void StartupReplicationSlots(void);
 extern void CheckPointReplicationSlots(void);
-- 
2.16.3

From 131ae101f3bc78d86c0629bef9f653a2f3b0bb93 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 21 Dec 2017 21:23:25 +0900
Subject: [PATCH 2/6] Add monitoring aid for max_slot_wal_keep_size.

Adds two columns "status" and "remain" in pg_replication_slot.
Setting max_slot_wal_keep_size, long-disconnected slots may lose sync.
The two columns show whether the slot is reconnectable or not, or
about to lose reserving WAL segments, and the remaining bytes of WAL
that can be advance until the slot loses reserving WAL records.
---
 contrib/test_decoding/expected/ddl.out |   4 +-
 contrib/test_decoding/sql/ddl.sql      |   2 +
 src/backend/access/transam/xlog.c      | 150 +++++++++++++++++++++++++++++++--
 src/backend/catalog/system_views.sql   |   4 +-
 src/backend/replication/slotfuncs.c    |  16 +++-
 src/include/access/xlog.h              |   1 +
 src/include/catalog/pg_proc.dat        |   6 +-
 src/test/regress/expected/rules.out    |   6 +-
 8 files changed, 172 insertions(+), 17 deletions(-)

diff --git a/contrib/test_decoding/expected/ddl.out b/contrib/test_decoding/expected/ddl.out
index b7c76469fc..c5f52d6ee8 100644
--- a/contrib/test_decoding/expected/ddl.out
+++ b/contrib/test_decoding/expected/ddl.out
@@ -705,8 +705,8 @@ SELECT pg_drop_replication_slot('regression_slot');
 (1 row)
 
 /* check that the slot is gone */
+\x
 SELECT * FROM pg_replication_slots;
- slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin |
restart_lsn| confirmed_flush_lsn 
 

------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
 (0 rows)
 
+\x
diff --git a/contrib/test_decoding/sql/ddl.sql b/contrib/test_decoding/sql/ddl.sql
index c4b10a4cf9..5040d5e85e 100644
--- a/contrib/test_decoding/sql/ddl.sql
+++ b/contrib/test_decoding/sql/ddl.sql
@@ -374,4 +374,6 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
 SELECT pg_drop_replication_slot('regression_slot');
 
 /* check that the slot is gone */
+\x
 SELECT * FROM pg_replication_slots;
+\x
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2a4cec1adf..4a5ab3be40 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -874,7 +874,9 @@ static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 static void LocalSetXLogInsertAllowed(void);
 static void CreateEndOfRecoveryRecord(void);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
-static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr minSlotPtr);
+static XLogSegNo GetOldestXLogFileSegNo(void);
+static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr minSlotPtr,
+                       XLogRecPtr targetLSN, uint64 *restBytes);
 static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
 static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
 
@@ -6654,6 +6656,12 @@ StartupXLOG(void)
      */
     StartupReplicationOrigin();
 
+    /*
+     * Initialize lastRemovedSegNo looking pg_wal directory. The minimum
+     * segment number is 1 so no wrap-around can happen.
+     */
+    XLogCtl->lastRemovedSegNo = GetOldestXLogFileSegNo() - 1;
+
     /*
      * Initialize unlogged LSN. On a clean shutdown, it's restored from the
      * control file. On recovery, all unlogged relations are blown away, so
@@ -9321,19 +9329,115 @@ CreateRestartPoint(int flags)
     return true;
 }
 
+
+/*
+ * Finds the segment number of the oldest file in XLOG directory.
+ *
+ * This function is intended to be used for initialization of
+ * XLogCtl->lastRemovedSegNo.
+ */
+static XLogSegNo
+GetOldestXLogFileSegNo(void)
+{
+    DIR        *xldir;
+    struct dirent *xlde;
+    XLogSegNo segno = 0;
+
+    xldir = AllocateDir(XLOGDIR);
+    while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
+    {
+        TimeLineID tli;
+        XLogSegNo fsegno;
+
+        /* Ignore files that are not XLOG segments */
+        if (!IsXLogFileName(xlde->d_name) &&
+            !IsPartialXLogFileName(xlde->d_name))
+            continue;
+
+        XLogFromFileName(xlde->d_name, &tli, &fsegno, wal_segment_size);
+
+        /*
+         * get minimum segment ignoring timeline ID.  Since RemoveOldXlog
+         * works ignoring timeline ID, this function works the same way.
+         */
+        if (segno == 0 || fsegno < segno)
+            segno = fsegno;
+    }
+
+    FreeDir(xldir);
+
+    return segno;
+}
+
+/*
+ * Returns availability status of the record at given targetLSN
+ *
+ * Returns three kinds of value.
+ * "streaming" when the WAL record at targetLSN is available.
+ * "keeping" when still available but about to be removed by the next
+ * checkpoint.
+ * "lost" when the WAL record at targetLSN is already removed.
+ *
+ * If restBytes is not NULL, sets the remaining LSN bytes to advance until the
+ * segment that contains targetLSN will be removed.
+ */
+char *
+GetLsnAvailability(XLogRecPtr targetLSN, uint64 *restBytes)
+{
+    XLogRecPtr currpos;
+    XLogRecPtr slotPtr;
+    XLogSegNo targetSeg;
+    XLogSegNo tailSeg;
+    XLogSegNo oldestSeg;
+
+    Assert(!XLogRecPtrIsInvalid(targetLSN));
+    Assert(restBytes);
+
+    currpos = GetXLogWriteRecPtr();
+
+    SpinLockAcquire(&XLogCtl->info_lck);
+    oldestSeg = XLogCtl->lastRemovedSegNo;
+    SpinLockRelease(&XLogCtl->info_lck);
+
+    /* oldest segment is just after the last removed segment */
+    oldestSeg++;
+
+    XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+
+    slotPtr = XLogGetReplicationSlotMinimumLSN();
+    tailSeg = GetOldestKeepSegment(currpos, slotPtr, targetLSN, restBytes);
+
+    /* targetSeg is being reserved by slots */
+    if (tailSeg <= targetSeg)
+        return "streaming";
+
+    /* targetSeg is not reserved but still available */
+    if (oldestSeg <= targetSeg)
+        return "keeping";
+
+    /* targetSeg has gone */
+    return    "lost";
+}
+
 /*
  * Returns minimum segment number the next checkpoint must leave considering
  * wal_keep_segments, replication slots and max_slot_wal_keep_size.
  *
  * currLSN is the current insert location
  * minSlotLSN is the minimum restart_lsn of all active slots
+ * targetLSN is used when restBytes is not NULL.
+ *
+ * If restBytes is not NULL, sets the remaining LSN bytes to advance until the
+ * segment that contains targetLSN will be removed.
  */
 static XLogSegNo
-GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN)
+GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
+                     XLogRecPtr targetLSN, uint64 *restBytes)
 {
     uint64        keepSegs = 0;
     XLogSegNo    currSeg;
     XLogSegNo    minSlotSeg;
+    uint64        limitSegs = 0;
 
     XLByteToSeg(currLSN, currSeg, wal_segment_size);
     XLByteToSeg(minSlotLSN, minSlotSeg, wal_segment_size);
@@ -9348,8 +9452,6 @@ GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN)
     /* Cap keepSegs by max_slot_wal_keep_size */
     if (max_slot_wal_keep_size_mb >= 0)
     {
-        uint64 limitSegs;
-
         limitSegs = ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
 
         /* Apply max_slot_wal_keep_size to keepSegs */
@@ -9357,9 +9459,40 @@ GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN)
             keepSegs = limitSegs;
     }
 
-    /* but, keep at least wal_keep_segments segments if any */
-    if (wal_keep_segments > 0 && keepSegs < wal_keep_segments)
-        keepSegs = wal_keep_segments;
+    if (wal_keep_segments > 0)
+    {
+        /* but, keep at least wal_keep_segments segments if any */
+        if (keepSegs < wal_keep_segments)
+            keepSegs = wal_keep_segments;
+
+        /* also, limitSegs should be raised if wal_keep_segments is larger */
+        if (limitSegs < wal_keep_segments)
+            limitSegs = wal_keep_segments;
+    }
+
+    /*
+     * If requested, return remaining LSN bytes to advance until the slot
+     * gives up reserving WAL records.
+     */
+    if (restBytes)
+    {
+        uint64 fragbytes;
+        XLogSegNo targetSeg;
+
+        *restBytes = 0;
+
+        XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+        if (max_slot_wal_keep_size_mb >= 0 && currSeg <= targetSeg + limitSegs)
+        {
+            /*
+             * This slot still has all required segments. Calculate how many
+             * LSN bytes the slot has until it loses targetLSN.
+             */
+            fragbytes = wal_segment_size - (currLSN % wal_segment_size);
+            XLogSegNoOffsetToRecPtr(targetSeg + limitSegs - currSeg, fragbytes,
+                                    wal_segment_size, *restBytes);
+        }
+    }
 
     /* avoid underflow, don't go below 1 */
     if (currSeg <= keepSegs)
@@ -9389,7 +9522,8 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
     /*
      * We should keep certain number of WAL segments after this checkpoint.
      */
-    minSegNo = GetOldestKeepSegment(recptr, slotminptr);
+    minSegNo =
+        GetOldestKeepSegment(recptr, slotminptr, InvalidXLogRecPtr, NULL);
 
     /*
      * warn if the checkpoint flushes the segments required by replication
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5253837b54..9ed00c7a7b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -798,7 +798,9 @@ CREATE VIEW pg_replication_slots AS
             L.xmin,
             L.catalog_xmin,
             L.restart_lsn,
-            L.confirmed_flush_lsn
+            L.confirmed_flush_lsn,
+            L.wal_status,
+            L.remain
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 8782bad4a2..a4a028f4d7 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -185,7 +185,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 Datum
 pg_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_GET_REPLICATION_SLOTS_COLS 11
+#define PG_GET_REPLICATION_SLOTS_COLS 13
     ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
     TupleDesc    tupdesc;
     Tuplestorestate *tupstore;
@@ -307,6 +307,20 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
         else
             nulls[i++] = true;
 
+        if (restart_lsn == InvalidXLogRecPtr)
+        {
+            values[i++] = CStringGetTextDatum("unknown");
+            values[i++] = LSNGetDatum(InvalidXLogRecPtr);
+        }
+        else
+        {
+            uint64    remaining_bytes;
+
+            values[i++] = CStringGetTextDatum(
+                GetLsnAvailability(restart_lsn, &remaining_bytes));
+            values[i++] = Int64GetDatum(remaining_bytes);
+        }
+
         tuplestore_putvalues(tupstore, tupdesc, values, nulls);
     }
     LWLockRelease(ReplicationSlotControlLock);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 2cf9c9bc98..4d4d8101f6 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -302,6 +302,7 @@ extern void ShutdownXLOG(int code, Datum arg);
 extern void InitXLOGAccess(void);
 extern void CreateCheckPoint(int flags);
 extern bool CreateRestartPoint(int flags);
+extern char *GetLsnAvailability(XLogRecPtr targetLSN, uint64 *restBytes);
 extern void XLogPutNextOid(Oid nextOid);
 extern XLogRecPtr XLogRestorePoint(const char *rpName);
 extern void UpdateFullPageWrites(void);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d0a571ef95..5a587e9685 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9655,9 +9655,9 @@
   proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames =>
'{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn}',
+  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames =>
'{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,remain}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e384cd2279..956c3c9525 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1451,8 +1451,10 @@ pg_replication_slots| SELECT l.slot_name,
     l.xmin,
     l.catalog_xmin,
     l.restart_lsn,
-    l.confirmed_flush_lsn
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin,
catalog_xmin,restart_lsn, confirmed_flush_lsn)
 
+    l.confirmed_flush_lsn,
+    l.wal_status,
+    l.remain
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin,
catalog_xmin,restart_lsn, confirmed_flush_lsn, wal_status, remain)
 
      LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
     pg_authid.rolsuper,
-- 
2.16.3

From 845986891fbda7dac41fea9eae76666212a362c4 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Wed, 19 Dec 2018 12:43:57 +0900
Subject: [PATCH 3/6] Add primary_slot_name to init_from_backup in TAP test.

It is convenient that priary_slot_name can be specified on taking a
base backup. This adds a new parameter of the name to the perl
function.
---
 src/test/perl/PostgresNode.pm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 8a2c6fc122..daca2e0085 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -672,11 +672,11 @@ sub init_from_backup
     chmod(0700, $data_path);
 
     # Base configuration for this node
-    $self->append_conf(
-        'postgresql.conf',
-        qq(
-port = $port
-));
+    $self->append_conf('postgresql.conf', qq(port = $port));
+    $self->append_conf('postgresql.conf',
+                       qq(primary_slot_name = $params{primary_slot_name}))
+      if (defined $params{primary_slot_name});
+
     $self->enable_streaming($root_node) if $params{has_streaming};
     $self->enable_restoring($root_node) if $params{has_restoring};
     return;
-- 
2.16.3

From 5714dcde4d6ca7d75cd0aae27fe1e5fc66d03f5d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 21 Dec 2017 17:33:53 +0900
Subject: [PATCH 4/6] TAP test for the slot limit feature

---
 src/test/recovery/t/016_replslot_limit.pl | 185 ++++++++++++++++++++++++++++++
 1 file changed, 185 insertions(+)
 create mode 100644 src/test/recovery/t/016_replslot_limit.pl

diff --git a/src/test/recovery/t/016_replslot_limit.pl b/src/test/recovery/t/016_replslot_limit.pl
new file mode 100644
index 0000000000..8d1d3a2275
--- /dev/null
+++ b/src/test/recovery/t/016_replslot_limit.pl
@@ -0,0 +1,185 @@
+# Test for replication slot limit
+# Ensure that max_slot_wal_keep_size limits the number of WAL files to
+# be kept by replication slot.
+
+use strict;
+use warnings;
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+use Time::HiRes qw(usleep);
+
+$ENV{PGDATABASE} = 'postgres';
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, extra => ['--wal-segsize=1']);
+$node_master->append_conf('postgresql.conf', qq(
+min_wal_size = 2MB
+max_wal_size = 3MB
+));
+$node_master->start;
+$node_master->safe_psql('postgres', "SELECT pg_create_physical_replication_slot('rep1')");
+
+# The slot state should be known before the first connection
+my $result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, remain FROM pg_replication_slots
WHEREslot_name = 'rep1'");
 
+is($result, "|unknown|0", 'non-reserved slot shows unknown');
+
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create a standby linking to it using a replication slot
+my $node_standby = get_new_node('standby_1');
+$node_standby->init_from_backup($node_master, $backup_name, has_streaming => 1, primary_slot_name => 'rep1');
+
+$node_standby->start;
+
+# Wait until standby has replayed enough data on the standby
+my $start_lsn = $node_master->lsn('write');
+$node_master->wait_for_catchup($node_standby, 'replay', $start_lsn);
+
+# Stop standby
+$node_standby->stop;
+
+
+# Preparation done, currently the slot must be streaming state.
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, remain FROM pg_replication_slots WHERE
slot_name= 'rep1'");
 
+is($result, "$start_lsn|streaming|0", 'check initial state of standby');
+
+# Advance WAL by five segments (= 5MB) on master
+advance_wal($node_master, 5);
+$node_master->safe_psql('postgres', "CHECKPOINT;");
+
+# The slot is unconditionally "safe" with the default setting.
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, remain FROM pg_replication_slots WHERE
slot_name= 'rep1'");
 
+is($result, "$start_lsn|streaming|0", 'check that slot is keeping all segments');
+
+# The stanby can connect to master
+$node_standby->start;
+
+$start_lsn = $node_master->lsn('write');
+$node_master->wait_for_catchup($node_standby, 'replay', $start_lsn);
+
+$node_standby->stop;
+
+# Set max_slot_wal_keep_size on master
+my $max_slot_wal_keep_size_mb = 3;
+$node_master->append_conf('postgresql.conf', qq(
+max_slot_wal_keep_size = ${max_slot_wal_keep_size_mb}MB
+));
+$node_master->reload;
+
+# The slot is in safe state. The remaing bytes should be almost just
+# (max_slot_wal_keep_size + 1) times as large as the segment size.
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, pg_size_pretty(remain) as remain FROM
pg_replication_slotsWHERE slot_name = 'rep1'");
 
+is($result, "$start_lsn|streaming|4096 kB", 'check that remaining bytes is calculated');
+
+# Advance WAL again then checkpoint
+advance_wal($node_master, 2);
+$node_master->safe_psql('postgres', "CHECKPOINT;");
+
+# The slot is still working.
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, pg_size_pretty(remain) as remain FROM
pg_replication_slotsWHERE slot_name = 'rep1'");
 
+is($result, "$start_lsn|streaming|2048 kB", 'remaining byte should be reduced by 2MB');
+
+
+# wal_keep_segments can override
+$result = $node_master->safe_psql('postgres', "ALTER SYSTEM SET wal_keep_segments to 6; SELECT pg_reload_conf();");
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, pg_size_pretty(remain) as remain FROM
pg_replication_slotsWHERE slot_name = 'rep1'");
 
+is($result, "$start_lsn|streaming|5120 kB", 'check that wal_keep_segments overrides');
+
+# restore wal_keep_segments (no test)
+$result = $node_master->safe_psql('postgres', "ALTER SYSTEM SET wal_keep_segments to 0; SELECT pg_reload_conf();");
+
+# Advance WAL again without checkpoint
+advance_wal($node_master, 2);
+
+# Slot gets to 'keeping' state
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, pg_size_pretty(remain) as remain FROM
pg_replication_slotsWHERE slot_name = 'rep1'");
 
+is($result, "$start_lsn|keeping|0 bytes", 'check that some segments are about to be removed');
+
+# The stanby still can connect to master
+$node_standby->start;
+
+$start_lsn = $node_master->lsn('write');
+$node_master->wait_for_catchup($node_standby, 'replay', $start_lsn);
+
+$node_standby->stop;
+
+ok(!find_in_log($node_standby,
+                "requested WAL segment [0-9A-F]+ has already been removed"),
+   'check that no replication failure is caused by insecure state');
+
+# Advance WAL again, the slot loses some segments.
+my $logstart = get_log_size($node_master);
+advance_wal($node_master, 5);
+$node_master->safe_psql('postgres', "CHECKPOINT;");
+
+# WARNING should be issued
+ok(find_in_log($node_master,
+               "some replication slots have lost required WAL segments\n".
+               ".*Slot rep1 lost 2 segment\\(s\\)\\.",
+               $logstart),
+   'check that warning is correctly logged');
+
+# This slot should be broken
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, pg_size_pretty(remain) as remain FROM
pg_replication_slotsWHERE slot_name = 'rep1'");
 
+is($result, "$start_lsn|lost|0 bytes", 'check that overflown segments have been removed');
+
+# The stanby no longer can connect to the master
+$logstart = get_log_size($node_standby);
+$node_standby->start;
+
+my $failed = 0;
+for (my $i = 0 ; $i < 10000 ; $i++)
+{
+    if (find_in_log($node_standby,
+                    "requested WAL segment [0-9A-F]+ has already been removed",
+                    $logstart))
+    {
+        $failed = 1;
+        last;
+    }
+    usleep(100_000);
+}
+ok($failed, 'check that replication has been broken');
+
+$node_standby->stop;
+
+#####################################
+# Advance WAL of $node by $n segments
+sub advance_wal
+{
+    my ($node, $n) = @_;
+
+    # Advance by $n segments (= (16 * $n) MB) on master
+    for (my $i = 0 ; $i < $n ; $i++)
+    {
+        $node->safe_psql('postgres', "CREATE TABLE t (); DROP TABLE t; SELECT pg_switch_wal();");
+    }
+}
+
+# return the size of logfile of $node in bytes
+sub get_log_size
+{
+    my ($node) = @_;
+
+    return (stat $node->logfile)[7];
+}
+
+# find $pat in logfile of $node after $off-th byte
+sub find_in_log
+{
+    my ($node, $pat, $off) = @_;
+
+    $off = 0 unless defined $off;
+    my $log = TestLib::slurp_file($node->logfile);
+    return 0 if (length($log) <= $off);
+
+    $log = substr($log, $off);
+
+    return $log =~ m/$pat/;
+}
-- 
2.16.3

From 6d456a2b8eeecd34b630816ca3388c7e1d8d68af Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 11 Jan 2018 15:00:32 +0900
Subject: [PATCH 5/6] Documentation for slot-limit feature

---
 doc/src/sgml/catalogs.sgml          | 28 ++++++++++++++++++++++++++++
 doc/src/sgml/config.sgml            | 23 +++++++++++++++++++++++
 doc/src/sgml/high-availability.sgml |  8 +++++---
 3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index af4d0625ea..7ec8764ce5 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9825,6 +9825,34 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>wal_status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+
+      <entry>Availability of WAL records claimed by this
+      slot. <literal>streaming</literal>, <literal>keeping</literal>,
+      <literal>lost</literal>
+      or <literal>unknown</literal>. <literal>streaming</literal> means that
+      the claimed records are available. <literal>keeping</literal> means that
+      some of them are to be removed by the next checkpoint.
+      <literal>lost</literal> means that some of them are no longer
+      available. The last two states are seen only when
+      <xref linkend="guc-max-slot-wal-keep-size"/> is
+      non-negative. If <structfield>restart_lsn</structfield> is NULL, this
+      field is <literal>unknown</literal>.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>remain</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>The amount in bytes that WAL location (LSN) can advance until
+        this slot may lose required WAL records.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4a7121a51f..3d034ac0d1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3531,6 +3531,29 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
        </listitem>
       </varlistentry>
 
+      <varlistentry id="guc-max-slot-wal-keep-size" xreflabel="max_slot_wal_keep_size">
+       <term><varname>max_slot_wal_keep_size</varname> (<type>integer</type>)
+       <indexterm>
+        <primary><varname>max_slot_wal_keep_size</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+       <para>
+        Specify the maximum size of WAL files
+        that <link linkend="streaming-replication-slots">replication
+        slots</link> are allowed to retain in the <filename>pg_wal</filename>
+        directory at checkpoint time.
+        If <varname>max_slot_wal_keep_size</varname> is -1 (the default),
+        replication slots retain unlimited size of WAL files.  If restart_lsn
+        of a replication slot gets behind more than that bytes from the
+        current LSN, the standby using the slot may no longer be able to
+        reconnect due to removal of required WAL records. You can see the WAL
+        availability of replication slots
+        in <link linkend="view-pg-replication-slots">pg_replication_slots</link>.
+       </para>
+       </listitem>
+      </varlistentry>
+
      <varlistentry id="guc-wal-sender-timeout" xreflabel="wal_sender_timeout">
       <term><varname>wal_sender_timeout</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index d8fd195da0..e30eaaeebe 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -925,9 +925,11 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     <xref linkend="guc-archive-command"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
-    known to be needed.  An advantage of these methods is that they bound
-    the space requirement for <literal>pg_wal</literal>; there is currently no way
-    to do this using replication slots.
+    known to be needed.  On the other hand, replication slots can retain so
+    many WAL segments that they fill up the space allocated
+    for <literal>pg_wal</literal>;
+    <xref linkend="guc-max-slot-wal-keep-size"/> limits the size of WAL files
+    retained by replication slots.
    </para>
    <para>
     Similarly, <xref linkend="guc-hot-standby-feedback"/>
-- 
2.16.3

From 44bdb5b92b651c524e6f901be4eaa0184714f04d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 26 Oct 2018 10:07:05 +0900
Subject: [PATCH 6/6] Check removal of in-reading segment file.

Checkpoint can remove or recycle a segment file while it is being read
by ReadRecord during logical decoding. This patch checks for the case
and error out immedaitely.  Reading recycled file is basically safe
and inconsistency caused by overwrites as new segment will be caught
by page/record validation. So this is only for keeping consistency
with the wal_status shown in pg_replication_slots.
---
 src/backend/access/transam/xlogreader.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index c5e019bf77..117710c55b 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -26,6 +26,7 @@
 #include "replication/origin.h"
 
 #ifndef FRONTEND
+#include "access/xlog.h"
 #include "utils/memutils.h"
 #endif
 
@@ -224,7 +225,9 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
     uint32        pageHeaderSize;
     bool        gotheader;
     int            readOff;
-
+#ifndef FRONTEND    
+    XLogSegNo    targetSegNo;
+#endif
     /*
      * randAccess indicates whether to verify the previous-record pointer of
      * the record we're reading.  We only do this if we're reading
@@ -270,6 +273,21 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
     targetPagePtr = RecPtr - (RecPtr % XLOG_BLCKSZ);
     targetRecOff = RecPtr % XLOG_BLCKSZ;
 
+#ifndef FRONTEND
+    /*
+     * checkpoint can remove the segment currently looking for.  make sure the
+     * current segment still exists. we check this once per page. This cannot
+     * happen on frontend.
+     */
+    XLByteToSeg(targetPagePtr, targetSegNo, state->wal_segment_size);
+    if (targetSegNo <= XLogGetLastRemovedSegno())
+    {
+        report_invalid_record(state,
+                              "WAL segment for LSN %X/%X has been removed",
+                              (uint32)(RecPtr >> 32), (uint32) RecPtr);
+        goto err;
+    }
+#endif
     /*
      * Read the page containing the record into state->readBuf. Request enough
      * byte to cover the whole record header, or at least the part of it that
-- 
2.16.3


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gavin Flower
Дата:
Сообщение: Re: Using POPCNT and other advanced bit manipulation instructions
Следующее
От: "Ideriha, Takeshi"
Дата:
Сообщение: RE: Protect syscache from bloating with negative cache entries