Re: min_safe_lsn column in pg_replication_slots view

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: min_safe_lsn column in pg_replication_slots view
Дата
Msg-id 20200618.152216.1889381045723164350.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: min_safe_lsn column in pg_replication_slots view  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Ответы Re: min_safe_lsn column in pg_replication_slots view  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Thanks for the comments.

At Wed, 17 Jun 2020 21:37:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> On 2020/06/15 16:35, Kyotaro Horiguchi wrote:
> > At Mon, 15 Jun 2020 13:44:31 +0900, Michael Paquier
> > <michael@paquier.xyz> wrote in
> >> On Mon, Jun 15, 2020 at 12:40:03PM +0900, Fujii Masao wrote:
> >>> BTW, I just wonder why each row in pg_replication_slots needs to have
> >>> min_safe_lsn column? Basically min_safe_lsn should be the same between
> >>> every replication slots.
> >>
> >> Indeed, that's confusing in its current shape.  I would buy putting
> >> this value into pg_replication_slots if there were some consistency of
> >> the data to worry about because of locking issues, but here this data
> >> is controlled within info_lck, which is out of the the repslot LW
> >> lock.  So I think that it is incorrect to put this data in this view
> >> and that we should remove it, and that instead we had better push for
> >> a system view that maps with the contents of XLogCtl.
> 
> Agreed. But as you know it's too late to do that for v13...
> So firstly I'd like to fix the issues in pg_replication_slots view,
> and then we can improve the things later for v14 if necessary.
> 
> 
> > It was once the difference from the safe_lsn to restart_lsn which is
> > distinct among slots. Then it was changed to the safe_lsn. I agree to
> > the discussion above, but it is needed anywhere since no one can know
> > the margin until the slot goes to the "lost" state without it.  (Note
> > that currently even wal_status and min_safe_lsn can be inconsistent in
> > a line.)
> > Just for the need for table-consistency and in-line consistency, we
> > could just remember the result of XLogGetLastRemovedSegno() around
> > taking info_lock in the function.  That doesn't make a practical
> > difference but makes the view look consistent.
> 
> Agreed. Thanks for the patch. Here are the review comments:
> 
> 
> Not only the last removed segment but also current write position
> should be obtain at the beginning of pg_get_replication_slots()
> and should be given to GetWALAvailability(), for the consistency?

You are right.  Though I faintly thought that I didn't need that since
WriteRecPtr doesn't move by so wide steps as removed_segment, actually
it moves.

> Even after applying your patch, min_safe_lsn is calculated for
> each slot even though the calculated result is always the same.
> Which is a bit waste of cycle. We should calculate min_safe_lsn
> at the beginning and use it for each slot?

Agreed. That may results in a wastful calculation but it's better than
repeated wasteful calculations.

>             XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
> 
> Isn't it better to use 1 as the second argument of the above,
> in order to address the issue that I reported upthread?
> Otherwise, the WAL file name that pg_walfile_name(min_safe_lsn)
> returns
> would be confusing.

Mmm. pg_walfile_name seems too specialize to
pg_stop_backup(). (pg_walfile_name_offset() returns wrong result for
segment boundaries.)  I'm not willing to do that only to follow such
suspicious(?) specification, but surely it would practically be better
doing that. Please find the attached first patch.  I found that
there's no reason to hide min_safe_lsn when wal_status has certain
values. So I changed it to be shown always.

By the way, I noticed that when a replication slot reserves all
existing WAL segments, checkpoint cannot remove a file and
lastRemovedSegment is left being 0. The second attached forces
RemoveOldXlogFiles to initialize the variable even when no WAL
segments are removed.  It puts no additional loads on file system
since the directory is scanned anyway.  My old proposal to
unconditionally initialize it separately from checkpoint was rejected,
but I think this is acceptable.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 6742e3b31ac53ef54458d64e34feeac7d4c710d2 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Thu, 18 Jun 2020 13:36:23 +0900
Subject: [PATCH v2 1/2] Make pg_stat_replication view consistent

min_safe_lsn is not consistent within the result of a query. Make it
consistent. On the way fixing that min_safe_lsn is changed to show the
second byte of a segment instead of first byte in order to users can
get the intended segment file name from pg_walfile_name.
---
 src/backend/access/transam/xlog.c   | 15 +++++++------
 src/backend/replication/slotfuncs.c | 35 ++++++++++++++++++++---------
 src/include/access/xlog.h           |  4 +++-
 3 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 55cac186dc..940f5fcb18 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9484,6 +9484,9 @@ CreateRestartPoint(int flags)
  * Report availability of WAL for the given target LSN
  *        (typically a slot's restart_lsn)
  *
+ * currWriteLSN and lastRemovedSeg is the results of XLogGetLastRemovedSegno()
+ * and GetXLogWriteRecPtr() respectively at the referring time.
+ *
  * Returns one of the following enum values:
  * * WALAVAIL_NORMAL means targetLSN is available because it is in the range
  *   of max_wal_size.
@@ -9498,9 +9501,9 @@ CreateRestartPoint(int flags)
  * * WALAVAIL_INVALID_LSN means the slot hasn't been set to reserve WAL.
  */
 WALAvailability
-GetWALAvailability(XLogRecPtr targetLSN)
+GetWALAvailability(XLogRecPtr targetLSN, XLogRecPtr currWriteLSN,
+                   XLogSegNo lastRemovedSeg)
 {
-    XLogRecPtr    currpos;        /* current write LSN */
     XLogSegNo    currSeg;        /* segid of currpos */
     XLogSegNo    targetSeg;        /* segid of targetLSN */
     XLogSegNo    oldestSeg;        /* actual oldest segid */
@@ -9513,21 +9516,19 @@ GetWALAvailability(XLogRecPtr targetLSN)
     if (XLogRecPtrIsInvalid(targetLSN))
         return WALAVAIL_INVALID_LSN;
 
-    currpos = GetXLogWriteRecPtr();
-
     /* calculate oldest segment currently needed by slots */
     XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
-    KeepLogSeg(currpos, &oldestSlotSeg);
+    KeepLogSeg(currWriteLSN, &oldestSlotSeg);
 
     /*
      * Find the oldest extant segment file. We get 1 until checkpoint removes
      * the first WAL segment file since startup, which causes the status being
      * wrong under certain abnormal conditions but that doesn't actually harm.
      */
-    oldestSeg = XLogGetLastRemovedSegno() + 1;
+    oldestSeg = lastRemovedSeg + 1;
 
     /* calculate oldest segment by max_wal_size and wal_keep_segments */
-    XLByteToSeg(currpos, currSeg, wal_segment_size);
+    XLByteToSeg(currWriteLSN, currSeg, wal_segment_size);
     keepSegs = ConvertToXSegs(Max(max_wal_size_mb, wal_keep_segments),
                               wal_segment_size) + 1;
 
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 1b929a603e..063c6dd435 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -243,6 +243,9 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
     MemoryContext per_query_ctx;
     MemoryContext oldcontext;
     int            slotno;
+    XLogSegNo    last_removed_seg;
+    XLogRecPtr    curr_write_lsn;
+    XLogRecPtr    min_safe_lsn = 0;
 
     /* check to see if caller supports us returning a tuplestore */
     if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -272,6 +275,24 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
     rsinfo->setResult = tupstore;
     rsinfo->setDesc = tupdesc;
 
+    /*
+     * Remember the last removed segment and current WAL write LSN at this
+     * point for the consistency in this table. Since there's no interlock
+     * between slot data and checkpointer, the segment can be removed
+     * in-between, but that doesn't make any practical difference.  Also we
+     * calculate safe_min_lsn here as the same value is shown for all slots.
+     */
+    last_removed_seg = XLogGetLastRemovedSegno();
+    curr_write_lsn = GetXLogWriteRecPtr();
+
+    /*
+     * Show the next byte after segment boundary as min_safe_lsn so that
+     * pg_walfile_name() returns the correct file name for the value.
+     */
+    if (max_slot_wal_keep_size_mb >= 0 && last_removed_seg != 0)
+        XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 1,
+                                wal_segment_size, min_safe_lsn);
+
     MemoryContextSwitchTo(oldcontext);
 
     LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
@@ -282,7 +303,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
         Datum        values[PG_GET_REPLICATION_SLOTS_COLS];
         bool        nulls[PG_GET_REPLICATION_SLOTS_COLS];
         WALAvailability walstate;
-        XLogSegNo    last_removed_seg;
         int            i;
 
         if (!slot->in_use)
@@ -342,7 +362,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
         else
             nulls[i++] = true;
 
-        walstate = GetWALAvailability(slot_contents.data.restart_lsn);
+        walstate = GetWALAvailability(slot_contents.data.restart_lsn,
+                                      curr_write_lsn, last_removed_seg);
 
         switch (walstate)
         {
@@ -366,16 +387,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
                 elog(ERROR, "invalid walstate: %d", (int) walstate);
         }
 
-        if (max_slot_wal_keep_size_mb >= 0 &&
-            (walstate == WALAVAIL_NORMAL || walstate == WALAVAIL_RESERVED) &&
-            ((last_removed_seg = XLogGetLastRemovedSegno()) != 0))
-        {
-            XLogRecPtr    min_safe_lsn;
-
-            XLogSegNoOffsetToRecPtr(last_removed_seg + 1, 0,
-                                    wal_segment_size, min_safe_lsn);
+        if (min_safe_lsn != 0)
             values[i++] = Int64GetDatum(min_safe_lsn);
-        }
         else
             nulls[i++] = true;
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index e917dfe92d..bbe00e7195 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -326,7 +326,9 @@ extern void ShutdownXLOG(int code, Datum arg);
 extern void InitXLOGAccess(void);
 extern void CreateCheckPoint(int flags);
 extern bool CreateRestartPoint(int flags);
-extern WALAvailability GetWALAvailability(XLogRecPtr restart_lsn);
+extern WALAvailability GetWALAvailability(XLogRecPtr targetLSN,
+                                          XLogRecPtr currWriteLSN,
+                                          XLogSegNo last_removed_seg);
 extern XLogRecPtr CalculateMaxmumSafeLSN(void);
 extern void XLogPutNextOid(Oid nextOid);
 extern XLogRecPtr XLogRestorePoint(const char *rpName);
-- 
2.18.4

From 10425b3c062ed3972c0ad067adb7c14c5dca43c8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Thu, 18 Jun 2020 14:49:48 +0900
Subject: [PATCH v2 2/2] Forcibly initialize lastRemovedSegNo at the first
 checkpoint

If we have a replication slot retaining all existing WAL segments, a
checkpoint doesn't initialize lastRemovedSegNo. That means
pg_replication_slots can not show min_safe_lsn until any slot loses a
segment. Forcibly initialize it even if no WAL segments are removed at
the first checkpoint.
---
 src/backend/access/transam/xlog.c | 33 +++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 940f5fcb18..282bac32ab 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4025,6 +4025,16 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr)
     DIR           *xldir;
     struct dirent *xlde;
     char        lastoff[MAXFNAMELEN];
+    char        oldest[MAXFNAMELEN];
+    bool        remember_oldest = false;
+
+    /*
+     * If we haven't set the initial last removed segno, force to set it even
+     * if we wouldn't have removed any segments.
+     */
+    oldest[0] = 0;
+    if (XLogGetLastRemovedSegno() == 0)
+        remember_oldest = true;
 
     /*
      * Construct a filename of the last segment to be kept. The timeline ID
@@ -4062,10 +4072,33 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr)
             {
                 /* Update the last removed location in shared memory first */
                 UpdateLastRemovedPtr(xlde->d_name);
+                remember_oldest = false;
 
                 RemoveXlogFile(xlde->d_name, lastredoptr, endptr);
             }
         }
+        /* Remember the oldest file left in the directotry */
+        else if(remember_oldest &&
+                (oldest[0] == 0 || strcmp(xlde->d_name + 8, oldest + 8) < 0))
+            strncpy(oldest, xlde->d_name, MAXFNAMELEN);
+    }
+
+    /*
+     * We haven't initialize the pointer, initialize it.
+     * The last removed pointer is the oldest existing segment minus 1.
+     */
+    if (remember_oldest && oldest[0] != 0)
+    {
+        uint32        tli;
+        XLogSegNo    segno;
+
+        XLogFromFileName(oldest, &tli, &segno, wal_segment_size);
+
+        if (segno > 1)
+        {
+            XLogFileName(oldest, tli, segno - 1, wal_segment_size);
+            UpdateLastRemovedPtr(oldest);
+        }
     }
 
     FreeDir(xldir);
-- 
2.18.4


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: Fast DSM segments
Следующее
От: "李杰(慎追)"
Дата:
Сообщение: 回复:回复:回复:回复:how to create index concurrently on partitioned table