Updating FSM on recovery

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Updating FSM on recovery
Дата
Msg-id 49072021.7010801@enterprisedb.com
обсуждение исходный текст
Ответы Re: Updating FSM on recovery  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Updating FSM on recovery  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
The one remaining issue I'd like to address in the new FSM
implementation is the fact that the FSM is currently not updated at all
in WAL recovery. The old FSM wasn't updated on WAL recovery either, and
was in fact completely thrown away if the system wasn't shut down
cleanly. The difference is that after recovery, we used to start with no
FSM information at all, and all inserts would have to extend the
relations until the next vacuum, while now the inserts use the old data
in the FSM. In case of a PITR recovery or warm stand-by, the FSM would
information would come from the last base backup, which could be *very* old.

The first inserter after the recovery might have to visit a lot of pages
that the FSM claimed had free space, but didn't in reality, before
finding a suitable target. In the absolutely worst case, where the table
was almost empty when the base backup was taken, but is now full, it
might have to visit every single heap page. That's not good.

So we should try to update the FSM during recovery as well. It doesn't
need to be very accurate, as the FSM information isn't accurate anyway,
but we should try to avoid the worst case scenarios.

The attached patch is my first attempt at that. Arbitrarily, if after a
heap insert/update there's less than 20% of free space on the page, the
FSM is updated. Compared to updating it every time, that saves a lot of
overhead, while doing a pretty good job at marking full pages as full in
the FSM. My first thought was to update the FSM if there isn't enough
room on the page for a new tuple of the same size as the one just
inserted; that would be pretty close to the logic we have during normal
operation, where the FSM is updated when the tuple that we're about to
insert doesn't fit on the page. But because we don't know the fillfactor
during recovery, I don't think we can do reliably.

One issue with this patch is that it doesn't update the FSM at all when
pages are restored from full page images. It would require fetching the
page and checking the free space on it, or peeking into the size of the
backup block data, and I'm not sure if it's worth the extra code to do that.

Thoughts?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1d43b0b..a9bc17a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -54,6 +54,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
+#include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "storage/procarray.h"
 #include "storage/smgr.h"
@@ -4029,6 +4030,7 @@ heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record, bool clean_move)
     int            nredirected;
     int            ndead;
     int            nunused;
+    Size        freespace;

     if (record->xl_info & XLR_BKP_BLOCK_1)
         return;
@@ -4068,6 +4070,15 @@ heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record, bool clean_move)
     PageSetLSN(page, lsn);
     PageSetTLI(page, ThisTimeLineID);
     MarkBufferDirty(buffer);
+
+    /*
+     * update the FSM as well
+     *
+     * XXX: We don't get here if the page was restored from full page image
+     */
+    freespace = PageGetHeapFreeSpace(page);
+    XLogRecordPageWithFreeSpace(xlrec->node, xlrec->block, freespace);
+
     UnlockReleaseBuffer(buffer);
 }

@@ -4212,6 +4223,7 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record)
     HeapTupleHeader htup;
     xl_heap_header xlhdr;
     uint32        newlen;
+    Size        freespace;

     if (record->xl_info & XLR_BKP_BLOCK_1)
         return;
@@ -4271,6 +4283,19 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record)
     PageSetLSN(page, lsn);
     PageSetTLI(page, ThisTimeLineID);
     MarkBufferDirty(buffer);
+
+    /*
+     * If the page is running low on free space, update the FSM as well.
+     * Pretty arbitrarily, our definition of low is less than 20%. We can't
+     * do much better than that without knowing the fill-factor for the table.
+     *
+     * XXX: We don't get here if the page was restored from full page image
+     */
+    freespace = PageGetHeapFreeSpace(page);
+    if (freespace < BLCKSZ / 5)
+        XLogRecordPageWithFreeSpace(xlrec->target.node,
+                                    BufferGetBlockNumber(buffer), freespace);
+
     UnlockReleaseBuffer(buffer);
 }

@@ -4296,6 +4321,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool move, bool hot_update)
     xl_heap_header xlhdr;
     int            hsize;
     uint32        newlen;
+    Size        freespace;

     if (record->xl_info & XLR_BKP_BLOCK_1)
     {
@@ -4456,6 +4482,16 @@ newsame:;
     PageSetLSN(page, lsn);
     PageSetTLI(page, ThisTimeLineID);
     MarkBufferDirty(buffer);
+
+    /*
+     * If the page is running low on free space, update the FSM as well.
+     * XXX: We don't get here if the page was restored from full page image
+     */
+    freespace = PageGetHeapFreeSpace(page);
+    if (freespace < BLCKSZ / 5)
+        XLogRecordPageWithFreeSpace(xlrec->target.node,
+                                    BufferGetBlockNumber(buffer), freespace);
+
     UnlockReleaseBuffer(buffer);
 }

diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 17f733f..7aa72c9 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -203,6 +203,51 @@ RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk, Size spaceAvail)
 }

 /*
+ * XLogRecordPageWithFreeSpace - like RecordPageWithFreeSpace, for use in
+ *        WAL replay
+ */
+void
+XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
+                            Size spaceAvail)
+{
+    int            new_cat = fsm_space_avail_to_cat(spaceAvail);
+    FSMAddress    addr;
+    uint16        slot;
+    BlockNumber blkno;
+    Buffer        buf;
+
+    /* Get the location of the FSM byte representing the heap block */
+    addr = fsm_get_location(heapBlk, &slot);
+    blkno = fsm_logical_to_physical(addr);
+
+    /* If the page doesn't exist already, extend */
+    buf = XLogReadBufferWithFork(rnode, FSM_FORKNUM, blkno, false);
+    if (!BufferIsValid(buf))
+    {
+        /*
+         * There's no direct way to tell XLogReadBuffer() that it's OK
+         * if the page doesn't exist. It will log it as an invalid page,
+         * and error at the end of WAL replay. To avoid that, lie to
+         * xlogutils.c that the file was in fact truncated, and initialize
+         * the page.
+         *
+         * XXX: Perhaps we should change XLogReadBufferWithFork() so that
+         * instead of the 'init' boolean argument, make it an an enum so
+         * that the third state means "silently extend the relation if the
+         * page doesn't exist".
+         */
+        XLogTruncateRelation(rnode, FSM_FORKNUM, blkno);
+        buf = XLogReadBufferWithFork(rnode, FSM_FORKNUM, blkno, true);
+        PageInit(BufferGetPage(buf), BLCKSZ, 0);
+    }
+    Assert(BufferIsValid(buf));
+
+    if (fsm_set_avail(BufferGetPage(buf), slot, new_cat))
+        MarkBufferDirty(buf);
+    UnlockReleaseBuffer(buf);
+}
+
+/*
  * GetRecordedFreePage - return the amount of free space on a particular page,
  *        according to the FSM.
  */
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index 7a1664f..e17a8d5 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -27,6 +27,8 @@ extern BlockNumber RecordAndGetPageWithFreeSpace(Relation rel,
                               Size spaceNeeded);
 extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
                                     Size spaceAvail);
+extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
+                                        Size spaceAvail);

 extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
 extern void FreeSpaceMapVacuum(Relation rel);

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Optimizing tuplestore usage for SRFs
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Visibility map, partial vacuums