Updating FSM on recovery
От | Heikki Linnakangas |
---|---|
Тема | Updating FSM on recovery |
Дата | |
Msg-id | 49072021.7010801@enterprisedb.com обсуждение исходный текст |
Ответы |
Re: Updating FSM on recovery
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: Updating FSM on recovery (Simon Riggs <simon@2ndQuadrant.com>) |
Список | pgsql-hackers |
The one remaining issue I'd like to address in the new FSM implementation is the fact that the FSM is currently not updated at all in WAL recovery. The old FSM wasn't updated on WAL recovery either, and was in fact completely thrown away if the system wasn't shut down cleanly. The difference is that after recovery, we used to start with no FSM information at all, and all inserts would have to extend the relations until the next vacuum, while now the inserts use the old data in the FSM. In case of a PITR recovery or warm stand-by, the FSM would information would come from the last base backup, which could be *very* old. The first inserter after the recovery might have to visit a lot of pages that the FSM claimed had free space, but didn't in reality, before finding a suitable target. In the absolutely worst case, where the table was almost empty when the base backup was taken, but is now full, it might have to visit every single heap page. That's not good. So we should try to update the FSM during recovery as well. It doesn't need to be very accurate, as the FSM information isn't accurate anyway, but we should try to avoid the worst case scenarios. The attached patch is my first attempt at that. Arbitrarily, if after a heap insert/update there's less than 20% of free space on the page, the FSM is updated. Compared to updating it every time, that saves a lot of overhead, while doing a pretty good job at marking full pages as full in the FSM. My first thought was to update the FSM if there isn't enough room on the page for a new tuple of the same size as the one just inserted; that would be pretty close to the logic we have during normal operation, where the FSM is updated when the tuple that we're about to insert doesn't fit on the page. But because we don't know the fillfactor during recovery, I don't think we can do reliably. One issue with this patch is that it doesn't update the FSM at all when pages are restored from full page images. It would require fetching the page and checking the free space on it, or peeking into the size of the backup block data, and I'm not sure if it's worth the extra code to do that. Thoughts? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 1d43b0b..a9bc17a 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -54,6 +54,7 @@ #include "miscadmin.h" #include "pgstat.h" #include "storage/bufmgr.h" +#include "storage/freespace.h" #include "storage/lmgr.h" #include "storage/procarray.h" #include "storage/smgr.h" @@ -4029,6 +4030,7 @@ heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record, bool clean_move) int nredirected; int ndead; int nunused; + Size freespace; if (record->xl_info & XLR_BKP_BLOCK_1) return; @@ -4068,6 +4070,15 @@ heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record, bool clean_move) PageSetLSN(page, lsn); PageSetTLI(page, ThisTimeLineID); MarkBufferDirty(buffer); + + /* + * update the FSM as well + * + * XXX: We don't get here if the page was restored from full page image + */ + freespace = PageGetHeapFreeSpace(page); + XLogRecordPageWithFreeSpace(xlrec->node, xlrec->block, freespace); + UnlockReleaseBuffer(buffer); } @@ -4212,6 +4223,7 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record) HeapTupleHeader htup; xl_heap_header xlhdr; uint32 newlen; + Size freespace; if (record->xl_info & XLR_BKP_BLOCK_1) return; @@ -4271,6 +4283,19 @@ heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record) PageSetLSN(page, lsn); PageSetTLI(page, ThisTimeLineID); MarkBufferDirty(buffer); + + /* + * If the page is running low on free space, update the FSM as well. + * Pretty arbitrarily, our definition of low is less than 20%. We can't + * do much better than that without knowing the fill-factor for the table. + * + * XXX: We don't get here if the page was restored from full page image + */ + freespace = PageGetHeapFreeSpace(page); + if (freespace < BLCKSZ / 5) + XLogRecordPageWithFreeSpace(xlrec->target.node, + BufferGetBlockNumber(buffer), freespace); + UnlockReleaseBuffer(buffer); } @@ -4296,6 +4321,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool move, bool hot_update) xl_heap_header xlhdr; int hsize; uint32 newlen; + Size freespace; if (record->xl_info & XLR_BKP_BLOCK_1) { @@ -4456,6 +4482,16 @@ newsame:; PageSetLSN(page, lsn); PageSetTLI(page, ThisTimeLineID); MarkBufferDirty(buffer); + + /* + * If the page is running low on free space, update the FSM as well. + * XXX: We don't get here if the page was restored from full page image + */ + freespace = PageGetHeapFreeSpace(page); + if (freespace < BLCKSZ / 5) + XLogRecordPageWithFreeSpace(xlrec->target.node, + BufferGetBlockNumber(buffer), freespace); + UnlockReleaseBuffer(buffer); } diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c index 17f733f..7aa72c9 100644 --- a/src/backend/storage/freespace/freespace.c +++ b/src/backend/storage/freespace/freespace.c @@ -203,6 +203,51 @@ RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk, Size spaceAvail) } /* + * XLogRecordPageWithFreeSpace - like RecordPageWithFreeSpace, for use in + * WAL replay + */ +void +XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk, + Size spaceAvail) +{ + int new_cat = fsm_space_avail_to_cat(spaceAvail); + FSMAddress addr; + uint16 slot; + BlockNumber blkno; + Buffer buf; + + /* Get the location of the FSM byte representing the heap block */ + addr = fsm_get_location(heapBlk, &slot); + blkno = fsm_logical_to_physical(addr); + + /* If the page doesn't exist already, extend */ + buf = XLogReadBufferWithFork(rnode, FSM_FORKNUM, blkno, false); + if (!BufferIsValid(buf)) + { + /* + * There's no direct way to tell XLogReadBuffer() that it's OK + * if the page doesn't exist. It will log it as an invalid page, + * and error at the end of WAL replay. To avoid that, lie to + * xlogutils.c that the file was in fact truncated, and initialize + * the page. + * + * XXX: Perhaps we should change XLogReadBufferWithFork() so that + * instead of the 'init' boolean argument, make it an an enum so + * that the third state means "silently extend the relation if the + * page doesn't exist". + */ + XLogTruncateRelation(rnode, FSM_FORKNUM, blkno); + buf = XLogReadBufferWithFork(rnode, FSM_FORKNUM, blkno, true); + PageInit(BufferGetPage(buf), BLCKSZ, 0); + } + Assert(BufferIsValid(buf)); + + if (fsm_set_avail(BufferGetPage(buf), slot, new_cat)) + MarkBufferDirty(buf); + UnlockReleaseBuffer(buf); +} + +/* * GetRecordedFreePage - return the amount of free space on a particular page, * according to the FSM. */ diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h index 7a1664f..e17a8d5 100644 --- a/src/include/storage/freespace.h +++ b/src/include/storage/freespace.h @@ -27,6 +27,8 @@ extern BlockNumber RecordAndGetPageWithFreeSpace(Relation rel, Size spaceNeeded); extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk, Size spaceAvail); +extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk, + Size spaceAvail); extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks); extern void FreeSpaceMapVacuum(Relation rel);
В списке pgsql-hackers по дате отправления: