Re: Unexpected page allocation behavior on insert-only tables

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Unexpected page allocation behavior on insert-only tables
Дата
Msg-id 19116.1275338859@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Unexpected page allocation behavior on insert-only tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Unexpected page allocation behavior on insert-only tables  (Bruce Momjian <bruce@momjian.us>)
Re: Unexpected page allocation behavior on insert-only tables  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
I wrote:
> In particular, now that there's a distinction between smgr flush
> and relcache flush, maybe we could associate targblock reset with
> smgr flush (only) and arrange to not flush the smgr level during
> ANALYZE --- basically, smgr flush would only be needed when truncating
> or reassigning the relfilenode.  I think this might work out nicely but
> haven't chased the details.

I looked into that a bit more and decided that it'd be a ticklish
change: the coupling between relcache and smgr cache is pretty tight,
and there just isn't any provision for having an smgr cache entry live
longer than its owning relcache entry.  Even if we could fix it to
work reliably, this approach does nothing for the case where a backend
actually exits after filling just part of a new page, as noted by
Takahiro-san.

The next most promising fix is to have RelationGetBufferForTuple tell
the FSM about the new page immediately on creation.  I made a draft
patch for that (attached).  It fixes Michael's scenario nicely ---
all pages get filled completely --- and a simple test with pgbench
didn't reveal any obvious change in performance.  However there is
clear *potential* for performance loss, due to both the extra FSM
access and the potential for increased contention because of multiple
backends piling into the same new page.  So it would be good to do
some real performance testing on insert-heavy scenarios before we
consider applying this.  Any volunteers?

Note: patch is against HEAD but should work in 8.4, if you reverse out
the use of the rd_targblock access macros.

            regards, tom lane

Index: src/backend/access/heap/hio.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/heap/hio.c,v
retrieving revision 1.78
diff -c -r1.78 hio.c
*** src/backend/access/heap/hio.c    9 Feb 2010 21:43:29 -0000    1.78
--- src/backend/access/heap/hio.c    31 May 2010 20:44:29 -0000
***************
*** 354,384 ****
       * is empty (this should never happen, but if it does we don't want to
       * risk wiping out valid data).
       */
      page = BufferGetPage(buffer);

      if (!PageIsNew(page))
          elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
!              BufferGetBlockNumber(buffer),
!              RelationGetRelationName(relation));

      PageInit(page, BufferGetPageSize(buffer), 0);

!     if (len > PageGetHeapFreeSpace(page))
      {
          /* We should not get here given the test at the top */
          elog(PANIC, "tuple is too big: size %lu", (unsigned long) len);
      }

      /*
       * Remember the new page as our target for future insertions.
-      *
-      * XXX should we enter the new page into the free space map immediately,
-      * or just keep it for this backend's exclusive use in the short run
-      * (until VACUUM sees it)?    Seems to depend on whether you expect the
-      * current backend to make more insertions or not, which is probably a
-      * good bet most of the time.  So for now, don't add it to FSM yet.
       */
!     RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer));

      return buffer;
  }
--- 354,388 ----
       * is empty (this should never happen, but if it does we don't want to
       * risk wiping out valid data).
       */
+     targetBlock = BufferGetBlockNumber(buffer);
      page = BufferGetPage(buffer);

      if (!PageIsNew(page))
          elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
!              targetBlock, RelationGetRelationName(relation));

      PageInit(page, BufferGetPageSize(buffer), 0);

!     pageFreeSpace = PageGetHeapFreeSpace(page);
!     if (len > pageFreeSpace)
      {
          /* We should not get here given the test at the top */
          elog(PANIC, "tuple is too big: size %lu", (unsigned long) len);
      }

      /*
+      * If using FSM, mark the page in FSM as having whatever amount of
+      * free space will be left after our insertion.  This is needed so that
+      * the free space won't be forgotten about if this backend doesn't use
+      * it up before exiting or flushing the rel's relcache entry.
+      */
+     if (use_fsm)
+         RecordPageWithFreeSpace(relation, targetBlock, pageFreeSpace - len);
+
+     /*
       * Remember the new page as our target for future insertions.
       */
!     RelationSetTargetBlock(relation, targetBlock);

      return buffer;
  }

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: why do we have rd_istemp?
Следующее
От: Jesper Krogh
Дата:
Сообщение: Re: bitmap-index-scan faster than seq-scan on full-table-scan (gin index)