Обсуждение: WIP: Page space reservation (pgupgrade)

Поиск
Список
Период
Сортировка

WIP: Page space reservation (pgupgrade)

От
Zdenek Kotala
Дата:
Attached patch allows to setup storage parameter for space reservation. I use
reloptions capability for it. You can use it:

CREATE TABLE test(id int) with (reservedspace=10);

The idea is to reduce freespace value about reservedspace on places where
PageGet(Heap)FreeSpace is called.

I need perform more tests on this patch, however I need feedback if it is
reasonable way. It seems to me that the patch could be backported without any
problem.

I have still following doubts/questions:

1) GiST - gist uses gistnospace() function to check correct amount of space.
unfortunately, it does not have information about Relation. The function is
called from:

gistContinueInsert(), gistplacetopage(), and gistVacuumUpdate().

It seems to me that better that modify this function should be modified these
callers (exclude gistContinueInsert see 2)

2) WAL - I do not modify freespace during WAL replay. I think that when
reservedspace is set, then WAL record cannot break a space reservation.

3) vacuum - PageGetFreeSpaceWithFillFactor

It look likes that vacuum uses fill factor to check possible free space on page,
but it does not try to free space on page to satisfy fill factor criteria. Is it
correct or I'm wrong?

        Thanks for your comments.
            Zdenek


--
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql

diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/access/common/reloptions.c
pgsql_spacereserve.c901d4bb8cca/src/backend/access/common/reloptions.c
*** pgsql_spacereserve.9e46c188067f/src/backend/access/common/reloptions.c    2008-11-08 23:19:47.795930570 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/access/common/reloptions.c    2008-11-08 23:19:47.910535376 +0100
***************
*** 286,330 ****
  default_reloptions(Datum reloptions, bool validate,
                     int minFillfactor, int defaultFillfactor)
  {
!     static const char *const default_keywords[1] = {"fillfactor"};
!     char       *values[1];
!     int            fillfactor;
      StdRdOptions *result;

!     parseRelOptions(reloptions, 1, default_keywords, values, validate);

      /*
       * If no options, we can just return NULL rather than doing anything.
       * (defaultFillfactor is thus not used, but we require callers to pass it
       * anyway since we would need it if more options were added.)
       */
!     if (values[0] == NULL)
          return NULL;

!     if (!parse_int(values[0], &fillfactor, 0, NULL))
      {
!         if (validate)
!             ereport(ERROR,
!                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                      errmsg("fillfactor must be an integer: \"%s\"",
!                             values[0])));
!         return NULL;
      }

!     if (fillfactor < minFillfactor || fillfactor > 100)
      {
!         if (validate)
!             ereport(ERROR,
!                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                      errmsg("fillfactor=%d is out of range (should be between %d and 100)",
!                             fillfactor, minFillfactor)));
!         return NULL;
      }

      result = (StdRdOptions *) palloc(sizeof(StdRdOptions));
      SET_VARSIZE(result, sizeof(StdRdOptions));

      result->fillfactor = fillfactor;

      return (bytea *) result;
  }
--- 286,360 ----
  default_reloptions(Datum reloptions, bool validate,
                     int minFillfactor, int defaultFillfactor)
  {
!     static const char *const default_keywords[2] = {"fillfactor","reservedspace"};
!     char       *values[2];
!     int            fillfactor=defaultFillfactor;
!     int            reservedspace=0;
      StdRdOptions *result;

!     parseRelOptions(reloptions, 2, default_keywords, values, validate);

      /*
       * If no options, we can just return NULL rather than doing anything.
       * (defaultFillfactor is thus not used, but we require callers to pass it
       * anyway since we would need it if more options were added.)
       */
!     if ((values[0] == NULL) && (values[1] == NULL))
          return NULL;

!     /* fill factor */
!     if (values[0] != NULL)
      {
!         if (!parse_int(values[0], &fillfactor, 0, NULL))
!         {
!             if (validate)
!                 ereport(ERROR,
!                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                          errmsg("fillfactor must be an integer: \"%s\"",
!                                 values[0])));
!             return NULL;
!         }
!
!         if (fillfactor < minFillfactor || fillfactor > 100)
!         {
!             if (validate)
!                 ereport(ERROR,
!                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                          errmsg("fillfactor=%d is out of range (should be between %d and 100)",
!                                 fillfactor, minFillfactor)));
!             return NULL;
!         }
      }

!     /* reserved space */
!     if (values[1] != NULL)
      {
!         if (!parse_int(values[1], &reservedspace, 0, NULL))
!         {
!             if (validate)
!                 ereport(ERROR,
!                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                          errmsg("reservedspace must be an integer: \"%s\"",
!                                 values[1])));
!             return NULL;
!         }
!
!         if (reservedspace < 0 || reservedspace > BLCKSZ/4)
!         {
!             if (validate)
!                 ereport(ERROR,
!                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                          errmsg("reservedspace=%d is out of range (should be between 0 and %d)",
!                                 reservedspace, BLCKSZ/4)));
!             return NULL;
!         }
      }

      result = (StdRdOptions *) palloc(sizeof(StdRdOptions));
      SET_VARSIZE(result, sizeof(StdRdOptions));

      result->fillfactor = fillfactor;
+     result->reservedspace = reservedspace;

      return (bytea *) result;
  }
diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/access/gin/ginentrypage.c
pgsql_spacereserve.c901d4bb8cca/src/backend/access/gin/ginentrypage.c
*** pgsql_spacereserve.9e46c188067f/src/backend/access/gin/ginentrypage.c    2008-11-08 23:19:47.799408407 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/access/gin/ginentrypage.c    2008-11-08 23:19:47.911560880 +0100
***************
*** 314,320 ****
          itupsz = MAXALIGN(IndexTupleSize(itup)) + sizeof(ItemIdData);
      }

!     if (PageGetFreeSpace(page) + itupsz >= MAXALIGN(IndexTupleSize(btree->entry)) + sizeof(ItemIdData))
          return true;

      return false;
--- 314,321 ----
          itupsz = MAXALIGN(IndexTupleSize(itup)) + sizeof(ItemIdData);
      }

!     if (PageGetFreeSpace(page) - RelationGetReservedSpace(btree->index) + itupsz
!              >= MAXALIGN(IndexTupleSize(btree->entry)) + sizeof(ItemIdData))
          return true;

      return false;
diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/access/hash/hashinsert.c
pgsql_spacereserve.c901d4bb8cca/src/backend/access/hash/hashinsert.c
*** pgsql_spacereserve.9e46c188067f/src/backend/access/hash/hashinsert.c    2008-11-08 23:19:47.802614678 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/access/hash/hashinsert.c    2008-11-08 23:19:47.912878140 +0100
***************
*** 106,112 ****
      Assert(pageopaque->hasho_bucket == bucket);

      /* Do the insertion */
!     while (PageGetFreeSpace(page) < itemsz)
      {
          /*
           * no space on this page; check for an overflow page
--- 106,112 ----
      Assert(pageopaque->hasho_bucket == bucket);

      /* Do the insertion */
!     while (PageGetFreeSpace(page)-RelationGetReservedSpace(rel) < itemsz)
      {
          /*
           * no space on this page; check for an overflow page
***************
*** 138,144 ****
              page = BufferGetPage(buf);

              /* should fit now, given test above */
!             Assert(PageGetFreeSpace(page) >= itemsz);
          }
          pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
          Assert(pageopaque->hasho_flag == LH_OVERFLOW_PAGE);
--- 138,144 ----
              page = BufferGetPage(buf);

              /* should fit now, given test above */
!             Assert(PageGetFreeSpace(page)-RelationGetReservedSpace(rel) >= itemsz);
          }
          pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
          Assert(pageopaque->hasho_flag == LH_OVERFLOW_PAGE);
diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/access/hash/hashovfl.c
pgsql_spacereserve.c901d4bb8cca/src/backend/access/hash/hashovfl.c
*** pgsql_spacereserve.9e46c188067f/src/backend/access/hash/hashovfl.c    2008-11-08 23:19:47.810863015 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/access/hash/hashovfl.c    2008-11-08 23:19:47.915444493 +0100
***************
*** 656,662 ****
               * Walk up the bucket chain, looking for a page big enough for
               * this item.  Exit if we reach the read page.
               */
!             while (PageGetFreeSpace(wpage) < itemsz)
              {
                  Assert(!PageIsEmpty(wpage));

--- 656,662 ----
               * Walk up the bucket chain, looking for a page big enough for
               * this item.  Exit if we reach the read page.
               */
!             while (PageGetFreeSpace(wpage)-RelationGetReservedSpace(rel) < itemsz)
              {
                  Assert(!PageIsEmpty(wpage));

diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/access/hash/hashpage.c
pgsql_spacereserve.c901d4bb8cca/src/backend/access/hash/hashpage.c
*** pgsql_spacereserve.9e46c188067f/src/backend/access/hash/hashpage.c    2008-11-08 23:19:47.821028802 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/access/hash/hashpage.c    2008-11-08 23:19:47.918694933 +0100
***************
*** 856,862 ****
              itemsz = IndexTupleDSize(*itup);
              itemsz = MAXALIGN(itemsz);

!             if (PageGetFreeSpace(npage) < itemsz)
              {
                  /* write out nbuf and drop lock, but keep pin */
                  _hash_chgbufaccess(rel, nbuf, HASH_WRITE, HASH_NOLOCK);
--- 856,862 ----
              itemsz = IndexTupleDSize(*itup);
              itemsz = MAXALIGN(itemsz);

!             if (PageGetFreeSpace(npage)-RelationGetReservedSpace(rel) < itemsz)
              {
                  /* write out nbuf and drop lock, but keep pin */
                  _hash_chgbufaccess(rel, nbuf, HASH_WRITE, HASH_NOLOCK);
diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/access/heap/heapam.c
pgsql_spacereserve.c901d4bb8cca/src/backend/access/heap/heapam.c
*** pgsql_spacereserve.9e46c188067f/src/backend/access/heap/heapam.c    2008-11-08 23:19:47.868737614 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/access/heap/heapam.c    2008-11-08 23:19:47.930477950 +0100
***************
*** 2593,2599 ****
                        HeapTupleHasExternal(newtup) ||
                        newtup->t_len > TOAST_TUPLE_THRESHOLD);

!     pagefree = PageGetHeapFreeSpace(page);

      newtupsize = MAXALIGN(newtup->t_len);

--- 2593,2599 ----
                        HeapTupleHasExternal(newtup) ||
                        newtup->t_len > TOAST_TUPLE_THRESHOLD);

!     pagefree = PageGetHeapFreeSpace(page)-RelationGetReservedSpace(relation);

      newtupsize = MAXALIGN(newtup->t_len);

***************
*** 2658,2664 ****
              /* Re-acquire the lock on the old tuple's page. */
              LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
              /* Re-check using the up-to-date free space */
!             pagefree = PageGetHeapFreeSpace(page);
              if (newtupsize > pagefree)
              {
                  /*
--- 2658,2664 ----
              /* Re-acquire the lock on the old tuple's page. */
              LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
              /* Re-check using the up-to-date free space */
!             pagefree = PageGetHeapFreeSpace(page)-RelationGetReservedSpace(relation);
              if (newtupsize > pagefree)
              {
                  /*
diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/access/heap/hio.c
pgsql_spacereserve.c901d4bb8cca/src/backend/access/heap/hio.c
*** pgsql_spacereserve.9e46c188067f/src/backend/access/heap/hio.c    2008-11-08 23:19:47.874146098 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/access/heap/hio.c    2008-11-08 23:19:47.932167154 +0100
***************
*** 269,275 ****
           * we're done.
           */
          page = BufferGetPage(buffer);
!         pageFreeSpace = PageGetHeapFreeSpace(page);
          if (len + saveFreeSpace <= pageFreeSpace)
          {
              /* use this page as future insert target, too */
--- 269,275 ----
           * we're done.
           */
          page = BufferGetPage(buffer);
!         pageFreeSpace = PageGetHeapFreeSpace(page)-RelationGetReservedSpace(relation);
          if (len + saveFreeSpace <= pageFreeSpace)
          {
              /* use this page as future insert target, too */
***************
*** 362,368 ****

      PageInit(page, BufferGetPageSize(buffer), 0);

!     if (len > PageGetHeapFreeSpace(page))
      {
          /* We should not get here given the test at the top */
          elog(PANIC, "tuple is too big: size %lu", (unsigned long) len);
--- 362,368 ----

      PageInit(page, BufferGetPageSize(buffer), 0);

!     if (len > PageGetHeapFreeSpace(page)-RelationGetReservedSpace(relation))
      {
          /* We should not get here given the test at the top */
          elog(PANIC, "tuple is too big: size %lu", (unsigned long) len);
diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/access/heap/pruneheap.c
pgsql_spacereserve.c901d4bb8cca/src/backend/access/heap/pruneheap.c
*** pgsql_spacereserve.9e46c188067f/src/backend/access/heap/pruneheap.c    2008-11-08 23:19:47.877897804 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/access/heap/pruneheap.c    2008-11-08 23:19:47.933351529 +0100
***************
*** 100,106 ****
                                               HEAP_DEFAULT_FILLFACTOR);
      minfree = Max(minfree, BLCKSZ / 10);

!     if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
      {
          /* OK, try to get exclusive buffer lock */
          if (!ConditionalLockBufferForCleanup(buffer))
--- 100,106 ----
                                               HEAP_DEFAULT_FILLFACTOR);
      minfree = Max(minfree, BLCKSZ / 10);

!     if (PageIsFull(page) || PageGetHeapFreeSpace(page)-RelationGetReservedSpace(relation) < minfree)
      {
          /* OK, try to get exclusive buffer lock */
          if (!ConditionalLockBufferForCleanup(buffer))
***************
*** 112,118 ****
           * prune. (We needn't recheck PageIsPrunable, since no one else could
           * have pruned while we hold pin.)
           */
!         if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
          {
              /* OK to prune (though not to remove redirects) */
              (void) heap_page_prune(relation, buffer, OldestXmin, false, true);
--- 112,118 ----
           * prune. (We needn't recheck PageIsPrunable, since no one else could
           * have pruned while we hold pin.)
           */
!         if (PageIsFull(page) || PageGetHeapFreeSpace(page)-RelationGetReservedSpace(relation) < minfree)
          {
              /* OK to prune (though not to remove redirects) */
              (void) heap_page_prune(relation, buffer, OldestXmin, false, true);
diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/access/heap/rewriteheap.c
pgsql_spacereserve.c901d4bb8cca/src/backend/access/heap/rewriteheap.c
*** pgsql_spacereserve.9e46c188067f/src/backend/access/heap/rewriteheap.c    2008-11-08 23:19:47.881338490 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/access/heap/rewriteheap.c    2008-11-08 23:19:47.934390304 +0100
***************
*** 600,606 ****
      /* Now we can check to see if there's enough free space already. */
      if (state->rs_buffer_valid)
      {
!         pageFreeSpace = PageGetHeapFreeSpace(page);

          if (len + saveFreeSpace > pageFreeSpace)
          {
--- 600,606 ----
      /* Now we can check to see if there's enough free space already. */
      if (state->rs_buffer_valid)
      {
!         pageFreeSpace = PageGetHeapFreeSpace(page)-RelationGetReservedSpace(state->rs_new_rel);

          if (len + saveFreeSpace > pageFreeSpace)
          {
diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/access/nbtree/nbtinsert.c
pgsql_spacereserve.c901d4bb8cca/src/backend/access/nbtree/nbtinsert.c
*** pgsql_spacereserve.9e46c188067f/src/backend/access/nbtree/nbtinsert.c    2008-11-08 23:19:47.891455345 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/access/nbtree/nbtinsert.c    2008-11-08 23:19:47.938184262 +0100
***************
*** 445,451 ****
       */
      movedright = false;
      vacuumed = false;
!     while (PageGetFreeSpace(page) < itemsz)
      {
          Buffer        rbuf;

--- 445,451 ----
       */
      movedright = false;
      vacuumed = false;
!     while (PageGetFreeSpace(page)-RelationGetReservedSpace(rel) < itemsz)
      {
          Buffer        rbuf;

***************
*** 463,469 ****
               */
              vacuumed = true;

!             if (PageGetFreeSpace(page) >= itemsz)
                  break;            /* OK, now we have enough space */
          }

--- 463,469 ----
               */
              vacuumed = true;

!             if (PageGetFreeSpace(page)-RelationGetReservedSpace(rel) >= itemsz)
                  break;            /* OK, now we have enough space */
          }

***************
*** 579,585 ****
       * so this comparison is correct even though we appear to be accounting
       * only for the item and not for its line pointer.
       */
!     if (PageGetFreeSpace(page) < itemsz)
      {
          bool        is_root = P_ISROOT(lpageop);
          bool        is_only = P_LEFTMOST(lpageop) && P_RIGHTMOST(lpageop);
--- 579,585 ----
       * so this comparison is correct even though we appear to be accounting
       * only for the item and not for its line pointer.
       */
!     if (PageGetFreeSpace(page)-RelationGetReservedSpace(rel) < itemsz)
      {
          bool        is_root = P_ISROOT(lpageop);
          bool        is_only = P_LEFTMOST(lpageop) && P_RIGHTMOST(lpageop);
diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/access/nbtree/nbtsort.c
pgsql_spacereserve.c901d4bb8cca/src/backend/access/nbtree/nbtsort.c
*** pgsql_spacereserve.9e46c188067f/src/backend/access/nbtree/nbtsort.c    2008-11-08 23:19:47.894508546 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/access/nbtree/nbtsort.c    2008-11-08 23:19:47.941241651 +0100
***************
*** 462,468 ****
      nblkno = state->btps_blkno;
      last_off = state->btps_lastoff;

!     pgspc = PageGetFreeSpace(npage);
      itupsz = IndexTupleDSize(*itup);
      itupsz = MAXALIGN(itupsz);

--- 462,468 ----
      nblkno = state->btps_blkno;
      last_off = state->btps_lastoff;

!     pgspc = PageGetFreeSpace(npage)-RelationGetReservedSpace(wstate->index);
      itupsz = IndexTupleDSize(*itup);
      itupsz = MAXALIGN(itupsz);

diff -Nrc pgsql_spacereserve.9e46c188067f/src/backend/commands/vacuumlazy.c
pgsql_spacereserve.c901d4bb8cca/src/backend/commands/vacuumlazy.c
*** pgsql_spacereserve.9e46c188067f/src/backend/commands/vacuumlazy.c    2008-11-08 23:19:47.898479299 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/backend/commands/vacuumlazy.c    2008-11-08 23:19:47.945127252 +0100
***************
*** 343,349 ****
                  PageInit(page, BufferGetPageSize(buf), 0);
                  empty_pages++;
              }
!             freespace = PageGetHeapFreeSpace(page);
              MarkBufferDirty(buf);
              UnlockReleaseBuffer(buf);

--- 343,349 ----
                  PageInit(page, BufferGetPageSize(buf), 0);
                  empty_pages++;
              }
!             freespace = PageGetHeapFreeSpace(page)-RelationGetReservedSpace(onerel);
              MarkBufferDirty(buf);
              UnlockReleaseBuffer(buf);

***************
*** 354,360 ****
          if (PageIsEmpty(page))
          {
              empty_pages++;
!             freespace = PageGetHeapFreeSpace(page);
              UnlockReleaseBuffer(buf);
              RecordPageWithFreeSpace(onerel, blkno, freespace);
              continue;
--- 354,360 ----
          if (PageIsEmpty(page))
          {
              empty_pages++;
!             freespace = PageGetHeapFreeSpace(page)-RelationGetReservedSpace(onerel);
              UnlockReleaseBuffer(buf);
              RecordPageWithFreeSpace(onerel, blkno, freespace);
              continue;
***************
*** 524,530 ****
              vacuumed_pages++;
          }

!         freespace = PageGetHeapFreeSpace(page);

          /* Remember the location of the last page with nonremovable tuples */
          if (hastup)
--- 524,530 ----
              vacuumed_pages++;
          }

!         freespace = PageGetHeapFreeSpace(page)-RelationGetReservedSpace(onerel);

          /* Remember the location of the last page with nonremovable tuples */
          if (hastup)
***************
*** 626,632 ****

          /* Now that we've compacted the page, record its available space */
          page = BufferGetPage(buf);
!         freespace = PageGetHeapFreeSpace(page);

          UnlockReleaseBuffer(buf);
          RecordPageWithFreeSpace(onerel, tblk, freespace);
--- 626,632 ----

          /* Now that we've compacted the page, record its available space */
          page = BufferGetPage(buf);
!         freespace = PageGetHeapFreeSpace(page)-RelationGetReservedSpace(onerel);

          UnlockReleaseBuffer(buf);
          RecordPageWithFreeSpace(onerel, tblk, freespace);
diff -Nrc pgsql_spacereserve.9e46c188067f/src/include/utils/rel.h
pgsql_spacereserve.c901d4bb8cca/src/include/utils/rel.h
*** pgsql_spacereserve.9e46c188067f/src/include/utils/rel.h    2008-11-08 23:19:47.900403231 +0100
--- pgsql_spacereserve.c901d4bb8cca/src/include/utils/rel.h    2008-11-08 23:19:47.946913616 +0100
***************
*** 214,219 ****
--- 214,220 ----
  {
      int32        vl_len_;        /* varlena header (do not touch directly!) */
      int            fillfactor;        /* page fill factor in percent (0..100) */
+     int            reservedspace;  /* page reserved space for in-place upgrade in bytes */
  } StdRdOptions;

  #define HEAP_MIN_FILLFACTOR            10
***************
*** 228,233 ****
--- 229,242 ----
       ((StdRdOptions *) (relation)->rd_options)->fillfactor : (defaultff))

  /*
+  * RelationGetReservedSpace
+  *        Returns the relation's reserved space.
+  */
+ #define RelationGetReservedSpace(relation) \
+     ((relation)->rd_options ? \
+      ((StdRdOptions *) (relation)->rd_options)->reservedspace : 0)
+
+ /*
   * RelationGetTargetPageUsage
   *        Returns the relation's desired space usage per page in bytes.
   */

Re: WIP: Page space reservation (pgupgrade)

От
Tom Lane
Дата:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> Attached patch allows to setup storage parameter for space
> reservation.

What is the point of this?  We don't need it for 8.3->8.4, we aren't
going to back-patch such a thing into 8.2 or before (certainly not
before, since reloptions didn't exist before 8.2), and I would hope we
put in a more general solution than this for 8.4-to-later preparatory
fixes.
        regards, tom lane


Re: WIP: Page space reservation (pgupgrade)

От
"Jonah H. Harris"
Дата:
On Sat, Nov 8, 2008 at 8:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
>> Attached patch allows to setup storage parameter for space
>> reservation.
>
> What is the point of this?

That's my question.  Why is this needed at all?

-- 
Jonah H. Harris, Senior DBA
myYearbook.com


Re: WIP: Page space reservation (pgupgrade)

От
Decibel!
Дата:
On Nov 8, 2008, at 8:35 PM, Jonah H. Harris wrote:
> On Sat, Nov 8, 2008 at 8:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
>>> Attached patch allows to setup storage parameter for space
>>> reservation.
>>
>> What is the point of this?
>
> That's my question.  Why is this needed at all?

I suspect this is to deal with needing to reserve space in a cluster  
that you're planning on upgrading to a new version that would take  
more space, but I think the implementation is probably too simplistic.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828




Re: WIP: Page space reservation (pgupgrade)

От
"Jonah H. Harris"
Дата:
On Sun, Nov 9, 2008 at 7:55 PM, Decibel! <decibel@decibel.org> wrote:
> On Nov 8, 2008, at 8:35 PM, Jonah H. Harris wrote:
>> That's my question.  Why is this needed at all?
>
> I suspect this is to deal with needing to reserve space in a cluster that
> you're planning on upgrading to a new version that would take more space,
> but I think the implementation is probably too simplistic.

Well, if that's what it is, I think it's a fairly poor design
decision.  When I upgrade Oracle, SQL Server, or MySQL, I don't need
to plan the amount of free space in my blocks a year or more before an
upgrade.  In fact, I don't have to plan it at all... it's completely
handled by the in-place upgrade.

-- 
Jonah H. Harris, Senior DBA
myYearbook.com


Re: WIP: Page space reservation (pgupgrade)

От
"Robert Haas"
Дата:
> Well, if that's what it is, I think it's a fairly poor design
> decision.  When I upgrade Oracle, SQL Server, or MySQL, I don't need
> to plan the amount of free space in my blocks a year or more before an
> upgrade.  In fact, I don't have to plan it at all... it's completely
> handled by the in-place upgrade.

Well, I think the proposal is that you would change the amount of free
space in your blocks immediately prior to performing the upgrade, but
I still think it's a poor decision to make in-place upgrade dependent
on support from the OLD version of the code.

Let's suppose, for example, that in 8.5 we decide to change some type
that is presently 16 bits to 32 bits, or 8 bits to 16 bits, etc.  This
will make some tuples bigger, and potentially much bigger, but since
it presumably won't be a commonly used data-type, most tuples won't
change at all.  However, the worst case scenario for how much free
space you might need to reserve will be very bad, and therefore a
mechanism that allows reserving a fixed amount of free space per page
won't be adequate.

Now, maybe someone will come up with some reason why that particular
example is unlikely or impossible or can be worked around.  But there
are LOTS of imaginable scenarios that could cause unpredictable
amounts of page expansion or contraction, and I find it hard to
believe that we can be confident that NONE of those things will EVER
happen.

...Robert


Re: WIP: Page space reservation (pgupgrade)

От
Zdenek Kotala
Дата:
Jonah H. Harris napsal(a):
> On Sun, Nov 9, 2008 at 7:55 PM, Decibel! <decibel@decibel.org> wrote:
>> On Nov 8, 2008, at 8:35 PM, Jonah H. Harris wrote:
>>> That's my question.  Why is this needed at all?
>> I suspect this is to deal with needing to reserve space in a cluster that
>> you're planning on upgrading to a new version that would take more space,
>> but I think the implementation is probably too simplistic.
> 
> Well, if that's what it is, I think it's a fairly poor design
> decision.  When I upgrade Oracle, SQL Server, or MySQL, I don't need
> to plan the amount of free space in my blocks a year or more before an
> upgrade.  In fact, I don't have to plan it at all... it's completely
> handled by the in-place upgrade.

It will be handled by PostgreSQL as well. The patch is about mechanism and 
configuration which will be used by in-place upgrade without any user activity. 
Only what user will have to do is run pg_upgrade_prepare() or something like 
this and when database will be ready then user can do upgrade.
    Zdenek




Re: WIP: Page space reservation (pgupgrade)

От
Martijn van Oosterhout
Дата:
On Sun, Nov 09, 2008 at 11:28:50PM -0500, Robert Haas wrote:
> > Well, if that's what it is, I think it's a fairly poor design
> > decision.  When I upgrade Oracle, SQL Server, or MySQL, I don't need
> > to plan the amount of free space in my blocks a year or more before an
> > upgrade.  In fact, I don't have to plan it at all... it's completely
> > handled by the in-place upgrade.
>
> Well, I think the proposal is that you would change the amount of free
> space in your blocks immediately prior to performing the upgrade, but
> I still think it's a poor decision to make in-place upgrade dependent
> on support from the OLD version of the code.

ISTM the only reason why people are talking about page reservation is
because people don't like the idea of an 8.4 backend being able to read
8.3 tuples without converting the whole page. If there's no hard
dealine on the page conversion then you can let the 8.4 vacuum deal
with it. Maybe it will take a few runs but it should get there
eventually.

I think you could probably sell page-reservation as: if you do this
then upgrades will happen quicker but you shouldn't rely on people doing
it as there are corner cases where it won't work.

But if you ask people with multi-TB database whether they'll take a 1%
CPU performance hit for never having to dump/restore again, which do
you think they'll choose? For an I/O bound database the choice is easy.

And if the performance is such a big deal provide two binaries, one to
run during the upgrade and one once the upgrade is complete.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: WIP: Page space reservation (pgupgrade)

От
Zdenek Kotala
Дата:
Robert Haas napsal(a):

> Let's suppose, for example, that in 8.5 we decide to change some type
> that is presently 16 bits to 32 bits, or 8 bits to 16 bits, etc.  This
> will make some tuples bigger, and potentially much bigger, but since
> it presumably won't be a commonly used data-type, most tuples won't
> change at all.  However, the worst case scenario for how much free
> space you might need to reserve will be very bad, and therefore a
> mechanism that allows reserving a fixed amount of free space per page
> won't be adequate.

The problem with datatypes is different story. It is should be easy to manage
this problem with keeping the old datatype definition for "old" tables and for 
new create new datatype with new OID. You can use ALTER TABLE for converting 
data from old type to the new one.
    Zdenek


Re: WIP: Page space reservation (pgupgrade)

От
Zdenek Kotala
Дата:
Tom Lane napsal(a):
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
>> Attached patch allows to setup storage parameter for space
>> reservation.
> 
> What is the point of this?  We don't need it for 8.3->8.4, we aren't
> going to back-patch such a thing into 8.2 or before (certainly not
> before, since reloptions didn't exist before 8.2), and I would hope we
> put in a more general solution than this for 8.4-to-later preparatory
> fixes.

There is whole idea.

By my opinion the space reservation is possible to split into three separate 
problems:

1) First mandatory is to implement space reservation functionality in the core. 
It is what this patch tries to do. It prevents to store new data in reserved 
area. The configuration is implemented with reloptions.

I think This part is possible simply backported back to 8.2 and it allows 
in-place online upgrade form 8.2 to 8.4 (and indirectly from 8.1). There is only 
limitation for toasted arrays and composite datatypes, which can be solve with 
offline conversion.

How Jim Nasby mentioned we need to add also per tuple space reservation which is 
not required for V3->V4 upgrade or it is negative, but it is important to have 
it for 8.4, because who know what will happen int 8.5 development.

By my opinion, this functionality is similar to fillfactor and it is 
independent. And can be committed separately.

2) Second part is pre-upgrade check/preparation process. For 8.2 it has to be 
store procedure which will work separately (no core changes). In 8.4 it will be 
implemented into core - new column in pg_class and pg_database. Vacuum adjusting 
and so on.

3) Last part is pre-upgrade configuration which correctly setup reserved space 
all non-system relations in database cluster.

Thats my idea. Let me know any comments.
        Thanks Zdenek



Re: WIP: Page space reservation (pgupgrade)

От
Tom Lane
Дата:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> Tom Lane napsal(a):
>> What is the point of this?  We don't need it for 8.3->8.4, we aren't
>> going to back-patch such a thing into 8.2 or before (certainly not
>> before, since reloptions didn't exist before 8.2), and I would hope we
>> put in a more general solution than this for 8.4-to-later preparatory
>> fixes.

> ...
> I think This part is possible simply backported back to 8.2 and it allows 
> in-place online upgrade form 8.2 to 8.4 (and indirectly from 8.1).

I would never recommend to anyone that they depend on a patch like this
for pre-upgrade conversions.  It has no way to guarantee that all pages
in the database have been fixed.  And in any case, I would vote against
back-patching such a thing into 8.2, at least not without a whole lot
more field testing than it is likely to get.

As for planning that there will someday be 8.1->8.2 or even 8.2->8.3
online upgrade, I suggest that you quit wasting your time even thinking
about that.  It won't happen and expending more cycles on it is mostly
going to ensure that the entire upgrade project fails.  What we need now
is something that works for 8.3->8.4 and that we can extend and maintain
for future version updates.
        regards, tom lane