Обсуждение: Better tracking of free space during SP-GiST index build

Поиск
Список
Период
Сортировка

Better tracking of free space during SP-GiST index build

От
Tom Lane
Дата:
Over in the thread about the SP-GiST inet opclass, I threatened to post
a patch like this, and here it is.

The basic idea is to track more than just the very latest page we've used
in each of the page categories that SP-GiST works with.  I started with an
arrangement that gave an equal number of cache slots to each category, but
soon realized that that was dumb, because there are usually way more leaf
pages than anything else.  So this version has a little table of how many
slots to give to each category.  The constants could maybe use a bit more
fiddling, if we have some more test data sets to try this on.

On the IRRExplorer data set we discussed in the other thread, this reduces
the index size from 132MB to 120MB.  Poking into that more closely with
pg_filedump, the total free space within the index drops from 42MB to
28MB.  If you think those numbers don't add up, you're right --- this
seems to result in more non-leaf tuples than before.  I'm not sure why;
maybe more aggressive sucking up of free space results in more splits.
(Maybe adjustment of the default spgist fillfactor would be in order
to counteract that?)  But the index search time doesn't seem to be hurt,
so perhaps there's nothing to worry about.

As coded, this makes no attempt to preferentially select pages with the
most or least free space.  I don't know if it'd be worth any cycles to
do that.

I'll put this in the commitfest queue.  It could use review from someone
with the time and motivation to do performance testing/tuning.

            regards, tom lane

diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index d570ae5..95c45fa 100644
*** a/src/backend/access/spgist/spgutils.c
--- b/src/backend/access/spgist/spgutils.c
***************
*** 27,32 ****
--- 27,56 ----


  /*
+  * This array defines how many entries of the lastUsedPages[] cache are
+  * reserved for index pages of each classification known to SpGistGetBuffer().
+  */
+ struct LUPMapEntry
+ {
+     int            count;
+     int            start;            /* must equal sum of preceding counts */
+ };
+
+ static const struct LUPMapEntry lastUsedPagesMap[] = {
+     {8, 0},                        /* inner pages, parity 0 */
+     {8, 8},                        /* inner pages, parity 1 */
+     {8, 16},                    /* inner pages, parity 2 */
+     {60, 24},                    /* leaf pages */
+     {2, 84},                    /* inner pages for nulls, parity 0 */
+     {2, 86},                    /* inner pages for nulls, parity 1 */
+     {2, 88},                    /* inner pages for nulls, parity 2 */
+     {10, 90}                    /* leaf pages for nulls */
+
+ #define LASTUSEDPAGESMAP_END 100    /* must equal SPGIST_CACHED_PAGES */
+ };
+
+
+ /*
   * SP-GiST handler function: return IndexAmRoutine with access method parameters
   * and callbacks.
   */
*************** spghandler(PG_FUNCTION_ARGS)
*** 35,40 ****
--- 59,67 ----
  {
      IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);

+     StaticAssertStmt(LASTUSEDPAGESMAP_END == SPGIST_CACHED_PAGES,
+                      "lastUsedPagesMap[] does not match SPGIST_CACHED_PAGES");
+
      amroutine->amstrategies = 0;
      amroutine->amsupport = SPGISTNProc;
      amroutine->amcanorder = false;
*************** spgGetCache(Relation index)
*** 129,140 ****

          metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));

!         if (metadata->magicNumber != SPGIST_MAGIC_NUMBER)
              elog(ERROR, "index \"%s\" is not an SP-GiST index",
                   RelationGetRelationName(index));

-         cache->lastUsedPages = metadata->lastUsedPages;
-
          UnlockReleaseBuffer(metabuffer);

          index->rd_amcache = (void *) cache;
--- 156,179 ----

          metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));

!         if (metadata->magicNumber == SPGIST_MAGIC_NUMBER)
!             cache->lastUsedPages = metadata->lastUsedPages;
!         else if (metadata->magicNumber == SPGIST_OLD_MAGIC_NUMBER)
!         {
!             /*
!              * We could make an effort to slurp up the pages that exist in the
!              * old-format cache, but it's probably not worth the trouble. Just
!              * init our cache to empty.
!              */
!             int            i;
!
!             for (i = 0; i < SPGIST_CACHED_PAGES; i++)
!                 cache->lastUsedPages.cachedPage[i].blkno = InvalidBlockNumber;
!         }
!         else
              elog(ERROR, "index \"%s\" is not an SP-GiST index",
                   RelationGetRelationName(index));

          UnlockReleaseBuffer(metabuffer);

          index->rd_amcache = (void *) cache;
*************** SpGistUpdateMetaPage(Relation index)
*** 258,263 ****
--- 297,305 ----
          if (ConditionalLockBuffer(metabuffer))
          {
              metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
+
+             /* Update both the magic # and the cached page list */
+             metadata->magicNumber = SPGIST_MAGIC_NUMBER;
              metadata->lastUsedPages = cache->lastUsedPages;

              MarkBufferDirty(metabuffer);
*************** SpGistUpdateMetaPage(Relation index)
*** 270,279 ****
      }
  }

- /* Macro to select proper element of lastUsedPages cache depending on flags */
- /* Masking flags with SPGIST_CACHED_PAGES is just for paranoia's sake */
- #define GET_LUP(c, f)  (&(c)->lastUsedPages.cachedPage[((unsigned int) (f)) % SPGIST_CACHED_PAGES])
-
  /*
   * Allocate and initialize a new buffer of the type and parity specified by
   * flags.  The returned buffer is already pinned and exclusive-locked.
--- 312,317 ----
*************** SpGistUpdateMetaPage(Relation index)
*** 297,303 ****
  static Buffer
  allocNewBuffer(Relation index, int flags)
  {
-     SpGistCache *cache = spgGetCache(index);
      uint16        pageflags = 0;

      if (GBUF_REQ_LEAF(flags))
--- 335,340 ----
*************** allocNewBuffer(Relation index, int flags
*** 330,340 ****
              else
              {
                  /* Page has wrong parity, record it in cache and try again */
!                 if (pageflags & SPGIST_NULLS)
!                     blkFlags |= GBUF_NULLS;
!                 cache->lastUsedPages.cachedPage[blkFlags].blkno = blkno;
!                 cache->lastUsedPages.cachedPage[blkFlags].freeSpace =
!                     PageGetExactFreeSpace(BufferGetPage(buffer));
                  UnlockReleaseBuffer(buffer);
              }
          }
--- 367,373 ----
              else
              {
                  /* Page has wrong parity, record it in cache and try again */
!                 SpGistSetLastUsedPage(index, buffer);
                  UnlockReleaseBuffer(buffer);
              }
          }
*************** Buffer
*** 354,360 ****
  SpGistGetBuffer(Relation index, int flags, int needSpace, bool *isNew)
  {
      SpGistCache *cache = spgGetCache(index);
!     SpGistLastUsedPage *lup;

      /* Bail out if even an empty page wouldn't meet the demand */
      if (needSpace > SPGIST_PAGE_CAPACITY)
--- 387,395 ----
  SpGistGetBuffer(Relation index, int flags, int needSpace, bool *isNew)
  {
      SpGistCache *cache = spgGetCache(index);
!     int            lupstart,
!                 lupstop,
!                 i;

      /* Bail out if even an empty page wouldn't meet the demand */
      if (needSpace > SPGIST_PAGE_CAPACITY)
*************** SpGistGetBuffer(Relation index, int flag
*** 371,405 ****
                                                  SPGIST_DEFAULT_FILLFACTOR);
      needSpace = Min(needSpace, SPGIST_PAGE_CAPACITY);

!     /* Get the cache entry for this flags setting */
!     lup = GET_LUP(cache, flags);
!
!     /* If we have nothing cached, just turn it over to allocNewBuffer */
!     if (lup->blkno == InvalidBlockNumber)
!     {
!         *isNew = true;
!         return allocNewBuffer(index, flags);
!     }
!
!     /* fixed pages should never be in cache */
!     Assert(!SpGistBlockIsFixed(lup->blkno));

!     /* If cached freeSpace isn't enough, don't bother looking at the page */
!     if (lup->freeSpace >= needSpace)
      {
          Buffer        buffer;
          Page        page;

!         buffer = ReadBuffer(index, lup->blkno);

          if (!ConditionalLockBuffer(buffer))
          {
!             /*
!              * buffer is locked by another process, so return a new buffer
!              */
              ReleaseBuffer(buffer);
!             *isNew = true;
!             return allocNewBuffer(index, flags);
          }

          page = BufferGetPage(buffer);
--- 406,443 ----
                                                  SPGIST_DEFAULT_FILLFACTOR);
      needSpace = Min(needSpace, SPGIST_PAGE_CAPACITY);

!     /* Locate the cache entries for this flags setting */
!     lupstart = lastUsedPagesMap[flags].start;
!     lupstop = lastUsedPagesMap[flags].count + lupstart;

!     /*
!      * Search for a usable entry.  If there's more than one, we just choose
!      * the first one; is there a better heuristic?
!      */
!     for (i = lupstart; i < lupstop; i++)
      {
+         SpGistLastUsedPage *lup = &cache->lastUsedPages.cachedPage[i];
+         BlockNumber blkno = lup->blkno;
          Buffer        buffer;
          Page        page;

!         if (blkno == InvalidBlockNumber)
!             continue;
!
!         /* fixed pages should never be in cache */
!         Assert(!SpGistBlockIsFixed(blkno));
!
!         /* If cached freeSpace isn't enough, don't bother looking at the page */
!         if (lup->freeSpace < needSpace)
!             continue;
!
!         buffer = ReadBuffer(index, blkno);

          if (!ConditionalLockBuffer(buffer))
          {
!             /* buffer is locked by another process, so keep searching */
              ReleaseBuffer(buffer);
!             continue;
          }

          page = BufferGetPage(buffer);
*************** SpGistGetBuffer(Relation index, int flag
*** 414,419 ****
--- 452,459 ----
              if (GBUF_REQ_NULLS(flags))
                  pageflags |= SPGIST_NULLS;
              SpGistInitBuffer(buffer, pageflags);
+
+             /* Update freespace info and return the buffer */
              lup->freeSpace = PageGetExactFreeSpace(page) - needSpace;
              *isNew = true;
              return buffer;
*************** SpGistGetBuffer(Relation index, int flag
*** 437,445 ****
              }
          }

!         /*
!          * fallback to allocation of new buffer
!          */
          UnlockReleaseBuffer(buffer);
      }

--- 477,483 ----
              }
          }

!         /* Give up on this page, but keep scanning cache entries */
          UnlockReleaseBuffer(buffer);
      }

*************** SpGistGetBuffer(Relation index, int flag
*** 451,474 ****
  /*
   * Update lastUsedPages cache when done modifying a page.
   *
!  * We update the appropriate cache entry if it already contained this page
!  * (its freeSpace is likely obsolete), or if this page has more space than
!  * whatever we had cached.
   */
  void
  SpGistSetLastUsedPage(Relation index, Buffer buffer)
  {
      SpGistCache *cache = spgGetCache(index);
-     SpGistLastUsedPage *lup;
-     int            freeSpace;
      Page        page = BufferGetPage(buffer);
      BlockNumber blkno = BufferGetBlockNumber(buffer);
      int            flags;

      /* Never enter fixed pages (root pages) in cache, though */
      if (SpGistBlockIsFixed(blkno))
          return;

      if (SpGistPageIsLeaf(page))
          flags = GBUF_LEAF;
      else
--- 489,520 ----
  /*
   * Update lastUsedPages cache when done modifying a page.
   *
!  * If there's already a cache entry for this page, update its freeSpace.
!  * Otherwise, remember the page in the cache if it has more space than
!  * the least useful page we had before.
   */
  void
  SpGistSetLastUsedPage(Relation index, Buffer buffer)
  {
      SpGistCache *cache = spgGetCache(index);
      Page        page = BufferGetPage(buffer);
      BlockNumber blkno = BufferGetBlockNumber(buffer);
      int            flags;
+     int            freeSpace;
+     int            emptySlot,
+                 minFreeSlot,
+                 minFreeSpace;
+     int            lupstart,
+                 lupstop,
+                 i;

      /* Never enter fixed pages (root pages) in cache, though */
      if (SpGistBlockIsFixed(blkno))
          return;

+     /* Determine page's current free space and its category */
+     freeSpace = PageGetExactFreeSpace(page);
+
      if (SpGistPageIsLeaf(page))
          flags = GBUF_LEAF;
      else
*************** SpGistSetLastUsedPage(Relation index, Bu
*** 476,490 ****
      if (SpGistPageStoresNulls(page))
          flags |= GBUF_NULLS;

!     lup = GET_LUP(cache, flags);

!     freeSpace = PageGetExactFreeSpace(page);
!     if (lup->blkno == InvalidBlockNumber || lup->blkno == blkno ||
!         lup->freeSpace < freeSpace)
      {
!         lup->blkno = blkno;
!         lup->freeSpace = freeSpace;
      }
  }

  /*
--- 522,567 ----
      if (SpGistPageStoresNulls(page))
          flags |= GBUF_NULLS;

!     /* Locate the cache entries for this flags setting */
!     lupstart = lastUsedPagesMap[flags].start;
!     lupstop = lastUsedPagesMap[flags].count + lupstart;

!     /*
!      * If page is in the cache, just update its free space; otherwise identify
!      * best candidate slot to replace.
!      */
!     emptySlot = minFreeSlot = -1;
!     minFreeSpace = BLCKSZ + 1;
!     for (i = lupstart; i < lupstop; i++)
      {
!         SpGistLastUsedPage *lup = &cache->lastUsedPages.cachedPage[i];
!
!         if (lup->blkno == blkno)
!         {
!             lup->freeSpace = freeSpace;
!             return;
!         }
!         if (lup->blkno == InvalidBlockNumber)
!             emptySlot = i;
!         else if (lup->freeSpace < minFreeSpace)
!         {
!             minFreeSlot = i;
!             minFreeSpace = lup->freeSpace;
!         }
      }
+
+     if (emptySlot >= 0)
+         i = emptySlot;
+     else if (minFreeSlot >= 0 && minFreeSpace < freeSpace)
+         i = minFreeSlot;
+     else
+     {
+         /* There's no suitable slot, so just forget the page */
+         return;
+     }
+
+     cache->lastUsedPages.cachedPage[i].blkno = blkno;
+     cache->lastUsedPages.cachedPage[i].freeSpace = freeSpace;
  }

  /*
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index cb8fa9c..66a4fb4 100644
*** a/src/include/access/spgist_private.h
--- b/src/include/access/spgist_private.h
*************** typedef struct SpGistLastUsedPage
*** 80,87 ****
      int            freeSpace;        /* page's free space (could be obsolete!) */
  } SpGistLastUsedPage;

! /* Note: indexes in cachedPage[] match flag assignments for SpGistGetBuffer */
! #define SPGIST_CACHED_PAGES 8

  typedef struct SpGistLUPCache
  {
--- 80,91 ----
      int            freeSpace;        /* page's free space (could be obsolete!) */
  } SpGistLastUsedPage;

! /*
!  * See spgutils.c for the assignments of indexes in cachedPage[].  Note that
!  * making SPGIST_CACHED_PAGES much larger than this would fail with the
!  * minimum allowed value of BLCKSZ, ie 1kB.
!  */
! #define SPGIST_CACHED_PAGES 100

  typedef struct SpGistLUPCache
  {
*************** typedef struct SpGistMetaPageData
*** 97,103 ****
      SpGistLUPCache lastUsedPages;        /* shared storage of last-used info */
  } SpGistMetaPageData;

! #define SPGIST_MAGIC_NUMBER (0xBA0BABEE)

  #define SpGistPageGetMeta(p) \
      ((SpGistMetaPageData *) PageGetContents(p))
--- 101,110 ----
      SpGistLUPCache lastUsedPages;        /* shared storage of last-used info */
  } SpGistMetaPageData;

! /* For current format: */
! #define SPGIST_MAGIC_NUMBER (0xBA0BABE1)
! /* For pre-v10 format (which had a smaller lastUsedPages array): */
! #define SPGIST_OLD_MAGIC_NUMBER (0xBA0BABEE)

  #define SpGistPageGetMeta(p) \
      ((SpGistMetaPageData *) PageGetContents(p))
*************** typedef struct spgxlogVacuumRedirect
*** 599,605 ****
   * In addition, GBUF_NULLS can be OR'd in to get a page for storage of
   * null-valued tuples.
   *
!  * Note: these flag values are used as indexes into lastUsedPages.
   */
  #define GBUF_LEAF                0x03
  #define GBUF_INNER_PARITY(x)    ((x) % 3)
--- 606,613 ----
   * In addition, GBUF_NULLS can be OR'd in to get a page for storage of
   * null-valued tuples.
   *
!  * Note: these flag values are used as indexes into spgutils.c's
!  * lastUsedPagesMap[].
   */
  #define GBUF_LEAF                0x03
  #define GBUF_INNER_PARITY(x)    ((x) % 3)

Re: Better tracking of free space during SP-GiST index build

От
Tomas Vondra
Дата:

On 08/25/2016 01:45 AM, Tom Lane wrote:
> Over in the thread about the SP-GiST inet opclass, I threatened to post
> a patch like this, and here it is.
>
> The basic idea is to track more than just the very latest page we've used
> in each of the page categories that SP-GiST works with.  I started with an
> arrangement that gave an equal number of cache slots to each category, but
> soon realized that that was dumb, because there are usually way more leaf
> pages than anything else.  So this version has a little table of how many
> slots to give to each category.  The constants could maybe use a bit more
> fiddling, if we have some more test data sets to try this on.
>
> On the IRRExplorer data set we discussed in the other thread, this reduces
> the index size from 132MB to 120MB.  Poking into that more closely with
> pg_filedump, the total free space within the index drops from 42MB to
> 28MB.  If you think those numbers don't add up, you're right --- this
> seems to result in more non-leaf tuples than before.  I'm not sure why;
> maybe more aggressive sucking up of free space results in more splits.
> (Maybe adjustment of the default spgist fillfactor would be in order
> to counteract that?)  But the index search time doesn't seem to be hurt,
> so perhaps there's nothing to worry about.
>
> As coded, this makes no attempt to preferentially select pages with the
> most or least free space.  I don't know if it'd be worth any cycles to
> do that.
>
> I'll put this in the commitfest queue.  It could use review from someone
> with the time and motivation to do performance testing/tuning.
>

I can do a bit of benchmarking on this, I guess - possibly next week, 
but I can't promise that 100%. I'm not a spgist-expert and I won't have 
time to dive into the code, so it'll be mostly blackbox testing. Any 
hints what would be worth/interesting to test?

ISTM it'd be interesting to test both index creation, maintenance and 
querying, while varying the fillfactor. What other parameters would be 
interesting to tweak, and what datasets might be useful?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Better tracking of free space during SP-GiST index build

От
Tomas Vondra
Дата:
On 08/25/2016 03:26 AM, Tomas Vondra wrote:
>
>
> On 08/25/2016 01:45 AM, Tom Lane wrote:
>> Over in the thread about the SP-GiST inet opclass, I threatened to
>> post a patch like this, and here it is.
>>
>> The basic idea is to track more than just the very latest page
>> we've used in each of the page categories that SP-GiST works with.
>> I started with an arrangement that gave an equal number of cache
>> slots to each category, but soon realized that that was dumb,
>> because there are usually way more leaf pages than anything else.
>> So this version has a little table of how many slots to give to
>> each category. The constants could maybe use a bit more fiddling,
>> if we have some more test data sets to try this on.
>>
>> On the IRRExplorer data set we discussed in the other thread, this
>> reduces the index size from 132MB to 120MB. Poking into that more
>> closely with pg_filedump, the total free space within the index
>> drops from 42MB to 28MB. If you think those numbers don't add up,
>> you're right --- this seems to result in more non-leaf tuples than
>> before. I'm not sure why; maybe more aggressive sucking up of free
>> space results in more splits. (Maybe adjustment of the default
>> spgist fillfactor would be in order to counteract that?) But the
>> index search time doesn't seem to be hurt, so perhaps there's
>> nothing to worry about.
>>
>> As coded, this makes no attempt to preferentially select pages with
>> the most or least free space. I don't know if it'd be worth any
>> cycles to do that.
>>
>> I'll put this in the commitfest queue. It could use review from
>> someone with the time and motivation to do performance
>> testing/tuning.
>>
>

I've been toying with this patch a bit today, particularly looking at 
(1) sizing of the cache, and (2) how beneficial it'd be to choose pages 
from the cache in a smarter way.


(1) sizing of the cache

I wonder why the patch only sets the cache size to 100 items, when we 
might fit many more entries into the ~8kB limit. Sure, it's going to be 
more expensive to maintain the cache, but if it results in smaller 
index, it might be worth it. I've tried increasing the cache size to 768 
entries, with vast majority of them (~600) allocated to leaf pages.

Sadly, this seems to only increase the CREATE INDEX duration a bit, 
without making the index significantly smaller (still ~120MB).


(2) page selection

I do believe this is due to the current simple selection of pages from 
the cache - we simply select the pages more or less randomly (as long as 
the  page has enough free space). My understanding is that this leads to 
pages having roughly the same amount of free space. Subsequently, when 
SpGistSetLastUsedPage() selects page to evict from the cache, it finds 
with roughly the same amount of free space, and even if it picks the 
most full one, it wastes quite a bit of space.

I do think selecting the page with the least free space would save 
additional space. Instead of making SpGistGetBuffer() more complicated, 
I've simply shoved a pg_qsort() calls on a few places, sorting the cache 
by free space, with unused slots at the end. Clearly, this is quite 
expensive and proper patch would have to do that differently (e.g. 
maintaining the order incrementally), but OTOH it'd allow some 
optimizations in SpGistGetBuffer() - the first page with enough free 
space would have the smallest amount of free space (more or less).

This actually helped a bit, and the index size dropped by ~2MB. So not 
bad, but nowhere close to the initial 132MB -> 120MB improvement.

The following table shows comparison of index sizes, and also the effect 
of fillfactor=100:

master: 132MB
master + fillfactor=100: 124MB

patch: 120MB
patch + fillfactor=100: 109MB

patch + 768 items + selection: 117MB
patch + 768 items + selection + fillfactor=100: 103MB

It's worth mentioning the spgist fillfactor is a bit crude, most likely 
thanks to splits - e.g. the 109MB index still contains ~10MB of free 
space on the pages (measures using pageinspect as upper-lower), so 
almost 10%. Perhaps it really is time to increase the spgist default 
fillfactor?

(3) random comments

It seems the patch keeps new/empty/deleted pages in the cache, and thus 
segregated by type. Is that intentional, or should 
SpGistSetLastUsedPage() remove such pages from the cache? Or maybe move 
them into a special category? It's true we'll reuse those pages, as 
allocNewBuffer() allocates the buffer directly, but those pages are 
unlikely to get evicted from the cache due to high freeSpace value 
(despite possibly already reused).

Similarly for completely full pages (with freeSpace==0) - does it make 
sense to keep them in the cache? Although it's probably harmless, as 
those pages will get evicted first if needed.


Overall, I think the patch is good - it may be possible to improve it in 
the future, but it's a solid improvement.

One thing I'd change is making the SpGistLUPCache dynamic, i.e. storing 
the size and lastUsedPagesMap on the meta page. That should allow us 
resizing the cache and tweak lastUsedPagesMap in the future.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Better tracking of free space during SP-GiST index build

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 08/25/2016 01:45 AM, Tom Lane wrote:
>>> I'll put this in the commitfest queue. It could use review from
>>> someone with the time and motivation to do performance
>>> testing/tuning.

> I've been toying with this patch a bit today, particularly looking at 
> (1) sizing of the cache, and (2) how beneficial it'd be to choose pages 
> from the cache in a smarter way.

Thanks for reviewing!

> I wonder why the patch only sets the cache size to 100 items, when we 
> might fit many more entries into the ~8kB limit.

I chose that because it would still work with the minimum allowed page
size of 1K.  We could make the cache size variable depending on BLCKSZ,
but I'm not sure it's worth the trouble.  There is some cost to a larger
lastUsedPages array, in that you spend more time memcpy'ing it back and
forth between the metapage buffer and backend local memory; and I was
afraid of that outweighing the incremental gains from a larger cache.

> ... I've tried increasing the cache size to 768 
> entries, with vast majority of them (~600) allocated to leaf pages.
> Sadly, this seems to only increase the CREATE INDEX duration a bit, 
> without making the index significantly smaller (still ~120MB).

Yeah, that's in line with my results: not much further gain from a
larger cache.  Though if you were testing with the same IRRExplorer
data, it's not surprising that our results would match.  Would be
good to try some other cases...

> I do think selecting the page with the least free space would save 
> additional space. Instead of making SpGistGetBuffer() more complicated, 
> I've simply shoved a pg_qsort() calls on a few places, sorting the cache 
> by free space, with unused slots at the end. Clearly, this is quite 
> expensive and proper patch would have to do that differently (e.g. 
> maintaining the order incrementally), but OTOH it'd allow some 
> optimizations in SpGistGetBuffer() - the first page with enough free 
> space would have the smallest amount of free space (more or less).

> This actually helped a bit, and the index size dropped by ~2MB. So not 
> bad, but nowhere close to the initial 132MB -> 120MB improvement.

OK, I'll think about how to do that more efficiently.  The smaller
incremental improvement isn't surprising, because in this example the
index would still be 90-something MB if it had no free space at all,
so there's going to be decreasing returns from any additional work
to avoid wasted free space.  But if we can do it cheaply, this does
suggest that using pages in order by free space is of value.

> It's worth mentioning the spgist fillfactor is a bit crude, most likely 
> thanks to splits - e.g. the 109MB index still contains ~10MB of free 
> space on the pages (measures using pageinspect as upper-lower), so 
> almost 10%. Perhaps it really is time to increase the spgist default 
> fillfactor?

Maybe; I don't think anyone's put much effort into tuning that.

> It seems the patch keeps new/empty/deleted pages in the cache, and thus 
> segregated by type. Is that intentional, or should 
> SpGistSetLastUsedPage() remove such pages from the cache? Or maybe move 
> them into a special category? It's true we'll reuse those pages, as 
> allocNewBuffer() allocates the buffer directly, but those pages are 
> unlikely to get evicted from the cache due to high freeSpace value 
> (despite possibly already reused).

My thought was that once we've found a new/empty/deleted page, putting
it in the cache is a cheap way of making sure it gets used soon.  Sure,
we could record it in the FSM instead, but that sounds relatively more
expensive.  That intuition could be wrong of course.

> Similarly for completely full pages (with freeSpace==0) - does it make 
> sense to keep them in the cache? Although it's probably harmless, as 
> those pages will get evicted first if needed.

IIRC, we don't have anything to replace the cache entry with at that
point, so there's no particular value in marking the entry unused
rather than used-but-with-zero-space.  It will be first priority for
replacement either way, so there seemed little point in writing
more code to make it unused.

> One thing I'd change is making the SpGistLUPCache dynamic, i.e. storing 
> the size and lastUsedPagesMap on the meta page. That should allow us 
> resizing the cache and tweak lastUsedPagesMap in the future.

Yeah, probably a good idea.  I had thought of bumping SPGIST_MAGIC_NUMBER
again if we want to revisit the cache size; but keeping it as a separate
field won't add noticeable cost, and it might save some trouble.
        regards, tom lane



Re: Better tracking of free space during SP-GiST index build

От
Tomas Vondra
Дата:
On 09/22/2016 07:37 PM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>
>> ... I've tried increasing the cache size to 768
>> entries, with vast majority of them (~600) allocated to leaf pages.
>> Sadly, this seems to only increase the CREATE INDEX duration a bit,
>> without making the index significantly smaller (still ~120MB).
>
> Yeah, that's in line with my results: not much further gain from a
> larger cache.  Though if you were testing with the same IRRExplorer
> data, it's not surprising that our results would match.  Would be
> good to try some other cases...
>

Agreed, but I don't have any other data sets at hand. One possibility 
would be to generate something randomly (e.g. it's not particularly 
difficult to generate random IP addresses), but I'd much rather use some 
real-world data sets.
>>
>> One thing I'd change is making the SpGistLUPCache dynamic, i.e.
>> storing the size and lastUsedPagesMap on the meta page. That
>> should allow us resizing the cache and tweak lastUsedPagesMap in
>> the future.
>
> Yeah, probably a good idea. I had thought of bumping
> SPGIST_MAGIC_NUMBER again if we want to revisit the cache size; but
> keeping it as a separate field won't add noticeable cost, and it
> might save some trouble.
>

I see you plan to track only the cache size, while I proposed to track 
also the map, i.e. number of pages per category. I think that'd useful 
in case we come up with better values (e.g. more entries for leaf 
pages), or even somewhat adaptive way.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Better tracking of free space during SP-GiST index build

От
Oleg Bartunov
Дата:
On Sat, Sep 24, 2016 at 11:32 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 09/22/2016 07:37 PM, Tom Lane wrote:
>>
>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>
>>> ... I've tried increasing the cache size to 768
>>> entries, with vast majority of them (~600) allocated to leaf pages.
>>> Sadly, this seems to only increase the CREATE INDEX duration a bit,
>>> without making the index significantly smaller (still ~120MB).
>>
>>
>> Yeah, that's in line with my results: not much further gain from a
>> larger cache.  Though if you were testing with the same IRRExplorer
>> data, it's not surprising that our results would match.  Would be
>> good to try some other cases...
>>
>
> Agreed, but I don't have any other data sets at hand. One possibility would
> be to generate something randomly (e.g. it's not particularly difficult to
> generate random IP addresses), but I'd much rather use some real-world data
> sets.

Tomas, I have one real dataset, which I used for testing spgist
(https://www.postgresql.org/message-id/CAF4Au4zxd2XOV0A__FU7xoHxSiwJzm1z2xhs-FFaT1DzB9ub3Q@mail.gmail.com)
Let me know if you need it.

>
>>>
>>>
>>> One thing I'd change is making the SpGistLUPCache dynamic, i.e.
>>> storing the size and lastUsedPagesMap on the meta page. That
>>> should allow us resizing the cache and tweak lastUsedPagesMap in
>>> the future.
>>
>>
>> Yeah, probably a good idea. I had thought of bumping
>> SPGIST_MAGIC_NUMBER again if we want to revisit the cache size; but
>> keeping it as a separate field won't add noticeable cost, and it
>> might save some trouble.
>>
>
> I see you plan to track only the cache size, while I proposed to track also
> the map, i.e. number of pages per category. I think that'd useful in case we
> come up with better values (e.g. more entries for leaf pages), or even
> somewhat adaptive way.
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



Re: Better tracking of free space during SP-GiST index build

От
Tomas Vondra
Дата:
On 09/25/2016 08:33 PM, Oleg Bartunov wrote:
> On Sat, Sep 24, 2016 at 11:32 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> On 09/22/2016 07:37 PM, Tom Lane wrote:
>>>
>>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>>
>>>> ... I've tried increasing the cache size to 768
>>>> entries, with vast majority of them (~600) allocated to leaf pages.
>>>> Sadly, this seems to only increase the CREATE INDEX duration a bit,
>>>> without making the index significantly smaller (still ~120MB).
>>>
>>>
>>> Yeah, that's in line with my results: not much further gain from a
>>> larger cache.  Though if you were testing with the same IRRExplorer
>>> data, it's not surprising that our results would match.  Would be
>>> good to try some other cases...
>>>
>>
>> Agreed, but I don't have any other data sets at hand. One possibility would
>> be to generate something randomly (e.g. it's not particularly difficult to
>> generate random IP addresses), but I'd much rather use some real-world data
>> sets.
>
> Tomas, I have one real dataset, which I used for testing spgist
> (https://www.postgresql.org/message-id/CAF4Au4zxd2XOV0A__FU7xoHxSiwJzm1z2xhs-FFaT1DzB9ub3Q@mail.gmail.com)
> Let me know if you need it.
>

Sure, that would be useful.

I think it would be useful to make repository of such data sets, so that 
patch authors & reviewers can get a reasonable collection of data sets 
if needed, instead of scrambling every time. Opinions?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Better tracking of free space during SP-GiST index build

От
Robert Haas
Дата:
On Sun, Sep 25, 2016 at 3:28 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Sure, that would be useful.
>
> I think it would be useful to make repository of such data sets, so that
> patch authors & reviewers can get a reasonable collection of data sets if
> needed, instead of scrambling every time. Opinions?

In theory, great idea.  In practice, I suspect the problem will be
that nobody will know what the use case for a particular data set was
supposed to be, and therefore it'll become a collection of files
nobody knows what to do with.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Better tracking of free space during SP-GiST index build

От
Robert Haas
Дата:
On Thu, Sep 22, 2016 at 1:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> OK, I'll think about how to do that more efficiently.  The smaller
> incremental improvement isn't surprising, because in this example the
> index would still be 90-something MB if it had no free space at all,
> so there's going to be decreasing returns from any additional work
> to avoid wasted free space.  But if we can do it cheaply, this does
> suggest that using pages in order by free space is of value.

Tom, are you planning to do something about this patch yet this
CommitFest, or leave it until later?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Better tracking of free space during SP-GiST index build

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Sep 22, 2016 at 1:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> OK, I'll think about how to do that more efficiently.  The smaller
>> incremental improvement isn't surprising, because in this example the
>> index would still be 90-something MB if it had no free space at all,
>> so there's going to be decreasing returns from any additional work
>> to avoid wasted free space.  But if we can do it cheaply, this does
>> suggest that using pages in order by free space is of value.

> Tom, are you planning to do something about this patch yet this
> CommitFest, or leave it until later?

I doubt I will get to it this week, so let's mark it RWF for this fest.
        regards, tom lane



Re: Better tracking of free space during SP-GiST index build

От
Robert Haas
Дата:
On Wed, Sep 28, 2016 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Sep 22, 2016 at 1:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> OK, I'll think about how to do that more efficiently.  The smaller
>>> incremental improvement isn't surprising, because in this example the
>>> index would still be 90-something MB if it had no free space at all,
>>> so there's going to be decreasing returns from any additional work
>>> to avoid wasted free space.  But if we can do it cheaply, this does
>>> suggest that using pages in order by free space is of value.
>
>> Tom, are you planning to do something about this patch yet this
>> CommitFest, or leave it until later?
>
> I doubt I will get to it this week, so let's mark it RWF for this fest.

OK, done.  Thanks for the reply.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Better tracking of free space during SP-GiST index build

От
Jeff Janes
Дата:
On Wed, Sep 28, 2016 at 10:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Sep 25, 2016 at 3:28 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Sure, that would be useful.
>
> I think it would be useful to make repository of such data sets, so that
> patch authors & reviewers can get a reasonable collection of data sets if
> needed, instead of scrambling every time. Opinions?

In theory, great idea.  In practice, I suspect the problem will be
that nobody will know what the use case for a particular data set was
supposed to be, and therefore it'll become a collection of files
nobody knows what to do with.

We would have to store them together with the example queries they pertain to, somehow, rather than just a loose assemblage of files.  Best case is probably to have a generator for the data, rather than data itself, when possible. What would be the best technology to host such a thing?
 
Cheers,

Jeff