Re: Index-only-scans, indexam API changes

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Index-only-scans, indexam API changes
Дата
Msg-id 4A5B5386.3070100@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Index-only-scans, indexam API changes  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Index-only-scans, indexam API changes  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Tom Lane wrote:
> One thought here is that an AM call isn't really free, and doing two of
> them instead of one mightn't be such a good idea.  I would suggest
> either having a separate AM entry point to get both bits of data
> ("amgettupledata"?) or adding an optional parameter to amgettuple.

I'm thinking of adding a new flag to IndexScanDesc, "needIndexTuple".
When that is set, amgettuple stores a pointer to the IndexTuple in
another new field in IndexScanDesc, xs_itup. (In case of b-tree at
least, the IndexTuple is a palloc'd copy of the tuple on disk)

> [ thinks a bit ... ]  At least for GIST, it is possible that whether
> data can be regurgitated will vary depending on the selected opclass.
> Some opclasses use the STORAGE modifier and some don't.  I am not sure
> how hard we want to work to support flexibility there.  Would it be
> sufficient to hard-code the check as "pgam says the AM can do it,
> and the opclass does not have a STORAGE property"?  Or do we need
> additional intelligence about GIST opclasses?

Well, the way I have it implemented is that the am is not required to
return the index tuple, even if requested. I implemented the B-tree
changes similar to how we implement "kill_prior_tuple": in btgettuple,
lock the index page and see if the tuple is still at the same position
that we remembered in the scan opaque struct. If not (because of
concurrent changes to the page), we give up and don't return the index
tuple. The executor will then perform a heap fetch as before.

Alternatively, we could copy all the matching index tuples to private
memory when we step to a new index page, but that seems pretty expensive
if we're only going to use the first few matching tuples (LIMIT), and I
fear the memory management gets complicated. But in any case, the GiST
issue would still be there.

Since we're discussing it, I'm attaching the prototype patch I have for
returning tuples from b-tree and using them to filter rows before heap
fetches. I was going to post it in the morning along with description
about the planner and executor changes, but here goes. It applies on top
of the indexam API patch I started this thread with.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/index/genam.c
--- b/src/backend/access/index/genam.c
***************
*** 91,102 **** RelationGetIndexScan(Relation indexRelation,
--- 91,104 ----

      scan->kill_prior_tuple = false;
      scan->ignore_killed_tuples = true;    /* default setting */
+     scan->needIndexTuple = false;

      scan->opaque = NULL;

      ItemPointerSetInvalid(&scan->xs_ctup.t_self);
      scan->xs_ctup.t_data = NULL;
      scan->xs_cbuf = InvalidBuffer;
+     scan->xs_itup = NULL;

      /*
       * Let the AM fill in the key and any opaque data it wants.
*** a/src/backend/access/index/indexam.c
--- b/src/backend/access/index/indexam.c
***************
*** 398,404 **** index_restrpos(IndexScanDesc scan)
   *
   * Advances to the next index tuple satisfying the scan keys, returning TRUE
   * if there was one, FALSE otherwise. The heap TID pointer of the next match
!  * is stored in scan->xs_ctup.self.
   *
   * Note: caller must check scan->xs_recheck, and perform rechecking of the
   * scan keys if required.  We do not do that here because we don't have
--- 398,406 ----
   *
   * Advances to the next index tuple satisfying the scan keys, returning TRUE
   * if there was one, FALSE otherwise. The heap TID pointer of the next match
!  * is stored in scan->xs_ctup.self. If scan->needIndexTuple is TRUE, the
!  * index tuple itself is stored in scan->xs_itup, if the indexam succeeded in
!  * fetching it.
   *
   * Note: caller must check scan->xs_recheck, and perform rechecking of the
   * scan keys if required.  We do not do that here because we don't have
*** a/src/backend/access/nbtree/nbtree.c
--- b/src/backend/access/nbtree/nbtree.c
***************
*** 280,285 **** btgettuple(PG_FUNCTION_ARGS)
--- 280,297 ----
      else
          res = _bt_first(scan, dir);

+     /* Fetch the index tuple if needed */
+     if (scan->needIndexTuple)
+     {
+         if (scan->xs_itup != NULL)
+         {
+             pfree(scan->xs_itup);
+             scan->xs_itup = NULL;
+         }
+         if (res)
+             scan->xs_itup = _bt_getindextuple(scan);
+     }
+
      PG_RETURN_BOOL(res);
  }

*** a/src/backend/access/nbtree/nbtsearch.c
--- b/src/backend/access/nbtree/nbtsearch.c
***************
*** 963,968 **** _bt_next(IndexScanDesc scan, ScanDirection dir)
--- 963,1031 ----
      return true;
  }

+
+ /*
+  * _bt_getindextuple - fetch index tuple at current position.
+  *
+  * The caller must have pin on so->currPos.buf.
+  *
+  * The tuple returned is a palloc'd copy. This can fail to find the tuple if
+  * new tuples have been inserted to the page since we stepped on this index
+  * page. NULL is returned in that case.
+  */
+ IndexTuple
+ _bt_getindextuple(IndexScanDesc scan)
+ {
+     BTScanOpaque so = (BTScanOpaque) scan->opaque;
+     Page        page;
+     BTPageOpaque opaque;
+     OffsetNumber minoff;
+     OffsetNumber maxoff;
+     IndexTuple    ituple, result;
+     int itemIndex;
+     BTScanPosItem *positem;
+     OffsetNumber offnum;
+
+     Assert(BufferIsValid(so->currPos.buf));
+
+     LockBuffer(so->currPos.buf, BT_READ);
+
+     page = BufferGetPage(so->currPos.buf);
+     opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+     minoff = P_FIRSTDATAKEY(opaque);
+     maxoff = PageGetMaxOffsetNumber(page);
+
+     itemIndex = so->currPos.itemIndex;
+     positem = &so->currPos.items[itemIndex];
+     offnum = positem->indexOffset;
+
+     Assert(itemIndex >= so->currPos.firstItem &&
+            itemIndex <= so->currPos.lastItem);
+     if (offnum < minoff)
+     {
+         /* pure paranoia */
+         LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+         return NULL;
+     }
+
+     ituple = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
+
+     if (ItemPointerEquals(&ituple->t_tid, &scan->xs_ctup.t_self))
+     {
+         /* found the item */
+         Size itupsz = IndexTupleSize(ituple);
+
+         result = palloc(itupsz);
+         memcpy(result, ituple, itupsz);
+     }
+     else
+         result = NULL;
+
+     LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+     return result;
+ }
+
  /*
   *    _bt_readpage() -- Load data from current index page into so->currPos
   *
*** a/src/backend/commands/explain.c
--- b/src/backend/commands/explain.c
***************
*** 828,833 **** explain_outNode(StringInfo str,
--- 828,838 ----
                             ((Scan *) plan)->scanrelid,
                             plan, outer_plan,
                             str, indent, es);
+             show_scan_qual(((IndexScan *)plan)->indexonlyqual,
+                                "Index-Only Filter",
+                            ((Scan *) plan)->scanrelid,
+                            plan, outer_plan,
+                            str, indent, es);
              show_scan_qual(plan->qual,
                             "Filter",
                             ((Scan *) plan)->scanrelid,
*** a/src/backend/executor/execTuples.c
--- b/src/backend/executor/execTuples.c
***************
*** 92,97 ****
--- 92,98 ----
  #include "postgres.h"

  #include "funcapi.h"
+ #include "access/itup.h"
  #include "catalog/pg_type.h"
  #include "nodes/nodeFuncs.h"
  #include "storage/bufmgr.h"
***************
*** 577,582 **** ExecStoreVirtualTuple(TupleTableSlot *slot)
--- 578,624 ----
      return slot;
  }

+ TupleTableSlot *
+ ExecStoreIndexTuple(IndexTuple tuple, TupleTableSlot *slot, bool shouldFree)
+ {
+     int attnum;
+
+     /*
+      * sanity checks
+      */
+     Assert(tuple != NULL);
+     Assert(slot != NULL);
+     Assert(slot->tts_tupleDescriptor != NULL);
+
+     /*
+      * Free any old physical tuple belonging to the slot.
+      */
+     if (slot->tts_shouldFree)
+         heap_freetuple(slot->tts_tuple);
+     if (slot->tts_shouldFreeMin)
+         heap_free_minimal_tuple(slot->tts_mintuple);
+
+     /*
+      * Store the new tuple into the specified slot.
+      */
+     slot->tts_isempty = false;
+     slot->tts_shouldFree = shouldFree;
+     slot->tts_shouldFreeMin = false;
+     slot->tts_tuple = NULL;
+     slot->tts_mintuple = NULL;
+
+     for(attnum = 1; attnum <= slot->tts_tupleDescriptor->natts; attnum++)
+     {
+         slot->tts_values[attnum - 1] =
+             index_getattr(tuple, attnum, slot->tts_tupleDescriptor,
+                           &slot->tts_isnull[attnum - 1]);
+     }
+
+     slot->tts_nvalid = slot->tts_tupleDescriptor->natts;
+
+     return slot;
+ }
+
  /* --------------------------------
   *        ExecStoreAllNullTuple
   *            Set up the slot to contain a null in every column.
*** a/src/backend/executor/nodeIndexscan.c
--- b/src/backend/executor/nodeIndexscan.c
***************
*** 35,52 ****
  #include "utils/memutils.h"


- static TupleTableSlot *IndexNext(IndexScanState *node);
-
-
  /* ----------------------------------------------------------------
!  *        IndexNext
   *
   *        Retrieve a tuple from the IndexScan node's currentRelation
   *        using the index specified in the IndexScanState information.
   * ----------------------------------------------------------------
   */
! static TupleTableSlot *
! IndexNext(IndexScanState *node)
  {
      EState       *estate;
      ExprContext *econtext;
--- 35,49 ----
  #include "utils/memutils.h"


  /* ----------------------------------------------------------------
!  *        ExecIndexScan(node)
   *
   *        Retrieve a tuple from the IndexScan node's currentRelation
   *        using the index specified in the IndexScanState information.
   * ----------------------------------------------------------------
   */
! TupleTableSlot *
! ExecIndexScan(IndexScanState *node)
  {
      EState       *estate;
      ExprContext *econtext;
***************
*** 55,60 **** IndexNext(IndexScanState *node)
--- 52,89 ----
      Index        scanrelid;
      HeapTuple    tuple;
      TupleTableSlot *slot;
+     IndexScan *plan = (IndexScan *) node->ss.ps.plan;
+     ExprDoneCond isDone;
+     TupleTableSlot *resultSlot;
+
+     /*
+      * If we have runtime keys and they've not already been set up, do it now.
+      */
+     if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
+         ExecReScan((PlanState *) node, NULL);
+
+     /*
+      * Check to see if we're still projecting out tuples from a previous scan
+      * tuple (because there is a function-returning-set in the projection
+      * expressions).  If so, try to project another one.
+      */
+     if (node->ss.ps.ps_TupFromTlist)
+     {
+         Assert(node->ss.ps.ps_ProjInfo); /* can't get here if not projecting */
+         resultSlot = ExecProject(node->ss.ps.ps_ProjInfo, &isDone);
+         if (isDone == ExprMultipleResult)
+             return resultSlot;
+         /* Done with that source tuple... */
+         node->ss.ps.ps_TupFromTlist = false;
+     }
+
+     /*
+      * Reset per-tuple memory context to free any expression evaluation
+      * storage allocated in the previous tuple cycle.  Note this can't happen
+      * until we're done projecting out tuples from a scan tuple.
+      */
+     econtext = node->ss.ps.ps_ExprContext;
+     ResetExprContext(econtext);

      /*
       * extract necessary information from index scan node
***************
*** 62,68 **** IndexNext(IndexScanState *node)
      estate = node->ss.ps.state;
      direction = estate->es_direction;
      /* flip direction if this is an overall backward scan */
!     if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
      {
          if (ScanDirectionIsForward(direction))
              direction = BackwardScanDirection;
--- 91,97 ----
      estate = node->ss.ps.state;
      direction = estate->es_direction;
      /* flip direction if this is an overall backward scan */
!     if (ScanDirectionIsBackward(plan->indexorderdir))
      {
          if (ScanDirectionIsForward(direction))
              direction = BackwardScanDirection;
***************
*** 72,78 **** IndexNext(IndexScanState *node)
      scandesc = node->iss_ScanDesc;
      econtext = node->ss.ps.ps_ExprContext;
      slot = node->ss.ss_ScanTupleSlot;
!     scanrelid = ((IndexScan *) node->ss.ps.plan)->scan.scanrelid;

      /*
       * Check if we are evaluating PlanQual for tuple of this relation.
--- 101,107 ----
      scandesc = node->iss_ScanDesc;
      econtext = node->ss.ps.ps_ExprContext;
      slot = node->ss.ss_ScanTupleSlot;
!     scanrelid = plan->scan.scanrelid;

      /*
       * Check if we are evaluating PlanQual for tuple of this relation.
***************
*** 106,162 **** IndexNext(IndexScanState *node)
      /*
       * ok, now that we have what we need, fetch the next tuple.
       */
!     while ((tuple = index_getnext(scandesc, direction)) != NULL)
      {
          /*
           * Store the scanned tuple in the scan tuple slot of the scan state.
!          * Note: we pass 'false' because tuples returned by amgetnext are
           * pointers onto disk pages and must not be pfree()'d.
           */
          ExecStoreTuple(tuple,    /* tuple to store */
                         slot,    /* slot to store in */
!                        scandesc->xs_cbuf,        /* buffer containing tuple */
                         false);    /* don't pfree */

          /*
           * If the index was lossy, we have to recheck the index quals using
           * the real tuple.
           */
          if (scandesc->xs_recheck)
          {
-             econtext->ecxt_scantuple = slot;
              ResetExprContext(econtext);
              if (!ExecQual(node->indexqualorig, econtext, false))
                  continue;        /* nope, so ask index for another one */
          }

!         return slot;
      }

      /*
       * if we get here it means the index scan failed so we are at the end of
       * the scan..
       */
-     return ExecClearTuple(slot);
- }

- /* ----------------------------------------------------------------
-  *        ExecIndexScan(node)
-  * ----------------------------------------------------------------
-  */
- TupleTableSlot *
- ExecIndexScan(IndexScanState *node)
- {
      /*
!      * If we have runtime keys and they've not already been set up, do it now.
       */
!     if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
!         ExecReScan((PlanState *) node, NULL);

!     /*
!      * use IndexNext as access method
!      */
!     return ExecScan(&node->ss, (ExecScanAccessMtd) IndexNext);
  }

  /* ----------------------------------------------------------------
--- 135,245 ----
      /*
       * ok, now that we have what we need, fetch the next tuple.
       */
!     while (index_next(scandesc, direction))
      {
+         bool indextuple_fetched = false;
+
+         /* See if we can evaluate quals or targets using index tuple */
+         if (scandesc->needIndexTuple && scandesc->xs_itup != NULL)
+         {
+             indextuple_fetched = true;
+
+             slot = node->iss_IndexTupleSlot;
+             ExecStoreIndexTuple(scandesc->xs_itup, slot, false);
+
+             econtext->ecxt_scantuple = slot;
+             if (!ExecQual(node->iss_indexonlyqual, econtext, false))
+                 continue;
+         }
+
+         if ((tuple = index_fetch(scandesc)) == NULL)
+             continue;
+
+         slot = node->ss.ss_ScanTupleSlot;
          /*
           * Store the scanned tuple in the scan tuple slot of the scan state.
!          * Note: we pass 'false' because tuples returned by index_fetch are
           * pointers onto disk pages and must not be pfree()'d.
           */
          ExecStoreTuple(tuple,    /* tuple to store */
                         slot,    /* slot to store in */
!                        scandesc->xs_cbuf,    /* buffer containing tuple */
                         false);    /* don't pfree */

+         econtext->ecxt_scantuple = slot;
+
+         /* check index-only quals if we failed to get index tuple */
+         if (!indextuple_fetched && node->iss_indexonlyqual != NULL &&
+             !ExecQual(node->iss_indexonlyqual, econtext, false))
+             continue;
+
+         /* check additional quals */
+         if (node->ss.ps.qual &&
+             !ExecQual(node->ss.ps.qual, econtext, false))
+         {
+             /* Failed quals */
+             continue;
+         }
+
          /*
           * If the index was lossy, we have to recheck the index quals using
           * the real tuple.
           */
          if (scandesc->xs_recheck)
          {
              ResetExprContext(econtext);
              if (!ExecQual(node->indexqualorig, econtext, false))
                  continue;        /* nope, so ask index for another one */
          }

!         /* A visible tuple was found */
!
!         /*
!          * Found a satisfactory scan tuple.
!          */
!         if (node->ss.ps.ps_ProjInfo)
!         {
!             /*
!              * Form a projection tuple, store it in the result tuple slot
!              * and return it --- unless we find we can project no tuples
!              * from this scan tuple, in which case continue scan.
!              */
!             resultSlot = ExecProject(node->ss.ps.ps_ProjInfo, &isDone);
!             if (isDone != ExprEndResult)
!             {
!                 node->ss.ps.ps_TupFromTlist = (isDone == ExprMultipleResult);
!                 return resultSlot;
!             }
!         }
!         else
!         {
!             /*
!              * Here, we aren't projecting, so just return scan tuple.
!              */
!             return slot;
!         }
      }

      /*
       * if we get here it means the index scan failed so we are at the end of
       * the scan..
       */

      /*
!      * if the slot returned by the accessMtd contains NULL, then it means
!      * there is nothing more to scan so we just return an empty slot,
!      * being careful to use the projection result slot so it has correct
!      * tupleDesc.
       */
!     if (TupIsNull(slot))
!     {
!         if (node->ss.ps.ps_ProjInfo)
!             return ExecClearTuple(node->ss.ps.ps_ProjInfo->pi_slot);
!         else
!             return slot;
!     }

!     return ExecClearTuple(slot);
  }

  /* ----------------------------------------------------------------
***************
*** 431,436 **** ExecEndIndexScan(IndexScanState *node)
--- 514,520 ----
       */
      ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
      ExecClearTuple(node->ss.ss_ScanTupleSlot);
+     ExecClearTuple(node->iss_IndexTupleSlot);

      /*
       * close the index relation (no-op if we didn't open it)
***************
*** 519,531 **** ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
          ExecInitExpr((Expr *) node->indexqualorig,
                       (PlanState *) indexstate);

! #define INDEXSCAN_NSLOTS 2

      /*
       * tuple table initialization
       */
      ExecInitResultTupleSlot(estate, &indexstate->ss.ps);
      ExecInitScanTupleSlot(estate, &indexstate->ss);

      /*
       * open the base relation and acquire appropriate lock on it.
--- 603,624 ----
          ExecInitExpr((Expr *) node->indexqualorig,
                       (PlanState *) indexstate);

!     indexstate->iss_indexonlyqual = (List *)
!         ExecInitExpr((Expr *) node->indexonlyqual,
!                      (PlanState *) indexstate);
!
!     indexstate->iss_indexonlyqualorig = (List *)
!         ExecInitExpr((Expr *) node->indexonlyqualorig,
!                      (PlanState *) indexstate);
!
! #define INDEXSCAN_NSLOTS 3

      /*
       * tuple table initialization
       */
      ExecInitResultTupleSlot(estate, &indexstate->ss.ps);
      ExecInitScanTupleSlot(estate, &indexstate->ss);
+     indexstate->iss_IndexTupleSlot = ExecAllocTableSlot(estate->es_tupleTable);

      /*
       * open the base relation and acquire appropriate lock on it.
***************
*** 566,571 **** ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
--- 659,671 ----
                                       relistarget ? NoLock : AccessShareLock);

      /*
+      * Initialize projection info for returning tuples straight from the index
+      */
+     if (node->indexonlyqual != NIL)
+         ExecSetSlotDescriptor(indexstate->iss_IndexTupleSlot,
+                               RelationGetDescr(indexstate->iss_RelationDesc));
+
+     /*
       * Initialize index-specific scan state
       */
      indexstate->iss_RuntimeKeysReady = false;
***************
*** 611,616 **** ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
--- 711,718 ----
                                                 estate->es_snapshot,
                                                 indexstate->iss_NumScanKeys,
                                                 indexstate->iss_ScanKeys);
+     if (node->indexonlyqual != NIL)
+         indexstate->iss_ScanDesc->needIndexTuple = true;

      /*
       * all done.
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
***************
*** 313,318 **** _copyIndexScan(IndexScan *from)
--- 313,320 ----
      COPY_NODE_FIELD(indexqual);
      COPY_NODE_FIELD(indexqualorig);
      COPY_SCALAR_FIELD(indexorderdir);
+     COPY_NODE_FIELD(indexonlyqual);
+     COPY_NODE_FIELD(indexonlyqualorig);

      return newnode;
  }
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
***************
*** 399,404 **** _outIndexScan(StringInfo str, IndexScan *node)
--- 399,406 ----
      WRITE_NODE_FIELD(indexqual);
      WRITE_NODE_FIELD(indexqualorig);
      WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+     WRITE_NODE_FIELD(indexonlyqual);
+     WRITE_NODE_FIELD(indexonlyqualorig);
  }

  static void
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
***************
*** 1,3 ****
--- 1,4 ----
+
  /*-------------------------------------------------------------------------
   *
   * costsize.c
***************
*** 192,197 **** cost_seqscan(Path *path, PlannerInfo *root,
--- 193,200 ----
   *
   * 'index' is the index to be used
   * 'indexQuals' is the list of applicable qual clauses (implicit AND semantics)
+  * 'indexOnlyQuals' is the list of qual clauses that are applied to tuples
+  *        fetched from index, before fetching the heap tuple
   * 'outer_rel' is the outer relation when we are considering using the index
   *        scan as the inside of a nestloop join (hence, some of the indexQuals
   *        are join clauses, and we should expect repeated scans of the index);
***************
*** 213,218 **** void
--- 216,222 ----
  cost_index(IndexPath *path, PlannerInfo *root,
             IndexOptInfo *index,
             List *indexQuals,
+            List *indexOnlyQuals,
             RelOptInfo *outer_rel)
  {
      RelOptInfo *baserel = index->rel;
***************
*** 221,226 **** cost_index(IndexPath *path, PlannerInfo *root,
--- 225,231 ----
      Cost        indexStartupCost;
      Cost        indexTotalCost;
      Selectivity indexSelectivity;
+     Selectivity indexOnlyQualSelectivity;
      double        indexCorrelation,
                  csquared;
      Cost        min_IO_cost,
***************
*** 254,259 **** cost_index(IndexPath *path, PlannerInfo *root,
--- 259,267 ----
                       PointerGetDatum(&indexSelectivity),
                       PointerGetDatum(&indexCorrelation));

+     indexOnlyQualSelectivity =
+         clauselist_selectivity(root, indexOnlyQuals, 0, JOIN_INNER, NULL);
+
      /*
       * Save amcostestimate's results for possible use in bitmap scan planning.
       * We don't bother to save indexStartupCost or indexCorrelation, because a
***************
*** 267,273 **** cost_index(IndexPath *path, PlannerInfo *root,
      run_cost += indexTotalCost - indexStartupCost;

      /* estimate number of main-table tuples fetched */
!     tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);

      /*----------
       * Estimate number of main-table pages fetched, and compute I/O cost.
--- 275,282 ----
      run_cost += indexTotalCost - indexStartupCost;

      /* estimate number of main-table tuples fetched */
!     tuples_fetched = clamp_row_est(
!         indexOnlyQualSelectivity * indexSelectivity * baserel->tuples);

      /*----------
       * Estimate number of main-table pages fetched, and compute I/O cost.
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
***************
*** 27,32 ****
--- 27,33 ----
  #include "optimizer/cost.h"
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
+ #include "optimizer/planmain.h"
  #include "optimizer/predtest.h"
  #include "optimizer/restrictinfo.h"
  #include "optimizer/var.h"
***************
*** 290,295 **** find_usable_indexes(PlannerInfo *root, RelOptInfo *rel,
--- 291,297 ----
          IndexOptInfo *index = (IndexOptInfo *) lfirst(ilist);
          IndexPath  *ipath;
          List       *restrictclauses;
+         List       *indexonlyQuals;
          List       *index_pathkeys;
          List       *useful_pathkeys;
          bool        useful_predicate;
***************
*** 397,411 **** find_usable_indexes(PlannerInfo *root, RelOptInfo *rel,
              useful_pathkeys = NIL;

          /*
           * 3. Generate an indexscan path if there are relevant restriction
           * clauses in the current clauses, OR the index ordering is
           * potentially useful for later merging or final output ordering, OR
           * the index has a predicate that was proven by the current clauses.
           */
!         if (found_clause || useful_pathkeys != NIL || useful_predicate)
          {
              ipath = create_index_path(root, index,
                                        restrictclauses,
                                        useful_pathkeys,
                                        index_is_ordered ?
                                        ForwardScanDirection :
--- 399,451 ----
              useful_pathkeys = NIL;

          /*
+          * X. The index might be useful if data from it can be used to
+          * evaluate any restrictions not used as index keys.
+          */
+         {
+             ListCell *l, *l2;
+
+             indexonlyQuals = NIL;
+             foreach(l, rel->baserestrictinfo)
+             {
+                 RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+                 Expr *indexonlyQual;
+                 bool indexed = false;
+
+                 /*
+                  * Only consider restrictions that are not handled by the
+                  * index
+                  */
+                 foreach(l2, restrictclauses)
+                 {
+                     if (list_member((List *) lfirst(l2), rinfo))
+                     {
+                         indexed = true;
+                         break;
+                     }
+                 }
+                 if (!indexed)
+                 {
+                     indexonlyQual = make_indexonly_expr(root, index, rinfo->clause,
+                                                         NULL);
+                     if (indexonlyQual != NULL)
+                         indexonlyQuals = lappend(indexonlyQuals, rinfo->clause);
+                 }
+             }
+         }
+
+         /*
           * 3. Generate an indexscan path if there are relevant restriction
           * clauses in the current clauses, OR the index ordering is
           * potentially useful for later merging or final output ordering, OR
           * the index has a predicate that was proven by the current clauses.
           */
!         if (found_clause || useful_pathkeys != NIL || useful_predicate ||
!             indexonlyQuals != NIL)
          {
              ipath = create_index_path(root, index,
                                        restrictclauses,
+                                       indexonlyQuals,
                                        useful_pathkeys,
                                        index_is_ordered ?
                                        ForwardScanDirection :
***************
*** 429,434 **** find_usable_indexes(PlannerInfo *root, RelOptInfo *rel,
--- 469,475 ----
              {
                  ipath = create_index_path(root, index,
                                            restrictclauses,
+                                           indexonlyQuals,
                                            useful_pathkeys,
                                            BackwardScanDirection,
                                            outer_rel);
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
***************
*** 871,885 **** create_indexscan_plan(PlannerInfo *root,
--- 871,902 ----
      Index        baserelid = best_path->path.parent->relid;
      Oid            indexoid = best_path->indexinfo->indexoid;
      List       *qpqual;
+     List       *indexonlyqual;
      List       *stripped_indexquals;
+     List       *indexonlyqualorig;
      List       *fixed_indexquals;
      ListCell   *l;
      IndexScan  *scan_plan;
+     RelOptInfo *reloptinfo;
+     IndexOptInfo *indexoptinfo;

      /* it should be a base rel... */
      Assert(baserelid > 0);
      Assert(best_path->path.parent->rtekind == RTE_RELATION);

+     /* Find the index that was used */
+     reloptinfo =  root->simple_rel_array[baserelid];
+     foreach(l, reloptinfo->indexlist)
+     {
+         indexoptinfo = (IndexOptInfo *) lfirst(l);
+         if (indexoptinfo->indexoid == indexoid)
+             break;
+         else
+             indexoptinfo = NULL;
+     }
+     if (indexoptinfo == NULL)
+         elog(ERROR, "can't find IndexOptInfo for index scan");
+
      /*
       * Build "stripped" indexquals structure (no RestrictInfos) to pass to
       * executor as indexqualorig
***************
*** 928,936 **** create_indexscan_plan(PlannerInfo *root,
--- 945,957 ----
       * plan so that they'll be properly rechecked by EvalPlanQual testing.
       */
      qpqual = NIL;
+     indexonlyqual = NIL;
+     indexonlyqualorig = NIL;
      foreach(l, scan_clauses)
      {
          RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+         Expr *indexonlyclause;
+         bool is_indexonly;

          Assert(IsA(rinfo, RestrictInfo));
          if (rinfo->pseudoconstant)
***************
*** 952,958 **** create_indexscan_plan(PlannerInfo *root,
                          continue;
              }
          }
!         qpqual = lappend(qpqual, rinfo);
      }

      /* Sort clauses into best execution order */
--- 973,989 ----
                          continue;
              }
          }
!
!         indexonlyclause = make_indexonly_expr(root, indexoptinfo,
!                                               (Expr *) rinfo->clause,
!                                               &is_indexonly);
!         if (is_indexonly)
!         {
!             indexonlyqual = lappend(indexonlyqual, indexonlyclause);
!             indexonlyqualorig = lappend(indexonlyqualorig, rinfo->clause);
!         }
!         else
!             qpqual = lappend(qpqual, rinfo);
      }

      /* Sort clauses into best execution order */
***************
*** 969,974 **** create_indexscan_plan(PlannerInfo *root,
--- 1000,1007 ----
                                 fixed_indexquals,
                                 stripped_indexquals,
                                 best_path->indexscandir);
+     scan_plan->indexonlyqual = indexonlyqual;
+     scan_plan->indexonlyqualorig = indexonlyqualorig;

      copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
      /* use the indexscan-specific rows estimate, not the parent rel's */
*** a/src/backend/optimizer/plan/planagg.c
--- b/src/backend/optimizer/plan/planagg.c
***************
*** 402,407 **** build_minmax_path(PlannerInfo *root, RelOptInfo *rel, MinMaxAggInfo *info)
--- 402,408 ----
          new_path = create_index_path(root, index,
                                       restrictclauses,
                                       NIL,
+                                      NIL,
                                       indexscandir,
                                       NULL);

*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
***************
*** 16,21 ****
--- 16,23 ----
  #include "postgres.h"

  #include "access/transam.h"
+ #include "catalog/pg_am.h"
+ #include "catalog/pg_opfamily.h"
  #include "catalog/pg_type.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
***************
*** 274,279 **** set_plan_refs(PlannerGlobal *glob, Plan *plan, int rtoffset)
--- 276,285 ----
                      fix_scan_list(glob, splan->indexqual, rtoffset);
                  splan->indexqualorig =
                      fix_scan_list(glob, splan->indexqualorig, rtoffset);
+                 splan->indexonlyqual =
+                     fix_scan_list(glob, splan->indexonlyqual, rtoffset);
+                 splan->indexonlyqualorig =
+                     fix_scan_list(glob, splan->indexonlyqualorig, rtoffset);
              }
              break;
          case T_BitmapIndexScan:
***************
*** 516,521 **** set_plan_refs(PlannerGlobal *glob, Plan *plan, int rtoffset)
--- 522,675 ----
  }

  /*
+  * replace_heapvars_with_indexvars
+  *
+  * Replaces all Vars in an expression with Vars that refer to columns
+  * of an index. The resulting expression can be used in the executor to
+  * evaluate an expression using an IndexTuple from an an index, instead
+  * of a heap tuple.
+  *
+  * Returns NULL if the expression can't be evaluated using columns from
+  * the index only.
+  */
+ typedef struct
+ {
+     IndexOptInfo *index; /* index we're dealing with */
+     bool indexonly;
+ } replace_heapvars_context;
+
+ static Node *
+ replace_heapvars_with_indexvars(Node *node, replace_heapvars_context *context)
+ {
+     /*
+      * We represent index keys by Var nodes having the varno of the base table
+      * but varattno equal to the index's attribute number (index column
+      * position).  This is a bit hokey ... would be cleaner to use a
+      * special-purpose node type that could not be mistaken for a regular Var.
+      * But it will do for now.
+      */
+     Var           *result;
+     int            pos;
+     IndexOptInfo *index = context->index;
+     ListCell   *indexpr_item;
+
+     if (node == NULL || !context->indexonly)
+         return NULL;
+
+     if (IsA(node, Var))
+     {
+         Var *var = (Var *) node;
+
+         if (var->varno == index->rel->relid)
+         {
+             /* Try to match against simple index columns */
+             int            varatt = ((Var *) node)->varattno;
+
+             if (varatt != 0)
+             {
+                 for (pos = 0; pos < index->ncolumns; pos++)
+                 {
+                     if (index->indexkeys[pos] == varatt)
+                     {
+                         result = (Var *) copyObject(node);
+                         result->varattno = pos + 1;
+
+                         /*
+                          * XXX: The hack in name b-tree opclass to store the
+                          * keys as cstrings instead of names confuses
+                          * executor.
+                          */
+                         if(index->opfamily[pos] == NAME_BTREE_FAM_OID)
+                             result->vartype = CSTRINGOID;
+
+                         return (Node *) result;
+                     }
+                 }
+                 /* Not in index. Can't do an index-only scan */
+                 context->indexonly = false;
+                 return NULL;
+             }
+             else
+             {
+                 /* TODO: we don't handle full-row references yet. */
+                 context->indexonly = false;
+                 return NULL;
+             }
+         }
+         else
+         {
+             /* XXX: Can this happen? */
+             context->indexonly = false;
+             return NULL;
+         }
+     }
+
+     /* Try to match against index expressions */
+     indexpr_item = list_head(index->indexprs);
+     for (pos = 0; pos < index->ncolumns; pos++)
+     {
+         if (index->indexkeys[pos] == 0)
+         {
+             Node       *indexkey;
+
+             if (indexpr_item == NULL)
+                 elog(ERROR, "too few entries in indexprs list");
+             indexkey = (Node *) lfirst(indexpr_item);
+             if (indexkey && IsA(indexkey, RelabelType))
+                 indexkey = (Node *) ((RelabelType *) indexkey)->arg;
+             if (equal(node, indexkey))
+             {
+                 /* Found a match */
+                 result = makeVar(index->rel->relid, pos + 1,
+                                  exprType(lfirst(indexpr_item)), -1,
+                                  0);
+                 return (Node *) result;
+             }
+             indexpr_item = lnext(indexpr_item);
+         }
+     }
+
+     /* Not found */
+     return expression_tree_mutator(node, replace_heapvars_with_indexvars,
+                                    context);
+ }
+
+ /*
+  * Given an expression, return an Expr tree that evaluates the expression
+  * using columns from an index. On success, *indexonly is set to TRUE.
+  * If the expression can't be evaluated using only index columns, *indexonly
+  * is set to FALSE and NULL is returned.
+  */
+ Expr *
+ make_indexonly_expr(PlannerInfo *root, IndexOptInfo *index,
+                     Expr *expr, bool *indexonly)
+ {
+     replace_heapvars_context context;
+
+     /*
+      * XXX: Right now we hard-code that only B-trees can return index tuples.
+      */
+     if (index->relam != BTREE_AM_OID)
+     {
+         if (indexonly)
+             *indexonly = false;
+         return NULL;
+     }
+
+     context.index = index;
+     context.indexonly = true;
+
+     expr = (Expr *) replace_heapvars_with_indexvars((Node *) expr, &context);
+
+     if (indexonly)
+         *indexonly = context.indexonly;
+     if (!context.indexonly)
+         return NULL;
+     else
+         return expr;
+ }
+
+ /*
   * set_subqueryscan_references
   *        Do set_plan_references processing on a SubqueryScan
   *
***************
*** 925,936 **** set_inner_join_references(PlannerGlobal *glob, Plan *inner_plan,
           */
          IndexScan  *innerscan = (IndexScan *) inner_plan;
          List       *indexqualorig = innerscan->indexqualorig;

          /* No work needed if indexqual refers only to its own rel... */
          if (NumRelids((Node *) indexqualorig) > 1)
          {
-             Index        innerrel = innerscan->scan.scanrelid;
-
              /* only refs to outer vars get changed in the inner qual */
              innerscan->indexqualorig = fix_join_expr(glob,
                                                       indexqualorig,
--- 1079,1090 ----
           */
          IndexScan  *innerscan = (IndexScan *) inner_plan;
          List       *indexqualorig = innerscan->indexqualorig;
+         List       *indexonlyqualorig = innerscan->indexonlyqualorig;
+         Index        innerrel = innerscan->scan.scanrelid;

          /* No work needed if indexqual refers only to its own rel... */
          if (NumRelids((Node *) indexqualorig) > 1)
          {
              /* only refs to outer vars get changed in the inner qual */
              innerscan->indexqualorig = fix_join_expr(glob,
                                                       indexqualorig,
***************
*** 944,963 **** set_inner_join_references(PlannerGlobal *glob, Plan *inner_plan,
                                                   NULL,
                                                   innerrel,
                                                   0);

!             /*
!              * We must fix the inner qpqual too, if it has join clauses (this
!              * could happen if special operators are involved: some indexquals
!              * may get rechecked as qpquals).
!              */
!             if (NumRelids((Node *) inner_plan->qual) > 1)
!                 inner_plan->qual = fix_join_expr(glob,
!                                                  inner_plan->qual,
                                                   outer_itlist,
                                                   NULL,
                                                   innerrel,
                                                   0);
          }
      }
      else if (IsA(inner_plan, BitmapIndexScan))
      {
--- 1098,1135 ----
                                                   NULL,
                                                   innerrel,
                                                   0);
+         }

!         /* No work needed if indexqual refers only to its own rel... */
!         if (NumRelids((Node *) indexonlyqualorig) > 1)
!         {
!             /* only refs to outer vars get changed in the inner qual */
!             innerscan->indexonlyqualorig = fix_join_expr(glob,
!                                                      indexonlyqualorig,
!                                                      outer_itlist,
!                                                      NULL,
!                                                      innerrel,
!                                                      0);
!             innerscan->indexonlyqual = fix_join_expr(glob,
!                                                  innerscan->indexonlyqual,
                                                   outer_itlist,
                                                   NULL,
                                                   innerrel,
                                                   0);
          }
+
+         /*
+          * We must fix the inner qpqual too, if it has join clauses (this
+          * could happen if special operators are involved: some indexquals
+          * may get rechecked as qpquals).
+          */
+         if (NumRelids((Node *) inner_plan->qual) > 1)
+             inner_plan->qual = fix_join_expr(glob,
+                                              inner_plan->qual,
+                                              outer_itlist,
+                                              NULL,
+                                              innerrel,
+                                              0);
      }
      else if (IsA(inner_plan, BitmapIndexScan))
      {
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
***************
*** 414,419 **** create_seqscan_path(PlannerInfo *root, RelOptInfo *rel)
--- 414,421 ----
   * 'index' is a usable index.
   * 'clause_groups' is a list of lists of RestrictInfo nodes
   *            to be used as index qual conditions in the scan.
+  * 'filter_quals' is a list of expressions that can be evaluated using data
+  *            from index.
   * 'pathkeys' describes the ordering of the path.
   * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
   *            for an ordered index, or NoMovementScanDirection for
***************
*** 427,432 **** IndexPath *
--- 429,435 ----
  create_index_path(PlannerInfo *root,
                    IndexOptInfo *index,
                    List *clause_groups,
+                   List *indexonlyquals,
                    List *pathkeys,
                    ScanDirection indexscandir,
                    RelOptInfo *outer_rel)
***************
*** 504,510 **** create_index_path(PlannerInfo *root,
          pathnode->rows = rel->rows;
      }

!     cost_index(pathnode, root, index, indexquals, outer_rel);

      return pathnode;
  }
--- 507,513 ----
          pathnode->rows = rel->rows;
      }

!     cost_index(pathnode, root, index, indexquals, indexonlyquals, outer_rel);

      return pathnode;
  }
*** a/src/include/access/nbtree.h
--- b/src/include/access/nbtree.h
***************
*** 556,561 **** extern int32 _bt_compare(Relation rel, int keysz, ScanKey scankey,
--- 556,562 ----
  extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
  extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
  extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost);
+ extern IndexTuple _bt_getindextuple(IndexScanDesc scan);

  /*
   * prototypes for functions in nbtutils.c
*** a/src/include/access/relscan.h
--- b/src/include/access/relscan.h
***************
*** 16,21 ****
--- 16,22 ----

  #include "access/genam.h"
  #include "access/heapam.h"
+ #include "access/itup.h"


  typedef struct HeapScanDescData
***************
*** 64,69 **** typedef struct IndexScanDescData
--- 65,71 ----
      Snapshot    xs_snapshot;    /* snapshot to see */
      int            numberOfKeys;    /* number of scan keys */
      ScanKey        keyData;        /* array of scan key descriptors */
+     bool        needIndexTuple;    /* should scan store index tuple in xs_itup? */

      /* signaling to index AM about killing index tuples */
      bool        kill_prior_tuple;        /* last-returned tuple is dead */
***************
*** 77,82 **** typedef struct IndexScanDescData
--- 79,86 ----
      Buffer        xs_cbuf;        /* current heap buffer in scan, if any */
      /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
      bool        xs_recheck;        /* T means scan keys must be rechecked */
+
+     IndexTuple    xs_itup;        /* current index tuple, if any */
  } IndexScanDescData;

  /* Struct for heap-or-index scans of system tables */
*** a/src/include/executor/tuptable.h
--- b/src/include/executor/tuptable.h
***************
*** 15,20 ****
--- 15,21 ----
  #define TUPTABLE_H

  #include "access/htup.h"
+ #include "access/itup.h"
  #include "access/tupdesc.h"
  #include "storage/buf.h"

***************
*** 165,170 **** extern TupleTableSlot *ExecStoreTuple(HeapTuple tuple,
--- 166,174 ----
  extern TupleTableSlot *ExecStoreMinimalTuple(MinimalTuple mtup,
                        TupleTableSlot *slot,
                        bool shouldFree);
+ extern TupleTableSlot *ExecStoreIndexTuple(IndexTuple itup,
+                       TupleTableSlot *slot,
+                       bool shouldFree);
  extern TupleTableSlot *ExecClearTuple(TupleTableSlot *slot);
  extern TupleTableSlot *ExecStoreVirtualTuple(TupleTableSlot *slot);
  extern TupleTableSlot *ExecStoreAllNullTuple(TupleTableSlot *slot);
*** a/src/include/nodes/execnodes.h
--- b/src/include/nodes/execnodes.h
***************
*** 1124,1129 **** typedef struct IndexScanState
--- 1124,1133 ----
      ExprContext *iss_RuntimeContext;
      Relation    iss_RelationDesc;
      IndexScanDesc iss_ScanDesc;
+
+     TupleTableSlot *iss_IndexTupleSlot;    /* slot for storing index tuple */
+     List           *iss_indexonlyqual;
+     List           *iss_indexonlyqualorig;
  } IndexScanState;

  /* ----------------
*** a/src/include/nodes/plannodes.h
--- b/src/include/nodes/plannodes.h
***************
*** 263,269 **** typedef Scan SeqScan;
   * position, not the base table's column, even though varno is for the base
   * table).    This is a bit hokey ... would be cleaner to use a special-purpose
   * node type that could not be mistaken for a regular Var.    But it will do
!  * for now.
   * ----------------
   */
  typedef struct IndexScan
--- 263,269 ----
   * position, not the base table's column, even though varno is for the base
   * table).    This is a bit hokey ... would be cleaner to use a special-purpose
   * node type that could not be mistaken for a regular Var.    But it will do
!  * for now. (We do the same for indexonlyqual)
   * ----------------
   */
  typedef struct IndexScan
***************
*** 273,278 **** typedef struct IndexScan
--- 273,280 ----
      List       *indexqual;        /* list of index quals (OpExprs) */
      List       *indexqualorig;    /* the same in original form */
      ScanDirection indexorderdir;    /* forward or backward or don't care */
+     List       *indexonlyqual;    /* quals that can be satisfied from index */
+     List       *indexonlyqualorig;    /* same in original form */
  } IndexScan;

  /* ----------------
*** a/src/include/optimizer/cost.h
--- b/src/include/optimizer/cost.h
***************
*** 66,72 **** extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
                      double index_pages, PlannerInfo *root);
  extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel);
  extern void cost_index(IndexPath *path, PlannerInfo *root, IndexOptInfo *index,
!            List *indexQuals, RelOptInfo *outer_rel);
  extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
                        Path *bitmapqual, RelOptInfo *outer_rel);
  extern void cost_bitmap_and_node(BitmapAndPath *path, PlannerInfo *root);
--- 66,72 ----
                      double index_pages, PlannerInfo *root);
  extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel);
  extern void cost_index(IndexPath *path, PlannerInfo *root, IndexOptInfo *index,
!                List *indexQuals, List *indexOnlyQuals, RelOptInfo *outer_rel);
  extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
                        Path *bitmapqual, RelOptInfo *outer_rel);
  extern void cost_bitmap_and_node(BitmapAndPath *path, PlannerInfo *root);
*** a/src/include/optimizer/pathnode.h
--- b/src/include/optimizer/pathnode.h
***************
*** 31,36 **** extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel);
--- 31,37 ----
  extern IndexPath *create_index_path(PlannerInfo *root,
                    IndexOptInfo *index,
                    List *clause_groups,
+                   List *indexonlyquals,
                    List *pathkeys,
                    ScanDirection indexscandir,
                    RelOptInfo *outer_rel);
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
***************
*** 119,122 **** extern void extract_query_dependencies(List *queries,
--- 119,125 ----
                             List **relationOids,
                             List **invalItems);

+ extern Expr *make_indexonly_expr(PlannerInfo *root, IndexOptInfo *index,
+                                  Expr *expr, bool *indexonly);
+
  #endif   /* PLANMAIN_H */

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Index-only-scans, indexam API changes
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Upgrading our minimum required flex version for 8.5