Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)
Дата
Msg-id 17170.1133221780@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> For a query like this, where the hash join is being done repeatedly,
> it might be useful for the executor itself to track how often each
> subplan has been seen to be empty.

I implemented a simple form of this, and it made 8.1 faster than 8.0
on the test case I was using.  Give it a try ...
        regards, tom lane


Index: src/backend/executor/nodeHashjoin.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.75.2.2
diff -c -r1.75.2.2 nodeHashjoin.c
*** src/backend/executor/nodeHashjoin.c    28 Nov 2005 17:14:47 -0000    1.75.2.2
--- src/backend/executor/nodeHashjoin.c    28 Nov 2005 23:41:28 -0000
***************
*** 120,135 ****          * since we aren't going to be able to skip the join on the strength          * of an empty
innerrelation anyway.)          *          * The only way to make the check is to try to fetch a tuple from the
* outer plan node.  If we succeed, we have to stash it away for later          * consumption by
ExecHashJoinOuterGetTuple.         */
 
!         if (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost ||
!             node->js.jointype == JOIN_LEFT)         {             node->hj_FirstOuterTupleSlot =
ExecProcNode(outerNode);            if (TupIsNull(node->hj_FirstOuterTupleSlot))                 return NULL;         }
       else             node->hj_FirstOuterTupleSlot = NULL;
 
--- 120,147 ----          * since we aren't going to be able to skip the join on the strength          * of an empty
innerrelation anyway.)          *
 
+          * If we are rescanning the join, we make use of information gained
+          * on the previous scan: don't bother to try the prefetch if the
+          * previous scan found the outer relation nonempty.  This is not
+          * 100% reliable since with new parameters the outer relation might
+          * yield different results, but it's a good heuristic.
+          *          * The only way to make the check is to try to fetch a tuple from the          * outer plan node.
Ifwe succeed, we have to stash it away for later          * consumption by ExecHashJoinOuterGetTuple.          */
 
!         if (node->js.jointype == JOIN_LEFT ||
!             (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
!              !node->hj_OuterNotEmpty))         {             node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
          if (TupIsNull(node->hj_FirstOuterTupleSlot))
 
+             {
+                 node->hj_OuterNotEmpty = false;                 return NULL;
+             }
+             else
+                 node->hj_OuterNotEmpty = true;         }         else             node->hj_FirstOuterTupleSlot =
NULL;
***************
*** 159,164 ****
--- 171,183 ----          * scanning the outer relation          */         hashtable->nbatch_outstart =
hashtable->nbatch;
+ 
+         /*
+          * Reset OuterNotEmpty for scan.  (It's OK if we fetched a tuple
+          * above, because ExecHashJoinOuterGetTuple will immediately
+          * set it again.)
+          */
+         node->hj_OuterNotEmpty = false;     }      /*
***************
*** 454,459 ****
--- 473,479 ----     hjstate->js.ps.ps_TupFromTlist = false;     hjstate->hj_NeedNewOuter = true;
hjstate->hj_MatchedOuter= false;
 
+     hjstate->hj_OuterNotEmpty = false;      return hjstate; }
***************
*** 546,551 ****
--- 566,574 ----             *hashvalue = ExecHashGetHashValue(hashtable, econtext,
         hjstate->hj_OuterHashKeys); 
 
+             /* remember outer relation is not empty for possible rescan */
+             hjstate->hj_OuterNotEmpty = true;
+              return slot;         } 
***************
*** 810,816 ****         if (node->hj_HashTable->nbatch == 1 &&             ((PlanState *) node)->righttree->chgParam
==NULL)         {
 
!             /* okay to reuse the hash table; needn't rescan inner, either */         }         else         {
--- 833,851 ----         if (node->hj_HashTable->nbatch == 1 &&             ((PlanState *) node)->righttree->chgParam
==NULL)         {
 
!             /*
!              * okay to reuse the hash table; needn't rescan inner, either.
!              *
!              * What we do need to do is reset our state about the emptiness
!              * of the outer relation, so that the new scan of the outer will
!              * update it correctly if it turns out to be empty this time.
!              * (There's no harm in clearing it now because ExecHashJoin won't
!              * need the info.  In the other cases, where the hash table
!              * doesn't exist or we are destroying it, we leave this state
!              * alone because ExecHashJoin will need it the first time
!              * through.)
!              */
!             node->hj_OuterNotEmpty = false;         }         else         {
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/nodes/execnodes.h,v
retrieving revision 1.139.2.2
diff -c -r1.139.2.2 execnodes.h
*** src/include/nodes/execnodes.h    22 Nov 2005 18:23:28 -0000    1.139.2.2
--- src/include/nodes/execnodes.h    28 Nov 2005 23:41:28 -0000
***************
*** 1101,1106 ****
--- 1101,1107 ----  *        hj_FirstOuterTupleSlot    first tuple retrieved from outer plan  *        hj_NeedNewOuter
         true if need new outer tuple on next call  *        hj_MatchedOuter            true if found a join match for
currentouter
 
+  *        hj_OuterNotEmpty        true if outer relation known not empty  * ----------------  */ 
***************
*** 1125,1130 ****
--- 1126,1132 ----     TupleTableSlot *hj_FirstOuterTupleSlot;     bool        hj_NeedNewOuter;     bool
hj_MatchedOuter;
+     bool        hj_OuterNotEmpty; } HashJoinState;  


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: NVL vs COALESCE
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)