Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)
| От | Tom Lane |
|---|---|
| Тема | Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off) |
| Дата | |
| Msg-id | 17170.1133221780@sss.pgh.pa.us обсуждение исходный текст |
| Ответ на | Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off) (Tom Lane <tgl@sss.pgh.pa.us>) |
| Список | pgsql-hackers |
I wrote:
> For a query like this, where the hash join is being done repeatedly,
> it might be useful for the executor itself to track how often each
> subplan has been seen to be empty.
I implemented a simple form of this, and it made 8.1 faster than 8.0
on the test case I was using. Give it a try ...
regards, tom lane
Index: src/backend/executor/nodeHashjoin.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.75.2.2
diff -c -r1.75.2.2 nodeHashjoin.c
*** src/backend/executor/nodeHashjoin.c 28 Nov 2005 17:14:47 -0000 1.75.2.2
--- src/backend/executor/nodeHashjoin.c 28 Nov 2005 23:41:28 -0000
***************
*** 120,135 **** * since we aren't going to be able to skip the join on the strength * of an empty
innerrelation anyway.) * * The only way to make the check is to try to fetch a tuple from the
* outer plan node. If we succeed, we have to stash it away for later * consumption by
ExecHashJoinOuterGetTuple. */
! if (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost ||
! node->js.jointype == JOIN_LEFT) { node->hj_FirstOuterTupleSlot =
ExecProcNode(outerNode); if (TupIsNull(node->hj_FirstOuterTupleSlot)) return NULL; }
else node->hj_FirstOuterTupleSlot = NULL;
--- 120,147 ---- * since we aren't going to be able to skip the join on the strength * of an empty
innerrelation anyway.) *
+ * If we are rescanning the join, we make use of information gained
+ * on the previous scan: don't bother to try the prefetch if the
+ * previous scan found the outer relation nonempty. This is not
+ * 100% reliable since with new parameters the outer relation might
+ * yield different results, but it's a good heuristic.
+ * * The only way to make the check is to try to fetch a tuple from the * outer plan node.
Ifwe succeed, we have to stash it away for later * consumption by ExecHashJoinOuterGetTuple. */
! if (node->js.jointype == JOIN_LEFT ||
! (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
! !node->hj_OuterNotEmpty)) { node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
if (TupIsNull(node->hj_FirstOuterTupleSlot))
+ {
+ node->hj_OuterNotEmpty = false; return NULL;
+ }
+ else
+ node->hj_OuterNotEmpty = true; } else node->hj_FirstOuterTupleSlot =
NULL;
***************
*** 159,164 ****
--- 171,183 ---- * scanning the outer relation */ hashtable->nbatch_outstart =
hashtable->nbatch;
+
+ /*
+ * Reset OuterNotEmpty for scan. (It's OK if we fetched a tuple
+ * above, because ExecHashJoinOuterGetTuple will immediately
+ * set it again.)
+ */
+ node->hj_OuterNotEmpty = false; } /*
***************
*** 454,459 ****
--- 473,479 ---- hjstate->js.ps.ps_TupFromTlist = false; hjstate->hj_NeedNewOuter = true;
hjstate->hj_MatchedOuter= false;
+ hjstate->hj_OuterNotEmpty = false; return hjstate; }
***************
*** 546,551 ****
--- 566,574 ---- *hashvalue = ExecHashGetHashValue(hashtable, econtext,
hjstate->hj_OuterHashKeys);
+ /* remember outer relation is not empty for possible rescan */
+ hjstate->hj_OuterNotEmpty = true;
+ return slot; }
***************
*** 810,816 **** if (node->hj_HashTable->nbatch == 1 && ((PlanState *) node)->righttree->chgParam
==NULL) {
! /* okay to reuse the hash table; needn't rescan inner, either */ } else {
--- 833,851 ---- if (node->hj_HashTable->nbatch == 1 && ((PlanState *) node)->righttree->chgParam
==NULL) {
! /*
! * okay to reuse the hash table; needn't rescan inner, either.
! *
! * What we do need to do is reset our state about the emptiness
! * of the outer relation, so that the new scan of the outer will
! * update it correctly if it turns out to be empty this time.
! * (There's no harm in clearing it now because ExecHashJoin won't
! * need the info. In the other cases, where the hash table
! * doesn't exist or we are destroying it, we leave this state
! * alone because ExecHashJoin will need it the first time
! * through.)
! */
! node->hj_OuterNotEmpty = false; } else {
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/nodes/execnodes.h,v
retrieving revision 1.139.2.2
diff -c -r1.139.2.2 execnodes.h
*** src/include/nodes/execnodes.h 22 Nov 2005 18:23:28 -0000 1.139.2.2
--- src/include/nodes/execnodes.h 28 Nov 2005 23:41:28 -0000
***************
*** 1101,1106 ****
--- 1101,1107 ---- * hj_FirstOuterTupleSlot first tuple retrieved from outer plan * hj_NeedNewOuter
true if need new outer tuple on next call * hj_MatchedOuter true if found a join match for
currentouter
+ * hj_OuterNotEmpty true if outer relation known not empty * ---------------- */
***************
*** 1125,1130 ****
--- 1126,1132 ---- TupleTableSlot *hj_FirstOuterTupleSlot; bool hj_NeedNewOuter; bool
hj_MatchedOuter;
+ bool hj_OuterNotEmpty; } HashJoinState;
В списке pgsql-hackers по дате отправления: