Re: Should HashSetOp go away
| От | Tom Lane |
|---|---|
| Тема | Re: Should HashSetOp go away |
| Дата | |
| Msg-id | 3858818.1761938547@sss.pgh.pa.us обсуждение исходный текст |
| Ответ на | Re: Should HashSetOp go away (Tom Lane <tgl@sss.pgh.pa.us>) |
| Ответы |
Re: Should HashSetOp go away
|
| Список | pgsql-hackers |
I wrote:
> Here's a pair of patches to try to do better. The first one
> is concerned with getting more realistic size estimates for
> TupleHashTables in the planner. The second is some mop-up
> that's been pending for a long time in the same area, namely
> getting rid of "long int" field types in Plan nodes.
Meh ... cfbot found a compiler warning that I'd not seen locally.
v2 attached silences that, and I twiddled a couple of comments.
regards, tom lane
From 51766859aa3f694bc6c26e8569f6ebff61aeac8b Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 31 Oct 2025 15:14:26 -0400
Subject: [PATCH v2 1/2] Improve planner's estimates of tuple hash table sizes.
For several types of plan nodes that use TupleHashTables, the
planner estimated the expected size of the table as basically
numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)).
This is pretty far off, especially for small data widths, because
it doesn't account for the overhead of the simplehash.h hash table
nor for any per-tuple "additional space" the plan node may request.
Jeff Janes noted a case where the estimate was off by about a factor
of three, even though the obvious hazards such as inaccurate estimates
of numEntries or dataWidth didn't apply.
To improve matters, create functions provided by the relevant executor
modules that can estimate the required sizes with reasonable accuracy.
(We're still not accounting for effects like allocator padding, but
this at least gets the first-order effects correct.)
I added functions that can estimate the tuple table sizes for
nodeSetOp and nodeSubplan; these rely on an estimator for
TupleHashTables in general, and that in turn relies on one for
simplehash.h hash tables. That feels like kind of a lot of mechanism,
but if we take any short-cuts we're violating modularity boundaries.
The other places that use TupleHashTables are nodeAgg, which took
pains to get its numbers right already, and nodeRecursiveunion.
I did not try to improve the situation for nodeRecursiveunion because
there's nothing to improve: we are not making an estimate of the hash
table size, and it wouldn't help us to do so because we have no
non-hashed alternative implementation. On top of that, our estimate
of the number of entries to be hashed in that module is so suspect
that we'd likely often choose the wrong implementation if we did have
two ways to do it.
Reported-by: Jeff Janes <jeff.janes@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
---
src/backend/executor/execGrouping.c | 59 ++++++++++++++++++++++++++
src/backend/executor/nodeSetOp.c | 9 ++++
src/backend/executor/nodeSubplan.c | 53 ++++++++++++++++++++++-
src/backend/optimizer/plan/subselect.c | 45 ++++++++++----------
src/backend/optimizer/util/pathnode.c | 12 +++---
src/include/executor/executor.h | 3 ++
src/include/executor/nodeSetOp.h | 2 +
src/include/executor/nodeSubplan.h | 4 ++
src/include/lib/simplehash.h | 47 +++++++++++++++++++-
9 files changed, 203 insertions(+), 31 deletions(-)
diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index b4bdaa3c305..e1a3a813dd9 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
#include "access/parallel.h"
#include "common/hashfn.h"
#include "executor/executor.h"
@@ -302,6 +303,64 @@ ResetTupleHashTable(TupleHashTable hashtable)
MemoryContextReset(hashtable->tuplescxt);
}
+/*
+ * Estimate the amount of space needed for a TupleHashTable with nentries
+ * entries, if the tuples have average data width tupleWidth and the caller
+ * requires additionalsize extra space per entry.
+ *
+ * Return SIZE_MAX if it'd overflow size_t.
+ *
+ * nentries is "double" because this is meant for use by the planner,
+ * which typically works with double rowcount estimates. So we'd need to
+ * clamp to integer somewhere and that might as well be here. We do expect
+ * the value not to be NaN or negative, else the result will be garbage.
+ */
+Size
+EstimateTupleHashTableSpace(double nentries,
+ Size tupleWidth,
+ Size additionalsize)
+{
+ Size sh_space;
+ double tuples_space;
+
+ /* First estimate the space needed for the simplehash table */
+ sh_space = tuplehash_estimate_space(nentries);
+
+ /* Give up if that's already too big */
+ if (sh_space >= SIZE_MAX)
+ return sh_space;
+
+ /*
+ * Compute space needed for hashed tuples with additional data. nentries
+ * must be somewhat sane, so it should be safe to compute this product.
+ *
+ * We assume that the hashed tuples will be kept in a BumpContext so that
+ * there is not additional per-tuple overhead.
+ *
+ * (Note that this is only accurate if MEMORY_CONTEXT_CHECKING is off,
+ * else bump.c will add a MemoryChunk header to each tuple. However, it
+ * seems undesirable for debug builds to make different planning choices
+ * than production builds, so we assume the production behavior always.)
+ */
+ tuples_space = nentries * (MAXALIGN(SizeofMinimalTupleHeader) +
+ MAXALIGN(tupleWidth) +
+ MAXALIGN(additionalsize));
+
+ /*
+ * Check for size_t overflow. This coding is trickier than it may appear,
+ * because on 64-bit machines SIZE_MAX cannot be represented exactly as a
+ * double. We must cast it explicitly to suppress compiler warnings about
+ * an inexact conversion, and we must trust that any double value that
+ * compares strictly less than "(double) SIZE_MAX" will cast to a
+ * representable size_t value.
+ */
+ if (sh_space + tuples_space >= (double) SIZE_MAX)
+ return SIZE_MAX;
+
+ /* We don't bother estimating size of the miscellaneous overhead data */
+ return (Size) (sh_space + tuples_space);
+}
+
/*
* Find or create a hashtable entry for the tuple group containing the
* given tuple. The tuple must be the same type as the hashtable entries.
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 7b223a7ca3a..5aabed18a09 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -111,6 +111,15 @@ build_hash_table(SetOpState *setopstate)
false);
}
+/* Planner support routine to estimate space needed for hash table */
+Size
+EstimateSetOpHashTableSpace(double nentries, Size tupleWidth)
+{
+ return EstimateTupleHashTableSpace(nentries,
+ tupleWidth,
+ sizeof(SetOpStatePerGroupData));
+}
+
/*
* We've completed processing a tuple group. Decide how many copies (if any)
* of its representative row to emit, and store the count into numOutput.
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index 9f6e45bcb0b..96d900a7783 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -525,7 +525,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
node->tab_hash_funcs,
node->tab_collations,
nbuckets,
- 0,
+ 0, /* no additional data */
node->planstate->state->es_query_cxt,
node->tuplesContext,
innerecontext->ecxt_per_tuple_memory,
@@ -554,7 +554,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
node->tab_hash_funcs,
node->tab_collations,
nbuckets,
- 0,
+ 0, /* no additional data */
node->planstate->state->es_query_cxt,
node->tuplesContext,
innerecontext->ecxt_per_tuple_memory,
@@ -636,6 +636,55 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
MemoryContextSwitchTo(oldcontext);
}
+/* Planner support routine to estimate space needed for hash table(s) */
+Size
+EstimateSubplanHashTableSpace(double nentries,
+ Size tupleWidth,
+ bool unknownEqFalse)
+{
+ Size tab1space,
+ tab2space;
+
+ /* Estimate size of main hashtable */
+ tab1space = EstimateTupleHashTableSpace(nentries,
+ tupleWidth,
+ 0 /* no additional data */ );
+
+ /* Give up if that's already too big */
+ if (tab1space >= SIZE_MAX)
+ return tab1space;
+
+ /* Done if we don't need a hashnulls table */
+ if (unknownEqFalse)
+ return tab1space;
+
+ /*
+ * Adjust the rowcount estimate in the same way as above, except that we
+ * don't bother with the special case for a single hash column. (We could
+ * handle that, but it'd be notationally painful for our caller to provide
+ * the column count, and this table adds not that much to the total
+ * estimate anyway.)
+ */
+ nentries /= 16;
+ if (nentries < 1)
+ nentries = 1;
+
+ /*
+ * It might be sane to also reduce the tupleWidth, but on the other hand
+ * we are not accounting for the space taken by the tuples' null bitmaps.
+ * Leave it alone for now.
+ */
+ tab2space = EstimateTupleHashTableSpace(nentries,
+ tupleWidth,
+ 0 /* no additional data */ );
+
+ /* Guard against overflow */
+ if (tab2space >= SIZE_MAX - tab1space)
+ return SIZE_MAX;
+
+ return tab1space + tab2space;
+}
+
/*
* execTuplesUnequal
* Return true if two tuples are definitely unequal in the indicated
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 14192a13236..ff63d20f8d5 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -20,6 +20,7 @@
#include "catalog/pg_operator.h"
#include "catalog/pg_type.h"
#include "executor/executor.h"
+#include "executor/nodeSubplan.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -79,8 +80,8 @@ static Node *convert_testexpr(PlannerInfo *root,
List *subst_nodes);
static Node *convert_testexpr_mutator(Node *node,
convert_testexpr_context *context);
-static bool subplan_is_hashable(Plan *plan);
-static bool subpath_is_hashable(Path *path);
+static bool subplan_is_hashable(Plan *plan, bool unknownEqFalse);
+static bool subpath_is_hashable(Path *path, bool unknownEqFalse);
static bool testexpr_is_hashable(Node *testexpr, List *param_ids);
static bool test_opexpr_is_hashable(OpExpr *testexpr, List *param_ids);
static bool hash_ok_operator(OpExpr *expr);
@@ -283,7 +284,7 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
best_path = final_rel->cheapest_total_path;
/* Now we can check if it'll fit in hash_mem */
- if (subpath_is_hashable(best_path))
+ if (subpath_is_hashable(best_path, true))
{
SubPlan *hashplan;
AlternativeSubPlan *asplan;
@@ -524,7 +525,7 @@ build_subplan(PlannerInfo *root, Plan *plan, Path *path,
*/
if (subLinkType == ANY_SUBLINK &&
splan->parParam == NIL &&
- subplan_is_hashable(plan) &&
+ subplan_is_hashable(plan, unknownEqFalse) &&
testexpr_is_hashable(splan->testexpr, splan->paramIds))
splan->useHashTable = true;
@@ -711,19 +712,19 @@ convert_testexpr_mutator(Node *node,
* is suitable for hashing. We only look at the subquery itself.
*/
static bool
-subplan_is_hashable(Plan *plan)
+subplan_is_hashable(Plan *plan, bool unknownEqFalse)
{
- double subquery_size;
+ Size hashtablesize;
/*
- * The estimated size of the subquery result must fit in hash_mem. (Note:
- * we use heap tuple overhead here even though the tuples will actually be
- * stored as MinimalTuples; this provides some fudge factor for hashtable
- * overhead.)
+ * The estimated size of the hashtable holding the subquery result must
+ * fit in hash_mem. (Note: reject on equality, to ensure that an estimate
+ * of SIZE_MAX disables hashing regardless of the hash_mem limit.)
*/
- subquery_size = plan->plan_rows *
- (MAXALIGN(plan->plan_width) + MAXALIGN(SizeofHeapTupleHeader));
- if (subquery_size > get_hash_memory_limit())
+ hashtablesize = EstimateSubplanHashTableSpace(plan->plan_rows,
+ plan->plan_width,
+ unknownEqFalse);
+ if (hashtablesize >= get_hash_memory_limit())
return false;
return true;
@@ -735,19 +736,19 @@ subplan_is_hashable(Plan *plan)
* Identical to subplan_is_hashable, but work from a Path for the subplan.
*/
static bool
-subpath_is_hashable(Path *path)
+subpath_is_hashable(Path *path, bool unknownEqFalse)
{
- double subquery_size;
+ Size hashtablesize;
/*
- * The estimated size of the subquery result must fit in hash_mem. (Note:
- * we use heap tuple overhead here even though the tuples will actually be
- * stored as MinimalTuples; this provides some fudge factor for hashtable
- * overhead.)
+ * The estimated size of the hashtable holding the subquery result must
+ * fit in hash_mem. (Note: reject on equality, to ensure that an estimate
+ * of SIZE_MAX disables hashing regardless of the hash_mem limit.)
*/
- subquery_size = path->rows *
- (MAXALIGN(path->pathtarget->width) + MAXALIGN(SizeofHeapTupleHeader));
- if (subquery_size > get_hash_memory_limit())
+ hashtablesize = EstimateSubplanHashTableSpace(path->rows,
+ path->pathtarget->width,
+ unknownEqFalse);
+ if (hashtablesize >= get_hash_memory_limit())
return false;
return true;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 44ac5312edd..e4fd6950fad 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -17,6 +17,7 @@
#include <math.h>
#include "access/htup_details.h"
+#include "executor/nodeSetOp.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/extensible.h"
@@ -3461,7 +3462,7 @@ create_setop_path(PlannerInfo *root,
}
else
{
- Size hashentrysize;
+ Size hashtablesize;
/*
* In hashed mode, we must read all the input before we can emit
@@ -3490,11 +3491,12 @@ create_setop_path(PlannerInfo *root,
/*
* Also disable if it doesn't look like the hashtable will fit into
- * hash_mem.
+ * hash_mem. (Note: reject on equality, to ensure that an estimate of
+ * SIZE_MAX disables hashing regardless of the hash_mem limit.)
*/
- hashentrysize = MAXALIGN(leftpath->pathtarget->width) +
- MAXALIGN(SizeofMinimalTupleHeader);
- if (hashentrysize * numGroups > get_hash_memory_limit())
+ hashtablesize = EstimateSetOpHashTableSpace(numGroups,
+ leftpath->pathtarget->width);
+ if (hashtablesize >= get_hash_memory_limit())
pathnode->path.disabled_nodes++;
}
pathnode->path.rows = outputRows;
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 8e7a5453064..086f52cff3d 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -157,6 +157,9 @@ extern TupleHashEntry FindTupleHashEntry(TupleHashTable hashtable,
ExprState *eqcomp,
ExprState *hashexpr);
extern void ResetTupleHashTable(TupleHashTable hashtable);
+extern Size EstimateTupleHashTableSpace(double nentries,
+ Size tupleWidth,
+ Size additionalsize);
#ifndef FRONTEND
/*
diff --git a/src/include/executor/nodeSetOp.h b/src/include/executor/nodeSetOp.h
index 024c6ba1fce..302936df8be 100644
--- a/src/include/executor/nodeSetOp.h
+++ b/src/include/executor/nodeSetOp.h
@@ -20,4 +20,6 @@ extern SetOpState *ExecInitSetOp(SetOp *node, EState *estate, int eflags);
extern void ExecEndSetOp(SetOpState *node);
extern void ExecReScanSetOp(SetOpState *node);
+extern Size EstimateSetOpHashTableSpace(double nentries, Size tupleWidth);
+
#endif /* NODESETOP_H */
diff --git a/src/include/executor/nodeSubplan.h b/src/include/executor/nodeSubplan.h
index a1cafbcc694..301c29d1f24 100644
--- a/src/include/executor/nodeSubplan.h
+++ b/src/include/executor/nodeSubplan.h
@@ -20,6 +20,10 @@ extern SubPlanState *ExecInitSubPlan(SubPlan *subplan, PlanState *parent);
extern Datum ExecSubPlan(SubPlanState *node, ExprContext *econtext, bool *isNull);
+extern Size EstimateSubplanHashTableSpace(double nentries,
+ Size tupleWidth,
+ bool unknownEqFalse);
+
extern void ExecReScanSetParamPlan(SubPlanState *node, PlanState *parent);
extern void ExecSetParamPlan(SubPlanState *node, ExprContext *econtext);
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 9622131ede6..82d66135ddf 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -125,6 +125,7 @@
#define SH_ITERATE SH_MAKE_NAME(iterate)
#define SH_ALLOCATE SH_MAKE_NAME(allocate)
#define SH_FREE SH_MAKE_NAME(free)
+#define SH_ESTIMATE_SPACE SH_MAKE_NAME(estimate_space)
#define SH_STAT SH_MAKE_NAME(stat)
/* internal helper functions (no externally visible prototypes) */
@@ -242,7 +243,10 @@ SH_SCOPE void SH_START_ITERATE_AT(SH_TYPE * tb, SH_ITERATOR * iter, uint32 at);
/* <element> *<prefix>_iterate(<prefix>_hash *tb, <prefix>_iterator *iter) */
SH_SCOPE SH_ELEMENT_TYPE *SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter);
-/* void <prefix>_stat(<prefix>_hash *tb */
+/* size_t <prefix>_estimate_space(double nentries) */
+SH_SCOPE size_t SH_ESTIMATE_SPACE(double nentries);
+
+/* void <prefix>_stat(<prefix>_hash *tb) */
SH_SCOPE void SH_STAT(SH_TYPE * tb);
#endif /* SH_DECLARE */
@@ -305,7 +309,7 @@ SH_SCOPE void SH_STAT(SH_TYPE * tb);
/*
* Compute allocation size for hashtable. Result can be passed to
- * SH_UPDATE_PARAMETERS.
+ * SH_UPDATE_PARAMETERS. (Keep SH_ESTIMATE_SPACE in sync with this!)
*/
static inline uint64
SH_COMPUTE_SIZE(uint64 newsize)
@@ -1068,6 +1072,44 @@ SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter)
return NULL;
}
+/*
+ * Estimate the amount of space needed for a hashtable with nentries entries.
+ * Return SIZE_MAX if that's too many entries.
+ *
+ * nentries is "double" because this is meant for use by the planner,
+ * which typically works with double rowcount estimates. So we'd need to
+ * clamp to integer somewhere and that might as well be here. We do expect
+ * the value not to be NaN or negative, else the result will be garbage.
+ */
+SH_SCOPE size_t
+SH_ESTIMATE_SPACE(double nentries)
+{
+ uint64 size;
+ uint64 space;
+
+ /* fail immediately if we'd overrun SH_MAX_SIZE entries */
+ if (nentries >= SH_MAX_SIZE * SH_FILLFACTOR)
+ return SIZE_MAX;
+
+ /* should be safe to convert to uint64 */
+ size = (uint64) nentries;
+
+ /* supporting zero sized hashes would complicate matters */
+ size = Max(size, 2);
+
+ /* round up size to the next power of 2, that's how bucketing works */
+ size = pg_nextpower2_64(size);
+
+ /* calculate space needed for ->data */
+ space = ((uint64) sizeof(SH_ELEMENT_TYPE)) * size;
+
+ /* verify that allocation of ->data is possible on this platform */
+ if (space >= SIZE_MAX / 2)
+ return SIZE_MAX;
+
+ return (size_t) space + sizeof(SH_TYPE);
+}
+
/*
* Report some statistics about the state of the hashtable. For
* debugging/profiling purposes only.
@@ -1195,6 +1237,7 @@ SH_STAT(SH_TYPE * tb)
#undef SH_ITERATE
#undef SH_ALLOCATE
#undef SH_FREE
+#undef SH_ESTIMATE_SPACE
#undef SH_STAT
/* internal function names */
--
2.43.7
From cc7a0b06d13fd47d6c78a74291b470e9c07e48af Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 31 Oct 2025 15:18:20 -0400
Subject: [PATCH v2 2/2] Change "long" numGroups fields to be Cardinality
(i.e., double).
We've been nibbling away at removing uses of "long" for a long time,
since its width is platform-dependent. Here's one more: change the
remaining "long" fields in Plan nodes to Cardinality, since the three
surviving examples all represent group-count estimates. The upstream
planner code was converted to Cardinality some time ago; for example
the corresponding fields in Path nodes are type Cardinality, as are
the arguments of the make_foo_path functions. Downstream in the
executor, it turns out that these all feed to the table-size argument
of BuildTupleHashTable. Change that to "double" as well, and fix it
so that it safely clamps out-of-range values to the uint32 limit of
simplehash.h, as was not being done before.
Essentially, this is removing all the artificial datatype-dependent
limitations on these values from upstream processing, and applying
just one clamp at the moment where we're forced to do so by the
datatype choices of simplehash.h.
Also, remove BuildTupleHashTable's misguided attempt to enforce
work_mem/hash_mem_limit. It doesn't have enough information
(particularly not the expected tuple width) to do that accurately,
and it has no real business second-guessing the caller's choice.
For all these plan types, it's really the planner's responsibility
to not choose a hashed implementation if the hashtable is expected
to exceed hash_mem_limit. The previous patch improved the
accuracy of those estimates, and even if BuildTupleHashTable had
more information it should arrive at the same conclusions.
Reported-by: Jeff Janes <jeff.janes@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
---
src/backend/executor/execGrouping.c | 36 ++++++++++++-----------
src/backend/executor/nodeAgg.c | 24 +++++++--------
src/backend/executor/nodeRecursiveunion.c | 1 -
src/backend/executor/nodeSetOp.c | 1 -
src/backend/executor/nodeSubplan.c | 19 +++++-------
src/backend/optimizer/path/costsize.c | 26 ----------------
src/backend/optimizer/plan/createplan.c | 26 +++++-----------
src/include/executor/executor.h | 2 +-
src/include/nodes/plannodes.h | 6 ++--
src/include/optimizer/optimizer.h | 1 -
src/include/optimizer/planmain.h | 2 +-
11 files changed, 50 insertions(+), 94 deletions(-)
diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index e1a3a813dd9..8b64a625ca5 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -14,6 +14,8 @@
*/
#include "postgres.h"
+#include <math.h>
+
#include "access/htup_details.h"
#include "access/parallel.h"
#include "common/hashfn.h"
@@ -144,7 +146,7 @@ execTuplesHashPrepare(int numCols,
* eqfuncoids: OIDs of equality comparison functions to use
* hashfunctions: FmgrInfos of datatype-specific hashing functions to use
* collations: collations to use in comparisons
- * nbuckets: initial estimate of hashtable size
+ * nelements: initial estimate of hashtable size
* additionalsize: size of data that may be stored along with the hash entry
* metacxt: memory context for long-lived data and the simplehash table
* tuplescxt: memory context in which to store the hashed tuples themselves
@@ -187,7 +189,7 @@ BuildTupleHashTable(PlanState *parent,
const Oid *eqfuncoids,
FmgrInfo *hashfunctions,
Oid *collations,
- long nbuckets,
+ double nelements,
Size additionalsize,
MemoryContext metacxt,
MemoryContext tuplescxt,
@@ -195,12 +197,24 @@ BuildTupleHashTable(PlanState *parent,
bool use_variable_hash_iv)
{
TupleHashTable hashtable;
- Size entrysize;
- Size hash_mem_limit;
+ uint32 nbuckets;
MemoryContext oldcontext;
uint32 hash_iv = 0;
- Assert(nbuckets > 0);
+ /*
+ * tuplehash_create requires a uint32 element count, so we had better
+ * clamp the given nelements to fit in that. As long as we have to do
+ * that, we might as well protect against completely insane input like
+ * zero or NaN. But it is not our job here to enforce issues like staying
+ * within hash_mem: the caller should have done that, and we don't have
+ * enough info to second-guess.
+ */
+ if (isnan(nelements) || nelements <= 0)
+ nbuckets = 1;
+ else if (nelements >= PG_UINT32_MAX)
+ nbuckets = PG_UINT32_MAX;
+ else
+ nbuckets = (uint32) nelements;
/* tuplescxt must be separate, else ResetTupleHashTable breaks things */
Assert(metacxt != tuplescxt);
@@ -208,18 +222,6 @@ BuildTupleHashTable(PlanState *parent,
/* ensure additionalsize is maxalign'ed */
additionalsize = MAXALIGN(additionalsize);
- /*
- * Limit initial table size request to not more than hash_mem.
- *
- * XXX this calculation seems pretty misguided, as it counts only overhead
- * and not the tuples themselves. But we have no knowledge of the
- * expected tuple width here.
- */
- entrysize = sizeof(TupleHashEntryData) + additionalsize;
- hash_mem_limit = get_hash_memory_limit() / entrysize;
- if (nbuckets > hash_mem_limit)
- nbuckets = hash_mem_limit;
-
oldcontext = MemoryContextSwitchTo(metacxt);
hashtable = (TupleHashTable) palloc(sizeof(TupleHashTableData));
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 759ffeed2e6..058171d5133 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -402,12 +402,12 @@ static void find_cols(AggState *aggstate, Bitmapset **aggregated,
Bitmapset **unaggregated);
static bool find_cols_walker(Node *node, FindColsContext *context);
static void build_hash_tables(AggState *aggstate);
-static void build_hash_table(AggState *aggstate, int setno, long nbuckets);
+static void build_hash_table(AggState *aggstate, int setno, double nbuckets);
static void hashagg_recompile_expressions(AggState *aggstate, bool minslot,
bool nullcheck);
static void hash_create_memory(AggState *aggstate);
-static long hash_choose_num_buckets(double hashentrysize,
- long ngroups, Size memory);
+static double hash_choose_num_buckets(double hashentrysize,
+ double ngroups, Size memory);
static int hash_choose_num_partitions(double input_groups,
double hashentrysize,
int used_bits,
@@ -1469,7 +1469,7 @@ build_hash_tables(AggState *aggstate)
for (setno = 0; setno < aggstate->num_hashes; ++setno)
{
AggStatePerHash perhash = &aggstate->perhash[setno];
- long nbuckets;
+ double nbuckets;
Size memory;
if (perhash->hashtable != NULL)
@@ -1478,8 +1478,6 @@ build_hash_tables(AggState *aggstate)
continue;
}
- Assert(perhash->aggnode->numGroups > 0);
-
memory = aggstate->hash_mem_limit / aggstate->num_hashes;
/* choose reasonable number of buckets per hashtable */
@@ -1505,7 +1503,7 @@ build_hash_tables(AggState *aggstate)
* Build a single hashtable for this grouping set.
*/
static void
-build_hash_table(AggState *aggstate, int setno, long nbuckets)
+build_hash_table(AggState *aggstate, int setno, double nbuckets)
{
AggStatePerHash perhash = &aggstate->perhash[setno];
MemoryContext metacxt = aggstate->hash_metacxt;
@@ -2053,11 +2051,11 @@ hash_create_memory(AggState *aggstate)
/*
* Choose a reasonable number of buckets for the initial hash table size.
*/
-static long
-hash_choose_num_buckets(double hashentrysize, long ngroups, Size memory)
+static double
+hash_choose_num_buckets(double hashentrysize, double ngroups, Size memory)
{
- long max_nbuckets;
- long nbuckets = ngroups;
+ double max_nbuckets;
+ double nbuckets = ngroups;
max_nbuckets = memory / hashentrysize;
@@ -2065,7 +2063,7 @@ hash_choose_num_buckets(double hashentrysize, long ngroups, Size memory)
* Underestimating is better than overestimating. Too many buckets crowd
* out space for group keys and transition state values.
*/
- max_nbuckets >>= 1;
+ max_nbuckets /= 2;
if (nbuckets > max_nbuckets)
nbuckets = max_nbuckets;
@@ -3686,7 +3684,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
if (use_hashing)
{
Plan *outerplan = outerPlan(node);
- uint64 totalGroups = 0;
+ double totalGroups = 0;
aggstate->hash_spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
&TTSOpsMinimalTuple);
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index ebb7919b49b..cd0ad51dcd2 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -35,7 +35,6 @@ build_hash_table(RecursiveUnionState *rustate)
TupleDesc desc = ExecGetResultType(outerPlanState(rustate));
Assert(node->numCols > 0);
- Assert(node->numGroups > 0);
/*
* If both child plans deliver the same fixed tuple slot type, we can tell
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 5aabed18a09..9e0f9274fb1 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -88,7 +88,6 @@ build_hash_table(SetOpState *setopstate)
TupleDesc desc = ExecGetResultType(outerPlanState(setopstate));
Assert(node->strategy == SETOP_HASHED);
- Assert(node->numGroups > 0);
/*
* If both child plans deliver the same fixed tuple slot type, we can tell
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index 96d900a7783..aa9fa482797 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -34,7 +34,6 @@
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
-#include "optimizer/optimizer.h"
#include "utils/array.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -481,7 +480,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
int ncols = node->numCols;
ExprContext *innerecontext = node->innerecontext;
MemoryContext oldcontext;
- long nbuckets;
+ double nentries;
TupleTableSlot *slot;
Assert(subplan->subLinkType == ANY_SUBLINK);
@@ -509,9 +508,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
node->havehashrows = false;
node->havenullrows = false;
- nbuckets = clamp_cardinality_to_long(planstate->plan->plan_rows);
- if (nbuckets < 1)
- nbuckets = 1;
+ nentries = planstate->plan->plan_rows;
if (node->hashtable)
ResetTupleHashTable(node->hashtable);
@@ -524,7 +521,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
node->tab_eq_funcoids,
node->tab_hash_funcs,
node->tab_collations,
- nbuckets,
+ nentries,
0, /* no additional data */
node->planstate->state->es_query_cxt,
node->tuplesContext,
@@ -534,12 +531,12 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
if (!subplan->unknownEqFalse)
{
if (ncols == 1)
- nbuckets = 1; /* there can only be one entry */
+ nentries = 1; /* there can only be one entry */
else
{
- nbuckets /= 16;
- if (nbuckets < 1)
- nbuckets = 1;
+ nentries /= 16;
+ if (nentries < 1)
+ nentries = 1;
}
if (node->hashnulls)
@@ -553,7 +550,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
node->tab_eq_funcoids,
node->tab_hash_funcs,
node->tab_collations,
- nbuckets,
+ nentries,
0, /* no additional data */
node->planstate->state->es_query_cxt,
node->tuplesContext,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 94077e6a006..8335cf5b5c5 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -257,32 +257,6 @@ clamp_width_est(int64 tuple_width)
return (int32) tuple_width;
}
-/*
- * clamp_cardinality_to_long
- * Cast a Cardinality value to a sane long value.
- */
-long
-clamp_cardinality_to_long(Cardinality x)
-{
- /*
- * Just for paranoia's sake, ensure we do something sane with negative or
- * NaN values.
- */
- if (isnan(x))
- return LONG_MAX;
- if (x <= 0)
- return 0;
-
- /*
- * If "long" is 64 bits, then LONG_MAX cannot be represented exactly as a
- * double. Casting it to double and back may well result in overflow due
- * to rounding, so avoid doing that. We trust that any double value that
- * compares strictly less than "(double) LONG_MAX" will cast to a
- * representable "long" value.
- */
- return (x < (double) LONG_MAX) ? (long) x : LONG_MAX;
-}
-
/*
* cost_seqscan
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 63fe6637155..8af091ba647 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -226,7 +226,7 @@ static RecursiveUnion *make_recursive_union(List *tlist,
Plan *righttree,
int wtParam,
List *distinctList,
- long numGroups);
+ Cardinality numGroups);
static BitmapAnd *make_bitmap_and(List *bitmapplans);
static BitmapOr *make_bitmap_or(List *bitmapplans);
static NestLoop *make_nestloop(List *tlist,
@@ -301,7 +301,7 @@ static Gather *make_gather(List *qptlist, List *qpqual,
int nworkers, int rescan_param, bool single_copy, Plan *subplan);
static SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy,
List *tlist, Plan *lefttree, Plan *righttree,
- List *groupList, long numGroups);
+ List *groupList, Cardinality numGroups);
static LockRows *make_lockrows(Plan *lefttree, List *rowMarks, int epqParam);
static Result *make_gating_result(List *tlist, Node *resconstantqual,
Plan *subplan);
@@ -2564,7 +2564,6 @@ create_setop_plan(PlannerInfo *root, SetOpPath *best_path, int flags)
List *tlist = build_path_tlist(root, &best_path->path);
Plan *leftplan;
Plan *rightplan;
- long numGroups;
/*
* SetOp doesn't project, so tlist requirements pass through; moreover we
@@ -2575,16 +2574,13 @@ create_setop_plan(PlannerInfo *root, SetOpPath *best_path, int flags)
rightplan = create_plan_recurse(root, best_path->rightpath,
flags | CP_LABEL_TLIST);
- /* Convert numGroups to long int --- but 'ware overflow! */
- numGroups = clamp_cardinality_to_long(best_path->numGroups);
-
plan = make_setop(best_path->cmd,
best_path->strategy,
tlist,
leftplan,
rightplan,
best_path->groupList,
- numGroups);
+ best_path->numGroups);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -2604,7 +2600,6 @@ create_recursiveunion_plan(PlannerInfo *root, RecursiveUnionPath *best_path)
Plan *leftplan;
Plan *rightplan;
List *tlist;
- long numGroups;
/* Need both children to produce same tlist, so force it */
leftplan = create_plan_recurse(root, best_path->leftpath, CP_EXACT_TLIST);
@@ -2612,15 +2607,12 @@ create_recursiveunion_plan(PlannerInfo *root, RecursiveUnionPath *best_path)
tlist = build_path_tlist(root, &best_path->path);
- /* Convert numGroups to long int --- but 'ware overflow! */
- numGroups = clamp_cardinality_to_long(best_path->numGroups);
-
plan = make_recursive_union(tlist,
leftplan,
rightplan,
best_path->wtParam,
best_path->distinctList,
- numGroups);
+ best_path->numGroups);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -5845,7 +5837,7 @@ make_recursive_union(List *tlist,
Plan *righttree,
int wtParam,
List *distinctList,
- long numGroups)
+ Cardinality numGroups)
{
RecursiveUnion *node = makeNode(RecursiveUnion);
Plan *plan = &node->plan;
@@ -6582,15 +6574,11 @@ Agg *
make_agg(List *tlist, List *qual,
AggStrategy aggstrategy, AggSplit aggsplit,
int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations,
- List *groupingSets, List *chain, double dNumGroups,
+ List *groupingSets, List *chain, Cardinality numGroups,
Size transitionSpace, Plan *lefttree)
{
Agg *node = makeNode(Agg);
Plan *plan = &node->plan;
- long numGroups;
-
- /* Reduce to long, but 'ware overflow! */
- numGroups = clamp_cardinality_to_long(dNumGroups);
node->aggstrategy = aggstrategy;
node->aggsplit = aggsplit;
@@ -6822,7 +6810,7 @@ make_gather(List *qptlist,
static SetOp *
make_setop(SetOpCmd cmd, SetOpStrategy strategy,
List *tlist, Plan *lefttree, Plan *righttree,
- List *groupList, long numGroups)
+ List *groupList, Cardinality numGroups)
{
SetOp *node = makeNode(SetOp);
Plan *plan = &node->plan;
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 086f52cff3d..fa2b657fb2f 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -138,7 +138,7 @@ extern TupleHashTable BuildTupleHashTable(PlanState *parent,
const Oid *eqfuncoids,
FmgrInfo *hashfunctions,
Oid *collations,
- long nbuckets,
+ double nelements,
Size additionalsize,
MemoryContext metacxt,
MemoryContext tuplescxt,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 7cdd2b51c94..c4393a94321 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -475,7 +475,7 @@ typedef struct RecursiveUnion
Oid *dupCollations pg_node_attr(array_size(numCols));
/* estimated number of groups in input */
- long numGroups;
+ Cardinality numGroups;
} RecursiveUnion;
/* ----------------
@@ -1206,7 +1206,7 @@ typedef struct Agg
Oid *grpCollations pg_node_attr(array_size(numCols));
/* estimated number of groups in input */
- long numGroups;
+ Cardinality numGroups;
/* for pass-by-ref transition data */
uint64 transitionSpace;
@@ -1446,7 +1446,7 @@ typedef struct SetOp
bool *cmpNullsFirst pg_node_attr(array_size(numCols));
/* estimated number of groups in left input */
- long numGroups;
+ Cardinality numGroups;
} SetOp;
/* ----------------
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index a34113903c0..d0aa8ab0c1c 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -82,7 +82,6 @@ extern PGDLLIMPORT int effective_cache_size;
extern double clamp_row_est(double nrows);
extern int32 clamp_width_est(int64 tuple_width);
-extern long clamp_cardinality_to_long(Cardinality x);
/* in path/indxpath.c: */
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 09b48b26f8f..00addf15992 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -55,7 +55,7 @@ extern Sort *make_sort_from_sortclauses(List *sortcls, Plan *lefttree);
extern Agg *make_agg(List *tlist, List *qual,
AggStrategy aggstrategy, AggSplit aggsplit,
int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations,
- List *groupingSets, List *chain, double dNumGroups,
+ List *groupingSets, List *chain, Cardinality numGroups,
Size transitionSpace, Plan *lefttree);
extern Limit *make_limit(Plan *lefttree, Node *limitOffset, Node *limitCount,
LimitOption limitOption, int uniqNumCols,
--
2.43.7
В списке pgsql-hackers по дате отправления: