Re: Should HashSetOp go away

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Should HashSetOp go away
Дата
Msg-id 2219168.1761509780@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Should HashSetOp go away  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> I noticed some changes in this code v18, so wanted to revisit the issue.
>> Under commit 27627929528e, it looks like it got 25% more memory efficient,
>> but it thinks it got 40% more efficient, so the memory use got better but
>> the estimation actually got worse.

> Hmm, so why not fix that estimation?

So I poked at this a little bit, and found a few factors affecting it:

* Tuple hash tables are typically given license to use twice work_mem;
see hash_mem_multiplier.  Not sure if you were accounting for that
in your test.

* create_setop_path's required-space estimate of entrysize * numGroups
was rather lame before commit 5dfc19814, and it's even more so
afterwards.  It's basically only accounting for the tuples themselves,
and not either the hashtable overhead or the SetOpStatePerGroupData
counter space.  With wide tuples that might disappear into the noise,
but with only 16-ish data bytes per tuple it's all about the overhead.
On my machine this example uses 80 bytes per tuple in the "SetOp hash
table" context and another 16 or more in the simplehash hashtable.
So about triple what the planner thought.

* We can buy some of this back nearly for free, by switching the
SetOp hash table context to be a BumpContext not an AllocSetContext.
That doesn't move the needle too much in this example with
MEMORY_CONTEXT_CHECKING on, but with it off, the SetOp hash table
consumption drops to 48 bytes/tuple.  (Also, for other tuple widths,
AllocSetContext's round-up-to-power-of-2 behavior could hurt a lot
more than it does here.)

* To do better, we probably need to take this computation out of the
planner and have execGrouping.c expose a function to estimate the
TupleHashTable size for N entries and such-and-such average data
width.  That in turn will require simplehash.h to expose a function
for estimating the size of its tables, because AFAICS its callers are
not supposed to know such details.

Attached is a quick-hack patch to use a BumpContext for this purpose
in nodeSetOp.c.  We should probably look at whether other users of
BuildTupleHashTable can do similarly.  I haven't thought hard about
what better-factorized space estimation would look like.

            regards, tom lane

diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4068481a523..ce42ea6a549 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -589,13 +589,15 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
     /*
      * If hashing, we also need a longer-lived context to store the hash
      * table.  The table can't just be kept in the per-query context because
-     * we want to be able to throw it away in ExecReScanSetOp.
+     * we want to be able to throw it away in ExecReScanSetOp.  We can use a
+     * BumpContext to save storage, because we will have no need to delete
+     * individual table entries.
      */
     if (node->strategy == SETOP_HASHED)
         setopstate->tableContext =
-            AllocSetContextCreate(CurrentMemoryContext,
-                                  "SetOp hash table",
-                                  ALLOCSET_DEFAULT_SIZES);
+            BumpContextCreate(CurrentMemoryContext,
+                              "SetOp hash table",
+                              ALLOCSET_DEFAULT_SIZES);

     /*
      * initialize child nodes

В списке pgsql-hackers по дате отправления: