Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Дата
Msg-id CA+hUKGJUSbkivOxBqdN6dB9KoMxVZ-F8s28=Kr58cNMqKASktA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (James Coleman <jtc331@gmail.com>)
Ответы Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (James Coleman <jtc331@gmail.com>)
Список pgsql-bugs
On Sat, Nov 9, 2019 at 1:23 PM James Coleman <jtc331@gmail.com> wrote:
> On Fri, Nov 8, 2019 at 6:30 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> > On Fri, Nov 08, 2019 at 09:52:16PM +0000, PG Bug reporting form wrote:
> > >ERROR:  invalid DSA memory alloc request size 1375731712

> > >#3  0x000055a7c079bd17 in ExecParallelHashJoinSetUpBatches
> > >(hashtable=hashtable@entry=0x55a7c1db2740, nbatch=nbatch@entry=2097152) at

> > I've briefly looked at this today, and I think the root cause is
> > somewhat similar to what is described in [1] where we simply increase
> > the number of batches in an effort to keep batch contents in work_mem,
> > but ignoring that each batch requires quite a bit of memory. So we end
> > up with a lot of batches where each is small enough to fit into
> > work_mem, but we need much more than work_mem to track the batches.

Yeah.  So even when this is fixed, the query is going to perform
*terribly*, opening and closing millions of files in random order to
stream tuples into, if this is case where there really are tuples to
go to all partitions (and not just a case of extreme skew that our
extreme skew detector fails to detect because it only detects absolute
extreme skew).

> > This seems to be about the same problem, except that instead of
> > forgeting about BufFile, the parallel hash join ignores this:
> >
> >    pstate->batches =
> >      dsa_allocate0(hashtable->area,
> >                    EstimateParallelHashJoinBatch(hashtable) * nbatch);

Yeah, I failed to consider that possibility.  I suppose it could be
avoided with something like this (not tested, I will find a repro for
this on Monday to convince myself that it's right):

@@ -1246,7 +1246,10 @@
ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable)
                                }

                                /* Don't keep growing if it's not
helping or we'd overflow. */
-                               if (extreme_skew_detected ||
hashtable->nbatch >= INT_MAX / 2)
+                               if (extreme_skew_detected ||
+                                       hashtable->nbatch >= INT_MAX / 2 ||
+
!AllocSizeIsValid(EstimateParallelHashJoinBatch(hashtable) *
+
   hashtable->nbatch * 2))
                                        pstate->growth = PHJ_GROWTH_DISABLED;
                                else if (space_exhausted)
                                        pstate->growth =
PHJ_GROWTH_NEED_MORE_BATCHES;

But James's query is still going to be terrible.

Do you know if it's extreme skew (many tuples with the same key, just
a few scattered around in other keys), or simply too much data for
your work_mem setting?



В списке pgsql-bugs по дате отправления:

Предыдущее
От: James Coleman
Дата:
Сообщение: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Следующее
От: James Coleman
Дата:
Сообщение: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash