Re: Asymmetric partition-wise JOIN

Поиск
Список
Период
Сортировка
От Amul Sul
Тема Re: Asymmetric partition-wise JOIN
Дата
Msg-id CAAJ_b96=OF43+7k=ow5pFjSDLhGJkJsgNrbz7qchHiO7ont6vw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Asymmetric partition-wise JOIN  (Kohei KaiGai <kaigai@heterodb.com>)
Список pgsql-hackers
On Sat, Aug 24, 2019 at 2:03 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
>
> 2019年8月24日(土) 7:02 Thomas Munro <thomas.munro@gmail.com>:
> >
> > On Fri, Aug 23, 2019 at 4:05 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > We can consider the table join ptable X t1 above is equivalent to:
> > >   (ptable_p0 + ptable_p1 + ptable_p2) X t1
> > > = (ptable_p0 X t1) + (ptable_p1 X t1) + (ptable_p2 X t1)
> > > It returns an equivalent result, however, rows are already reduced by HashJoin
> > > in the individual leaf of Append, so CPU-cycles consumed by Append node can
> > > be cheaper.
> > >
> > > On the other hands, it has a downside because t1 must be read 3 times and
> > > hash table also must be built 3 times. It increases the expected cost,
> > > so planner
> > > may not choose the asymmetric partition-wise join plan.
> >
> > What if you include the partition constraint as a filter on t1?  So you get:
> >
> > ptable X t1 =
> >   (ptable_p0 X (σ hash(dist)%4=0 (t1))) +
> >   (ptable_p1 X (σ hash(dist)%4=1 (t1))) +
> >   (ptable_p2 X (σ hash(dist)%4=2 (t1))) +
> >   (ptable_p3 X (σ hash(dist)%4=3 (t1)))
> >
> > Pros:
> > 1.  The hash tables will not contain unnecessary junk.
> > 2.  You'll get the right answer if t1 is on the outer side of an outer join.
> > 3.  If this runs underneath a Parallel Append and t1 is big enough
> > then workers will hopefully cooperate and do a synchronised scan of
> > t1.
> > 4.  The filter might enable a selective and efficient plan like an index scan.
> >
> > Cons:
> > 1.  The filter might not enable a selective and efficient plan, and
> > therefore cause extra work.
> >
> > (It's a little weird in this example because don't usually see hash
> > functions in WHERE clauses, but that could just as easily be dist
> > BETWEEN 1 AND 99 or any other partition constraint.)
> >
> It requires the join-key must include the partition key and also must be
> equality-join, doesn't it?
> If ptable and t1 are joined using ptable.dist = t1.foo, we can distribute
> t1 for each leaf table with "WHERE hash(foo)%4 = xxx" according to
> the partition bounds, indeed.
>
> In case when some of partition leafs are pruned, it is exactly beneficial
> because relevant rows to be referenced by the pruned child relations
> are waste of memory.
>
> On the other hands, it eventually consumes almost equivalent amount
> of memory to load the inner relations, if no leafs are pruned, and if we
> could extend the Hash-node to share the hash-table with sibling join-nodess.
>
> > > One idea I have is, sibling HashJoin shares a hash table that was built once
> > > by any of the sibling Hash plan. Right now, it is not implemented yet.
> >
> > Yeah, I've thought a little bit about that in the context of Parallel
> > Repartition.  I'm interested in combining intra-node partitioning
> > (where a single plan node repartitions data among workers on the fly)
> > with inter-node partitioning (like PWJ, where partitions are handled
> > by different parts of the plan, considered at planning time); you
> > finish up needing to have nodes in the plan that 'receive' tuples for
> > each partition, to match up with the PWJ plan structure.  That's not
> > entirely unlike CTE references, and not entirely unlike your idea of
> > somehow sharing the same hash table.  I ran into a number of problems
> > while thinking about that, which I should write about in another
> > thread.
> >
> Hmm. Do you intend the inner-path may have different behavior according
> to the partition bounds definition where the outer-path to be joined?
> Let me investigate its pros & cons.
>
> The reasons why I think the idea of sharing the same hash table is reasonable
> in this scenario are:
> 1. We can easily extend the idea for parallel optimization. A hash table on DSM
>     segment, once built, can be shared by all the siblings in all the
> parallel workers.
> 2. We can save the memory consumption regardless of the join-keys and
>     partition-keys, even if these are not involved in the query.
>
> On the other hands, below are the downside. Potentially, combined use of
> your idea may help these cases:
> 3. Distributed inner-relation cannot be outer side of XXX OUTER JOIN.
> 4. Hash table contains rows to be referenced by only pruned partition leafs.
>

+ many, for the sharable hash of the inner table of the join. IMHO,
this could be the most interesting and captivating thing about this feature.
But might be a complicated piece, is that still on the plan?

Regards,
Amul



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: some unused parameters cleanup
Следующее
От: John Naylor
Дата:
Сообщение: Re: factorial function/phase out postfix operators?