Re: Partition-wise join for join between (declaratively) partitioned tables

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Partition-wise join for join between (declaratively) partitioned tables
Дата
Msg-id CA+Tgmob_PVrn6sgaiwMDCdxGJc_-St=bN+X=3Mk89YOp0Qp4Uw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Partition-wise join for join between (declaratively) partitioned tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Mon, Nov 14, 2016 at 9:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Nov 4, 2016 at 6:52 AM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> Costing PartitionJoinPath needs more thought so that we don't end up
>>> with bad overall plans. Here's an idea. Partition-wise joins are
>>> better compared to the unpartitioned ones, because of the smaller
>>> sizes of partitions. If we think of join as O(MN) operation where M
>>> and N are sizes of unpartitioned tables being joined, partition-wise
>>> join computes P joins each with average O(M/P * N/P) order where P is
>>> the number of partitions, which is still O(MN) with constant factor
>>> reduced by P times. I think, we need to apply similar logic to
>>> costing. Let's say cost of a join is J(M, N) = S (M, N) + R (M, N)
>>> where S and R are setup cost and joining cost (for M, N rows) resp.
>>> Cost of partition-wise join would be P * J(M/P, N/P) = P * S(M/P, N/P)
>>> + P * R(M/P, N/P). Each of the join methods will have different S and
>>> R functions and may not be linear on the number of rows. So,
>>> PartitionJoinPath costs are obtained from corresponding regular path
>>> costs subjected to above transformation. This way, we will be
>>> protected from choosing a PartitionJoinPath when it's not optimal.
>
>> I'm not sure that I really understand the stuff with big-O notation
>> and M, N, and P.  But I think what you are saying is that we could
>> cost a PartitionJoinPath by costing some of the partitions (it might
>> be a good idea to choose the biggest ones) and assuming the cost for
>> the remaining ones will be roughly proportional.  That does seem like
>> a reasonable strategy to me.
>
> I'm not sure to what extent the above argument depends on the assumption
> that join is O(MN), but I will point out that in no case of practical
> interest for large tables is it actually O(MN).  That would be true
> only for the stupidest possible nested-loop join method.  It would be
> wise to convince ourselves that the argument holds for more realistic
> big-O costs, eg hash join is more like O(M+N) if all goes well.

Yeah, I agree.  To recap briefly, the problem we're trying to solve
here is how to build a path for a partitionwise join without an
explosion in the amount of memory the planner uses or the number of
paths created.  In the initial design, if there are N partitions per
relation, the total number of paths generated by the planner increases
by a factor of N+1, which gets ugly if, say, N = 1000, or even N =
100.  To reign that in, we want to do a rough cut at costing the
partitionwise join that will be good enough to let us throw away
obviously inferior paths, and then work out the exact paths we're
going to use only for partitionwise joins that are actually selected.
I think costing one or a few of the larger sub-joins and assuming
those costs are representative is probably a reasonable approach to
that problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: [PATCH] Allow TAP tests to be run individually
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Pinning a buffer in TupleTableSlot is unnecessary