Re: Partition-wise join for join between (declaratively) partitioned tables
От | Ashutosh Bapat |
---|---|
Тема | Re: Partition-wise join for join between (declaratively) partitioned tables |
Дата | |
Msg-id | CAFjFpRcZ_M3-JxoiDkdoPS+-9Cok4ux9Si+4drcRL-62af=jWw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Partition-wise join for join between (declaratively) partitioned tables (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
(Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Список | pgsql-hackers |
Hi Robert, Sorry for delayed response. The attached patch implements following ideas: 1. At the time of creating paths - If the joining relations are both partitioned and join can use partition-wise join, we create paths for few child-joins. Similar to inheritance relations (set_append_rel_pathlist()), we collect paths with similar properties from all sampled child-joins and create one PartitionJoinPath with each set of paths. The cost of the PartitionJoinPath is obtained by multiplying the sum of costs of paths in the given set by the ratio of (number of rows estimated in the parent-join/sum of rows in child-joins). 2. If the PartitionJoinPath emerges as the best path, we create paths for each of the remaining child-joins. Then we collect paths with properties same as the given PartitionJoinPath, one from each child-join. These paths are converted into plans and a Merge/Append plan is created combing these plans. The paths and plans for child-join are created in a temporary memory context. The final plan for each child-join is copied into planner's context and the temporary memory context is reset. Right now, we choose 1% or 1 (whichever is higher) child-joins to base PartitionJoinPath costs on. Memory consumption ----------------------------- I tested a 5-way self-join for a table with 1000 partitions, each partition having 1M rows. The memory consumed in standard_planner() was measured with some granular tracking (mem_usage_func_wise_measurement_slabwise.patch). Partition-wise join consumed total of 289MB memory which is approx 6.6 times more than non-partition-wise join which consumed 44MB. That's much better than the earlier 16 times consumption for 5-way join with 100 partitions. The extra 245MB memory was consumed by child-join RelOptInfos (48MB), SpecialJoinInfos for child-joins (64MB), restrictlist translation (92MB), paths for sampled child-joins (1.5MB), building targetlists for child-joins (7MB). In order to choose representative child-joins based on the sizes of child-joins, we need to create all the child-join RelOptInfos. In order to estimate sizes of child-joins, we need to create SpecialJoinInfos and restrictlists for at least one join order for all child-joins. For every representative child-join, we need to create SpecialJoinInfo and restrictlist for all join orders for that child-join. We might be able to save of restrictlist translation, if we create restrict lists from joininfo similar to parent joins. I haven't tried that yet. Choosing representative child-joins: -------------------------------------------------- There's another angle to choosing representative child joins. In a partitioned N-way join, different joins covering different subsets of N relations, will have different size distributions across the partitions. This means that the child-joins costed for (N-k) joins, may be different for those required for (N-k+1) joins. With a factor of 1% sampling, N is such that a child-join participates in 100 joins, we will end up creating paths for all partitions before creating PartitionJoinPaths for the final N-way join. Hopefully that will be a rare case and usually we will end up using paths already created. We can not avoid creating PartitionJoinPaths for subset joins, as there might be cases when partition-wise join will be optimal for an N-k way join but not for N-way join. We may avoid this if we choose representative child-joins based on their positions, in which case, we may end up with some or all of those being empty and thus skewing the costs heavily. Partial paths ----------------- AFAIU, we create partial paths for append relation, when all the children have partial paths. Unlike parameterized paths or path with pathkeys, there is no way to create a partial path for a normal path. This means that unless we create paths for all child-joins, we can not create partial paths for appendrel comprising of child-joins, and thus can not use parallel query right now. This may not be that bad, since it would be more efficient to run each child-join in a separate worker, rather than using multiple workers for a single child-join. regression tests ---------------------- I observed that for small relations (1000 rows in each partition and 100 partitions), the size estimates in append relations and sum of those in child relations are very different. As a result, the extrapolated costs for PartitionJoinPaths as described above, are way higher than costs of join of appends (or even append of joins if we are to create paths for all child-joins). Thus with this approach, we choose partition-wise join for large number of partitions with large data (e.g. 1000 partitions with 1M rows each). These are certainly the cases when partition-wise join is a big win. I have not tried to find out a threshold above which partition-wise join gets chosen with above approach, but it's going to be a larger threshold. That makes writing regression tests difficult, as those will require large data. So, we have to find a way so that we can test partition-wise join with smaller data. There are few possibilities like 1. convert the fraction of representative child-joins into GUC and setting it to 100% would start choosing partition-wise joins for tables with a few hundred rows per partition, like it did in earlier approach, 2. provide a way to force partition-wise join whenever possible, by say costing partition-wise joins much lesser than non-partition-wise join when a GUC is set (e.g. enable_partition_wise_join with values always, never, optimal or something like that). -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Вложения
В списке pgsql-hackers по дате отправления: