Обсуждение: Observations in Parallel Append
Few observations in Parallel Append commit (ab727167) 1. +++ b/src/include/nodes/execnodes.h @@ -21,6 +21,7 @@ #include "lib/pairingheap.h" #include "nodes/params.h" #include "nodes/plannodes.h" +#include "storage/spin.h" .. There doesn't seem to be any need for including spin.h. I think some prior version of the patch might need it. Patch attached to remove it. 2. + * choose_next_subplan_for_worker .. + * We start from the first plan and advance through the list; + * when we get back to the end, we loop back to the first + * nonpartial plan. .. +choose_next_subplan_for_worker(AppendState *node) { .. + if (pstate->pa_next_plan < node->as_nplans - 1) + { + /* Advance to next plan. */ + pstate->pa_next_plan++; + } + else if (append->first_partial_plan < node->as_nplans) + { + /* Loop back to first partial plan. */ + pstate->pa_next_plan = append->first_partial_plan; + } .. The code and comment don't seem to match. The comments indicate that after reaching the end of the list, we loop back to first nonpartial plan whereas code indicates that we loop back to first partial plan. I think one of those needs to be changed unless I am missing something obvious. 3. +cost_append(AppendPath *apath) { .. + /* + * Apply parallel divisor to non-partial subpaths. Also add the + * cost of partial paths to the total cost, but ignore non-partial + * paths for now. + */ + if (i < apath->first_partial_path) + apath->path.rows += subpath->rows / parallel_divisor; + else + { + apath->path.rows += subpath->rows; + apath->path.total_cost += subpath->total_cost; + } .. } I think it is better to use clamp_row_est for rows for the case where we use parallel_divisor so that the value of rows is always sane. Also, don't we need to use parallel_divisor for partial paths instead of non-partial paths as those will be actually distributed among workers? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Вложения
On Fri, Dec 22, 2017 at 6:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > There doesn't seem to be any need for including spin.h. I think some > prior version of the patch might need it. Patch attached to remove > it. OK, good catch. > The code and comment don't seem to match. The comments indicate that > after reaching the end of the list, we loop back to first nonpartial > plan whereas code indicates that we loop back to first partial plan. > I think one of those needs to be changed unless I am missing something > obvious. Yeah, the header comment should say partial, not nonpartial. > 3. > +cost_append(AppendPath *apath) > { > .. > + /* > + * Apply parallel divisor to non-partial subpaths. Also add the > + * cost of partial paths to the total cost, but ignore non-partial > + * paths for now. > + */ > + if (i < apath->first_partial_path) > + apath->path.rows += subpath->rows / parallel_divisor; > + else > + { > + apath->path.rows += subpath->rows; > + apath->path.total_cost += subpath->total_cost; > + } > .. > } > > I think it is better to use clamp_row_est for rows for the case where > we use parallel_divisor so that the value of rows is always sane. Good point. > Also, don't we need to use parallel_divisor for partial paths instead > of non-partial paths as those will be actually distributed among > workers? Uh, that seems backwards to me. We're trying to estimate the average number of rows per worker. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Dec 24, 2017 at 12:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Dec 22, 2017 at 6:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > >> Also, don't we need to use parallel_divisor for partial paths instead >> of non-partial paths as those will be actually distributed among >> workers? > > Uh, that seems backwards to me. We're trying to estimate the average > number of rows per worker. > Okay, but is it appropriate to use the parallel_divisor? The parallel_divisor means the contribution of all the workers (+ leader_contribution) whereas for non-partial paths there will be always only the subset of workers which will operate on them. Consider a case with one non-partial subpath and five partial subpaths with six as parallel_divisor, now the current code will try to divide the rows of non-partial subpath with respect to six workers. However, in reality, there will always be one worker which will execute that path. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, Dec 24, 2017 at 8:37 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sun, Dec 24, 2017 at 12:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Dec 22, 2017 at 6:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >>> Also, don't we need to use parallel_divisor for partial paths instead >>> of non-partial paths as those will be actually distributed among >>> workers? >> >> Uh, that seems backwards to me. We're trying to estimate the average >> number of rows per worker. > > Okay, but is it appropriate to use the parallel_divisor? The > parallel_divisor means the contribution of all the workers (+ > leader_contribution) whereas for non-partial paths there will be > always only the subset of workers which will operate on them. > Consider a case with one non-partial subpath and five partial subpaths > with six as parallel_divisor, now the current code will try to divide > the rows of non-partial subpath with respect to six workers. However, > in reality, there will always be one worker which will execute that > path. That's true, of course, but if five processes each return 0 rows and the sixth process returns 600 rows, the average number of rows per process is 100, not anything else. Here's one way to look at it. Suppose there is a table with 1000 partitions. If we do a Parallel Append over a Parallel Seq Scan per partition, we will come up with a row estimate by summing the estimated row count across all partitions and dividing by the parallel_divisor. This will give us some answer. If we instead do a Parallel Append over a Seq Scan per partition, we should really come up with the *same* estimate. The only way to do that is to also divide by the parallel_divisor in this case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Dec 27, 2017 at 12:09 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Dec 24, 2017 at 8:37 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> Okay, but is it appropriate to use the parallel_divisor? The >> parallel_divisor means the contribution of all the workers (+ >> leader_contribution) whereas for non-partial paths there will be >> always only the subset of workers which will operate on them. >> Consider a case with one non-partial subpath and five partial subpaths >> with six as parallel_divisor, now the current code will try to divide >> the rows of non-partial subpath with respect to six workers. However, >> in reality, there will always be one worker which will execute that >> path. > > That's true, of course, but if five processes each return 0 rows and > the sixth process returns 600 rows, the average number of rows per > process is 100, not anything else. > > Here's one way to look at it. Suppose there is a table with 1000 > partitions. If we do a Parallel Append over a Parallel Seq Scan per > partition, we will come up with a row estimate by summing the > estimated row count across all partitions and dividing by the > parallel_divisor. This will give us some answer. If we instead do a > Parallel Append over a Seq Scan per partition, we should really come > up with the *same* estimate. The only way to do that is to also > divide by the parallel_divisor in this case. > The theory sounds good, but it looks like that the code doesn't behave in the same way. To verify the theory, I and my colleague Dilip had run few tests, the output of which is as below: Test setup ------------------- create table t(a int, b varchar) partition by range(a); create table t1 partition of t for values from (0) to (100000); create table t2 partition of t for values from (100000) to (200000); create table t3 partition of t for values from (200000) to (300000); create table t4 partition of t for values from (300000) to (400000); insert into t values (generate_series(1,399999), repeat('x', 50)); analyze t; Test ------- set parallel_tuple_cost=0; set parallel_setup_cost=0; set max_parallel_workers_per_gather=6; postgres=# explain select * from t; QUERY PLAN ------------------------------------------------------------------------------ Gather (cost=0.00..6476.94 rows=399999 width=55) Workers Planned: 3 -> Parallel Append (cost=0.00..6476.94 rows=235295 width=55) -> Parallel Seq Scan on t1 (cost=0.00..1619.23 rows=58823 width=55) -> Parallel Seq Scan on t2 (cost=0.00..1619.24 rows=58824 width=55) -> Parallel Seq Scan on t3 (cost=0.00..1619.24 rows=58824 width=55) -> Parallel Seq Scan on t4 (cost=0.00..1619.24 rows=58824 width=55) (7 rows) postgres=# alter table t1 set (parallel_workers=0); ALTER TABLE postgres=# explain select * from t; QUERY PLAN ------------------------------------------------------------------------------ Gather (cost=0.00..6888.70 rows=399999 width=55) Workers Planned: 3 -> Parallel Append (cost=0.00..6888.70 rows=208730 width=55) -> Seq Scan on t1 (cost=0.00..2030.99 rows=99999 width=55) -> Parallel Seq Scan on t2 (cost=0.00..1619.24 rows=58824 width=55) -> Parallel Seq Scan on t3 (cost=0.00..1619.24 rows=58824 width=55) -> Parallel Seq Scan on t4 (cost=0.00..1619.24 rows=58824 width=55) (7 rows) postgres=# alter table t2 set (parallel_workers=0); ALTER TABLE postgres=# alter table t3 set (parallel_workers=0); ALTER TABLE postgres=# alter table t4 set (parallel_workers=0); ALTER TABLE postgres=# explain select * from t; QUERY PLAN ----------------------------------------------------------------------- Gather (cost=0.00..4061.99 rows=399999 width=55) Workers Planned: 3 -> Parallel Append (cost=0.00..4061.99 rows=129032 width=55) -> Seq Scan on t2 (cost=0.00..2031.00 rows=100000 width=55) -> Seq Scan on t3 (cost=0.00..2031.00 rows=100000 width=55) -> Seq Scan on t4 (cost=0.00..2031.00 rows=100000 width=55) -> Seq Scan on t1 (cost=0.00..2030.99 rows=99999 width=55) (7 rows) The value of rows in Parallel Append is different for different cases: (a) Row estimate for Parallel Append over a Parallel Seq Scan per partition - 235295 (b) Row estimate for Parallel Append over a mix of Parallel Seq Scan and Seq Scan per partition - 208730 (c) Row estimate for Parallel Append over a Seq Scan per partition - 129032 I think the reason for different estimates is that we use a different value of parallel divisor for partial paths. If the row estimation for partial paths uses the scaled value of parallel divisor, then we get consistent results for row estimation. I have tried the below change: + apath->path.rows += (subpath->rows * subpath_parallel_divisor / + parallel_divisor); The results after the change: postgres=# explain select * from t; QUERY PLAN ------------------------------------------------------------------------------ Gather (cost=0.00..6476.94 rows=399999 width=55) Workers Planned: 3 -> Parallel Append (cost=0.00..6476.94 rows=129032 width=55) -> Parallel Seq Scan on t1 (cost=0.00..1619.23 rows=58823 width=55) -> Parallel Seq Scan on t2 (cost=0.00..1619.24 rows=58824 width=55) -> Parallel Seq Scan on t3 (cost=0.00..1619.24 rows=58824 width=55) -> Parallel Seq Scan on t4 (cost=0.00..1619.24 rows=58824 width=55) (7 rows) postgres=# alter table t1 set (parallel_workers=0); ALTER TABLE postgres=# explain select * from t; QUERY PLAN ------------------------------------------------------------------------------- Gather (cost=0.00..6888.70 rows=399999 width=55) Workers Planned: 3 -> Parallel Append (cost=0.00..6888.70 rows=129032 width=55) -> Seq Scan on t1 (cost=0.00..2030.99 rows=99999 width=55) -> Parallel Seq Scan on t2 (cost=0.00..1619.24 rows=58824 width=55) -> Parallel Seq Scan on t3 (cost=0.00..1619.24 rows=58824 width=55) -> Parallel Seq Scan on t4 (cost=0.00..1619.24 rows=58824 width=55) (7 rows) postgres=# alter table t4 set (parallel_workers=0); ALTER TABLE postgres=# alter table t3 set (parallel_workers=0); ALTER TABLE postgres=# alter table t2 set (parallel_workers=0); ALTER TABLE postgres=# explain select * from t; QUERY PLAN ---------------------------------------------------------------------- Gather (cost=0.00..4061.99 rows=399999 width=55) Workers Planned: 3 -> Parallel Append (cost=0.00..4061.99 rows=129032 width=55) -> Seq Scan on t2 (cost=0.00..2031.00 rows=100000 width=55) -> Seq Scan on t3 (cost=0.00..2031.00 rows=100000 width=55) -> Seq Scan on t4 (cost=0.00..2031.00 rows=100000 width=55) -> Seq Scan on t1 (cost=0.00..2030.99 rows=99999 width=55) (7 rows) Note that after the change the row estimation (129032) for Parallel Append is consistent for all the three kinds of cases. Attached, please find the patch which fixes this issue (Thanks to Dilip for helping me in identifying the above case and fix). I have also modified the comment atop function choose_next_subplan_for_worker() as discussed above. The change to remove unnecessary inclusion of spin.h is still in a separate patch as posted above. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Вложения
On Tue, Jan 2, 2018 at 11:11 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > Attached, please find the patch which fixes this issue (Thanks to > Dilip for helping me in identifying the above case and fix). I have > also modified the comment atop function > choose_next_subplan_for_worker() as discussed above. The change to > remove unnecessary inclusion of spin.h is still in a separate patch as > posted above. Here's a combined patch with some cosmetic changes which I will commit if it looks OK to you. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Thu, Jan 4, 2018 at 9:06 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jan 2, 2018 at 11:11 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> Attached, please find the patch which fixes this issue (Thanks to >> Dilip for helping me in identifying the above case and fix). I have >> also modified the comment atop function >> choose_next_subplan_for_worker() as discussed above. The change to >> remove unnecessary inclusion of spin.h is still in a separate patch as >> posted above. > > Here's a combined patch with some cosmetic changes which I will commit > if it looks OK to you. > Looks good to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Jan 3, 2018 at 10:54 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> Here's a combined patch with some cosmetic changes which I will commit >> if it looks OK to you. > > Looks good to me. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company