Re: [HACKERS] parallelize queries containing initplans
От | Robert Haas |
---|---|
Тема | Re: [HACKERS] parallelize queries containing initplans |
Дата | |
Msg-id | CA+TgmobbYS0JwRYpNpMNjV2ivVAMFchJmRvwwCTnbx1MrecQfQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] parallelize queries containing initplans (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: [HACKERS] parallelize queries containing initplans
(Amit Kapila <amit.kapila16@gmail.com>)
|
Список | pgsql-hackers |
On Thu, Oct 5, 2017 at 1:16 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Oct 5, 2017 at 6:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Oct 5, 2017 at 5:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> Now, unless, I am missing something here, it won't be possible to >>> detect params in such cases during forming of join rels and hence we >>> need the tests in generate_gather_paths. Let me know if I am missing >>> something in this context or if you have any better ideas to make it >>> work? >> >> Hmm, in a case like this, I think we shouldn't need to detect it. The >> Var is perfectly parallel-safe, the problem is that we can't use a >> not-safe plan for the inner rel. I wonder why that's happening >> here... > > It is because the inner rel (Result Path) contains a reference to a > param which appears to be at the same query level. Basically due to > changes in max_parallel_hazard_walker(). I spent several hours debugging this issue this afternoon. I think you've misdiagnosed the problem. I think that the Param reference in the result path is parallel-safe; that doesn't seem to me to be wrong. If we see a Param reference for our own query level, then either we're below the Gather and the new logic added by this patch will pass down the value or we're above the Gather and we can access the value directly. Either way, no problem. However, I think that if you attach an InitPlan to a parallel-safe plan, it becomes parallel-restricted. This is obvious in the case where the InitPlan's plan isn't itself parallel-safe, but it's also arguably true even when the InitPlan's plan *is* parallel-safe, because pushing that below a Gather introduces a multiple-evaluation hazard. I think we should fix that problem someday by providing a DSA-based parameter store, but that's a job for another day. And it turns out that we actually have such logic already, but this patch removes it: @@ -2182,7 +2181,6 @@ SS_charge_for_initplans(PlannerInfo *root, RelOptInfo *final_rel) path->startup_cost += initplan_cost; path->total_cost += initplan_cost; - path->parallel_safe = false; } /* We needn't do set_cheapest() here, caller will do it */ Now, the header comment for SS_charge_for_initplans() is wrong; it claims we can't transmit initPlans to workers, but I think that's already wrong even before this patch. On the other hand, I think that the actual code is right even after this patch. If I put that line back but make contains_parallel_unsafe_param always return false, then I can still get plans like this (I modified EXPLAIN to show Parallel Safe markings)... rhaas=# explain select * from pgbench_accounts where bid = (select max(g) from generate_series(1,1000)g); QUERY PLAN -----------------------------------------------------------------------------------------Gather (cost=12.51..648066.51 rows=100000width=97) Parallel Safe: false Workers Planned: 2 Params Evaluated: $0 InitPlan 1 (returns $0) -> Aggregate (cost=12.50..12.51 rows=1 width=4) Parallel Safe: true -> Function Scan on generate_series g (cost=0.00..10.00 rows=1000 width=4) Parallel Safe: true -> Parallel Seq Scan on pgbench_accounts (cost=0.00..648054.00 rows=41667 width=97) Parallel Safe: true Filter: (bid = $0) (12 rows) ...but Kuntal's example no longer misbehaves: QUERY PLAN ----------------------------------------------------------------------Hash Semi Join Parallel Safe: false Output: t1.i,t1.j, t1.k Hash Cond: (t1.i = ((1 + $1))) -> Gather Parallel Safe: false Output: t1.i, t1.j, t1.k Workers Planned: 2 -> Parallel Seq Scan on public.t1 Parallel Safe: true Output:t1.i, t1.j, t1.k -> Hash Parallel Safe: false Output: ((1 + $1)) -> Result ParallelSafe: false Output: (1 + $1) InitPlan 1 (returns $1) -> Finalize Aggregate Parallel Safe: false Output: max(t3.j) -> Gather Parallel Safe: false Output: (PARTIAL max(t3.j)) Workers Planned: 2 -> Partial Aggregate ParallelSafe: true Output: PARTIAL max(t3.j) -> ParallelSeq Scan on public.t3 Parallel Safe: true Output: t3.j (31 rows) With your original path, the Result was getting marked parallel-safe, but now it doesn't, which is correct, and after that everything seems to just work. Notice that in my original example the topmost plan node doubly fails to be parallel-safe: it fails because it's a Gather, and it fails because it has an InitPlan attached. But that's all OK -- the InitPlan doesn't make anything *under* the Gather unsafe, so we can still use parallelism *at* the level where the InitPlan is attached, just not *above* that level. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
В списке pgsql-hackers по дате отправления:
Следующее
От: Sean ChittendenДата:
Сообщение: [HACKERS] Re: [BUGS] Re: [PATCH] BUG #13416: Postgres>= 9.3 doesn't use optimized shared memory on Solarisanymore