Re: [HACKERS] parallelize queries containing initplans

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: [HACKERS] parallelize queries containing initplans
Дата	6 октября 2017 г. 00:02:38
Msg-id	CA+TgmobbYS0JwRYpNpMNjV2ivVAMFchJmRvwwCTnbx1MrecQfQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [HACKERS] parallelize queries containing initplans (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: [HACKERS] parallelize queries containing initplans
Список	pgsql-hackers

Дерево обсуждения

On Thu, Oct 5, 2017 at 1:16 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Oct 5, 2017 at 6:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Oct 5, 2017 at 5:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> Now, unless, I am missing something here, it won't be possible to
>>> detect params in such cases during forming of join rels and hence we
>>> need the tests in generate_gather_paths.  Let me know if I am missing
>>> something in this context or if you have any better ideas to make it
>>> work?
>>
>> Hmm, in a case like this, I think we shouldn't need to detect it.  The
>> Var is perfectly parallel-safe, the problem is that we can't use a
>> not-safe plan for the inner rel.  I wonder why that's happening
>> here...
>
> It is because the inner rel (Result Path) contains a reference to a
> param which appears to be at the same query level.  Basically due to
> changes in max_parallel_hazard_walker().

I spent several hours debugging this issue this afternoon.  I think
you've misdiagnosed the problem.  I think that the Param reference in
the result path is parallel-safe; that doesn't seem to me to be wrong.
If we see a Param reference for our own query level, then either we're
below the Gather and the new logic added by this patch will pass down
the value or we're above the Gather and we can access the value
directly.  Either way, no problem.

However, I think that if you attach an InitPlan to a parallel-safe
plan, it becomes parallel-restricted.  This is obvious in the case
where the InitPlan's plan isn't itself parallel-safe, but it's also
arguably true even when the InitPlan's plan *is* parallel-safe,
because pushing that below a Gather introduces a multiple-evaluation
hazard.  I think we should fix that problem someday by providing a
DSA-based parameter store, but that's a job for another day.

And it turns out that we actually have such logic already, but this
patch removes it:

@@ -2182,7 +2181,6 @@ SS_charge_for_initplans(PlannerInfo *root,
RelOptInfo *final_rel)
               path->startup_cost += initplan_cost;               path->total_cost += initplan_cost;
-               path->parallel_safe = false;       }
       /* We needn't do set_cheapest() here, caller will do it */

Now, the header comment for SS_charge_for_initplans() is wrong; it
claims we can't transmit initPlans to workers, but I think that's
already wrong even before this patch.  On the other hand, I think that
the actual code is right even after this patch.  If I put that line
back but make contains_parallel_unsafe_param always return false, then
I can still get plans like this (I modified EXPLAIN to show Parallel
Safe markings)...

rhaas=# explain select * from pgbench_accounts where bid = (select
max(g) from generate_series(1,1000)g);                                      QUERY PLAN
-----------------------------------------------------------------------------------------Gather  (cost=12.51..648066.51
rows=100000width=97)  Parallel Safe: false  Workers Planned: 2  Params Evaluated: $0  InitPlan 1 (returns $0)    ->
Aggregate (cost=12.50..12.51 rows=1 width=4)          Parallel Safe: true          ->  Function Scan on generate_series
g (cost=0.00..10.00

rows=1000 width=4)                Parallel Safe: true  ->  Parallel Seq Scan on pgbench_accounts
(cost=0.00..648054.00
rows=41667 width=97)        Parallel Safe: true        Filter: (bid = $0)
(12 rows)

...but Kuntal's example no longer misbehaves:
                             QUERY PLAN
----------------------------------------------------------------------Hash Semi Join  Parallel Safe: false  Output:
t1.i,t1.j, t1.k  Hash Cond: (t1.i = ((1 + $1)))  ->  Gather        Parallel Safe: false        Output: t1.i, t1.j, t1.k
      Workers Planned: 2        ->  Parallel Seq Scan on public.t1              Parallel Safe: true
Output:t1.i, t1.j, t1.k  ->  Hash        Parallel Safe: false        Output: ((1 + $1))        ->  Result
ParallelSafe: false              Output: (1 + $1)              InitPlan 1 (returns $1)                ->  Finalize
Aggregate                     Parallel Safe: false                      Output: max(t3.j)                      ->
Gather                           Parallel Safe: false                            Output: (PARTIAL max(t3.j))
               Workers Planned: 2                            ->  Partial Aggregate
ParallelSafe: true                                  Output: PARTIAL max(t3.j)                                  ->
ParallelSeq Scan on public.t3                                        Parallel Safe: true
       Output: t3.j

(31 rows)

With your original path, the Result was getting marked parallel-safe,
but now it doesn't, which is correct, and after that everything seems
to just work.

Notice that in my original example the topmost plan node doubly fails
to be parallel-safe: it fails because it's a Gather, and it fails
because it has an InitPlan attached.  But that's all OK -- the
InitPlan doesn't make anything *under* the Gather unsafe, so we can
still use parallelism *at* the level where the InitPlan is attached,
just not *above* that level.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] parallelize queries containing initplans