Обсуждение: [HACKERS] Parallel safety for extern params

Поиск
Список
Период
Сортировка

[HACKERS] Parallel safety for extern params

От
Amit Kapila
Дата:
As discussed in a nearby thread [1] (parallelize queries containing
initplans), it appears that there are cases where queries referring
PARAM_EXTERN params are treated as parallel-restricted even though
they should be parallel-safe.  I have done some further investigation
and found that actually for most of the cases they are treated as
parallel-restricted (due to tests in max_parallel_hazard_walker).  The
cases where such queries are treated as parallel-safe are when
eval_const_expressions_mutator converts the param to const.  So
whenever we select the generic plan, it won't work.  One simple
example is:

create table t1(c1 int, c2 char(500));
insert into t1 values(generate_series(1,10000),'aaa');
analyze t1;

set force_parallel_mode = on;
regression=# prepare t1_select(int) as Select c1 from t1 where c1 < $1;
PREPARE
regression=# explain (costs off, analyze) execute t1_select(10000);
                            QUERY PLAN
-------------------------------------------------------------------
 Gather (actual time=35.252..44.727 rows=9999 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Seq Scan on t1 (actual time=0.364..5.638 rows=9999 loops=1)
         Filter: (c1 < 10000)
         Rows Removed by Filter: 1
 Planning time: 2135.514 ms
 Execution time: 52.886 ms
(9 rows)

The next four executions will give the same plan, however, the sixth
execution will give below plan:
regression=# explain (costs off, analyze) execute t1_select(10005);
                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on t1 (actual time=0.049..6.188 rows=10000 loops=1)
   Filter: (c1 < $1)
 Planning time: 22577.404 ms
 Execution time: 7.612 ms
(4 rows)

Attached patch fix_parallel_safety_for_extern_params_v1.patch fixes
this problem.  Note, for now, I have added a very simple test in
regression tests to cover prepared statement case, but we might want
to add something related to generic plan selection as I have shown
above. I am just not sure if it is a good idea to have multiple
executions just to get the generic plan.

After fixing this problem, when I ran the regression tests with
force_parallel_mode = regress, I saw multiple other failures.  All the
failures boil down to two kinds of cases:

1. There was an assumption while extracting simple expressions that
the target list of gather node can contain constants or Var's.  Now,
once the above patch allows extern params as parallel-safe, that
assumption no longer holds true.  We need to handle params as well.
Attached patch fix_simple_expr_interaction_gather_v1.patch handles
that case.

2. We don't allow execution to use parallelism if the plan can be
executed multiple times.  This has been enforced in ExecutePlan, but
it seems like that we miss to handle the case where we are already in
parallel mode by the time we enforce that condition.  So, what
happens, as a result, is that it will allow to use parallelism when it
shouldn't (when the same plan is executed multiple times) and lead to
a crash.  One way to fix is that we temporarily exit the parallel mode
in such cases and reenter in the same state once the current execution
is over. Attached patch fix_parallel_mode_nested_execution_v1.patch
fixes this problem.

Thanks to Kuntal who has helped in investigating the regression
failures which happened as a result of making param extern params as
parallel-safe.



[1] - https://www.postgresql.org/message-id/CA%2BTgmobSN66o2_c76rY12JvS_sZa17zpKvpuyG-QzRvVLYE8Vg%40mail.gmail.com
-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Parallel safety for extern params

От
Robert Haas
Дата:
On Fri, Oct 13, 2017 at 1:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> After fixing this problem, when I ran the regression tests with
> force_parallel_mode = regress, I saw multiple other failures.  All the
> failures boil down to two kinds of cases:
>
> 1. There was an assumption while extracting simple expressions that
> the target list of gather node can contain constants or Var's.  Now,
> once the above patch allows extern params as parallel-safe, that
> assumption no longer holds true.  We need to handle params as well.
> Attached patch fix_simple_expr_interaction_gather_v1.patch handles
> that case.

-     * referencing the child node's output ... but setrefs.c might also have
-     * copied a Const as-is.
+     * referencing the child node's output or a Param... but setrefs.c might
+     * also have copied a Const as-is.

I think the Param case should be mentioned after "... but" not before
- i.e. referencing the child node's output... but setrefs.c might also
have copied a Const or Param is-is.

> 2. We don't allow execution to use parallelism if the plan can be
> executed multiple times.  This has been enforced in ExecutePlan, but
> it seems like that we miss to handle the case where we are already in
> parallel mode by the time we enforce that condition.  So, what
> happens, as a result, is that it will allow to use parallelism when it
> shouldn't (when the same plan is executed multiple times) and lead to
> a crash.  One way to fix is that we temporarily exit the parallel mode
> in such cases and reenter in the same state once the current execution
> is over. Attached patch fix_parallel_mode_nested_execution_v1.patch
> fixes this problem.

This seems completely unsafe.  If somebody's already entered parallel
mode, they are counting on having the parallel-mode restrictions
enforced until they exit parallel mode.  We can't just disable those
restrictions for a while in the middle and then put them back.

I think the bug is in ExecGather(Merge): it assumes that if we're in
parallel mode, it's OK to start workers.  But actually, it shouldn't
do this unless ExecutePlan ended up with use_parallel_mode == true,
which isn't quite the same thing.  I think we might need ExecutePlan
to set a flag in the estate that ExecGather(Merge) can check.

Thanks for working on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

От
Amit Kapila
Дата:
On Sat, Oct 14, 2017 at 1:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 13, 2017 at 1:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> After fixing this problem, when I ran the regression tests with
>> force_parallel_mode = regress, I saw multiple other failures.  All the
>> failures boil down to two kinds of cases:
>>
>> 1. There was an assumption while extracting simple expressions that
>> the target list of gather node can contain constants or Var's.  Now,
>> once the above patch allows extern params as parallel-safe, that
>> assumption no longer holds true.  We need to handle params as well.
>> Attached patch fix_simple_expr_interaction_gather_v1.patch handles
>> that case.
>
> -     * referencing the child node's output ... but setrefs.c might also have
> -     * copied a Const as-is.
> +     * referencing the child node's output or a Param... but setrefs.c might
> +     * also have copied a Const as-is.
>
> I think the Param case should be mentioned after "... but" not before
> - i.e. referencing the child node's output... but setrefs.c might also
> have copied a Const or Param is-is.
>

I am not sure if we can write the comment like that (.. copied a Const
or Param as-is.) because fix_upper_expr_mutator in setrefs.c has a
special handling for Var and Param where constants are copied as-is
via expression_tree_mutator.  Also as proposed, the handling for
params is more like Var in exec_save_simple_expr.

>> 2. We don't allow execution to use parallelism if the plan can be
>> executed multiple times.  This has been enforced in ExecutePlan, but
>> it seems like that we miss to handle the case where we are already in
>> parallel mode by the time we enforce that condition.  So, what
>> happens, as a result, is that it will allow to use parallelism when it
>> shouldn't (when the same plan is executed multiple times) and lead to
>> a crash.  One way to fix is that we temporarily exit the parallel mode
>> in such cases and reenter in the same state once the current execution
>> is over. Attached patch fix_parallel_mode_nested_execution_v1.patch
>> fixes this problem.
>
> This seems completely unsafe.  If somebody's already entered parallel
> mode, they are counting on having the parallel-mode restrictions
> enforced until they exit parallel mode.  We can't just disable those
> restrictions for a while in the middle and then put them back.
>

Right.

> I think the bug is in ExecGather(Merge): it assumes that if we're in
> parallel mode, it's OK to start workers.  But actually, it shouldn't
> do this unless ExecutePlan ended up with use_parallel_mode == true,
> which isn't quite the same thing.  I think we might need ExecutePlan
> to set a flag in the estate that ExecGather(Merge) can check.
>

Attached patch fixes the problem in a suggested way.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Parallel safety for extern params

От
Amit Kapila
Дата:
On Mon, Oct 16, 2017 at 4:29 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sat, Oct 14, 2017 at 1:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I think the bug is in ExecGather(Merge): it assumes that if we're in
>> parallel mode, it's OK to start workers.  But actually, it shouldn't
>> do this unless ExecutePlan ended up with use_parallel_mode == true,
>> which isn't quite the same thing.  I think we might need ExecutePlan
>> to set a flag in the estate that ExecGather(Merge) can check.
>>
>
> Attached patch fixes the problem in a suggested way.
>
>

All the patches still apply and pass the regression test.  I have
added this to CF to avoid losing the track of this patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

От
Robert Haas
Дата:
On Mon, Oct 16, 2017 at 12:59 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I think the Param case should be mentioned after "... but" not before
>> - i.e. referencing the child node's output... but setrefs.c might also
>> have copied a Const or Param is-is.
>
> I am not sure if we can write the comment like that (.. copied a Const
> or Param as-is.) because fix_upper_expr_mutator in setrefs.c has a
> special handling for Var and Param where constants are copied as-is
> via expression_tree_mutator.  Also as proposed, the handling for
> params is more like Var in exec_save_simple_expr.

I committed fix_parallel_mode_nested_execution_v2.patch with some
cosmetic tweaks.  I back-patched it to 10 and 9.6, then had to fix
some issues reported by Tom as followup commits.

With respect to the bit quoted above, I rephrased the comment in a
slightly different way that hopefully is a reasonable compromise,
combined it with the main patch, and pushed it to master.  Along the
way I also got rid of the if statement you introduced and just made
the Assert() more complicated instead, which seems better to me.

When I tried back-porting the patch to v10 I discovered that the
plpgsql changes conflict heavily and that ripping them out all the way
produces regression failures under force_parallel_mode.  I think I see
why that's happening but it's not obvious how to me how to adapt
b73f1b5c29e0ace5a281bc13ce09dea30e1b66de to the v10 code.  Do you want
to have a crack at it or should we just leave this as a master-only
fix?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

От
Amit Kapila
Дата:
On Sat, Oct 28, 2017 at 2:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Oct 16, 2017 at 12:59 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> When I tried back-porting the patch to v10 I discovered that the
> plpgsql changes conflict heavily and that ripping them out all the way
> produces regression failures under force_parallel_mode.  I think I see
> why that's happening but it's not obvious how to me how to adapt
> b73f1b5c29e0ace5a281bc13ce09dea30e1b66de to the v10 code.  Do you want
> to have a crack at it or should we just leave this as a master-only
> fix?
>

I think we need to make changes in exec_simple_recheck_plan to make
the behavior similar to HEAD.  With the attached patch, all tests
passed with force_parallel_mode.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Parallel safety for extern params

От
Robert Haas
Дата:
On Sat, Oct 28, 2017 at 8:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I think we need to make changes in exec_simple_recheck_plan to make
> the behavior similar to HEAD.  With the attached patch, all tests
> passed with force_parallel_mode.

Committed to REL_10_STABLE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

От
Amit Kapila
Дата:
On Sun, Oct 29, 2017 at 9:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Oct 28, 2017 at 8:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I think we need to make changes in exec_simple_recheck_plan to make
>> the behavior similar to HEAD.  With the attached patch, all tests
>> passed with force_parallel_mode.
>
> Committed to REL_10_STABLE.
>

Thanks, I have closed this entry in CF app.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers