Обсуждение: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

Поиск
Список
Период
Сортировка

[sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

От
Andreas Seltenreich
Дата:
Hi,

the query below triggers a parallel worker assertion for me when run on
the regression database of master as of 0832f2d.  The plan sports a
couple of InitPlan nodes below Gather.

regards,
Andreas
Gather  (cost=1.64..84.29 rows=128 width=4)  Workers Planned: 1  Single Copy: true  ->  Limit  (cost=1.64..84.29
rows=128width=4)        ->  Subquery Scan on subq_0  (cost=1.64..451.06 rows=696 width=4)              Filter:
(subq_0.c6IS NOT NULL)              ->  Nested Loop Left Join  (cost=1.64..444.07 rows=699 width=145)
Join Filter: (sample_0.aa = sample_1.pageno)                    InitPlan 4 (returns $3)                      ->  Result
(cost=1.21..5.36 rows=15 width=0)                            One-Time Filter: ($0 AND ($1 = $2))
   InitPlan 1 (returns $0)                              ->  Result  (cost=0.00..0.00 rows=0 width=0)
               One-Time Filter: false                            InitPlan 2 (returns $1)
-> Limit  (cost=0.35..0.52 rows=1 width=4)                                    ->  Gather  (cost=0.00..1.04 rows=6
width=4)                                         Workers Planned: 1                                          ->
ParallelSeq Scan on reltime_tbl  (cost=0.00..1.04 rows=4 width=4)                            InitPlan 3 (returns $2)
                         ->  Limit  (cost=0.52..0.69 rows=1 width=4)                                    ->  Gather
(cost=0.00..1.04rows=6 width=4)                                          Workers Planned: 1
            ->  Parallel Seq Scan on reltime_tbl reltime_tbl_1  (cost=0.00..1.04 rows=4 width=4)
   ->  Sample Scan on pg_foreign_data_wrapper sample_2  (cost=1.21..5.36 rows=15 width=0)
  Sampling: system ('3.1'::real)                    ->  Nested Loop  (cost=0.15..382.85 rows=699 width=4)
          ->  Sample Scan on pg_largeobject sample_1  (cost=0.00..209.03 rows=699 width=8)
 Sampling: bernoulli ('2.9'::real)                                Filter: (pageno IS NOT NULL)
-> Index Only Scan using pg_foreign_table_relid_index on pg_foreign_table ref_0  (cost=0.15..0.24 rows=1 width=4)
                        Index Cond: (ftrelid = sample_1.loid)                    ->  Materialize  (cost=0.00..16.06
rows=4width=4)                          ->  Append  (cost=0.00..16.04 rows=4 width=4)                                ->
Sample Scan on a sample_0  (cost=0.00..4.01 rows=1 width=4)                                      Sampling: system
('5'::real)                               ->  Sample Scan on b sample_0_1  (cost=0.00..4.01 rows=1 width=4)
                        Sampling: system ('5'::real)                                ->  Sample Scan on c sample_0_2
(cost=0.00..4.01rows=1 width=4)                                      Sampling: system ('5'::real)
        ->  Sample Scan on d sample_0_3  (cost=0.00..4.01 rows=1 width=4)
Sampling:system ('5'::real)
 


--8<---------------cut here---------------start------------->8---
set force_parallel_mode to on;
set max_parallel_workers_per_gather to 2;

select 91 as c0
from (select       (select pfname from public.pslot limit 1 offset 3)          as c0,       ref_1.grandtot as c1,
(selectpg_catalog.min(procost) from pg_catalog.pg_proc)          as c2,       ref_0.ftoptions as c3,
ref_1.grandtotas c4,       sample_1.loid as c5,       pg_catalog.pg_rotate_logfile() as c6,       (select random from
public.hash_i4_heaplimit 1 offset 5)          as c7,       sample_1.loid as c8     from       public.a as sample_0
tablesamplesystem (5)           right join pg_catalog.pg_largeobject as sample_1 tablesample bernoulli (2.9)
inner join pg_catalog.pg_foreign_table as ref_0             on (sample_1.loid = ref_0.ftrelid )           on
(sample_0.aa= sample_1.pageno )         left join public.mvtest_tvv as ref_1         on (EXISTS (             select
            sample_2.fdwoptions as c0,                 sample_2.fdwhandler as c1,                 (select during from
public.test_range_excllimit 1 offset 89)                    as c2               from
pg_catalog.pg_foreign_data_wrapperas sample_2 tablesample system (3.1)               where (EXISTS (
select                      sample_3.b as c0,                       (select grantee from
information_schema.udt_privilegeslimit 1 offset 4)                          as c1,                       sample_3.b as
c2,                      sample_3.rf_a as c3,                       sample_3.b as c4,
sample_3.rf_aas c5,                       sample_3.rf_a as c6                     from
public.clstr_tst_sas sample_3 tablesample system (8.1)                     where sample_3.rf_a >= cast(null as int8)
                limit 141))                 and ((select f1 from public.reltime_tbl limit 1 offset 2)
  = (select f1 from public.reltime_tbl limit 1 offset 3)                     )               limit 49))     where
sample_1.pagenois not NULL) as subq_0
 
where subq_0.c6 is not NULL
limit 128;
--8<---------------cut here---------------end--------------->8---



Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

От
Amit Kapila
Дата:
On Sun, Nov 20, 2016 at 10:43 PM, Andreas Seltenreich
<seltenreich@gmx.de> wrote:
> Hi,
>
> the query below triggers a parallel worker assertion for me when run on
> the regression database of master as of 0832f2d.  The plan sports a
> couple of InitPlan nodes below Gather.
>

I think the reason of this issue is that in some cases where subplan
is at some node other than top_plan node, we allow them to be executed
in the worker in force_parallel_mode.  It seems to me that the
problematic code is below check in standard_planner()

if (force_parallel_mode != FORCE_PARALLEL_OFF &&
best_path->parallel_safe &&
top_plan->initPlan == NIL)

Here instead of checking whether top_plan has initPlan, it should
check whether initPlan is present anywhere in plan tree.  I think one
simple way could be to check *glob->subplans* instead of
top_plan->initPlan, another possibility is to traverse the whole tree
to see if initPlan is present.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

От
Amit Kapila
Дата:
On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sun, Nov 20, 2016 at 10:43 PM, Andreas Seltenreich
> <seltenreich@gmx.de> wrote:
>> Hi,
>>
>> the query below triggers a parallel worker assertion for me when run on
>> the regression database of master as of 0832f2d.  The plan sports a
>> couple of InitPlan nodes below Gather.
>>
>
> I think the reason of this issue is that in some cases where subplan
> is at some node other than top_plan node, we allow them to be executed
> in the worker in force_parallel_mode.  It seems to me that the
> problematic code is below check in standard_planner()
>
> if (force_parallel_mode != FORCE_PARALLEL_OFF &&
> best_path->parallel_safe &&
> top_plan->initPlan == NIL)
>
> Here instead of checking whether top_plan has initPlan, it should
> check whether initPlan is present anywhere in plan tree.  I think one
> simple way could be to check *glob->subplans* instead of
> top_plan->initPlan,
>

Patch based on above suggestion is attached with this mail.

Thoughts?


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Here instead of checking whether top_plan has initPlan, it should
>> check whether initPlan is present anywhere in plan tree.  I think one
>> simple way could be to check *glob->subplans* instead of
>> top_plan->initPlan,

> Patch based on above suggestion is attached with this mail.

I think this is the right fix for the moment, because the proximate cause
of the crash is that ExecSerializePlan doesn't transmit any part of the
PlannedStmt.subplans list, which glob->subplans is the preimage of.

Now, maybe I'm missing something, but it seems to me that ordinary
subplans could be transmitted to workers just fine.  The problem with
transmitting initplans is that you'd lose the expectation of single
evaluation.  (Even there, it might be okay if they're far enough down
in the plan tree, but I haven't thought about it in detail.)  So I'd
rather see us fix ExecSerializePlan to transmit the subplans list
and have some more-restrictive test here.  This code would still be
wrong as it stands though.
        regards, tom lane



Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

От
Amit Kapila
Дата:
On Mon, Nov 21, 2016 at 9:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> Here instead of checking whether top_plan has initPlan, it should
>>> check whether initPlan is present anywhere in plan tree.  I think one
>>> simple way could be to check *glob->subplans* instead of
>>> top_plan->initPlan,
>
>> Patch based on above suggestion is attached with this mail.
>
> I think this is the right fix for the moment, because the proximate cause
> of the crash is that ExecSerializePlan doesn't transmit any part of the
> PlannedStmt.subplans list, which glob->subplans is the preimage of.
>
> Now, maybe I'm missing something, but it seems to me that ordinary
> subplans could be transmitted to workers just fine.  The problem with
> transmitting initplans is that you'd lose the expectation of single
> evaluation.
>

Yes and I think we can handle it such that master backend evaluates
initplans and share the calculated value along with paramid with all
the workers. Workers will, in turn, restore it in
queryDesc->estate->es_param_exec_vals (or some other place where those
can be directly used, I have yet to evaluate on this matter).  I am
working on a patch to parallelize queries containing
initplans/subplans, so I will evaluate your suggestion of passing
subplans (maybe non-InitPlans) in ExecSerializePlan as part of that
patch.  I have yet to figure out what is the best way to share hashed
subplans, do we pass them as it is and let each worker evaluate and
store it's own copy of hash table or shall we try to form the hash
table once in master and then share the same with workers (which could
be tricky) or shall we restrict such queries which contain hashed
subplans based on assumption that it will be costly for each worker to
form the hash table.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

От
Andreas Seltenreich
Дата:
Hi,

just caught another InitPlan below Gather with the recent patches in
(master as of 4cc6a3f).  Recipe below.

regards,
andreas

set max_parallel_workers_per_gather = 2;
set min_parallel_relation_size = 0;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;

explain select 1 from  public.quad_point_tbl as ref_0,  lateral (select        ref_0.p as c3,        sample_0.d as c5
  from        public.nv_child_2010 as sample_0          left join public.mvtest_tvv as ref_1          on ('x'< (select
contypefrom pg_catalog.pg_constraint limit 1))      limit 82) as subq_0;
 

--                                                QUERY PLAN
-- --------------------------------------------------------------------------------------------------------
--  Gather  (cost=0.19..13727.52 rows=902246 width=4)
--    Workers Planned: 2
--    ->  Nested Loop  (cost=0.19..13727.52 rows=902246 width=4)
--          ->  Parallel Seq Scan on quad_point_tbl ref_0  (cost=0.00..105.85 rows=4585 width=16)
--          ->  Limit  (cost=0.19..1.33 rows=82 width=20)
--                InitPlan 1 (returns $0)
--                  ->  Limit  (cost=0.00..0.19 rows=1 width=1)
--                        ->  Gather  (cost=0.00..10.22 rows=54 width=1)
--                              Workers Planned: 2
--                              ->  Parallel Seq Scan on pg_constraint  (cost=0.00..10.22 rows=22 width=1)
--                ->  Seq Scan on nv_child_2010 sample_0  (cost=0.00..35.50 rows=2550 width=20)



Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

От
Amit Kapila
Дата:
On Fri, Nov 25, 2016 at 3:26 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> Hi,
>
> just caught another InitPlan below Gather with the recent patches in
> (master as of 4cc6a3f).  Recipe below.
>

I think this problem exists since commit
110a6dbdebebac9401b43a8fc223e6ec43cd4d10 where we have allowed
subqueries to be pushed to parallel workers.  I think we should
consider rel (where rtekind is RTE_SUBQUERY) to be parallel safe if
the subquery is also parallel safe.   Attached patch does that and
fixes the reported problem for me.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> On Fri, Nov 25, 2016 at 3:26 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>> just caught another InitPlan below Gather with the recent patches in
>> (master as of 4cc6a3f).  Recipe below.

> I think this problem exists since commit
> 110a6dbdebebac9401b43a8fc223e6ec43cd4d10 where we have allowed
> subqueries to be pushed to parallel workers.

The impression I got in poking at this for a few minutes, before
going off to stuff myself with turkey, was that we were allowing
a SubqueryScanPath to mark itself as parallel-safe even though the
resulting plan node would have an InitPlan attached.  So my thought
about fixing it was along the lines of if-subroot-contains-initplans-
then-dont-let-SubqueryScanPath-be-parallel-safe.  What you propose
here seems like it would shut off parallel query in more cases than
that.  But I'm still full of turkey and may be missing something.

There's another issue here, which is that the InitPlan is derived from an
outer join qual whose outer join seems to have been deleted entirely by
analyzejoins.c.  Up to now it was possible to avert our eyes from the fact
that join removal is lazy about getting rid of unused InitPlans, but if
they're going to disable parallelization of the surrounding query, maybe
we need to work harder on that.

Another question worth asking is whether it was okay for the subquery to
decide to use a Gather.  Are we OK with multiple Gathers per plan tree,
independently of the points above?
        regards, tom lane



Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

От
Amit Kapila
Дата:
On Fri, Nov 25, 2016 at 10:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> On Fri, Nov 25, 2016 at 3:26 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>>> just caught another InitPlan below Gather with the recent patches in
>>> (master as of 4cc6a3f).  Recipe below.
>
>> I think this problem exists since commit
>> 110a6dbdebebac9401b43a8fc223e6ec43cd4d10 where we have allowed
>> subqueries to be pushed to parallel workers.
>
> The impression I got in poking at this for a few minutes, before
> going off to stuff myself with turkey, was that we were allowing
> a SubqueryScanPath to mark itself as parallel-safe even though the
> resulting plan node would have an InitPlan attached.  So my thought
> about fixing it was along the lines of if-subroot-contains-initplans-
> then-dont-let-SubqueryScanPath-be-parallel-safe.
>

I think this will work for the reported case, see the patch attached.
However, don't we need to worry about if there is a subplan
(non-initplan) instead of initplan.  I have tried by tweaking the
above query such that it will generate a subplan and for such a case
it will make SubqueryScanPath as parallel-safe.

explain select 1 from public.quad_point_tbl as ref_0, lateral (select
ref_0.p as c3, sample_0.d as c5 from public.nv_child_2010 as sample_0
left join public.mvtest_tvv as ref_1 on ('x' = ANY (select contype
from pg_catalog.pg_constraint limit 1)) limit 82) as subq_0;

>  What you propose
> here seems like it would shut off parallel query in more cases than
> that.  But I'm still full of turkey and may be missing something.
>
> There's another issue here, which is that the InitPlan is derived from an
> outer join qual whose outer join seems to have been deleted entirely by
> analyzejoins.c.  Up to now it was possible to avert our eyes from the fact
> that join removal is lazy about getting rid of unused InitPlans, but if
> they're going to disable parallelization of the surrounding query, maybe
> we need to work harder on that.
>

Yeah, that makes sense, but not sure whether we should try it along
with this patch.

> Another question worth asking is whether it was okay for the subquery to
> decide to use a Gather.  Are we OK with multiple Gathers per plan tree,
> independently of the points above?
>

As of now, we don't support nested Gather case.  Example:

Not Okay Plan
---------------------
-> Gather
     -> Nested Loop
         -> Parallel Seq Scan
         -> Gather
              -> Parallel Seq Scan

Okay Plan
---------------------
-> Nested Loop
     -> Gather
          -> Parallel Seq Scan
     -> Gather
          -> Parallel Seq Scan



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> On Fri, Nov 25, 2016 at 10:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The impression I got in poking at this for a few minutes, before
>> going off to stuff myself with turkey, was that we were allowing
>> a SubqueryScanPath to mark itself as parallel-safe even though the
>> resulting plan node would have an InitPlan attached.  So my thought
>> about fixing it was along the lines of if-subroot-contains-initplans-
>> then-dont-let-SubqueryScanPath-be-parallel-safe.

> I think this will work for the reported case, see the patch attached.

After further thought, it seems to me that the fundamental error here
is that we're not accounting for initPlans while marking paths as
parallel safe or not.  They should be correctly marked when they come
out of subquery_planner, rather than expecting callers to know that
they can't trust that marking.  That way, we don't need to do anything
special in create_subqueryscan_path, and we can also get rid of that
kluge in the force_parallel_mode logic.

> However, don't we need to worry about if there is a subplan
> (non-initplan) instead of initplan.

I don't think so.  References to subplans are already known to be parallel
restricted.  The issue that we're fixing here is that if a plan node has
initPlans attached, ExecutorStart will try to access those subplans,
whether or not they actually get used while running.  That's why the plan
node itself has to be marked parallel-unsafe so it won't get shipped to
parallel workers.

>> There's another issue here, which is that the InitPlan is derived from an
>> outer join qual whose outer join seems to have been deleted entirely by
>> analyzejoins.c.  Up to now it was possible to avert our eyes from the fact
>> that join removal is lazy about getting rid of unused InitPlans, but if
>> they're going to disable parallelization of the surrounding query, maybe
>> we need to work harder on that.

> Yeah, that makes sense, but not sure whether we should try it along
> with this patch.

I'm not certain that sublinks in deletable outer join quals is a case
important enough to sweat over.  But this is the first time I've realized
that failure to remove those initPlans is capable of making a plan worse.
So it's something to keep an eye on.  (I still wish that we could make
join removal happen during join tree preprocessing; doing it where it is
is nothing but a kluge, with unpleasant side effects like this one.)
        regards, tom lane