Обсуждение: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

Поиск
Список
Период
Сортировка

[sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Andreas Seltenreich
Дата:
Hi,

when fuzz testing master as of c1543a8, parallel workers trigger the
following assertion in ExecInitSubPlan every couple hours.

    TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File: "list.c", Line: 390)

Sample backtraces of a worker and leader below, plan of leader attached.
The collected queries don't seem to reproduce it.  Curiously, running
explain on them on the failed instance after crash recovery never shows
any gather nodes…

regards,
andreas

Core was generated by `postgres: bgworker: parallel worker for PID 28062                   '.
Program terminated with signal SIGABRT, Aborted.
(gdb) bt
#3  0x000000000061bad2 in list_nth_cell (list=0x0, n=<optimized out>) at list.c:390
#4  0x000000000061bb26 in list_nth (list=<optimized out>, n=<optimized out>) at list.c:413
#5  0x00000000005fe566 in ExecInitSubPlan (subplan=subplan@entry=0x1522a08, parent=parent@entry=0x1538188) at
nodeSubplan.c:705
#6  0x00000000005e3b54 in ExecInitExpr (node=0x1522a08, parent=parent@entry=0x1538188) at execQual.c:4724
#7  0x00000000005e415c in ExecInitExpr (node=<optimized out>, parent=parent@entry=0x1538188) at execQual.c:5164
#8  0x00000000005ff3fc in ExecInitAlternativeSubPlan (asplan=asplan@entry=0x1522060, parent=parent@entry=0x1538188) at
nodeSubplan.c:1185
#9  0x00000000005e35c4 in ExecInitExpr (node=0x1522060, parent=parent@entry=0x1538188) at execQual.c:4740
#10 0x00000000005e3f8a in ExecInitExpr (node=0x1522978, parent=parent@entry=0x1538188) at execQual.c:4845
#11 0x00000000005e415c in ExecInitExpr (node=<optimized out>, parent=parent@entry=0x1538188) at execQual.c:5164
#12 0x00000000005e3687 in ExecInitExpr (node=0x1522920, parent=parent@entry=0x1538188) at execQual.c:4648
#13 0x00000000005e330f in ExecInitExpr (node=0x15228c8, parent=parent@entry=0x1538188) at execQual.c:5132
#14 0x00000000005e326f in ExecInitExpr (node=0x1522870, parent=parent@entry=0x1538188) at execQual.c:5152
#15 0x00000000005e415c in ExecInitExpr (node=<optimized out>, parent=parent@entry=0x1538188) at execQual.c:5164
#16 0x00000000005fbb62 in ExecInitSeqScan (node=0x1522728, estate=0x15379b8, eflags=16) at nodeSeqscan.c:192
#17 0x00000000005dd567 in ExecInitNode (node=0x1522728, estate=estate@entry=0x15379b8, eflags=eflags@entry=16) at
execProcnode.c:192
#18 0x00000000005f12a5 in ExecInitHashJoin (node=0x1522530, estate=0x15379b8, eflags=16) at nodeHashjoin.c:489
#19 0x00000000005dd497 in ExecInitNode (node=node@entry=0x1522530, estate=estate@entry=0x15379b8,
eflags=eflags@entry=16)at execProcnode.c:275 
#20 0x00000000005dae6c in InitPlan (eflags=16, queryDesc=<optimized out>) at execMain.c:959
#21 standard_ExecutorStart (queryDesc=<optimized out>, eflags=16) at execMain.c:238
#22 0x00000000005dcac4 in ParallelQueryMain (seg=<optimized out>, toc=0x7f442d27b000) at execParallel.c:729
#23 0x00000000004e631b in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1033
#24 0x0000000000683af2 in StartBackgroundWorker () at bgworker.c:726
#25 0x000000000068ec32 in do_start_bgworker (rw=0x14c4a20) at postmaster.c:5531
#26 maybe_start_bgworker () at postmaster.c:5706
#27 0x000000000068f686 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4967
#28 <signal handler called>
#29 0x00007f442c839ac3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
#30 0x000000000046c144 in ServerLoop () at postmaster.c:1654
#31 0x0000000000690aae in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x14a1560) at postmaster.c:1298
#32 0x000000000046d78d in main (argc=4, argv=0x14a1560) at main.c:228
(gdb) frame 5
#5  0x00000000005fe566 in ExecInitSubPlan (subplan=subplan@entry=0x1522a08, parent=parent@entry=0x1538188) at
nodeSubplan.c:705
(gdb) list
704        /* Link the SubPlanState to already-initialized subplan */
705        sstate->planstate = (PlanState *) list_nth(estate->es_subplanstates,
706                                                   subplan->plan_id - 1);

(gdb) attach 28062
Attaching to program: /home/smith/postgres/inst/master/bin/postgres, process 28062
(gdb) bt
#0  0x00007f442c840e33 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1  0x00000000006d1dde in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7fffdedd75a0, cur_timeout=-1,
set=0xe3eedb8)at latch.c:981 
#2  WaitEventSetWait (set=set@entry=0xe3eedb8, timeout=timeout@entry=-1,
occurred_events=occurred_events@entry=0x7fffdedd75a0,nevents=nevents@entry=1) at latch.c:935 
#3  0x00000000006d2226 in WaitLatchOrSocket (latch=0x7f442c0d9644, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1,
timeout=-1,timeout@entry=0) at latch.c:347 
#4  0x00000000006d22ed in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0) at
latch.c:302
#5  0x00000000005ef4e3 in gather_readnext (gatherstate=0xe3d7ce8) at nodeGather.c:384
#6  gather_getnext (gatherstate=0xe3d7ce8) at nodeGather.c:283
#7  ExecGather (node=node@entry=0xe3d7ce8) at nodeGather.c:229
#8  0x00000000005dd728 in ExecProcNode (node=node@entry=0xe3d7ce8) at execProcnode.c:515
#9  0x00000000005f995c in ExecNestLoop (node=node@entry=0x10ef9d90) at nodeNestloop.c:174
#10 0x00000000005dd7b8 in ExecProcNode (node=node@entry=0x10ef9d90) at execProcnode.c:476
#11 0x00000000005f4990 in ExecLimit (node=node@entry=0x10ef9ac0) at nodeLimit.c:91
#12 0x00000000005dd6b8 in ExecProcNode (node=node@entry=0x10ef9ac0) at execProcnode.c:531
#13 0x00000000005d993f in ExecutePlan (dest=0x5babba0, direction=<optimized out>, numberTuples=0, sendTuples=<optimized
out>,operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x10ef9ac0, estate=0x368b6e8) at
execMain.c:1567
#14 standard_ExecutorRun (queryDesc=0x52a89f8, direction=<optimized out>, count=0) at execMain.c:338
#15 0x00000000006f7118 in PortalRunSelect (portal=portal@entry=0x3e84a10, forward=forward@entry=1 '\001', count=0,
count@entry=9223372036854775807,dest=dest@entry=0x5babba0) at pquery.c:946 
#16 0x00000000006f863e in PortalRun (portal=0x3e84a10, count=9223372036854775807, isTopLevel=<optimized out>,
dest=0x5babba0,altdest=0x5babba0, completionTag=0x7fffdedd7b30 "") at pquery.c:787 
#17 0x00000000006f5ef3 in exec_simple_query (query_string=<optimized out>) at postgres.c:1094
#18 PostgresMain (argc=65554960, argv=0xa38beb0, dbname=0x14a2578 "regression", username=0xa38c088 "\260\276\070\n") at
postgres.c:4059
#19 0x000000000046c82a in BackendRun (port=0x14c75b0) at postmaster.c:4258
#20 BackendStartup (port=0x14c75b0) at postmaster.c:3932
#21 ServerLoop () at postmaster.c:1690
#22 0x0000000000690aae in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x14a1560) at postmaster.c:1298
#23 0x000000000046d78d in main (argc=4, argv=0x14a1560) at main.c:228

(gdb) p debug_query_string
select ref_68.is_trigger_deletable as c0, (select d from
public.renamecolumnchild limit 1 offset 15) as c1,
ref_68.is_trigger_insertable_into as c2, 3 as c3 from
information_schema.role_udt_grants as ref_67 left join
information_schema.views as ref_68 on (ref_67.grantee =
ref_68.table_catalog ) where true limit 138;


Вложения

Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Tom Lane
Дата:
Andreas Seltenreich <seltenreich@gmx.de> writes:
> when fuzz testing master as of c1543a8, parallel workers trigger the
> following assertion in ExecInitSubPlan every couple hours.
>     TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File: "list.c", Line: 390)
> Sample backtraces of a worker and leader below, plan of leader attached.
> The collected queries don't seem to reproduce it.

Odd.  My understanding of the restrictions on parallel query is that
anything involving a SubPlan ought not be parallelized; and this query
doesn't seem to get parallelized when I try it.  It seems like what you're
seeing is that sometimes that restriction fails to be checked, but how
could that happen?
        regards, tom lane



Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Amit Kapila
Дата:
On Fri, May 6, 2016 at 8:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andreas Seltenreich <seltenreich@gmx.de> writes:
> > when fuzz testing master as of c1543a8, parallel workers trigger the
> > following assertion in ExecInitSubPlan every couple hours.
> >     TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File: "list.c", Line: 390)
> > Sample backtraces of a worker and leader below, plan of leader attached.
> > The collected queries don't seem to reproduce it.
>
> Odd.  My understanding of the restrictions on parallel query is that
> anything involving a SubPlan ought not be parallelized;
>

Subplan references are considered parallel-restricted, so parallel plan can be generated if there are subplans in a query, but they shouldn't be pushed to workers.  I have tried a somewhat simpler example to see if we pushdown anything parallel restricted to worker in case of joins and it turned out there are cases when that can happen.  Consider below example:

create or replace function parallel_func_select() returns integer
as $$
declare
    ret_val int;
begin
     ret_val := 1000;
     return ret_val;
end;
$$ language plpgsql Parallel Restricted;

CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 10000000) g;

CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 1000000) g;

Explain Verbose SELECT t1.c1 + parallel_func_select(), t2.c1 FROM t1 JOIN t2 ON t1.c1 = t2.c1;

                                       QUERY PLAN

--------------------------------------------------------------------------------
--------
 Gather  (cost=32813.00..537284.53 rows=1000000 width=8)
   Output: ((t1.c1 + parallel_func_select())), t2.c1
   Workers Planned: 2
   ->  Hash Join  (cost=31813.00..436284.53 rows=1000000 width=8)
         Output: (t1.c1 + parallel_func_select()), t2.c1
         Hash Cond: (t1.c1 = t2.c1)
         ->  Parallel Seq Scan on public.t1  (cost=0.00..95721.08 rows=4166608 w
idth=4)
               Output: t1.c1, t1.c2
         ->  Hash  (cost=15406.00..15406.00 rows=1000000 width=4)
               Output: t2.c1
               ->  Seq Scan on public.t2  (cost=0.00..15406.00 rows=1000000 widt
h=4)
                     Output: t2.c1
(12 rows)


From the above output it is clear that parallel restricted function is pushed down below gather node.  I found that though we have have care fully avoided to push pathtarget below GatherPath in apply_projection_to_path() if pathtarget contains any parallel unsafe or parallel restricted clause, but we are separately also trying to apply pathtarget to partialpath list which doesn't seem to be the correct way even if it is required.  I think this has been added during parallel aggregate patch and it seems to me this is not required after the changes related to GatherPath in apply_projection_to_path().

After applying the attached patch, it avoids to add parallel restricted clauses below gather path.

Now back to the original bug, if you notice in plan file attached in original bug report, the subplan is pushed below Gather node in target list, but not to immediate join, rather at one more level down to SeqScan path.  I am still not sure how it has manage to push the restricted clauses to that down the level.

Thoughts?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Вложения

Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Amit Kapila
Дата:
On Sat, May 7, 2016 at 6:37 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, May 6, 2016 at 8:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Andreas Seltenreich <seltenreich@gmx.de> writes:
> > > when fuzz testing master as of c1543a8, parallel workers trigger the
> > > following assertion in ExecInitSubPlan every couple hours.
> > >     TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File: "list.c", Line: 390)
> > > Sample backtraces of a worker and leader below, plan of leader attached.
> > > The collected queries don't seem to reproduce it.
> >
> > Odd.  My understanding of the restrictions on parallel query is that
> > anything involving a SubPlan ought not be parallelized;
> >
>
> Subplan references are considered parallel-restricted, so parallel plan can be generated if there are subplans in a query, but they shouldn't be pushed to workers.  I have tried a somewhat simpler example to see if we pushdown anything parallel restricted to worker in case of joins and it turned out there are cases when that can happen.  Consider below example:
>
>  
>
> From the above output it is clear that parallel restricted function is pushed down below gather node.  I found that though we have have care fully avoided to push pathtarget below GatherPath in apply_projection_to_path() if pathtarget contains any parallel unsafe or parallel restricted clause, but we are separately also trying to apply pathtarget to partialpath list which doesn't seem to be the correct way even if it is required.  I think this has been added during parallel aggregate patch and it seems to me this is not required after the changes related to GatherPath in apply_projection_to_path().
>
> After applying the attached patch, it avoids to add parallel restricted clauses below gather path.
>
> Now back to the original bug, if you notice in plan file attached in original bug report, the subplan is pushed below Gather node in target list, but not to immediate join, rather at one more level down to SeqScan path.  I am still not sure how it has manage to push the restricted clauses to that down the level.
>

On further analysis, I think I know what is going on in the original bug report.  We add the Vars (build_base_rel_tlists) and PlaceholderVars (add_placeholders_to_base_rels()) to each relations (RelOptInfo) target during qurey_planner and the Subplans are added as PlaceHolderVars in target expressions.  Now while considering whether a particular rel can be parallel in set_rel_consider_parallel(), we don't check the target expressions to allow the relation for parallelism.  I think we can prohibit the relation to be considered for parallelism if it's target expressions contain any parallel restricted clause.  Fix on those lines is attached with this mail.

Thanks to Dilip Kumar for helping me in narrowing down this particular problem.  We were not able to generate the exact test, but I think the above theory is sufficient to prove that it can cause a problem as seen in the original bug report. 



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Вложения

Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Robert Haas
Дата:
On Thu, May 12, 2016 at 7:46 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On further analysis, I think I know what is going on in the original bug
> report.  We add the Vars (build_base_rel_tlists) and PlaceholderVars
> (add_placeholders_to_base_rels()) to each relations (RelOptInfo) target
> during qurey_planner and the Subplans are added as PlaceHolderVars in target
> expressions.  Now while considering whether a particular rel can be parallel
> in set_rel_consider_parallel(), we don't check the target expressions to
> allow the relation for parallelism.  I think we can prohibit the relation to
> be considered for parallelism if it's target expressions contain any
> parallel restricted clause.  Fix on those lines is attached with this mail.
>
> Thanks to Dilip Kumar for helping me in narrowing down this particular
> problem.  We were not able to generate the exact test, but I think the above
> theory is sufficient to prove that it can cause a problem as seen in the
> original bug report.

I could be wrong, but I thought that the target list for an expression
would always contain only Vars at this stage.  Non-default tlists get
injected at the end of scan/join planning.  Am I wrong?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I could be wrong, but I thought that the target list for an expression
> would always contain only Vars at this stage.  Non-default tlists get
> injected at the end of scan/join planning.  Am I wrong?

Target list for a relation, you mean?  See relation.h:
*      reltarget - Default Path output tlist for this rel; normally contains*                  Var and PlaceHolderVar
nodesfor the values we need to*                  output from this relation.*                  List is in no particular
order,but all rels of an*                  appendrel set must use corresponding orders.*                  NOTE: in an
appendrelchild relation, may contain*                  arbitrary expressions pulled up from a subquery!
 
        regards, tom lane



Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Robert Haas
Дата:
On Thu, May 12, 2016 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I could be wrong, but I thought that the target list for an expression
>> would always contain only Vars at this stage.  Non-default tlists get
>> injected at the end of scan/join planning.  Am I wrong?
>
> Target list for a relation, you mean?  See relation.h:
>
>  *      reltarget - Default Path output tlist for this rel; normally contains
>  *                  Var and PlaceHolderVar nodes for the values we need to
>  *                  output from this relation.
>  *                  List is in no particular order, but all rels of an
>  *                  appendrel set must use corresponding orders.
>  *                  NOTE: in an appendrel child relation, may contain
>  *                  arbitrary expressions pulled up from a subquery!

Err, wow.  That makes my head hurt.  Can you explain why this case
only arises for appendrel children, and not for plain rels?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
>> Target list for a relation, you mean?  See relation.h:
>> 
>> *      reltarget - Default Path output tlist for this rel; normally contains
>> *                  Var and PlaceHolderVar nodes for the values we need to
>> *                  output from this relation.
>> *                  List is in no particular order, but all rels of an
>> *                  appendrel set must use corresponding orders.
>> *                  NOTE: in an appendrel child relation, may contain
>> *                  arbitrary expressions pulled up from a subquery!

> Err, wow.  That makes my head hurt.  Can you explain why this case
> only arises for appendrel children, and not for plain rels?

Well, plain rels only output Vars ;-)

But consider an appendrel representing
(SELECT x+1 FROM t1 UNION ALL SELECT y+2 FROM t2) ss(a)

The RTE for ss will have a reltarget list containing just "a".
Once we pull up the subqueries, the reltarget lists for the two child
appendrel members will need to contain "x+1" and "y+2" in order to be
equivalent to the parent's reltarget list.  See set_append_rel_size(),
which does that transformation.

This doesn't happen with ordinary subquery flattening because there
isn't a RelOptInfo corresponding to an ordinary subquery that's been
pulled up into the parent query.
        regards, tom lane



Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Amit Kapila
Дата:
On Thu, May 12, 2016 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> >> Target list for a relation, you mean?  See relation.h:
> >>
> >> *      reltarget - Default Path output tlist for this rel; normally contains
> >> *                  Var and PlaceHolderVar nodes for the values we need to
> >> *                  output from this relation.
> >> *                  List is in no particular order, but all rels of an
> >> *                  appendrel set must use corresponding orders.
> >> *                  NOTE: in an appendrel child relation, may contain
> >> *                  arbitrary expressions pulled up from a subquery!
>
> > Err, wow.  That makes my head hurt.  Can you explain why this case
> > only arises for appendrel children, and not for plain rels?
>
> Well, plain rels only output Vars ;-)
>

Does this mean that base rels can't contain PlaceHolderVars?  Isn't it possible in below code:

query_planner()
{
..
/*
* Now distribute "placeholders" to base rels as needed.  This has to be
* done after join removal because removal could change whether a
* placeholder is evaluatable at a base rel.
*/
add_placeholders_to_base_rels(root);
..
}

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Amit Kapila
Дата:
On Fri, May 13, 2016 at 9:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, May 12, 2016 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Robert Haas <robertmhaas@gmail.com> writes:
> > >> Target list for a relation, you mean?  See relation.h:
> > >>
> > >> *      reltarget - Default Path output tlist for this rel; normally contains
> > >> *                  Var and PlaceHolderVar nodes for the values we need to
> > >> *                  output from this relation.
> > >> *                  List is in no particular order, but all rels of an
> > >> *                  appendrel set must use corresponding orders.
> > >> *                  NOTE: in an appendrel child relation, may contain
> > >> *                  arbitrary expressions pulled up from a subquery!
> >
> > > Err, wow.  That makes my head hurt.  Can you explain why this case
> > > only arises for appendrel children, and not for plain rels?
> >
> > Well, plain rels only output Vars ;-)
> >
>
> Does this mean that base rels can't contain PlaceHolderVars?  Isn't it possible in below code:
>

Here I want to ask base rels which are plain rels?

It might be that I am missing something, but if we debug the serial plan for original query [1] for which this issue is reported, we have noticed that PlaceHolderVars that contain subplans are added to base rels for which RTE kind is (RTE_RELATION).


[1] - select ref_68.is_trigger_deletable as c0, (select d from
public.renamecolumnchild limit 1 offset 15) as c1,
ref_68.is_trigger_insertable_into as c2, 3 as c3 from
information_schema.role_udt_grants as ref_67 left join
information_schema.views as ref_68 on (ref_67.grantee =
ref_68.table_catalog ) where true limit 138;

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Dilip Kumar
Дата:

On Fri, May 13, 2016 at 10:31 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Here I want to ask base rels which are plain rels?

It might be that I am missing something, but if we debug the serial plan for original query [1] for which this issue is reported, we have noticed that PlaceHolderVars that contain subplans are added to base rels for which RTE kind is (RTE_RELATION).


[1] - select ref_68.is_trigger_deletable as c0, (select d from

Finally, I could reproduce this issue, with just three tables.

If targetlist of top query has subquery references and if subquery is part of OUTER join, then while pulling up the subquery,
it will create PlaceHolderVar, which will have expressions.

After applying prohibit_parallel_clause_below_rel_v1.patch, it is not selecting parallel plan.
So now its clear that because of sub query pullup, we may get expression in targetlist while creating single table path list. So we need to avoid parallel plan if it contains expression.

Below is my test:
--------------------------

postgres=# CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM
postgres-# generate_series(1, 10000000) g;

SELECT 10000000
postgres=# 
postgres=# CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM
postgres-# generate_series(1, 1000000) g;
SELECT 1000000
postgres=# CREATE TABLE t3(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 1000000) g;
SELECT 1000000
postgres=# analyze t1;
ANALYZE
postgres=# analyze t2;
ANALYZE
postgres=# analyze t3;
ANALYZE

postgres=# explain select t1.c1, y.x from t1 left join (select (select t3.c1 from t3 where t3.c1=t2.c1) as x from t2)y on (y.x=t1.c1);
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Hash Right Join  (cost=319113.85..96991333895.85 rows=9999860 width=8)
   Hash Cond: ((SubPlan 2) = t1.c1)
   ->  Gather  (cost=1000.00..7460943906.00 rows=1000000 width=8)
         Workers Planned: 2
         ->  Parallel Seq Scan on t2  (cost=0.00..7460842906.00 rows=416667 width=8)
               SubPlan 1
                 ->  Seq Scan on t3  (cost=0.00..17906.00 rows=1 width=4)
                       Filter: (c1 = t2.c1)
   ->  Hash  (cost=154053.60..154053.60 rows=9999860 width=4)
         ->  Seq Scan on t1  (cost=0.00..154053.60 rows=9999860 width=4)
   SubPlan 2
     ->  Seq Scan on t3 t3_1  (cost=0.00..17906.00 rows=1 width=4)
           Filter: (c1 = t2.c1)
(13 rows)

postgres=# select t1.c1, y.x from t1 left join (select (select t3.c1 from t3 where t3.c1=t2.c1) as x from t2)y on (y.x=t1.c1);
LOG:  worker process: parallel worker for PID 109446 (PID 109483) was terminated by signal 11: Segmentation fault
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Andreas Seltenreich
Дата:
Amit Kapila writes:

> avoid_restricted_clause_below_gather_v1.patch
> prohibit_parallel_clause_below_rel_v1.patch

I didn't observe any parallel worker related coredumps since applying
these.  The same amount of testing done before applying them yielded
about a dozend.

Dilip Kumar writes:

> So now its clear that because of sub query pullup, we may get expression in
> targetlist while creating single table path list. So we need to avoid
> parallel plan if it contains expression.

This sounds like a rather heavy restriction though…

regards,
Andreas



Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Amit Kapila
Дата:
On Sun, May 22, 2016 at 9:32 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>
> Amit Kapila writes:
>
> > avoid_restricted_clause_below_gather_v1.patch
> > prohibit_parallel_clause_below_rel_v1.patch
>
> I didn't observe any parallel worker related coredumps since applying
> these.  The same amount of testing done before applying them yielded
> about a dozend.
>

Thanks for verification.

> Dilip Kumar writes:
>
> > So now its clear that because of sub query pullup, we may get expression in
> > targetlist while creating single table path list. So we need to avoid
> > parallel plan if it contains expression.
>
> This sounds like a rather heavy restriction though…
>

I think what Dilip means by above statement is to avoid parallel plan if target list contains parallel unsafe or restricted expressions.  We already restrict generation of parallel plans if qualification for a relation contains such expressions (refer set_rel_consider_parallel()), so this doesn't sound to be heavy restriction.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Amit Kapila
Дата:
On Thu, May 12, 2016 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> >> Target list for a relation, you mean?  See relation.h:
> >>
> >> *      reltarget - Default Path output tlist for this rel; normally contains
> >> *                  Var and PlaceHolderVar nodes for the values we need to
> >> *                  output from this relation.
> >> *                  List is in no particular order, but all rels of an
> >> *                  appendrel set must use corresponding orders.
> >> *                  NOTE: in an appendrel child relation, may contain
> >> *                  arbitrary expressions pulled up from a subquery!
>
> > Err, wow.  That makes my head hurt.  Can you explain why this case
> > only arises for appendrel children, and not for plain rels?
>
> Well, plain rels only output Vars ;-)
>
> But consider an appendrel representing
>
>         (SELECT x+1 FROM t1 UNION ALL SELECT y+2 FROM t2) ss(a)
>
> The RTE for ss will have a reltarget list containing just "a".
> Once we pull up the subqueries, the reltarget lists for the two child
> appendrel members will need to contain "x+1" and "y+2" in order to be
> equivalent to the parent's reltarget list.  See set_append_rel_size(),
> which does that transformation.
>

I think this can also lead to an issue, as currently for child relations, we just copy consider_parallel flag of parent.  Now, if the child rel target list is adjusted such that it contains parallel restricted expression, then we are bound to fail.  I think to handle append rel case appropriately, we should compute the consider_parallel flag for each child relation in set_append_rel_size() and then later when computing the flag for parent rel, consider the flag for child rels (basically if any child rel has consider_parallel as false, then set consider_parallel for parent as false).  To achieve this, we need to call  set_rel_consider_parallel() after calling set_rel_size().

Thoughts?

Just to summarize, apart from above issue, we have discussed two different issues related to parallel query in this thread.
a. Push down of parallel restricted clauses to nodes below gather.  Patch to fix same is posted upthread [1].
b. Wrong assumption that target list can only contain Vars.  Patch to fix same is posted upthread [2].  Test which prove our assumption is wrong is also posted upthread [3].

Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Noah Misch
Дата:
On Thu, May 26, 2016 at 05:08:31PM +0530, Amit Kapila wrote:
> Just to summarize, apart from above issue, we have discussed two different
> issues related to parallel query in this thread.
> a. Push down of parallel restricted clauses to nodes below gather.  Patch
> to fix same is posted upthread [1].
> b. Wrong assumption that target list can only contain Vars.  Patch to fix
> same is posted upthread [2].  Test which prove our assumption is wrong is
> also posted upthread [3].
> 
> I will add this issue in list of open issues for 9.6 @
> https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items
> 
> [1] -
> https://www.postgresql.org/message-id/CAA4eK1Ky2=HsTsT4hmfL=EAL5rv0_t59tvWzVK9HQKvN6Dovkw@mail.gmail.com
> [2] -
> https://www.postgresql.org/message-id/CAA4eK1L-Uo=s4=0jvvVA51pj06u5WdDvSQg673yuxJ_Ja+x86Q@mail.gmail.com
> [3] -
> https://www.postgresql.org/message-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=zWqA@mail.gmail.com

[Action required within 72 hours.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
efforts toward speedy resolution.  Thanks.

[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com



Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Robert Haas
Дата:
On Sat, May 7, 2016 at 9:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> From the above output it is clear that parallel restricted function is
> pushed down below gather node.  I found that though we have have care fully
> avoided to push pathtarget below GatherPath in apply_projection_to_path() if
> pathtarget contains any parallel unsafe or parallel restricted clause, but
> we are separately also trying to apply pathtarget to partialpath list which
> doesn't seem to be the correct way even if it is required.  I think this has
> been added during parallel aggregate patch and it seems to me this is not
> required after the changes related to GatherPath in
> apply_projection_to_path().

Good catch.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Robert Haas
Дата:
On Thu, May 12, 2016 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Err, wow.  That makes my head hurt.  Can you explain why this case
>> only arises for appendrel children, and not for plain rels?
>
> Well, plain rels only output Vars ;-)

Hmm.  Dilip's example in
https://www.postgresql.org/message-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=zWqA@mail.gmail.com
seems to show that it's possible to end up with a targetlist
containing non-Vars even for a baserel.  In that example,
pull_up_subqueries() injects t2 into the parent query with a target
list containing two items, one of which is a PlaceHolderVar
referencing the subplan for t3.

Unfortunately, that makes it a bit hard to avoid wasting cycles here.
Amit's patch in
https://www.postgresql.org/message-id/CAA4eK1L-Uo=s4=0jvvVA51pj06u5WdDvSQg673yuxJ_Ja+x86Q@mail.gmail.com
is a bit brute force: it just checks the target list for every
relation for parallel-restricted constructs.  In most cases, no
subqueries having been pulled up, it will find none, but it'll still
have to walk the whole target list to figure that out.  If we had some
way of knowing that nothing was pulled up into the current rel, we
could avoid that.  This seems to apply equally to baserels and
appendrels; both will commonly have a target list containing only
Vars, but both can end up with non-Vars in the target list after
subqueries are pulled up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Robert Haas
Дата:
On Wed, Jun 1, 2016 at 9:35 PM, Noah Misch <noah@leadboat.com> wrote:
> [Action required within 72 hours.  This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.

Status update:

I'm working on it.  I think there are several issues here, of which
I've committed a fix for just one.  I hope to have them all fixed by
next Friday.  If not, I'll send another update then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Amit Kapila
Дата:
On Sat, Jun 4, 2016 at 1:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, May 12, 2016 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Err, wow.  That makes my head hurt.  Can you explain why this case
> >> only arises for appendrel children, and not for plain rels?
> >
> > Well, plain rels only output Vars ;-)
>
> Hmm.  Dilip's example in
> https://www.postgresql.org/message-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=zWqA@mail.gmail.com
> seems to show that it's possible to end up with a targetlist
> containing non-Vars even for a baserel.  In that example,
> pull_up_subqueries() injects t2 into the parent query with a target
> list containing two items, one of which is a PlaceHolderVar
> referencing the subplan for t3.
>
> Unfortunately, that makes it a bit hard to avoid wasting cycles here.
> Amit's patch in
> https://www.postgresql.org/message-id/CAA4eK1L-Uo=s4=0jvvVA51pj06u5WdDvSQg673yuxJ_Ja+x86Q@mail.gmail.com
> is a bit brute force: it just checks the target list for every
> relation for parallel-restricted constructs.  In most cases, no
> subqueries having been pulled up, it will find none, but it'll still
> have to walk the whole target list to figure that out.  If we had some
> way of knowing that nothing was pulled up into the current rel, we
> could avoid that.  This seems to apply equally to baserels and
> appendrels;
>

That seems doable, as for such rels we can only have Vars and PlaceHolderVars in targetlist.  Basically, whenever we are adding PlaceHolderVars to a relation, just remember that information and use it later.  The patch doing the same is attached with this mail.  Now still, this won't cover the case of ChildRels for an Append Relation as for that we adjust target list separately in set_append_rel_size.  I think we need to deal with it separately.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Вложения

Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Robert Haas
Дата:
On Mon, Jun 6, 2016 at 6:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> That seems doable, as for such rels we can only have Vars and
> PlaceHolderVars in targetlist.  Basically, whenever we are adding
> PlaceHolderVars to a relation, just remember that information and use it
> later.  The patch doing the same is attached with this mail.  Now still,
> this won't cover the case of ChildRels for an Append Relation as for that we
> adjust target list separately in set_append_rel_size.  I think we need to
> deal with it separately.

This approach looks pretty good to me.  Here's a revised version of
your patch, with some renaming and other adjustments.  I'm not sure
exactly what you're referring to in set_append_rel_size, but I did add
a line of code there that might be relevant.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Amit Kapila
Дата:
On Thu, Jun 9, 2016 at 8:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jun 6, 2016 at 6:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > That seems doable, as for such rels we can only have Vars and
> > PlaceHolderVars in targetlist.  Basically, whenever we are adding
> > PlaceHolderVars to a relation, just remember that information and use it
> > later.  The patch doing the same is attached with this mail.  Now still,
> > this won't cover the case of ChildRels for an Append Relation as for that we
> > adjust target list separately in set_append_rel_size.  I think we need to
> > deal with it separately.
>
> This approach looks pretty good to me.  Here's a revised version of
> your patch, with some renaming and other adjustments.
>

Your version looks good to me.

>  I'm not sure
> exactly what you're referring to in set_append_rel_size,
>

The below code:

set_append_rel_size()
{
..

/*

* CE failed, so finish copying/modifying targetlist and join quals.

*
* NB: the resulting childrel->reltarget->exprs may contain arbitrary
* expressions, which otherwise would not occur in a rel's targetlist
.
..
*/

childrel->reltarget->exprs = (List *)
adjust_appendrel_attrs(root,
  (Node *) rel->reltarget->exprs,
  appinfo);

What I mean to say is if above code lead to some additional expressions in childrel targetlist, then those can lead to some unsafe expressions in child rel target list. This can cause some problem as we are blindly considering that if parent rel is parallel safe, then child rel will also be parallel safe (in below code:)

set_append_rel_size()
{
..
/* Copy consider_parallel flag from parent. */
childrel->consider_parallel = rel->consider_parallel;


> but I did add
> a line of code there that might be relevant.
>

I am not sure if that can address the above problem.  I have posted my analysis on the same in mail [1].



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Robert Haas
Дата:
On Thu, Jun 9, 2016 at 9:37 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Jun 9, 2016 at 8:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Jun 6, 2016 at 6:07 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> > That seems doable, as for such rels we can only have Vars and
>> > PlaceHolderVars in targetlist.  Basically, whenever we are adding
>> > PlaceHolderVars to a relation, just remember that information and use it
>> > later.  The patch doing the same is attached with this mail.  Now still,
>> > this won't cover the case of ChildRels for an Append Relation as for
>> > that we
>> > adjust target list separately in set_append_rel_size.  I think we need
>> > to
>> > deal with it separately.
>>
>> This approach looks pretty good to me.  Here's a revised version of
>> your patch, with some renaming and other adjustments.
>
> Your version looks good to me.
>
>>  I'm not sure
>> exactly what you're referring to in set_append_rel_size,
>
> The below code:
>
> set_append_rel_size()
> {
> ..
>
> /*
>
> * CE failed, so finish copying/modifying targetlist and join quals.
>
> *
> * NB: the resulting childrel->reltarget->exprs may contain arbitrary
> * expressions, which otherwise would not occur in a rel's targetlist.
> ..
> */
>
> childrel->reltarget->exprs = (List *)
> adjust_appendrel_attrs(root,
>   (Node *) rel->reltarget->exprs,
>   appinfo);
>
> What I mean to say is if above code lead to some additional expressions in
> childrel targetlist, then those can lead to some unsafe expressions in child
> rel target list. This can cause some problem as we are blindly considering
> that if parent rel is parallel safe, then child rel will also be parallel
> safe (in below code:)
>
> set_append_rel_size()
> {
> ..
> /* Copy consider_parallel flag from parent. */
> childrel->consider_parallel = rel->consider_parallel;
>
>
>> but I did add
>> a line of code there that might be relevant.
>>
>
> I am not sure if that can address the above problem.  I have posted my
> analysis on the same in mail [1].
>
>
> [1] -
> https://www.postgresql.org/message-id/CAA4eK1Jz5tG2D2PCNYqYob2cb4dKowKYwVJ7OmP5vwyRyCMx4g%40mail.gmail.com

Do you have a test case that demonstrates the problem with this patch
applied?  I agree there may be a problem, but it would be easier to be
sure whether a given change fixes it if we had a good test case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Amit Kapila
Дата:
On Thu, Jun 9, 2016 at 7:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jun 9, 2016 at 9:37 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Jun 9, 2016 at 8:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> On Mon, Jun 6, 2016 at 6:07 AM, Amit Kapila <amit.kapila16@gmail.com>
> >> wrote:
> >> > That seems doable, as for such rels we can only have Vars and
> >> > PlaceHolderVars in targetlist.  Basically, whenever we are adding
> >> > PlaceHolderVars to a relation, just remember that information and use it
> >> > later.  The patch doing the same is attached with this mail.  Now still,
> >> > this won't cover the case of ChildRels for an Append Relation as for
> >> > that we
> >> > adjust target list separately in set_append_rel_size.  I think we need
> >> > to
> >> > deal with it separately.
> >>
> >> This approach looks pretty good to me.  Here's a revised version of
> >> your patch, with some renaming and other adjustments.
> >
> > Your version looks good to me.
> >
> >>  I'm not sure
> >> exactly what you're referring to in set_append_rel_size,
> >
> > The below code:
> >
> > set_append_rel_size()
> > {
> > ..
> >
> > /*
> >
> > * CE failed, so finish copying/modifying targetlist and join quals.
> >
> > *
> > * NB: the resulting childrel->reltarget->exprs may contain arbitrary
> > * expressions, which otherwise would not occur in a rel's targetlist.
> > ..
> > */
> >
> > childrel->reltarget->exprs = (List *)
> > adjust_appendrel_attrs(root,
> >   (Node *) rel->reltarget->exprs,
> >   appinfo);
> >
> > What I mean to say is if above code lead to some additional expressions in
> > childrel targetlist, then those can lead to some unsafe expressions in child
> > rel target list. This can cause some problem as we are blindly considering
> > that if parent rel is parallel safe, then child rel will also be parallel
> > safe (in below code:)
> >
> > set_append_rel_size()
> > {
> > ..
> > /* Copy consider_parallel flag from parent. */
> > childrel->consider_parallel = rel->consider_parallel;
> >
> >
> >> but I did add
> >> a line of code there that might be relevant.
> >>
> >
> > I am not sure if that can address the above problem.  I have posted my
> > analysis on the same in mail [1].
> >
> >
> > [1] -
> > https://www.postgresql.org/message-id/CAA4eK1Jz5tG2D2PCNYqYob2cb4dKowKYwVJ7OmP5vwyRyCMx4g%40mail.gmail.com
>
> Do you have a test case that demonstrates the problem with this patch
> applied?
>

No, I don't have a test case.  I have tried a bit, but it seems even if there is a problem, one needs to spend quite some time to generate an exact test. I think the current patch fixes the reproducible problem [1], so lets go with it.  We can look into Append case problem if some one has a reproducible case or if I could come up with one.  In any case, I think it is a separate problem then what is reported in this thread.


Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

От
Robert Haas
Дата:
On Thu, Jun 9, 2016 at 10:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> No, I don't have a test case.  I have tried a bit, but it seems even if
> there is a problem, one needs to spend quite some time to generate an exact
> test. I think the current patch fixes the reproducible problem [1], so lets
> go with it.  We can look into Append case problem if some one has a
> reproducible case or if I could come up with one.  In any case, I think it
> is a separate problem then what is reported in this thread.

OK, committed, then.  I have an idea about what that other problem
might be that I'll investigate separately.  For now, I'm marking this
open item as closed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company