Re: Parallelize correlated subqueries that execute within each worker

Поиск
Список
Период
Сортировка
От vignesh C
Тема Re: Parallelize correlated subqueries that execute within each worker
Дата
Msg-id CALDaNm3XwCCf=CXsrCrYSa3h1x9PXr2xV+=gMb+j8XEnpnQ8hw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallelize correlated subqueries that execute within each worker  (James Coleman <jtc331@gmail.com>)
Ответы Re: Parallelize correlated subqueries that execute within each worker  (Akshat Jaimini <destrex271@gmail.com>)
Re: Parallelize correlated subqueries that execute within each worker  (James Coleman <jtc331@gmail.com>)
Список pgsql-hackers
On Tue, 4 Jul 2023 at 06:56, James Coleman <jtc331@gmail.com> wrote:
>
> On Sun, Jun 11, 2023 at 10:23 PM James Coleman <jtc331@gmail.com> wrote:
> >
> > ...
> > > And while trying the v9 patch I came across a crash with the query
> > > below.
> > >
> > > set min_parallel_table_scan_size to 0;
> > > set parallel_setup_cost to 0;
> > > set parallel_tuple_cost to 0;
> > >
> > > explain (costs off)
> > > select * from pg_description t1 where objoid in
> > >     (select objoid from pg_description t2 where t2.description = t1.description);
> > >                        QUERY PLAN
> > > --------------------------------------------------------
> > >  Seq Scan on pg_description t1
> > >    Filter: (SubPlan 1)
> > >    SubPlan 1
> > >      ->  Gather
> > >            Workers Planned: 2
> > >            ->  Parallel Seq Scan on pg_description t2
> > >                  Filter: (description = t1.description)
> > > (7 rows)
> > >
> > > select * from pg_description t1 where objoid in
> > >     (select objoid from pg_description t2 where t2.description = t1.description);
> > > WARNING:  terminating connection because of crash of another server process
> > >
> > > Seems something is wrong when extracting the argument from the Param in
> > > parallel worker.
> >
> > With what I'm trying to change I don't think this plan should ever be
> > generated since it means we'd have to pass a param from the outer seq
> > scan into the parallel subplan, which we can't do (currently).
> >
> > I've attached the full backtrace to the email, but as you hinted at
> > the parallel worker is trying to get the param (in this case
> > detoasting it), but the param doesn't exist on the worker, so it seg
> > faults. Looking at this further I think there's an existing test case
> > that exposes the misplanning here (the one right under the comment
> > "Parallel Append is not to be used when the subpath depends on the
> > outer param" in select_parallel.sql), but it doesn't seg fault because
> > the param is an integer, doesn't need to be detoasted, and therefore
> > (I think) we skate by (but probably with wrong results in depending on
> > the dataset).
> >
> > Interestingly this is one of the existing test queries my original
> > patch approach didn't change, so this gives me something specific to
> > work with improving the path. Thanks for testing this and bringing
> > this to my attention!
>
> Here's what I've found debugging this:
>
> There's only a single gather path ever created when planning this
> query, making it easy to know which one is the problem. That gather
> path is created with this stacktrace:
>
>     frame #0: 0x0000000105291590
> postgres`create_gather_path(root=0x000000013081ae78,
> rel=0x000000013080c8e8, subpath=0x000000013081c080,
> target=0x000000013081c8c0, required_outer=0x0000000000000000,
> rows=0x0000000000000000) at pathnode.c:1971:2
>     frame #1: 0x0000000105208e54
> postgres`generate_gather_paths(root=0x000000013081ae78,
> rel=0x000000013080c8e8, override_rows=false) at allpaths.c:3097:4
>     frame #2: 0x00000001052090ec
> postgres`generate_useful_gather_paths(root=0x000000013081ae78,
> rel=0x000000013080c8e8, override_rows=false) at allpaths.c:3241:2
>     frame #3: 0x0000000105258754
> postgres`apply_scanjoin_target_to_paths(root=0x000000013081ae78,
> rel=0x000000013080c8e8, scanjoin_targets=0x000000013081c978,
> scanjoin_targets_contain_srfs=0x0000000000000000,
> scanjoin_target_parallel_safe=true, tlist_same_exprs=true) at
> planner.c:7696:3
>     frame #4: 0x00000001052533cc
> postgres`grouping_planner(root=0x000000013081ae78, tuple_fraction=0.5)
> at planner.c:1611:3
>     frame #5: 0x0000000105251e9c
> postgres`subquery_planner(glob=0x00000001308188d8,
> parse=0x000000013080caf8, parent_root=0x000000013080cc38,
> hasRecursion=false, tuple_fraction=0.5) at planner.c:1062:2
>     frame #6: 0x000000010526b134
> postgres`make_subplan(root=0x000000013080cc38,
> orig_subquery=0x000000013080ff58, subLinkType=ANY_SUBLINK,
> subLinkId=0, testexpr=0x000000013080d848, isTopQual=true) at
> subselect.c:221:12
>     frame #7: 0x0000000105268b8c
> postgres`process_sublinks_mutator(node=0x000000013080d6d8,
> context=0x000000016b0998f8) at subselect.c:1950:10
>     frame #8: 0x0000000105268ad8
> postgres`SS_process_sublinks(root=0x000000013080cc38,
> expr=0x000000013080d6d8, isQual=true) at subselect.c:1923:9
>     frame #9: 0x00000001052527b8
> postgres`preprocess_expression(root=0x000000013080cc38,
> expr=0x000000013080d6d8, kind=0) at planner.c:1169:10
>     frame #10: 0x0000000105252954
> postgres`preprocess_qual_conditions(root=0x000000013080cc38,
> jtnode=0x000000013080d108) at planner.c:1214:14
>     frame #11: 0x0000000105251580
> postgres`subquery_planner(glob=0x00000001308188d8,
> parse=0x0000000137010d68, parent_root=0x0000000000000000,
> hasRecursion=false, tuple_fraction=0) at planner.c:832:2
>     frame #12: 0x000000010525042c
> postgres`standard_planner(parse=0x0000000137010d68,
> query_string="explain (costs off)\nselect * from pg_description t1
> where objoid in\n    (select objoid from pg_description t2 where
> t2.description = t1.description);", cursorOptions=2048,
> boundParams=0x0000000000000000) at planner.c:411:9
>
> There aren't any lateral markings on the rels. Additionally the
> partial path has param_info=null (I found out from Tom in a separate
> thread [1] that this is only set for outer relations from the same
> query level).
>
> The only param that I could easily find at first was a single param of
> type PARAM_EXTERN in root->plan_params in make_subplan().
>
> I spent a lot of time trying to figure out where we could find the
> PARAM_EXEC param that's being fed into the subplan, but it doesn't
> seem like we have access to any of these things at the point in the
> path creation process that it's interesting to us when inserting the
> gather nodes.
>
> Given all of that I settled on this approach:
> 1. Modify is_parallel_safe() to by default ignore PARAM_EXEC params.
> 2. Add is_parallel_safe_with_params() that checks for the existence of
> such params.
> 3. Store the required params in a bitmapset on each base rel.
> 4. Union the bitmapset on join rels.
> 5. Only insert a gather node if that bitmapset is empty.
>
> I have an intuition that there's some spot (e.g. joins) that we should
> be removing params from this set (e.g., when we've satisfied them),
> but I haven't been able to come up with such a scenario as yet.
>
> The attached v11 fixes the issue you reported.

One of the tests has failed in CFBot at [1] with:
+++ /tmp/cirrus-ci-build/build/testrun/pg_upgrade/002_pg_upgrade/data/results/select_parallel.out
2023-12-20 20:08:42.480004000 +0000
@@ -137,23 +137,24 @@
 explain (costs off)
  select (select max((select pa1.b from part_pa_test pa1 where pa1.a = pa2.a)))
  from part_pa_test pa2;
-                          QUERY PLAN
---------------------------------------------------------------
- Aggregate
+                             QUERY PLAN
+--------------------------------------------------------------------
+ Finalize Aggregate
    ->  Gather
          Workers Planned: 3
-         ->  Parallel Append
-               ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
-               ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
+         ->  Partial Aggregate
+               ->  Parallel Append
+                     ->  Parallel Seq Scan on part_pa_test_p1 pa2_1
+                     ->  Parallel Seq Scan on part_pa_test_p2 pa2_2
+               SubPlan 1
+                 ->  Append
+                       ->  Seq Scan on part_pa_test_p1 pa1_1
+                             Filter: (a = pa2.a)
+                       ->  Seq Scan on part_pa_test_p2 pa1_2
+                             Filter: (a = pa2.a)
    SubPlan 2
      ->  Result
-   SubPlan 1
-     ->  Append
-           ->  Seq Scan on part_pa_test_p1 pa1_1
-                 Filter: (a = pa2.a)
-           ->  Seq Scan on part_pa_test_p2 pa1_2
-                 Filter: (a = pa2.a)
-(14 rows)
+(15 rows)

More details of the failure is available at [2].

[1] - https://cirrus-ci.com/task/5685696451575808
[2] -
https://api.cirrus-ci.com/v1/artifact/task/5685696451575808/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade

Regards,
Vignesh



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: speed up a logical replica setup
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: INFORMATION_SCHEMA note