Re: Get rid of runtime handling of AlternativeSubPlan?

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: Get rid of runtime handling of AlternativeSubPlan?
Дата
Msg-id CAKU4AWox8HiWMMLR7TjX80+KDfm12b1zJWfNcoSioN9+QvORWQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Get rid of runtime handling of AlternativeSubPlan?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Get rid of runtime handling of AlternativeSubPlan?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers


On Sun, Aug 30, 2020 at 7:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> Back in bd3daddaf232d95b0c9ba6f99b0170a0147dd8af, which introduced
> AlternativeSubPlans, I wrote:
>   There is a lot more that could be done based on this infrastructure: in
>   particular it's interesting to consider switching to the hash plan if we start
>   out using the non-hashed plan but find a lot more upper rows going by than we
>   expected.  I have therefore left some minor inefficiencies in place, such as
>   initializing both subplans even though we will currently only use one.
>
> That commit will be twelve years old come August, and nobody has either
> built anything else atop it or shown any interest in making the plan choice
> switchable mid-run.  So it seems like kind of a failed experiment.
>
> Therefore, I'm considering the idea of ripping out all executor support
> for AlternativeSubPlan and instead having the planner replace an
> AlternativeSubPlan with the desired specific SubPlan somewhere late in
> planning, possibly setrefs.c.

Here's a proposed patchset for that.  This runs with the idea I'd had
that setrefs.c could be smarter than the executor about which plan node
subexpressions will be executed how many times.  I did not take it very
far, for fear of adding an undue number of planning cycles, but it's still
better than what we have now.

For ease of review, 0001 adds the new planner logic, while 0002 removes
the now-dead executor support.

There's one bit of dead code that I left in place for the moment, which is
ruleutils.c's support for printing AlternativeSubPlans.  I'm not sure if
that's worth keeping or not --- it's dead code for normal use, but if
someone tried to use ruleutils.c to print partially-planned expression
trees, maybe there'd be a use for it?

(It's also arguable that readfuncs.c's support is now dead code, but
I have little interest in stripping that out.)

                        regards, tom lane


Thank you for this code!  I still have some confusion about when a SubPlan
should be executed when a join is involved.  I care about this because this 
has an impact on when we can get the num_exec for a subplan.

The subplan in a target list,  it is executed after the join in my case.  The subplan
can be execute after the scan of T1(see below example) and it can also be executed
after the join. Which one is better depends on which methods make the num_exec
smaller.  Is it something we already considered? I drill-down to 
populate_joinrel_with_paths and not find this logic. 

# explain (costs off) select (select a from t2 where t2.b = t1.b) from t1, t3;
          QUERY PLAN
------------------------------
 Nested Loop
   ->  Seq Scan on t1
   ->  Materialize
         ->  Seq Scan on t3
   SubPlan 1
     ->  Seq Scan on t2
           Filter: (b = t1.b)
(7 rows)


When the subplan is in a Qual, it is supposed to be executed as soon as possible,
The current implementation matches the below cases.  So can we say we 
knows the num_execs of SubPlan just after we plan the dependent rels?  
(In Q1 below the dependent rel is t1 vs t3,  in Q2 it is t1 only) If we can choose 
a subplan and recost the related path during(not after) creating the best path,  will
we get better results for some cases (due to the current cost method for AlternativeSubPlan[1])? 

-- the subplan depends on the result of t1 join t3
# explain (costs off) select t1.* from t1, t3 where 
   t1.a > (select max(a) from t2 where t2.b = t1.b and t2.c  = t3.c);
                     QUERY PLAN
-----------------------------------------------------
 Nested Loop
   Join Filter: (t1.a > (SubPlan 1))
   ->  Seq Scan on t1
   ->  Materialize
         ->  Seq Scan on t3
   SubPlan 1
     ->  Aggregate
           ->  Seq Scan on t2
                 Filter: ((b = t1.b) AND (c = t3.c))
(9 rows)

-- the subplan only depends on t1.
# explain (costs off) select t1.* from t1, t3 where 
t1.a > (select max(a) from t2 where t2.b = t1.b);
                   QUERY PLAN
------------------------------------------------
 Nested Loop
   ->  Seq Scan on t3
   ->  Materialize
         ->  Seq Scan on t1
               Filter: (a > (SubPlan 1))
               SubPlan 1
                 ->  Aggregate
                       ->  Seq Scan on t2
                             Filter: (b = t1.b)
(9 rows)
 

At last,  I want to use the commonly used table in src/test/regress/sql/create_table.sql
when providing an example, but I always have issues running the create_table.sql which
makes me uncomfortable to use that. Am I missing something? 

CREATE TABLE fail_part PARTITION OF range_parted3 FOR VALUES FROM (1, minvalue) TO (1, maxvalue);
psql:src/test/regress/sql/create_table.sql:611: ERROR:  partition "fail_part" would overlap partition "part10"

CREATE TABLE fail_part PARTITION OF hash_parted2 FOR VALUES WITH (MODULUS 2, REMAINDER 1);
psql:src/test/regress/sql/create_table.sql:622: ERROR:  partition "fail_part" would overlap partition "h2part_4"

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuro Yamada
Дата:
Сообщение: Re: list of extended statistics on psql
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Terminate the idle sessions