Re: Get rid of runtime handling of AlternativeSubPlan?

Поиск

Список

Период

Сортировка

От	Andy Fan
Тема	Re: Get rid of runtime handling of AlternativeSubPlan?
Дата	31 августа 2020 г. 00:23:53
Msg-id	CAKU4AWox8HiWMMLR7TjX80+KDfm12b1zJWfNcoSioN9+QvORWQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Get rid of runtime handling of AlternativeSubPlan? (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Get rid of runtime handling of AlternativeSubPlan?
Список	pgsql-hackers

Дерево обсуждения

On Sun, Aug 30, 2020 at 7:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:
> Back in bd3daddaf232d95b0c9ba6f99b0170a0147dd8af, which introduced
> AlternativeSubPlans, I wrote:
> There is a lot more that could be done based on this infrastructure: in
> particular it's interesting to consider switching to the hash plan if we start
> out using the non-hashed plan but find a lot more upper rows going by than we
> expected. I have therefore left some minor inefficiencies in place, such as
> initializing both subplans even though we will currently only use one.
>
> That commit will be twelve years old come August, and nobody has either
> built anything else atop it or shown any interest in making the plan choice
> switchable mid-run. So it seems like kind of a failed experiment.
>
> Therefore, I'm considering the idea of ripping out all executor support
> for AlternativeSubPlan and instead having the planner replace an
> AlternativeSubPlan with the desired specific SubPlan somewhere late in
> planning, possibly setrefs.c.

Here's a proposed patchset for that. This runs with the idea I'd had
that setrefs.c could be smarter than the executor about which plan node
subexpressions will be executed how many times. I did not take it very
far, for fear of adding an undue number of planning cycles, but it's still
better than what we have now.

For ease of review, 0001 adds the new planner logic, while 0002 removes
the now-dead executor support.

There's one bit of dead code that I left in place for the moment, which is
ruleutils.c's support for printing AlternativeSubPlans. I'm not sure if
that's worth keeping or not --- it's dead code for normal use, but if
someone tried to use ruleutils.c to print partially-planned expression
trees, maybe there'd be a use for it?

(It's also arguable that readfuncs.c's support is now dead code, but
I have little interest in stripping that out.)

regards, tom lane

Thank you for this code! I still have some confusion about when a SubPlan

should be executed when a join is involved. I care about this because this

has an impact on when we can get the num_exec for a subplan.

The subplan in a target list, it is executed after the join in my case. The subplan

can be execute after the scan of T1(see below example) and it can also be executed

after the join. Which one is better depends on which methods make the num_exec

smaller. Is it something we already considered? I drill-down to

populate_joinrel_with_paths and not find this logic.

# explain (costs off) select (select a from t2 where t2.b = t1.b) from t1, t3;
QUERY PLAN
------------------------------
Nested Loop
-> Seq Scan on t1
-> Materialize
-> Seq Scan on t3
SubPlan 1
-> Seq Scan on t2
Filter: (b = t1.b)
(7 rows)

When the subplan is in a Qual, it is supposed to be executed as soon as possible,

The current implementation matches the below cases. So can we say we

knows the num_execs of SubPlan just after we plan the dependent rels?

(In Q1 below the dependent rel is t1 vs t3, in Q2 it is t1 only) If we can choose

a subplan and recost the related path during(not after) creating the best path, will

we get better results for some cases (due to the current cost method for AlternativeSubPlan[1])?

-- the subplan depends on the result of t1 join t3

# explain (costs off) select t1.* from t1, t3 where

t1.a > (select max(a) from t2 where t2.b = t1.b and t2.c = t3.c);
QUERY PLAN
-----------------------------------------------------
Nested Loop
Join Filter: (t1.a > (SubPlan 1))
-> Seq Scan on t1
-> Materialize
-> Seq Scan on t3
SubPlan 1
-> Aggregate
-> Seq Scan on t2
Filter: ((b = t1.b) AND (c = t3.c))
(9 rows)

-- the subplan only depends on t1.
# explain (costs off) select t1.* from t1, t3 where

t1.a > (select max(a) from t2 where t2.b = t1.b);
QUERY PLAN
------------------------------------------------
Nested Loop
-> Seq Scan on t3
-> Materialize
-> Seq Scan on t1
Filter: (a > (SubPlan 1))
SubPlan 1
-> Aggregate
-> Seq Scan on t2
Filter: (b = t1.b)
(9 rows)

At last, I want to use the commonly used table in src/test/regress/sql/create_table.sql

when providing an example, but I always have issues running the create_table.sql which

makes me uncomfortable to use that. Am I missing something?

CREATE TABLE fail_part PARTITION OF range_parted3 FOR VALUES FROM (1, minvalue) TO (1, maxvalue);
psql:src/test/regress/sql/create_table.sql:611: ERROR: partition "fail_part" would overlap partition "part10"

CREATE TABLE fail_part PARTITION OF hash_parted2 FOR VALUES WITH (MODULUS 2, REMAINDER 1);
psql:src/test/regress/sql/create_table.sql:622: ERROR: partition "fail_part" would overlap partition "h2part_4"

[1] https://www.postgresql.org/message-id/07b3fa88-aa4e-2e13-423d-8389eb1712cf%40imap.cc

Best Regards

Andy Fan

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Get rid of runtime handling of AlternativeSubPlan?