Обсуждение: Add ALL_CANDIDATES option to EXPLAIN

Поиск
Список
Период
Сортировка

Add ALL_CANDIDATES option to EXPLAIN

От
Anthonin Bonnefoy
Дата:
Hi,

I have a prototype for an ALL_CANDIDATES option for EXPLAIN. The goal
of this option is to print all plan candidates instead of only the
cheapest plan. It will output the plans from the most expensive at the
top to the cheapest. Here's an example:

explain (all_candidates) select * from pgbench_accounts where aid=1;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Plan 1
   ->  Gather  (cost=1000.00..3375.39 rows=1 width=97)
         Workers Planned: 1
         ->  Parallel Seq Scan on pgbench_accounts
(cost=0.00..2375.29 rows=1 width=97)
               Filter: (aid = 1)
 Plan 2
   ->  Seq Scan on pgbench_accounts  (cost=0.00..2890.00 rows=1 width=97)
         Filter: (aid = 1)
 Plan 3
   ->  Bitmap Heap Scan on pgbench_accounts  (cost=4.30..8.31 rows=1 width=97)
         Recheck Cond: (aid = 1)
         ->  Bitmap Index Scan on pgbench_accounts_pkey
(cost=0.00..4.30 rows=1 width=0)
               Index Cond: (aid = 1)
 Plan 4
   ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.29..8.31 rows=1 width=97)
         Index Cond: (aid = 1)

This can provide very useful insight on the planner's decisions like
whether it tried to use a specific index and how much cost difference
there is with the top plan. Additionally, it makes it possible to spot
discrepancies in generated plans like incorrect row estimation [1].

The plan list is generated from the upper_rel's pathlist. However, due
to how planning mutates the PlannerGlobal state, we can't directly
iterate the path list generated by the subquery_planner and create a
planned statement for them. Instead, we need to plan from scratch for
every path in the pathlist to generate the list of PlannedStmt.

The patch is split in two mostly to ease the review:
001: Propagate PlannerInfo root to add_path. This is needed to prevent
add_path from discarding path if all_candidates is enabled which will
be stored in PlannerGlobal.
002: Add the planner_all_candidates function and print of candidate
list in explain

[1] https://www.postgresql.org/message-id/flat/CAO6_Xqr9+51NxgO=XospEkUeAg-p=EjAWmtpdcZwjRgGKJ53iA@mail.gmail.com

Regards,
Anthonin

Вложения

Re: Add ALL_CANDIDATES option to EXPLAIN

От
Tom Lane
Дата:
Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> writes:
> I have a prototype for an ALL_CANDIDATES option for EXPLAIN. The goal
> of this option is to print all plan candidates instead of only the
> cheapest plan. It will output the plans from the most expensive at the
> top to the cheapest.

This doesn't seem feasible at all to me.  If we don't prune plans
fairly aggressively at lower plan levels, we'll have a combinatorial
explosion in the amount of work the planner has to do.  Have you
tried this on even slightly complex plans --- say, a dozen-way join?
I do not think you'll like the runtime, the amount of memory consumed,
or the verboseness of the output.  (I wonder how it interacts with
GEQO, too.)

            regards, tom lane



Re: Add ALL_CANDIDATES option to EXPLAIN

От
Robert Haas
Дата:
On Fri, Jul 26, 2024 at 12:59 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
> I have a prototype for an ALL_CANDIDATES option for EXPLAIN. The goal
> of this option is to print all plan candidates instead of only the
> cheapest plan. It will output the plans from the most expensive at the
> top to the cheapest. Here's an example:

It's difficult for me to understand how this can work. Either it's not
really going to print out all candidates, or it's going to print out
gigabytes of output for moderately complex queries.

I've thought about trying to figure out some way of identifying and
printing out plans that are "interestingly different" from the chosen
plan, with the costs they would have had, but I haven't been able to
come up with a good algorithm. Printing out absolutely everything
doesn't seem viable, because planning would be slow and use amazing
amounts of memory and the output would be so large as to be useless.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add ALL_CANDIDATES option to EXPLAIN

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I've thought about trying to figure out some way of identifying and
> printing out plans that are "interestingly different" from the chosen
> plan, with the costs they would have had, but I haven't been able to
> come up with a good algorithm.

I wonder how far you'd get by just printing the surviving paths
(that is, something like Anthonin's patch, but without lobotomizing
add_path).  The survivors would have to dominate the cheapest-total
path along one of the other metrics add_path considers, which
seems like a rough approximation of "interestingly different".

            regards, tom lane



Re: Add ALL_CANDIDATES option to EXPLAIN

От
Robert Haas
Дата:
On Fri, Jul 26, 2024 at 1:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wonder how far you'd get by just printing the surviving paths
> (that is, something like Anthonin's patch, but without lobotomizing
> add_path).  The survivors would have to dominate the cheapest-total
> path along one of the other metrics add_path considers, which
> seems like a rough approximation of "interestingly different".

My guess is it wouldn't be that great. It seems easy to imagine that
the key decision for a particular plan might be whether to use table A
or B as the driving table, or whether to sequential scan or index scan
some table involved in the query. It could well be that you end up
with the same output ordering either way (or no output ordering) for
one reason or another. I'm actually not sure "interestingly different"
can be defined in a useful, general way, because how is the computer
to know what the human being cares about in a particular case? In
practice, it feels like what you'd often like to do is say "show me
the plan if you { started with table | used scan method Y on table X |
did not use index X | whatever }". I haven't given up on the idea that
there could be some way of defining interesting-ness that avoids the
need for the user to say what they think is interesting, but it
certainly feels like one needs to be a lot more clever to make it work
without user input.

--
Robert Haas
EDB: http://www.enterprisedb.com