Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG

Поиск
Список
Период
Сортировка
От Vladimir Churyukin
Тема Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG
Дата
Msg-id CAFSGpE3pdH4pgUWvJm0X0CozXgNmCALZZEP0GdT4HvBdo_Vtzw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers


On Tue, Jan 3, 2023 at 9:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> The thing I had in mind was some mode that would record additional
> details during planning that could be tagged onto the final plan in
> createplan.c so that EXPLAIN could display them. I just think that
> EXPLAIN is the place where people go to learn about _what_ the plan is
> and it might also be the place where they might expect to go to find
> out more details about _why_ that plan was chosen. I by no means have
> a fully bakes idea on what that would look like, but I just think that
> dumping a bunch of lines to stdout is not going to be useful to many
> people and we need to think of something better in order to properly
> make this useful.

There's a number of problems in this area, but I think the really
fundamental issue is that for speed reasons the planner wants to
reject losing plan alternatives as quickly as possible.  So we simply
don't pursue those alternatives far enough to produce anything that
could serve as input for EXPLAIN (in its current form, anyway).

That's not necessarily a fundamental issue for EXPLAIN (well, in theory, not sure if there are fundamental limitations of the current implementation).
When somebody runs EXPLAIN, they don't necessarily care that much about its performance, as long as it returns results in reasonable time.
So if the planner does some extra work in that mode to better display why the specific path was chosen, it should probably be ok from the performance perspective.
 
What that means is that a trace of add_path decisions just can't be
very useful to an end user: there isn't enough data to present the
decisions in a recognizable form, besides which there is too much
noise because most of the rejected options are in fact silly.
So indeed we find that even hard-core developers aren't interested
in consuming the data in that form.

Even if the output is not very human-readable, it still can be useful, if there are tools that consume the output and extract
meaningful data while omitting meaningless noise (if the meaningful data exists there of course).
 
Another issue is that frequently the problem is that we never
considered the desired plan at all, so that even if you had a
perfectly readable add_path trace it wouldn't show you what you want
to see.  This might happen because the planner is simply incapable of
producing that plan shape from the given query, but often it happens
for reasons like "function F() used in the query is marked volatile,
so we didn't flatten a subquery or consider an indexscan or whatever".
I'm not sure how we could produce output that would help people
discover that kind of problem ... but I am sure that an add_path
trace won't do it.

So, not only am I pretty down on exposing OPTIMIZER_DEBUG in
its current form, but I don't really believe that adding hooks
to add_path would allow an extension to produce anything of value.
I'd for sure want to see a convincing demonstration to the contrary
before we slow down that hot code path by adding hook calls. 

Pardon my ignorance, but I'm curious, how changes in planner code are currently validated?
Let's say, you add some extra logic that introduces different paths in some cases, or adjust some constants. How do you validate this logic doesn't slow down something else dramatically?
I see some EXPLAIN output checks in regression tests (not that many though), so I'm curious how regressions in planning are currently tested.
Not the simple ones, when you have a small input and predictable plan/output, but something that can happen with more or less real data distribution on medium / large datasets.

-Vladimir Churyukin

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ankit Kumar Pandey
Дата:
Сообщение: Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: Underscores in numeric literals