Re: How to retain lesser paths at add_path()?

Поиск
Список
Период
Сортировка
От Kohei KaiGai
Тема Re: How to retain lesser paths at add_path()?
Дата
Msg-id CAOP8fzZhW8=Yns1euo-H4S2Y=h1NL_xQ3-0OVnpOPFdKpB93yQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: How to retain lesser paths at add_path()?  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: How to retain lesser paths at add_path()?  (Kohei KaiGai <kaigai@heterodb.com>)
Список pgsql-hackers
2019年8月1日(木) 19:24 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
>
> On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
> >2019年8月1日(木) 16:19 Richard Guo <riguo@pivotal.io>:
> >>
> >> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> >>>
> >>> 2019年8月1日(木) 1:41 Tom Lane <tgl@sss.pgh.pa.us>:
> >>> >
> >>> > Robert Haas <robertmhaas@gmail.com> writes:
> >>> > > Yeah, but I have to admit that this whole design makes me kinda
> >>> > > uncomfortable.  Every time somebody comes up with a new figure of
> >>> > > merit, it increases not only the number of paths retained but also the
> >>> > > cost of comparing two paths to possibly reject one of them. A few
> >>> > > years ago, you came up with the (good) idea of rejecting some join
> >>> > > paths before actually creating the paths, and I wonder if we ought to
> >>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> >>> > > been saying, we ought to think about planning top-down with
> >>> > > memoization instead of bottom up (yeah, I know that's a huge change).
> >>> > > It just feels like the whole idea of a list of paths ordered by cost
> >>> > > breaks down when there are so many ways that a not-cheapest path can
> >>> > > still be worth keeping. Not sure exactly what would be better, though.
> >>> >
> >>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
> >>> > know what to do instead though.  Changing to a top-down design
> >>> > sounds like it would solve some problems while introducing others
> >>> > (not to mention the amount of work and breakage involved).
> >>> >
> >>> Hmm... It looks the problem we ought to revise about path construction
> >>> is much larger than my expectation, and uncertain for me how much works
> >>> are needed.
> >>>
> >>> Although it might be a workaround until fundamental reconstruction,
> >>> how about to have a margin of estimated cost to reject paths?
> >>> Current add_path() immediately rejects lesser paths if its cost is
> >>> even a little more expensive than the compared one. One the other hands,
> >>
> >>
> >> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> >> costs of two paths, although the fuzz factor (1%) is hard coded and not
> >> user-controllable.
> >>
> >Ah, sorry, I oversight this logic...
> >
>
> FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
> solution, because how would you know what value is the right one? Why ould
> 10% be the right threshold, for example? In my experience these these
> hard-coded coefficients imply behavior that's difficult to predict and
> explain to users.
>
Ah... That's exactly hard task to explain to users.

> >>> I understand it is not an essential re-design of path-construction logic, and
> >>> may have limitation. However, amount of works are reasonable and no side-
> >>> effect. (current behavior = 0% threshold).
> >>> How about your opinions?
> >>>
> >>
> >> How's about Tom's suggestion on adding another dimension in add_path()
> >> to be considered, just like how it considers paths of better sort order
> >> or parallel-safe?
> >>
> >Robert also mentioned it makes comparison operation more complicated.
> >If we try to have another dimension here, a callback function in Path node
> >may be able to tell the core optimizer whether "dominated path" shall be
> >dropped or not, without further complexity. It is just an idea.
> >
>
> I think adding a hook to add_path() allowing to override the decidion
> should be OK. The chance of getting that committed in the near future
> seems much higher than for a patch that completely reworks add_path().
>
> There's one caveat, though - AFAICS various places in the planner use
> things like cheapest_total_path, cheapest_startup_path and even
> get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
> considers startup/total cost. It might happen that even after keeping
> additional paths, the planner still won't use them :-(
>
Even if existing code looks at only cheapest_xxx_path, I don't think it is
a problematic behavior because these paths are exactly cheapest at a level,
but they may use more expensive paths in the deeper level.
If a hook can prevent dropping a path, not cheapest, in a particular level,
this path shall not appear on the cheapest_xxx_path, however, upper level
path construction logic can pick up these paths as a candidate of input.
If it has special discount factor here, the startup/total cost of the
upper level
path may have cheaper cost in spite of expensive input cost.

In this scenario, this hook gives a decision whether dominated path-node
shall be dropped or not. So, core optimizer still compares path-node using
estimated cost value.

In my scenario, even if GpuJoin is expensive at top level of JOIN/SCAN path
construction, GpuPreAgg + GpuJoin may be cheaper than others because of
data exchange on GPU device memory.
As long as GpuJoin is not removed from the pathlist, extension can build its
custom-path with cheaper cost.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: Store FullTransactionId in TwoPhaseFileHeader/GlobalTransactionData
Следующее
От: Sehrope Sarkuni
Дата:
Сообщение: Fix typos