Re: How to retain lesser paths at add_path()?

Поиск
Список
Период
Сортировка
От Kohei KaiGai
Тема Re: How to retain lesser paths at add_path()?
Дата
Msg-id CAOP8fzbM+6wSngKTdTY7xpf=DAAmK=CNeACJVy2VpCtGwx8=WQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: How to retain lesser paths at add_path()?  (Kohei KaiGai <kaigai@heterodb.com>)
Ответы Re: How to retain lesser paths at add_path()?  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hello,

For implementation of the concept, this patch puts a hook on add_path
/ add_partial_path
to override the path removal decision by extensions, according to its
own viewpoint.
We don't add any metrics other than path's cost and path keys, so
set_cheapest() still picks
up paths based on its cost for each depth.
As we are currently doing, extensions (FDW / CSP) are responsible to
construct and add
paths with reasonable cost values, then PostgreSQL optimizer chooses
the "best" path
according to the (self-reported) cost. On the other hands, an
expensive path at a particular
level is not always expensive from upper viewpoint, if combination of
path-A and path-B
has special optimization, like a reduction of DMA transfer between
host and device, or omit
of network transfer between local and remote.
In these cases, extension can get a control to override a decision
whether old_path that
is dominated by new_path shall be removed, or not. If old_path
survived, extension can
re-use the path at the upper level to construct a special path.

Best regards,

2019年8月1日(木) 21:14 Kohei KaiGai <kaigai@heterodb.com>:
>
> 2019年8月1日(木) 19:24 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
> >
> > On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
> > >2019年8月1日(木) 16:19 Richard Guo <riguo@pivotal.io>:
> > >>
> > >> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > >>>
> > >>> 2019年8月1日(木) 1:41 Tom Lane <tgl@sss.pgh.pa.us>:
> > >>> >
> > >>> > Robert Haas <robertmhaas@gmail.com> writes:
> > >>> > > Yeah, but I have to admit that this whole design makes me kinda
> > >>> > > uncomfortable.  Every time somebody comes up with a new figure of
> > >>> > > merit, it increases not only the number of paths retained but also the
> > >>> > > cost of comparing two paths to possibly reject one of them. A few
> > >>> > > years ago, you came up with the (good) idea of rejecting some join
> > >>> > > paths before actually creating the paths, and I wonder if we ought to
> > >>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> > >>> > > been saying, we ought to think about planning top-down with
> > >>> > > memoization instead of bottom up (yeah, I know that's a huge change).
> > >>> > > It just feels like the whole idea of a list of paths ordered by cost
> > >>> > > breaks down when there are so many ways that a not-cheapest path can
> > >>> > > still be worth keeping. Not sure exactly what would be better, though.
> > >>> >
> > >>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
> > >>> > know what to do instead though.  Changing to a top-down design
> > >>> > sounds like it would solve some problems while introducing others
> > >>> > (not to mention the amount of work and breakage involved).
> > >>> >
> > >>> Hmm... It looks the problem we ought to revise about path construction
> > >>> is much larger than my expectation, and uncertain for me how much works
> > >>> are needed.
> > >>>
> > >>> Although it might be a workaround until fundamental reconstruction,
> > >>> how about to have a margin of estimated cost to reject paths?
> > >>> Current add_path() immediately rejects lesser paths if its cost is
> > >>> even a little more expensive than the compared one. One the other hands,
> > >>
> > >>
> > >> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> > >> costs of two paths, although the fuzz factor (1%) is hard coded and not
> > >> user-controllable.
> > >>
> > >Ah, sorry, I oversight this logic...
> > >
> >
> > FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
> > solution, because how would you know what value is the right one? Why ould
> > 10% be the right threshold, for example? In my experience these these
> > hard-coded coefficients imply behavior that's difficult to predict and
> > explain to users.
> >
> Ah... That's exactly hard task to explain to users.
>
> > >>> I understand it is not an essential re-design of path-construction logic, and
> > >>> may have limitation. However, amount of works are reasonable and no side-
> > >>> effect. (current behavior = 0% threshold).
> > >>> How about your opinions?
> > >>>
> > >>
> > >> How's about Tom's suggestion on adding another dimension in add_path()
> > >> to be considered, just like how it considers paths of better sort order
> > >> or parallel-safe?
> > >>
> > >Robert also mentioned it makes comparison operation more complicated.
> > >If we try to have another dimension here, a callback function in Path node
> > >may be able to tell the core optimizer whether "dominated path" shall be
> > >dropped or not, without further complexity. It is just an idea.
> > >
> >
> > I think adding a hook to add_path() allowing to override the decidion
> > should be OK. The chance of getting that committed in the near future
> > seems much higher than for a patch that completely reworks add_path().
> >
> > There's one caveat, though - AFAICS various places in the planner use
> > things like cheapest_total_path, cheapest_startup_path and even
> > get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
> > considers startup/total cost. It might happen that even after keeping
> > additional paths, the planner still won't use them :-(
> >
> Even if existing code looks at only cheapest_xxx_path, I don't think it is
> a problematic behavior because these paths are exactly cheapest at a level,
> but they may use more expensive paths in the deeper level.
> If a hook can prevent dropping a path, not cheapest, in a particular level,
> this path shall not appear on the cheapest_xxx_path, however, upper level
> path construction logic can pick up these paths as a candidate of input.
> If it has special discount factor here, the startup/total cost of the
> upper level
> path may have cheaper cost in spite of expensive input cost.
>
> In this scenario, this hook gives a decision whether dominated path-node
> shall be dropped or not. So, core optimizer still compares path-node using
> estimated cost value.
>
> In my scenario, even if GpuJoin is expensive at top level of JOIN/SCAN path
> construction, GpuPreAgg + GpuJoin may be cheaper than others because of
> data exchange on GPU device memory.
> As long as GpuJoin is not removed from the pathlist, extension can build its
> custom-path with cheaper cost.
>
> Best regards,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei <kaigai@heterodb.com>

--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: SegFault on 9.6.14
Следующее
От: Kohei KaiGai
Дата:
Сообщение: Asymmetric partition-wise JOIN