Обсуждение: How to retain lesser paths at add_path()?

Поиск
Список
Период
Сортировка

How to retain lesser paths at add_path()?

От
Kohei KaiGai
Дата:
Hello,

When we add a new path using add_path(), it checks estimated cost and path-keys,
then it also removes dominated paths, if any.
Do we have a reasonable way to retain these "dominated" paths? Once it
is considered
lesser paths at a level, however, it may have a combined cheaper cost
with upper pathnode.

In my case, PG-Strom adds CustomPath to support JOIN/GROUP BY
workloads that utilizes
GPU and NVME storage. If GpuPreAgg and GpuJoin are executed
continuously, output buffer
of GpuJoin simultaneously performs as input buffer of GpuPreAgg on GPU
device memory.
So, it allows to avoid senseless DMA transfer between GPU and CPU/RAM.
This behavior
affects cost estimation during path construction steps - GpuPreAgg
discount DMA cost if its
input path is GpuJoin.
On the other hands, it looks to me add_path() does not consider upper
level optimization
other than sorting path-keys. As long as we can keep these "lesser"
pathnodes that has
further optimization chance, it will help making more reasonable query plan.

Do we have any reasonable way to retain these paths at add_path() even
if it is dominated
by other paths? Any idea welcome.

Best regards,

[*] GpuJoin + GpuPreAgg combined GPU kernel
https://www.slideshare.net/kaigai/20181016pgconfeussd2gpumulti/13
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



Re: How to retain lesser paths at add_path()?

От
Tom Lane
Дата:
Kohei KaiGai <kaigai@heterodb.com> writes:
> When we add a new path using add_path(), it checks estimated cost and path-keys,
> then it also removes dominated paths, if any.
> Do we have a reasonable way to retain these "dominated" paths? Once it
> is considered
> lesser paths at a level, however, it may have a combined cheaper cost
> with upper pathnode.

You do *not* want to have add_path fail to remove dominated paths in
general.  Don't even think about it, because otherwise you will have
plenty of time to regret your folly while you wait for the planner
to chew through an exponential number of possible join plans.

What you'd want to do for something like the above, I think, is to
have some kind of figure of merit or other special marking for paths
that will have some possible special advantage in later planning
steps.  Then you can teach add_path that that's another dimension it
should consider, in the same way that paths with different sort orders
or parallizability attributes don't dominate each other.

            regards, tom lane



Re: How to retain lesser paths at add_path()?

От
Robert Haas
Дата:
On Wed, Jul 31, 2019 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What you'd want to do for something like the above, I think, is to
> have some kind of figure of merit or other special marking for paths
> that will have some possible special advantage in later planning
> steps.  Then you can teach add_path that that's another dimension it
> should consider, in the same way that paths with different sort orders
> or parallizability attributes don't dominate each other.

Yeah, but I have to admit that this whole design makes me kinda
uncomfortable.  Every time somebody comes up with a new figure of
merit, it increases not only the number of paths retained but also the
cost of comparing two paths to possibly reject one of them. A few
years ago, you came up with the (good) idea of rejecting some join
paths before actually creating the paths, and I wonder if we ought to
try to go further with that somehow. Or maybe, as Peter Geoghegan, has
been saying, we ought to think about planning top-down with
memoization instead of bottom up (yeah, I know that's a huge change).
It just feels like the whole idea of a list of paths ordered by cost
breaks down when there are so many ways that a not-cheapest path can
still be worth keeping. Not sure exactly what would be better, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: How to retain lesser paths at add_path()?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Yeah, but I have to admit that this whole design makes me kinda
> uncomfortable.  Every time somebody comes up with a new figure of
> merit, it increases not only the number of paths retained but also the
> cost of comparing two paths to possibly reject one of them. A few
> years ago, you came up with the (good) idea of rejecting some join
> paths before actually creating the paths, and I wonder if we ought to
> try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> been saying, we ought to think about planning top-down with
> memoization instead of bottom up (yeah, I know that's a huge change).
> It just feels like the whole idea of a list of paths ordered by cost
> breaks down when there are so many ways that a not-cheapest path can
> still be worth keeping. Not sure exactly what would be better, though.

Yeah, I agree that add_path is starting to feel creaky.  I don't
know what to do instead though.  Changing to a top-down design
sounds like it would solve some problems while introducing others
(not to mention the amount of work and breakage involved).

            regards, tom lane



Re: How to retain lesser paths at add_path()?

От
Kohei KaiGai
Дата:
2019年8月1日(木) 1:41 Tom Lane <tgl@sss.pgh.pa.us>:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > Yeah, but I have to admit that this whole design makes me kinda
> > uncomfortable.  Every time somebody comes up with a new figure of
> > merit, it increases not only the number of paths retained but also the
> > cost of comparing two paths to possibly reject one of them. A few
> > years ago, you came up with the (good) idea of rejecting some join
> > paths before actually creating the paths, and I wonder if we ought to
> > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> > been saying, we ought to think about planning top-down with
> > memoization instead of bottom up (yeah, I know that's a huge change).
> > It just feels like the whole idea of a list of paths ordered by cost
> > breaks down when there are so many ways that a not-cheapest path can
> > still be worth keeping. Not sure exactly what would be better, though.
>
> Yeah, I agree that add_path is starting to feel creaky.  I don't
> know what to do instead though.  Changing to a top-down design
> sounds like it would solve some problems while introducing others
> (not to mention the amount of work and breakage involved).
>
Hmm... It looks the problem we ought to revise about path construction
is much larger than my expectation, and uncertain for me how much works
are needed.

Although it might be a workaround until fundamental reconstruction,
how about to have a margin of estimated cost to reject paths?
Current add_path() immediately rejects lesser paths if its cost is
even a little more expensive than the compared one. One the other hands,
a little cost difference may turn back final optimizer decision in some cases.
For example, if we retain lesser paths that have less then 10% expensive
cost than the current cheapest path in the same level, we may be able to
keep the number of lesser paths retained and still use simple cost comparison
for path survive decision.

I understand it is not an essential re-design of path-construction logic, and
may have limitation. However, amount of works are reasonable and no side-
effect. (current behavior = 0% threshold).
How about your opinions?

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



Re: How to retain lesser paths at add_path()?

От
Richard Guo
Дата:
On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
2019年8月1日(木) 1:41 Tom Lane <tgl@sss.pgh.pa.us>:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > Yeah, but I have to admit that this whole design makes me kinda
> > uncomfortable.  Every time somebody comes up with a new figure of
> > merit, it increases not only the number of paths retained but also the
> > cost of comparing two paths to possibly reject one of them. A few
> > years ago, you came up with the (good) idea of rejecting some join
> > paths before actually creating the paths, and I wonder if we ought to
> > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> > been saying, we ought to think about planning top-down with
> > memoization instead of bottom up (yeah, I know that's a huge change).
> > It just feels like the whole idea of a list of paths ordered by cost
> > breaks down when there are so many ways that a not-cheapest path can
> > still be worth keeping. Not sure exactly what would be better, though.
>
> Yeah, I agree that add_path is starting to feel creaky.  I don't
> know what to do instead though.  Changing to a top-down design
> sounds like it would solve some problems while introducing others
> (not to mention the amount of work and breakage involved).
>
Hmm... It looks the problem we ought to revise about path construction
is much larger than my expectation, and uncertain for me how much works
are needed.

Although it might be a workaround until fundamental reconstruction,
how about to have a margin of estimated cost to reject paths?
Current add_path() immediately rejects lesser paths if its cost is
even a little more expensive than the compared one. One the other hands,

Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
costs of two paths, although the fuzz factor (1%) is hard coded and not
user-controllable.


I understand it is not an essential re-design of path-construction logic, and
may have limitation. However, amount of works are reasonable and no side-
effect. (current behavior = 0% threshold).
How about your opinions?


How's about Tom's suggestion on adding another dimension in add_path()
to be considered, just like how it considers paths of better sort order
or parallel-safe?

Thanks
Richard

Re: How to retain lesser paths at add_path()?

От
Kohei KaiGai
Дата:
2019年8月1日(木) 16:19 Richard Guo <riguo@pivotal.io>:
>
> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
>>
>> 2019年8月1日(木) 1:41 Tom Lane <tgl@sss.pgh.pa.us>:
>> >
>> > Robert Haas <robertmhaas@gmail.com> writes:
>> > > Yeah, but I have to admit that this whole design makes me kinda
>> > > uncomfortable.  Every time somebody comes up with a new figure of
>> > > merit, it increases not only the number of paths retained but also the
>> > > cost of comparing two paths to possibly reject one of them. A few
>> > > years ago, you came up with the (good) idea of rejecting some join
>> > > paths before actually creating the paths, and I wonder if we ought to
>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
>> > > been saying, we ought to think about planning top-down with
>> > > memoization instead of bottom up (yeah, I know that's a huge change).
>> > > It just feels like the whole idea of a list of paths ordered by cost
>> > > breaks down when there are so many ways that a not-cheapest path can
>> > > still be worth keeping. Not sure exactly what would be better, though.
>> >
>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
>> > know what to do instead though.  Changing to a top-down design
>> > sounds like it would solve some problems while introducing others
>> > (not to mention the amount of work and breakage involved).
>> >
>> Hmm... It looks the problem we ought to revise about path construction
>> is much larger than my expectation, and uncertain for me how much works
>> are needed.
>>
>> Although it might be a workaround until fundamental reconstruction,
>> how about to have a margin of estimated cost to reject paths?
>> Current add_path() immediately rejects lesser paths if its cost is
>> even a little more expensive than the compared one. One the other hands,
>
>
> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> costs of two paths, although the fuzz factor (1%) is hard coded and not
> user-controllable.
>
Ah, sorry, I oversight this logic...

>> I understand it is not an essential re-design of path-construction logic, and
>> may have limitation. However, amount of works are reasonable and no side-
>> effect. (current behavior = 0% threshold).
>> How about your opinions?
>>
>
> How's about Tom's suggestion on adding another dimension in add_path()
> to be considered, just like how it considers paths of better sort order
> or parallel-safe?
>
Robert also mentioned it makes comparison operation more complicated.
If we try to have another dimension here, a callback function in Path node
may be able to tell the core optimizer whether "dominated path" shall be
dropped or not, without further complexity. It is just an idea.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



Re: How to retain lesser paths at add_path()?

От
Tomas Vondra
Дата:
On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
>2019年8月1日(木) 16:19 Richard Guo <riguo@pivotal.io>:
>>
>> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
>>>
>>> 2019年8月1日(木) 1:41 Tom Lane <tgl@sss.pgh.pa.us>:
>>> >
>>> > Robert Haas <robertmhaas@gmail.com> writes:
>>> > > Yeah, but I have to admit that this whole design makes me kinda
>>> > > uncomfortable.  Every time somebody comes up with a new figure of
>>> > > merit, it increases not only the number of paths retained but also the
>>> > > cost of comparing two paths to possibly reject one of them. A few
>>> > > years ago, you came up with the (good) idea of rejecting some join
>>> > > paths before actually creating the paths, and I wonder if we ought to
>>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
>>> > > been saying, we ought to think about planning top-down with
>>> > > memoization instead of bottom up (yeah, I know that's a huge change).
>>> > > It just feels like the whole idea of a list of paths ordered by cost
>>> > > breaks down when there are so many ways that a not-cheapest path can
>>> > > still be worth keeping. Not sure exactly what would be better, though.
>>> >
>>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
>>> > know what to do instead though.  Changing to a top-down design
>>> > sounds like it would solve some problems while introducing others
>>> > (not to mention the amount of work and breakage involved).
>>> >
>>> Hmm... It looks the problem we ought to revise about path construction
>>> is much larger than my expectation, and uncertain for me how much works
>>> are needed.
>>>
>>> Although it might be a workaround until fundamental reconstruction,
>>> how about to have a margin of estimated cost to reject paths?
>>> Current add_path() immediately rejects lesser paths if its cost is
>>> even a little more expensive than the compared one. One the other hands,
>>
>>
>> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
>> costs of two paths, although the fuzz factor (1%) is hard coded and not
>> user-controllable.
>>
>Ah, sorry, I oversight this logic...
>

FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
solution, because how would you know what value is the right one? Why ould
10% be the right threshold, for example? In my experience these these
hard-coded coefficients imply behavior that's difficult to predict and
explain to users.

>>> I understand it is not an essential re-design of path-construction logic, and
>>> may have limitation. However, amount of works are reasonable and no side-
>>> effect. (current behavior = 0% threshold).
>>> How about your opinions?
>>>
>>
>> How's about Tom's suggestion on adding another dimension in add_path()
>> to be considered, just like how it considers paths of better sort order
>> or parallel-safe?
>>
>Robert also mentioned it makes comparison operation more complicated.
>If we try to have another dimension here, a callback function in Path node
>may be able to tell the core optimizer whether "dominated path" shall be
>dropped or not, without further complexity. It is just an idea.
>

I think adding a hook to add_path() allowing to override the decidion
should be OK. The chance of getting that committed in the near future
seems much higher than for a patch that completely reworks add_path().

There's one caveat, though - AFAICS various places in the planner use
things like cheapest_total_path, cheapest_startup_path and even
get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
considers startup/total cost. It might happen that even after keeping
additional paths, the planner still won't use them :-(

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: How to retain lesser paths at add_path()?

От
Kohei KaiGai
Дата:
2019年8月1日(木) 19:24 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
>
> On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
> >2019年8月1日(木) 16:19 Richard Guo <riguo@pivotal.io>:
> >>
> >> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> >>>
> >>> 2019年8月1日(木) 1:41 Tom Lane <tgl@sss.pgh.pa.us>:
> >>> >
> >>> > Robert Haas <robertmhaas@gmail.com> writes:
> >>> > > Yeah, but I have to admit that this whole design makes me kinda
> >>> > > uncomfortable.  Every time somebody comes up with a new figure of
> >>> > > merit, it increases not only the number of paths retained but also the
> >>> > > cost of comparing two paths to possibly reject one of them. A few
> >>> > > years ago, you came up with the (good) idea of rejecting some join
> >>> > > paths before actually creating the paths, and I wonder if we ought to
> >>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> >>> > > been saying, we ought to think about planning top-down with
> >>> > > memoization instead of bottom up (yeah, I know that's a huge change).
> >>> > > It just feels like the whole idea of a list of paths ordered by cost
> >>> > > breaks down when there are so many ways that a not-cheapest path can
> >>> > > still be worth keeping. Not sure exactly what would be better, though.
> >>> >
> >>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
> >>> > know what to do instead though.  Changing to a top-down design
> >>> > sounds like it would solve some problems while introducing others
> >>> > (not to mention the amount of work and breakage involved).
> >>> >
> >>> Hmm... It looks the problem we ought to revise about path construction
> >>> is much larger than my expectation, and uncertain for me how much works
> >>> are needed.
> >>>
> >>> Although it might be a workaround until fundamental reconstruction,
> >>> how about to have a margin of estimated cost to reject paths?
> >>> Current add_path() immediately rejects lesser paths if its cost is
> >>> even a little more expensive than the compared one. One the other hands,
> >>
> >>
> >> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> >> costs of two paths, although the fuzz factor (1%) is hard coded and not
> >> user-controllable.
> >>
> >Ah, sorry, I oversight this logic...
> >
>
> FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
> solution, because how would you know what value is the right one? Why ould
> 10% be the right threshold, for example? In my experience these these
> hard-coded coefficients imply behavior that's difficult to predict and
> explain to users.
>
Ah... That's exactly hard task to explain to users.

> >>> I understand it is not an essential re-design of path-construction logic, and
> >>> may have limitation. However, amount of works are reasonable and no side-
> >>> effect. (current behavior = 0% threshold).
> >>> How about your opinions?
> >>>
> >>
> >> How's about Tom's suggestion on adding another dimension in add_path()
> >> to be considered, just like how it considers paths of better sort order
> >> or parallel-safe?
> >>
> >Robert also mentioned it makes comparison operation more complicated.
> >If we try to have another dimension here, a callback function in Path node
> >may be able to tell the core optimizer whether "dominated path" shall be
> >dropped or not, without further complexity. It is just an idea.
> >
>
> I think adding a hook to add_path() allowing to override the decidion
> should be OK. The chance of getting that committed in the near future
> seems much higher than for a patch that completely reworks add_path().
>
> There's one caveat, though - AFAICS various places in the planner use
> things like cheapest_total_path, cheapest_startup_path and even
> get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
> considers startup/total cost. It might happen that even after keeping
> additional paths, the planner still won't use them :-(
>
Even if existing code looks at only cheapest_xxx_path, I don't think it is
a problematic behavior because these paths are exactly cheapest at a level,
but they may use more expensive paths in the deeper level.
If a hook can prevent dropping a path, not cheapest, in a particular level,
this path shall not appear on the cheapest_xxx_path, however, upper level
path construction logic can pick up these paths as a candidate of input.
If it has special discount factor here, the startup/total cost of the
upper level
path may have cheaper cost in spite of expensive input cost.

In this scenario, this hook gives a decision whether dominated path-node
shall be dropped or not. So, core optimizer still compares path-node using
estimated cost value.

In my scenario, even if GpuJoin is expensive at top level of JOIN/SCAN path
construction, GpuPreAgg + GpuJoin may be cheaper than others because of
data exchange on GPU device memory.
As long as GpuJoin is not removed from the pathlist, extension can build its
custom-path with cheaper cost.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



Re: How to retain lesser paths at add_path()?

От
Kohei KaiGai
Дата:
Hello,

For implementation of the concept, this patch puts a hook on add_path
/ add_partial_path
to override the path removal decision by extensions, according to its
own viewpoint.
We don't add any metrics other than path's cost and path keys, so
set_cheapest() still picks
up paths based on its cost for each depth.
As we are currently doing, extensions (FDW / CSP) are responsible to
construct and add
paths with reasonable cost values, then PostgreSQL optimizer chooses
the "best" path
according to the (self-reported) cost. On the other hands, an
expensive path at a particular
level is not always expensive from upper viewpoint, if combination of
path-A and path-B
has special optimization, like a reduction of DMA transfer between
host and device, or omit
of network transfer between local and remote.
In these cases, extension can get a control to override a decision
whether old_path that
is dominated by new_path shall be removed, or not. If old_path
survived, extension can
re-use the path at the upper level to construct a special path.

Best regards,

2019年8月1日(木) 21:14 Kohei KaiGai <kaigai@heterodb.com>:
>
> 2019年8月1日(木) 19:24 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
> >
> > On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
> > >2019年8月1日(木) 16:19 Richard Guo <riguo@pivotal.io>:
> > >>
> > >> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > >>>
> > >>> 2019年8月1日(木) 1:41 Tom Lane <tgl@sss.pgh.pa.us>:
> > >>> >
> > >>> > Robert Haas <robertmhaas@gmail.com> writes:
> > >>> > > Yeah, but I have to admit that this whole design makes me kinda
> > >>> > > uncomfortable.  Every time somebody comes up with a new figure of
> > >>> > > merit, it increases not only the number of paths retained but also the
> > >>> > > cost of comparing two paths to possibly reject one of them. A few
> > >>> > > years ago, you came up with the (good) idea of rejecting some join
> > >>> > > paths before actually creating the paths, and I wonder if we ought to
> > >>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> > >>> > > been saying, we ought to think about planning top-down with
> > >>> > > memoization instead of bottom up (yeah, I know that's a huge change).
> > >>> > > It just feels like the whole idea of a list of paths ordered by cost
> > >>> > > breaks down when there are so many ways that a not-cheapest path can
> > >>> > > still be worth keeping. Not sure exactly what would be better, though.
> > >>> >
> > >>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
> > >>> > know what to do instead though.  Changing to a top-down design
> > >>> > sounds like it would solve some problems while introducing others
> > >>> > (not to mention the amount of work and breakage involved).
> > >>> >
> > >>> Hmm... It looks the problem we ought to revise about path construction
> > >>> is much larger than my expectation, and uncertain for me how much works
> > >>> are needed.
> > >>>
> > >>> Although it might be a workaround until fundamental reconstruction,
> > >>> how about to have a margin of estimated cost to reject paths?
> > >>> Current add_path() immediately rejects lesser paths if its cost is
> > >>> even a little more expensive than the compared one. One the other hands,
> > >>
> > >>
> > >> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> > >> costs of two paths, although the fuzz factor (1%) is hard coded and not
> > >> user-controllable.
> > >>
> > >Ah, sorry, I oversight this logic...
> > >
> >
> > FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
> > solution, because how would you know what value is the right one? Why ould
> > 10% be the right threshold, for example? In my experience these these
> > hard-coded coefficients imply behavior that's difficult to predict and
> > explain to users.
> >
> Ah... That's exactly hard task to explain to users.
>
> > >>> I understand it is not an essential re-design of path-construction logic, and
> > >>> may have limitation. However, amount of works are reasonable and no side-
> > >>> effect. (current behavior = 0% threshold).
> > >>> How about your opinions?
> > >>>
> > >>
> > >> How's about Tom's suggestion on adding another dimension in add_path()
> > >> to be considered, just like how it considers paths of better sort order
> > >> or parallel-safe?
> > >>
> > >Robert also mentioned it makes comparison operation more complicated.
> > >If we try to have another dimension here, a callback function in Path node
> > >may be able to tell the core optimizer whether "dominated path" shall be
> > >dropped or not, without further complexity. It is just an idea.
> > >
> >
> > I think adding a hook to add_path() allowing to override the decidion
> > should be OK. The chance of getting that committed in the near future
> > seems much higher than for a patch that completely reworks add_path().
> >
> > There's one caveat, though - AFAICS various places in the planner use
> > things like cheapest_total_path, cheapest_startup_path and even
> > get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
> > considers startup/total cost. It might happen that even after keeping
> > additional paths, the planner still won't use them :-(
> >
> Even if existing code looks at only cheapest_xxx_path, I don't think it is
> a problematic behavior because these paths are exactly cheapest at a level,
> but they may use more expensive paths in the deeper level.
> If a hook can prevent dropping a path, not cheapest, in a particular level,
> this path shall not appear on the cheapest_xxx_path, however, upper level
> path construction logic can pick up these paths as a candidate of input.
> If it has special discount factor here, the startup/total cost of the
> upper level
> path may have cheaper cost in spite of expensive input cost.
>
> In this scenario, this hook gives a decision whether dominated path-node
> shall be dropped or not. So, core optimizer still compares path-node using
> estimated cost value.
>
> In my scenario, even if GpuJoin is expensive at top level of JOIN/SCAN path
> construction, GpuPreAgg + GpuJoin may be cheaper than others because of
> data exchange on GPU device memory.
> As long as GpuJoin is not removed from the pathlist, extension can build its
> custom-path with cheaper cost.
>
> Best regards,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei <kaigai@heterodb.com>

--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Вложения

Re: How to retain lesser paths at add_path()?

От
Robert Haas
Дата:
On Mon, Aug 12, 2019 at 12:28 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> For implementation of the concept, this patch puts a hook on add_path
> / add_partial_path
> to override the path removal decision by extensions, according to its
> own viewpoint.

I don't think this hook is a very useful approach to this problem, and
I'm concerned that it might have a measurable performance cost.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: How to retain lesser paths at add_path()?

От
Tomas Vondra
Дата:
On Fri, Oct 04, 2019 at 12:19:06PM -0400, Robert Haas wrote:
>On Mon, Aug 12, 2019 at 12:28 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
>> For implementation of the concept, this patch puts a hook on add_path
>> / add_partial_path
>> to override the path removal decision by extensions, according to its
>> own viewpoint.
>
>I don't think this hook is a very useful approach to this problem, and
>I'm concerned that it might have a measurable performance cost.
>

Can you be more specific why you don't think this approach is not
useful? I'm not sure whether you consider all hooks to have this issue
or just this proposed one.

As for the performance impact, I think that's not difficult to measure.
I'd be surprised if it has measurable impact on cases with no hook
installed (there's plenty more expensive stuff going on). Of course, it
may have some impact for cases when the hook retains many more paths
and/or does something expensive, but that's kinda up to whoever writes
that particular hook. I think the assumption is that the savings from
building better plans far outweight that extra cost.

That does not necessarily mean the proposed hook is correct - I only
briefly looked at it, and it's not clear to me why would it be OK to
call the hook for remove_old=true but not also for accept_new=false? How
do we know whether the "better" path arrives first?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: How to retain lesser paths at add_path()?

От
Robert Haas
Дата:
On Sun, Oct 6, 2019 at 3:23 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> >I don't think this hook is a very useful approach to this problem, and
> >I'm concerned that it might have a measurable performance cost.
>
> Can you be more specific why you don't think this approach is not
> useful? I'm not sure whether you consider all hooks to have this issue
> or just this proposed one.

I'll start by admitting that that remark was rather off-the-cuff.  On
further reflection, add_path() is not really a crazy place to try to
add a new dimension of merit, which is really what KaiGai wants to do
here.  On the other hand, as Tom and I noted upthread, that system is
creaking under its weight as it is, and making it extensible seems
like it might therefore be a bad idea - specifically, because it might
slow down planning quite a bit on large join problems, either because
of the additional cost testing the additional dimension of merit or
because of the additional cost of dealing with the extra paths that
get kept.

It is maybe worth noting that join/aggregate pushdown for FDWs has a
somewhat similar problem, and we didn't solve it this way. Should we
have? Maybe it would have worked better and been less buggy. But
slowing down all planning for the benefit of that feature also sounds
bad. I think any changes in this area need careful though.

> As for the performance impact, I think that's not difficult to measure.
> I'd be surprised if it has measurable impact on cases with no hook
> installed (there's plenty more expensive stuff going on). Of course, it
> may have some impact for cases when the hook retains many more paths
> and/or does something expensive, but that's kinda up to whoever writes
> that particular hook. I think the assumption is that the savings from
> building better plans far outweight that extra cost.

You might be right, but add_path() is a pretty darn hot path in the
planner.  You probably wouldn't see a significant overhead on a
single-table query, but on a query with many tables I would not be
surprised if even the overhead of an empty function call was
measurable. But I could be wrong, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: How to retain lesser paths at add_path()?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Oct 6, 2019 at 3:23 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> Can you be more specific why you don't think this approach is not
>> useful? I'm not sure whether you consider all hooks to have this issue
>> or just this proposed one.

> I'll start by admitting that that remark was rather off-the-cuff.  On
> further reflection, add_path() is not really a crazy place to try to
> add a new dimension of merit, which is really what KaiGai wants to do
> here.  On the other hand, as Tom and I noted upthread, that system is
> creaking under its weight as it is, and making it extensible seems
> like it might therefore be a bad idea - specifically, because it might
> slow down planning quite a bit on large join problems, either because
> of the additional cost testing the additional dimension of merit or
> because of the additional cost of dealing with the extra paths that
> get kept.

FWIW, I think that the patch as proposed would most certainly have
nasty performance problems.  To make intelligent decisions, the
hook function would basically have to re-do a large fraction of
the calculations that add_path itself does.  It's also fairly
unclear (or at least undocumented) how things would work if multiple
path providers wanted to make use of the hook; except that that
performance issue would get worse, as each of them redoes that work.

Also, as Robert says, the real goal here is to allow some additional
comparison dimension to be considered.  Simply preventing pre-existing
paths from being removed isn't sufficient for that --- you have to be
able to change the accept_new decisions as well, as Tomas already
worried about.  But if we phrase that as an additional hook that
only concerns itself with accept_new, then the duplicate-calculation
problem gets really substantially worse: I think actual use of a hook
like that would require reconsidering the entire existing path list.

I'm not very sure what a good design for adding new comparison dimensions
would look like, but I think this isn't it.

We could imagine, maybe, that a hook for the purpose of allowing an
additional dimension to be considered would be essentially a path
comparison function, returning -1, +1, or 0 depending on whether
path A is dominated by path B (on this new dimension), dominates
path B, or neither.  However, I do not see how multiple extensions
could usefully share use of such a hook.

            regards, tom lane



Re: How to retain lesser paths at add_path()?

От
Robert Haas
Дата:
On Mon, Oct 7, 2019 at 9:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We could imagine, maybe, that a hook for the purpose of allowing an
> additional dimension to be considered would be essentially a path
> comparison function, returning -1, +1, or 0 depending on whether
> path A is dominated by path B (on this new dimension), dominates
> path B, or neither.  However, I do not see how multiple extensions
> could usefully share use of such a hook.

Typically, we support hook-sharing mostly by ad-hoc methods: when
installing a hook, you remember the previous value of the function
pointer, and arrange to call that function yourself.  That's not a
great method.  One problem with it is that you can't reasonably
uninstall a hook function, which would be a nice thing to be able to
do. We could do better by reusing the technique from on_shmem_exit or
RegisterXactCallbacks: keep an array of pointers, and call them in
order. I wish we'd retrofit all of our hooks to work more like that;
being able to unload shared libraries would be a good feature.

But if we want to stick with the ad-hoc method, we could also just
have four possible return values: dominates, dominated-by, both, or
neither.

Still, this doesn't feel like very scalable paradigm, because this
code gets called a lot.  Unless both calling the hook functions and
the hook functions themselves are dirt-cheap, it's going to hurt, and
TBH, I wonder if even the cost of detecting that the hook is unused
might be material.

I wonder whether we might get a nicer solution to this problem if our
method of managing paths looked less something invented by a LISP
hacker, but I don't have a specific proposal in mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: How to retain lesser paths at add_path()?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Oct 7, 2019 at 9:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We could imagine, maybe, that a hook for the purpose of allowing an
>> additional dimension to be considered would be essentially a path
>> comparison function, returning -1, +1, or 0 depending on whether
>> path A is dominated by path B (on this new dimension), dominates
>> path B, or neither.  However, I do not see how multiple extensions
>> could usefully share use of such a hook.

> ... if we want to stick with the ad-hoc method, we could also just
> have four possible return values: dominates, dominated-by, both, or
> neither.

Right, and then *each* user of the hook would have to be prepared
to merge its result with the result from the previous user(s),
which is a complicated bit of logic that somebody would surely
get wrong, especially if (a) there's no prototype to copy from
and (b) testing only their own extension would not exercise it.

[ thinks a bit... ]  Maybe that could be improved if we can express
the result as a bitmask, defined in such a way that OR'ing (or maybe
AND'ing? haven't worked it out) the results from different comparisons
does the right thing.

> Still, this doesn't feel like very scalable paradigm, because this
> code gets called a lot.  Unless both calling the hook functions and
> the hook functions themselves are dirt-cheap, it's going to hurt, and
> TBH, I wonder if even the cost of detecting that the hook is unused
> might be material.

Yeah, I'm worried about that too.  This is quite a hot code path,
and so I don't think we can just assume that changes are free.
Still, if we could come up with a cleaner paradigm, maybe we could
buy back a few cycles in the core-code comparison logic, and thus
not come out behind from adding a hook.

            regards, tom lane



Re: How to retain lesser paths at add_path()?

От
Kohei KaiGai
Дата:
2019年10月7日(月) 23:44 Robert Haas <robertmhaas@gmail.com>:
>
> On Mon, Oct 7, 2019 at 9:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > We could imagine, maybe, that a hook for the purpose of allowing an
> > additional dimension to be considered would be essentially a path
> > comparison function, returning -1, +1, or 0 depending on whether
> > path A is dominated by path B (on this new dimension), dominates
> > path B, or neither.  However, I do not see how multiple extensions
> > could usefully share use of such a hook.
>
> Typically, we support hook-sharing mostly by ad-hoc methods: when
> installing a hook, you remember the previous value of the function
> pointer, and arrange to call that function yourself.  That's not a
> great method.  One problem with it is that you can't reasonably
> uninstall a hook function, which would be a nice thing to be able to
> do. We could do better by reusing the technique from on_shmem_exit or
> RegisterXactCallbacks: keep an array of pointers, and call them in
> order. I wish we'd retrofit all of our hooks to work more like that;
> being able to unload shared libraries would be a good feature.
>
> But if we want to stick with the ad-hoc method, we could also just
> have four possible return values: dominates, dominated-by, both, or
> neither.
>
It seems to me this is a bit different from the purpose of this hook.
I never intend to overwrite existing cost-based decision by this hook.
The cheapest path at a particular level is the cheapest one regardless
of the result of this hook. However, this hook enables to prevent
immediate elimination of a particular path that we (extension) want to
use it later and may have potentially cheaper cost (e.g; a pair of
custom GpuAgg + GpuJoin by reduction of DMA cost).

So, I think expected behavior when multiple extensions would use
this hook is clear. If any of call-chain on the hook wanted to preserve
the path, it should be kept on the pathlist. (Of couse, it is not a cheapest
one)

> Still, this doesn't feel like very scalable paradigm, because this
> code gets called a lot.  Unless both calling the hook functions and
> the hook functions themselves are dirt-cheap, it's going to hurt, and
> TBH, I wonder if even the cost of detecting that the hook is unused
> might be material.
>
> I wonder whether we might get a nicer solution to this problem if our
> method of managing paths looked less something invented by a LISP
> hacker, but I don't have a specific proposal in mind.
>
One other design in my mind is, add a callback function pointer on Path
structure. Only if Path structure has valid pointer (not-NULL), add_path()
calls extension's own logic to determine whether the Path can be
eliminated now.
This design may minimize the number of callback invocation.

One potential downside of this approach is, function pointer makes
hard to implement readfuncs of Path nodes, even though we have
no read handler of them, right now.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



Re: How to retain lesser paths at add_path()?

От
Tom Lane
Дата:
Kohei KaiGai <kaigai@heterodb.com> writes:
> 2019年10月7日(月) 23:44 Robert Haas <robertmhaas@gmail.com>:
>> But if we want to stick with the ad-hoc method, we could also just
>> have four possible return values: dominates, dominated-by, both, or
>> neither.

> It seems to me this is a bit different from the purpose of this hook.
> I never intend to overwrite existing cost-based decision by this hook.
> The cheapest path at a particular level is the cheapest one regardless
> of the result of this hook. However, this hook enables to prevent
> immediate elimination of a particular path that we (extension) want to
> use it later and may have potentially cheaper cost (e.g; a pair of
> custom GpuAgg + GpuJoin by reduction of DMA cost).

I do not think this will work for the purpose you wish, for the reason
Tomas already pointed out: if you don't also modify the accept_new
decision, there's no guarantee that the path you want will get into
the relation's path list in the first place.

Another problem with trying to manage this only by preventing removals
is that it is likely to lead to keeping extra paths and thereby wasting
planning effort.  What if there's more than one path having the property
you want to keep?

Given the way that add_path works, you really have to think about the
problem as adding an additional figure-of-merit or comparison dimension.
Anything else is going to have some unpleasant behaviors.

> One other design in my mind is, add a callback function pointer on Path
> structure. Only if Path structure has valid pointer (not-NULL), add_path()
> calls extension's own logic to determine whether the Path can be
> eliminated now.

While I'm not necessarily against having a per-path callback, I don't
see how it helps us solve this problem, especially not in the presence
of multiple extensions trying to add different paths.  I do not wish
to see things ending up with extensions saying "don't delete my path
no matter what", because that's going to be costly in terms of later
planning effort.  But I'm not seeing how this wouldn't degenerate to
pretty much that behavior.

            regards, tom lane



Re: How to retain lesser paths at add_path()?

От
Kohei KaiGai
Дата:
2019年10月8日(火) 1:56 Tom Lane <tgl@sss.pgh.pa.us>:
>
> Kohei KaiGai <kaigai@heterodb.com> writes:
> > 2019年10月7日(月) 23:44 Robert Haas <robertmhaas@gmail.com>:
> >> But if we want to stick with the ad-hoc method, we could also just
> >> have four possible return values: dominates, dominated-by, both, or
> >> neither.
>
> > It seems to me this is a bit different from the purpose of this hook.
> > I never intend to overwrite existing cost-based decision by this hook.
> > The cheapest path at a particular level is the cheapest one regardless
> > of the result of this hook. However, this hook enables to prevent
> > immediate elimination of a particular path that we (extension) want to
> > use it later and may have potentially cheaper cost (e.g; a pair of
> > custom GpuAgg + GpuJoin by reduction of DMA cost).
>
> I do not think this will work for the purpose you wish, for the reason
> Tomas already pointed out: if you don't also modify the accept_new
> decision, there's no guarantee that the path you want will get into
> the relation's path list in the first place.
>
Ah, it is right, indeed. We may need to have a variation of add_path()
that guarantee to preserve a path newly added.
We may be utilize the callback to ask extension whether it allows the
new path to be dominated by the existing cheapest path also.

> Another problem with trying to manage this only by preventing removals
> is that it is likely to lead to keeping extra paths and thereby wasting
> planning effort.  What if there's more than one path having the property
> you want to keep?
>
My assumption is that upper path tries to walk on the path-list, not only
cheapest one, to construct upper paths with lesser paths if they are capable
for special optimization.
Of course, it is not a cheap way, however, majority of path-nodes are not
interested in the lesser paths, as its sub-path. Only limited number of
extension will walk on the lesser path also?
A separated list is probably an idea, to keep the lesser paths. It is not
referenced at the usual path, however, extension can walk on the list
to lookup another opportunity more than the cheapest path.
In this case, length of the path_list is not changed.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



Re: How to retain lesser paths at add_path()?

От
Tomas Vondra
Дата:
Hi,

I wonder what is the status of this patch/thread. There was quite a bit
of discussion about possible approaches, but we currently don't have any
patch for review, AFAICS. Not sure what's the plan?

So "needs review" status seems wrong, and considering we haven't seen
any patch since August (so in the past two CFs) I propose marking it as
returned with feedback. Any objections?

FWIW I think we may well need a more elaborate logic which paths to
keep, but I'd prefer re-adding it back to the CF when we actually have a
new patch.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: How to retain lesser paths at add_path()?

От
Kohei KaiGai
Дата:
Hi,

The proposition I posted at 10th-Oct proposed to have a separate list to retain
lesser paths not to expand the path_list length, but here are no comments by
others at that time.
Indeed, the latest patch has not been updated yet.
Please wait for a few days. I'll refresh the patch again.

Thanks,

2020年1月11日(土) 11:01 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
>
> Hi,
>
> I wonder what is the status of this patch/thread. There was quite a bit
> of discussion about possible approaches, but we currently don't have any
> patch for review, AFAICS. Not sure what's the plan?
>
> So "needs review" status seems wrong, and considering we haven't seen
> any patch since August (so in the past two CFs) I propose marking it as
> returned with feedback. Any objections?
>
> FWIW I think we may well need a more elaborate logic which paths to
> keep, but I'd prefer re-adding it back to the CF when we actually have a
> new patch.
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



Re: How to retain lesser paths at add_path()?

От
Tomas Vondra
Дата:
On Sat, Jan 11, 2020 at 05:07:11PM +0900, Kohei KaiGai wrote:
>Hi,
>
>The proposition I posted at 10th-Oct proposed to have a separate list to retain
>lesser paths not to expand the path_list length, but here are no comments by
>others at that time.
>Indeed, the latest patch has not been updated yet.
>Please wait for a few days. I'll refresh the patch again.
>

OK, thanks for the update. I've marked the patch as "waiting on author".


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: How to retain lesser paths at add_path()?

От
Kohei KaiGai
Дата:
The v2 patch is attached.

This adds two dedicated lists on the RelOptInfo to preserve lesser paths
if extension required to retain the path-node to be removed in usual manner.
These lesser paths are kept in the separated list, so it never expand the length
of pathlist and partial_pathlist. That was the arguable point in the discussion
at the last October.

The new hook is called just before the path-node removal operation, and
gives extension a chance for extra decision.
If extension considers the path-node to be removed can be used in the upper
path construction stage, they can return 'true' as a signal to preserve this
lesser path-node.
In case when same kind of path-node already exists in the preserved_pathlist
and the supplied lesser path-node is cheaper than the old one, extension can
remove the worse one arbitrarily to keep the length of preserved_pathlist.
(E.g, PG-Strom may need one GpuJoin path-node either pathlist or preserved-
pathlist for further opportunity of combined usage with GpuPreAgg path-node.
It just needs "the best GpuJoin path-node" somewhere, not two or more.)

Because PostgreSQL core has no information which preserved path-node can
be removed, extensions that uses path_removal_decision_hook() has responsibility
to keep the length of preserved_(partial_)pathlist reasonable.


BTW, add_path() now removes the lesser path-node by pfree(), not only detach
from the path-list. (IndexPath is an exception)
Does it really make sense? It only releases the path-node itself, so may not
release entire objects. So, efficiency of memory usage is limited. And
ForeignScan
/ CustomScan may references the path-node to be removed. It seems to me here
is no guarantee lesser path-nodes except for IndexPath nodes are safe
to release.

Best regards,

2020年1月11日(土) 21:27 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
>
> On Sat, Jan 11, 2020 at 05:07:11PM +0900, Kohei KaiGai wrote:
> >Hi,
> >
> >The proposition I posted at 10th-Oct proposed to have a separate list to retain
> >lesser paths not to expand the path_list length, but here are no comments by
> >others at that time.
> >Indeed, the latest patch has not been updated yet.
> >Please wait for a few days. I'll refresh the patch again.
> >
>
> OK, thanks for the update. I've marked the patch as "waiting on author".
>
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Вложения

Re: How to retain lesser paths at add_path()?

От
Dmitry Dolgov
Дата:
> On Tue, Jan 14, 2020 at 12:46:02AM +0900, Kohei KaiGai wrote:
> The v2 patch is attached.
>
> This adds two dedicated lists on the RelOptInfo to preserve lesser paths
> if extension required to retain the path-node to be removed in usual manner.
> These lesser paths are kept in the separated list, so it never expand the length
> of pathlist and partial_pathlist. That was the arguable point in the discussion
> at the last October.
>
> The new hook is called just before the path-node removal operation, and
> gives extension a chance for extra decision.
> If extension considers the path-node to be removed can be used in the upper
> path construction stage, they can return 'true' as a signal to preserve this
> lesser path-node.
> In case when same kind of path-node already exists in the preserved_pathlist
> and the supplied lesser path-node is cheaper than the old one, extension can
> remove the worse one arbitrarily to keep the length of preserved_pathlist.
> (E.g, PG-Strom may need one GpuJoin path-node either pathlist or preserved-
> pathlist for further opportunity of combined usage with GpuPreAgg path-node.
> It just needs "the best GpuJoin path-node" somewhere, not two or more.)
>
> Because PostgreSQL core has no information which preserved path-node can
> be removed, extensions that uses path_removal_decision_hook() has responsibility
> to keep the length of preserved_(partial_)pathlist reasonable.

Hi,

Thanks for the patch! I had a quick look at it and have a few questions:

* What would be the exact point/hook at which an extension can use
  preserved pathlists? I guess it's important, since I can imagine it's
  important for one of the issues mentioned in the thread about such an
  extension have to re-do significant part of the calculations from
  add_path.

* Do you have any benchmark results with some extension using this
  hook? The idea with another pathlist of "discarded" paths sounds like
  a lightweight solution, and indeed I've performed few tests with two
  workloads (simple queries, queries with joins of 10 tables) and the
  difference between master and patched versions is rather small (no
  stable difference for the former, couple of percent for the latter).
  But it's of course with an empty hook, so it would be good to see
  other benchmarks as well.

* Does it make sense to something similar with add_path_precheck,
  which also in some situations excluding paths?

* This part sounds dangerous for me:

> Because PostgreSQL core has no information which preserved path-node can
> be removed, extensions that uses path_removal_decision_hook() has responsibility
> to keep the length of preserved_(partial_)pathlist reasonable.

  since an extension can keep limited number of paths in the list, but
  then the same hook could be reused by another extension which will
  also try to limit such paths, but together they'll explode.



Re: How to retain lesser paths at add_path()?

От
Jaime Casanova
Дата:
On Sat, Mar 06, 2021 at 06:50:13PM +0900, Kohei KaiGai wrote:
> 
> On the other hand, I'm uncertain whether the pfree(new_path) at the tail
> of add_path makes sense on the modern hardware, because they allow to
> recycle just small amount of memory, then entire memory consumption
> by the optimizer shall be released by MemoryContext mechanism.
> If add_path does not release path-node, the above portion is much simpler.
> 

Hi Kaigai-san,

Do you have an updated patch? Please feel free to resubmit to next CF.
Current CF entry has been marked as RwF.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



Re: How to retain lesser paths at add_path()?

От
Greg Stark
Дата:
On Wed, 31 Jul 2019 at 11:45, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jul 31, 2019 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > What you'd want to do for something like the above, I think, is to
> > have some kind of figure of merit or other special marking for paths
> > that will have some possible special advantage in later planning
> > steps.  Then you can teach add_path that that's another dimension it
> > should consider, in the same way that paths with different sort orders
> > or parallizability attributes don't dominate each other.
>
> Yeah, but I have to admit that this whole design makes me kinda
> uncomfortable.  Every time somebody comes up with a new figure of
> merit, it increases not only the number of paths retained but also the
> cost of comparing two paths to possibly reject one of them.

But this is a fundamental problem with having lots of possible reasons
a path might be good. Not a problem with the algorithm.

I'm imagining that you're both right. If we had some sort of way to
look at the shape of the query and make decisions early on about what
figures of merit might be relevant then we might be able to pick just
a few. Sort of like how we currently only check paths that match some
join or other query feature.


-- 
greg



Re: How to retain lesser paths at add_path()?

От
Nils Dijk
Дата:
Hi,

While researching the viability to change Citus' planner in a way to
take more advantage of the postgres planner I ran into the same
limitations in add_path as described on this thread. For Citus we are
investigating a possibility to introduce Path nodes that describe the
operation of transer tuples over the network. Due to the pruning rules
in add_path we currently lose many paths that have great opportunities
of future optimizations.

Having looked at the patches provided I don't think that adding a new
List to the Path structure where we retain 'lesser' paths is the right
approach. First of all, this would require extension developers to
interpret all these paths, where many would dominate others on cost,
sorting, parameterization etc.

Instead I like the approach suggested by both Robert Haas and Tom Lane.
> have some kind of figure of merit or other special marking for paths
> that will have some possible special advantage in later planning
> steps.

This is well in line with how the current logic works in add_path
where cost, sorting and parameterization, rowcount and parallel safety
are dimensions on which paths are compared. IMHO extensions should be
able to add dimensions of their interest to this list of current
dimensions used.

The thoughts Tom Lane expressed earlier struc a chord with me.
> [ thinks a bit... ]  Maybe that could be improved if we can express
> the result as a bitmask, defined in such a way that OR'ing (or maybe
> AND'ing? haven't worked it out) the results from different comparisons
> does the right thing.

Attached you will find 3 patches that implement a way for extensions
to introduce 'a figure of merit' by the means of path comparisons.
 - The first patch refactors the decision logic out of the forloop
into their own function as to make it easier to reason about what is
being compared. This refactor also changes the direction of some
if-statements as to provide clear early decision points.
 - The second patch rewrites the original if/else/switch tree into 5
logical or operations of a bitmask expressing the comparison between
paths, together with early escapes once we know the paths are
different. To keep the original logic there is a slight deviation from
simply or-ing 5 comparisons. After comparing cost, sorting and
parameterization we only continue or-ing rows and parallelism if the
paths are leaning either to path1 (new_path) or path2 (old_path). If
the first 3 comparisons result in the paths being equal the original
code prefers parallel paths over paths with less rows.
 - The third and last path builds on the logical or operations
introduced in the middle patch. After the first three dimensions
postgres compares paths on we allow extensions to compare paths in the
same manner. Their result gets merged into the compounded comparison.

To come to the three patches above I have decomposed the orignal
behaviour into 3 possible decisions add_path can take per iteration in
the loop. It either REJECTS the new path, DOMINATES the old path or
CONTINUES with the next path in the list. The decision for any of the
3 actions is based on 6 different input parameters:
 - cost std fuzz factor
 - sort order / pathkeys
 - parameterizations / bitmap subset of required outer relid's
 - row count
 - parallel safety
 - cost smaller fuzz factor

To verify the correct decisions being made in the refactor of the
second patch I modelled both implementations in a scripting language
and passed in all possible comparisons for the six dimensions above.
With the change of behaviour after the first 3 dimensions I came to an
exact one-to-one mapping of the decisions being made before and after
patch 2.

Throughout this thread an emphasis on performance has been expressed
by many participants. I want to acknowledge their stance. Due to the
nature of the current planner the code in add_path gets invoked on an
exponential scale when the number of joins increases and extra paths
are retained. Especially with my proposed patch being called inside
the loop where paths are being compared makes that the hook gets
called ever more often compared to the earlier proposed patches on
this thread.

To reason about the performance impact of a patch in this area I think
we need to get to a mutual understanding and agreement on how to
evaluate the performance.

Given many if not most installations will run without any hook
installed in this area we should aim for minimal to no measurable
overhead of the code without a hook installed. This is also why I made
sure, via modelling of the decision logic in a scripting language the
behaviour of which paths are retained is not changed with these
patches when no hook is installed.

When an extension needs to add a dimension to the comparison of paths
the impact of this patch is twofold:
 - A dynamic function gets invoked to compare every path to every
other path. Both the dynamic function call and the logics to compare
the paths will introduce extra time spent in add_path
 - A hook in this area will by definition always retain more paths.
This is just the nature of how the comparison of paths in this area
works.

Both of the dimensions above will make that extensions requiring this
hook to retain more paths will have to think carefully about the work
they do, and on how many dimensions paths are compared in the end. As
Tomas Vondra pointed out earlier in this thread:
> I think the assumption is that the savings from
> building better plans far outweight that extra cost.

For Citus this translates into less network traffic by having more
optimization possibilities of pushing down expensive and row reducing
operations like joins and groupings, etc to the nodes hosting the
data. For PG-Strom this translates into amortising the DMA transfer
cost between host and GPU. Due to the gravity of data we always prefer
extra planning time for complex queries over transferring many tuples.
Frankly, our planner is already introducing much overhead currently
and we expect to reduce this overhead by having the possibility to
hook into postgres in areas like this.

To understand the impact these patches have when no hook is installed
I performed the following two comparisons.
 - Run a benchmark on a patched and a non-patched version of postgres
14.1. I used HammerDB for this as we have tooling around to quickly
run these. Given the execution time dominates such a benchmark I don't
think it gives good insights. At Least it shows that both versions
perform in the same range of performance.
 - Analysed the machine code on Ubuntu 20.04 x86_64. The machine code
is very much alike. The patched version has slightly less jumps and
slightly less instructions. My fear was that the excessive breaking
into small functions and switches to translate enums into each other
would cause many layers of indirection to be introduced. Instead all
code gets inlined when compiled with the same configuration as the
14.1 release.

I don't think the above two are enough to conclude this patch doesn't
introduce overhead when the hook is empty. I invite everyone with an
interest in this area to perform their own measurements and report
their findings.

Best,
-- Nils
Citus Data / Microsoft

Вложения

Re: How to retain lesser paths at add_path()?

От
Tom Lane
Дата:
Nils Dijk <me@thanod.nl> writes:
> Attached you will find 3 patches that implement a way for extensions
> to introduce 'a figure of merit' by the means of path comparisons.

I took a brief look through this.  I quite like your idea of expressing
PathComparison merging as an OR of suitably-chosen values.  I do have
some minor criticisms of the patch, which potentially make for small
performance improvements so I've not bothered yet to try to measure
performance.

* I think you could do without PATH_COMPARISON_MASK.  Use of the enum
already implies that we only allow valid values of the enum, and given
that the inputs of path_comparison_combine are valid, so is the output
of the "|".  There's no need to expend cycles masking it, and if there
were it would be dubious whether the masking restores correctness.
What *is* needed, though, is a comment pointing out that the values of
PathComparison are chosen with malice aforethought to cause ORing of
them to give semantically-correct results.

* I'd also drop enum AddPathDecision and add_path_decision(),
and just let add_path() do what it must based on the PathComparison
result.  I don't think the extra layer of mapping adds anything,
and it's probably costing some cycles.

* Perhaps it's worth explicitly marking the new small functions
as "static inline"?  Probably modern compilers will do that without
being prompted, but we might as well be clear about what we want.

* Some more attention to comments is needed, eg the header comment
for compare_path_costs_fuzzily still refers to COSTS_DIFFERENT.
(However, on the whole I'm not sure s/COSTS_DIFFERENT/PATHS_DIFFERENT/
etc is an improvement.  Somebody looking at this code for the first
time would probably think the answer should always be that the two
paths are "different", because one would hope we're not generating
redundant identical paths.  What we want to say is that the paths'
figures of merit are different; but "PATH_FIGURES_OF_MERIT_DIFFERENT"
is way too verbose.  Unless somebody has got a good proposal for
a short name I'd lean to sticking with the COSTS_XXX names.)

            regards, tom lane



Re: How to retain lesser paths at add_path()?

От
Tom Lane
Дата:
... BTW, a question I think we're going to be forced to face up to
if we put a hook here is: is path cost/FOM comparison guaranteed
transitive?  That is, if we find that path A dominates path B
and that path B dominates path C, is it guaranteed that comparing
A directly to C would conclude that A dominates C?  add_path()
currently assumes that such transitivity holds, because if the
new_path dominates an old_path, we immediately discard old_path.
This is unjustifiable if new_path later gets rejected because it
is dominated by some later list element: we just lost a path and
replaced it with nothing.  (Presumably, that can only happen if
neither existing list entry dominates the other.)

TBH, I'm not entirely convinced that transitivity is guaranteed
even today, now that we've got things like parallel safety in
the mix.  For sure I doubt that we should assume that injecting
multiple hook functions each with their own agendas will result
in transitivity-preserving comparisons.

The most honest way to deal with that would be to convert
add_path to a two-pass implementation.  In the first pass,
see if new_path is dominated by any existing list entry;
if so, stop immediately, discarding new_path.  If we get
through that, we will add new_path, so now identify which
old paths it dominates and remove them.  We could avoid
running path_compare() twice by retaining state from the
first pass to the second, but that's hardly free.  On the
other hand, if you assume that most add_path calls end in
rejecting new_path, having a streamlined route to determining
that could be a win.

A possibly-cheaper answer could be to say that if new_path is
found to dominate any old_path, add it, even if it's later found
to be dominated.  This'd only require some rejiggering of the way
the accept_new flag is tracked, I think.  On the other hand this
way might be penny wise and pound foolish, if it ends in keeping
more paths than we really need.

Thoughts?

            regards, tom lane



Re: How to retain lesser paths at add_path()?

От
Andres Freund
Дата:
Hi,

On 2022-07-31 16:05:20 -0400, Tom Lane wrote:
> Thoughts?

As the patch got some feedback ~2 months ago, I'm updating the status to
waiting-for-author.

Minor note: cfbot complains about a cpluspluscheck violation:

[12:24:50.124] time make -s cpluspluscheck EXTRAFLAGS='-fmax-errors=10'
[12:25:17.757] In file included from /tmp/cpluspluscheck.AoEDdi/test.cpp:3:
[12:25:17.757] /tmp/cirrus-ci-build/src/include/optimizer/pathnode.h: In function ‘PathComparison
path_comparison_combine(PathComparison,PathComparison)’:
 
[12:25:17.757] /tmp/cirrus-ci-build/src/include/optimizer/pathnode.h:39:19: error: invalid conversion from ‘int’ to
‘PathComparison’[-fpermissive]
 
[12:25:17.757]    39 |  return (c1 | c2) & PATH_COMPARISON_MASK;
[12:25:17.757]       |                   ^
[12:25:17.757]       |                   |
[12:25:17.757]       |                   int
[12:25:33.857] make: *** [GNUmakefile:141: cpluspluscheck] Error 1

Greetings,

Andres Freund