Обсуждение: add_partial_path() may remove dominated path but still in use

Поиск
Список
Период
Сортировка

add_partial_path() may remove dominated path but still in use

От
Kohei KaiGai
Дата:
Hello,

I've investigated a crash report of PG-Strom for a few days, then I doubt
add_partial_path() can unexpectedly release dominated old partial path
but still referenced by other Gather node, and it leads unexpected system
crash.

Please check at the gpuscan.c:373
https://github.com/heterodb/pg-strom/blob/master/src/gpuscan.c#L373

The create_gpuscan_path() constructs a partial custom-path, then it is
added to the partial_pathlist of the baserel.
If both of old and new partial paths have no pathkeys and old-path has
larger cost, add_partial_path detaches the old path from the list, then
calls pfree() to release old_path itself.

On the other hands, the baserel may have GatherPath which references
the partial-path on its pathlist. Here is no check whether the old partial-
paths are referenced by others, or not.

To ensure my assumption, I tried to inject elog() before/after the call of
add_partial_path() and just before the pfree(old_path) in add_partial_path().

----------------------------------------------------------------
dbt3=# explain select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1994-01-01'
    and l_shipdate < cast(date '1994-01-01' + interval '1 year' as date)
    and l_discount between 0.08 - 0.01 and 0.08 + 0.01
    and l_quantity < 24 limit 1;
INFO:  GpuScan:389  GATHER(0x28f3c30), SUBPATH(0x28f3f88): {GATHERPATH
:pathtype 44 :parent_relids (b 1) :required_outer (b) :parallel_aware
false :parallel_safe false :parallel_workers 0 :rows 151810
:startup_cost 1000.00 :total_cost 341760.73 :pathkeys <> :subpath
{PATH :pathtype 18 :parent_relids (b 1) :required_outer (b)
:parallel_aware true :parallel_safe true :parallel_workers 2 :rows
63254 :startup_cost 0.00 :total_cost 325579.73 :pathkeys <>}
:single_copy false :num_workers 2}
INFO:  add_partial_path:830 old_path(0x28f3f88) is removed
WARNING:  could not dump unrecognized node type: 2139062143
INFO:  GpuScan:401  GATHER(0x28f3c30), SUBPATH(0x28f3f88): {GATHERPATH
:pathtype 44 :parent_relids (b 1) :required_outer (b) :parallel_aware
false :parallel_safe false :parallel_workers 0 :rows 151810
:startup_cost 1000.00 :total_cost 341760.73 :pathkeys <> :subpath
{(HOGE)} :single_copy false :num_workers 2}
----------------------------------------------------------------

At the L389, GatherPath in the baresel->pathlist is healthy. Its
subpath (0x28f3f88) is
a valid T_Scan path node.
Then, gpuscan.c adds a cheaper path-node so add_partial_path()
considers the above
subpath (0x28f3f88) is dominated by the new custom-path, and release it.
So, elog() at L401 says subpath has unrecognized node type: 2139062143
== 0x7f7f7f7f
that implies the memory region was already released by pfree().

Reference counter or other mechanism to tack referenced paths may be an idea
to avoid unintentional release of path-node.
On the other hands, it seems to me the pfree() at add_path /
add_partial_path is not
a serious memory management because other objects referenced by the path-node
are not released here.
It is sufficient if we detach dominated path-node from the pathlist /
partial_pathlist.

How about your opinions?

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>


Re: add_partial_path() may remove dominated path but still in use

От
Tom Lane
Дата:
Kohei KaiGai <kaigai@heterodb.com> writes:
> I've investigated a crash report of PG-Strom for a few days, then I doubt
> add_partial_path() can unexpectedly release dominated old partial path
> but still referenced by other Gather node, and it leads unexpected system
> crash.

Hm.  This seems comparable to the special case in plain add_path, where it
doesn't attempt to free IndexPaths because of the risk that they're still
referenced.  So maybe we should just drop the pfree here.

However, first I'd like to know why this situation is arising in the first
place.  To have the situation you're describing, we'd have to have
attempted to make some Gather paths before we have all the partial paths
for the relation they're for.  Why is that a good thing to do?  It seems
like such Gathers are necessarily being made with incomplete information,
and we'd be better off to fix things so that none are made till later.

            regards, tom lane


Re: add_partial_path() may remove dominated path but still in use

От
Kohei KaiGai
Дата:
2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
>
> Kohei KaiGai <kaigai@heterodb.com> writes:
> > I've investigated a crash report of PG-Strom for a few days, then I doubt
> > add_partial_path() can unexpectedly release dominated old partial path
> > but still referenced by other Gather node, and it leads unexpected system
> > crash.
>
> Hm.  This seems comparable to the special case in plain add_path, where it
> doesn't attempt to free IndexPaths because of the risk that they're still
> referenced.  So maybe we should just drop the pfree here.
>
> However, first I'd like to know why this situation is arising in the first
> place.  To have the situation you're describing, we'd have to have
> attempted to make some Gather paths before we have all the partial paths
> for the relation they're for.  Why is that a good thing to do?  It seems
> like such Gathers are necessarily being made with incomplete information,
> and we'd be better off to fix things so that none are made till later.
>
Because of the hook location, Gather-node shall be constructed with built-in
and foreign partial scan node first, then extension gets a chance to add its
custom paths (partial and full).
At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
generate_gather_paths(). Even if extension adds some partial paths later,
the first generate_gather_paths() has to generate Gather node based on
incomplete information.
If we could ensure Gather node shall be made after all the partial nodes
are added, it may be a solution for the problem.

Of course, relocation of the hook may have a side-effect. Anyone may
expect the pathlist contains all the path-node including Gather node, for
editorialization of the pathlist.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>


Re: add_partial_path() may remove dominated path but still in use

От
Tom Lane
Дата:
Kohei KaiGai <kaigai@heterodb.com> writes:
> 2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
>> However, first I'd like to know why this situation is arising in the first
>> place.  To have the situation you're describing, we'd have to have
>> attempted to make some Gather paths before we have all the partial paths
>> for the relation they're for.  Why is that a good thing to do?  It seems
>> like such Gathers are necessarily being made with incomplete information,
>> and we'd be better off to fix things so that none are made till later.

> Because of the hook location, Gather-node shall be constructed with built-in
> and foreign partial scan node first, then extension gets a chance to add its
> custom paths (partial and full).
> At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> generate_gather_paths().

Hmm.  I'm inclined to think that we should have a separate hook
in which extensions are allowed to add partial paths, and that
set_rel_pathlist_hook should only be allowed to add regular paths.

            regards, tom lane


Re: add_partial_path() may remove dominated path but still in use

От
Kohei KaiGai
Дата:
2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
>
> Kohei KaiGai <kaigai@heterodb.com> writes:
> > 2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
> >> However, first I'd like to know why this situation is arising in the first
> >> place.  To have the situation you're describing, we'd have to have
> >> attempted to make some Gather paths before we have all the partial paths
> >> for the relation they're for.  Why is that a good thing to do?  It seems
> >> like such Gathers are necessarily being made with incomplete information,
> >> and we'd be better off to fix things so that none are made till later.
>
> > Because of the hook location, Gather-node shall be constructed with built-in
> > and foreign partial scan node first, then extension gets a chance to add its
> > custom paths (partial and full).
> > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > generate_gather_paths().
>
> Hmm.  I'm inclined to think that we should have a separate hook
> in which extensions are allowed to add partial paths, and that
> set_rel_pathlist_hook should only be allowed to add regular paths.
>
I have almost same opinion, but the first hook does not need to be
dedicated for partial paths. As like set_foreign_pathlist() doing, we can
add both of partial and regular paths here, then generate_gather_paths()
may generate a Gather-path on top of the best partial-path.

On the other hands, the later hook must be dedicated to add regular paths,
and also provides a chance for extensions to manipulate pre-built path-list
including Gather-path.
As long as I know, pg_hint_plan uses the set_rel_pathlist_hook to enforce
a particular path-node, including Gather-node, by manipulation of the cost
value. Horiguchi-san, is it right?
Likely, this kind of extension needs to use the later hook.

I expect these hooks are located as follows:

set_rel_pathlist(...)
{
        :
    <snip>
        :
    /* for partial / regular paths */
    if (set_rel_pathlist_hook)
      (*set_rel_pathlist_hook) (root, rel, rti, rte);
    /* generate Gather-node */
    if (rel->reloptkind == RELOPT_BASEREL)
        generate_gather_paths(root, rel);
    /* for regular paths and manipulation */
    if (post_rel_pathlist_hook)
      (*post_rel_pathlist_hook) (root, rel, rti, rte);

    set_cheapest();
}

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>


Re: add_partial_path() may remove dominated path but still in use

От
Amit Kapila
Дата:
On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> 2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
> >
> > Kohei KaiGai <kaigai@heterodb.com> writes:
> > > 2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
> > >> However, first I'd like to know why this situation is arising in the first
> > >> place.  To have the situation you're describing, we'd have to have
> > >> attempted to make some Gather paths before we have all the partial paths
> > >> for the relation they're for.  Why is that a good thing to do?  It seems
> > >> like such Gathers are necessarily being made with incomplete information,
> > >> and we'd be better off to fix things so that none are made till later.
> >
> > > Because of the hook location, Gather-node shall be constructed with built-in
> > > and foreign partial scan node first, then extension gets a chance to add its
> > > custom paths (partial and full).
> > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > > generate_gather_paths().
> >
> > Hmm.  I'm inclined to think that we should have a separate hook
> > in which extensions are allowed to add partial paths, and that
> > set_rel_pathlist_hook should only be allowed to add regular paths.

+1.  This idea sounds sensible to me.

> >
> I have almost same opinion, but the first hook does not need to be
> dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> add both of partial and regular paths here, then generate_gather_paths()
> may generate a Gather-path on top of the best partial-path.
>

Won't it be confusing for users if we allow both partial and full
paths in first hook and only full paths in the second hook?
Basically, in many cases, the second hook won't be of much use.  What
advantage you are seeing in allowing both partial and full paths in
the first hook?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: add_partial_path() may remove dominated path but still in use

От
Kohei KaiGai
Дата:
2018年12月31日(月) 13:10 Amit Kapila <amit.kapila16@gmail.com>:
>
> On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > 2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
> > >
> > > Kohei KaiGai <kaigai@heterodb.com> writes:
> > > > 2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
> > > >> However, first I'd like to know why this situation is arising in the first
> > > >> place.  To have the situation you're describing, we'd have to have
> > > >> attempted to make some Gather paths before we have all the partial paths
> > > >> for the relation they're for.  Why is that a good thing to do?  It seems
> > > >> like such Gathers are necessarily being made with incomplete information,
> > > >> and we'd be better off to fix things so that none are made till later.
> > >
> > > > Because of the hook location, Gather-node shall be constructed with built-in
> > > > and foreign partial scan node first, then extension gets a chance to add its
> > > > custom paths (partial and full).
> > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > > > generate_gather_paths().
> > >
> > > Hmm.  I'm inclined to think that we should have a separate hook
> > > in which extensions are allowed to add partial paths, and that
> > > set_rel_pathlist_hook should only be allowed to add regular paths.
>
> +1.  This idea sounds sensible to me.
>
> > >
> > I have almost same opinion, but the first hook does not need to be
> > dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> > add both of partial and regular paths here, then generate_gather_paths()
> > may generate a Gather-path on top of the best partial-path.
> >
>
> Won't it be confusing for users if we allow both partial and full
> paths in first hook and only full paths in the second hook?
> Basically, in many cases, the second hook won't be of much use.  What
> advantage you are seeing in allowing both partial and full paths in
> the first hook?
>
Two advantages. The first one is, it follows same manner of
set_foreign_pathlist()
which allows to add both of full and partial path if FDW supports parallel-scan.
The second one is practical. During the path construction, extension needs to
check availability to run (e.g, whether operators in WHERE-clause is supported
on GPU device...), calculate its estimated cost and so on. Not a small
portion of
them are common for both of full and partial path. So, if the first
hook accepts to
add both kind of paths at once, extension can share the common properties.

Probably, the second hook is only used for a few corner case where an extension
wants to manipulate path-list already built, like pg_hint_plan.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>


Re: add_partial_path() may remove dominated path but still in use

От
Amit Kapila
Дата:
On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
>
> 2018年12月31日(月) 13:10 Amit Kapila <amit.kapila16@gmail.com>:
> >
> > On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > 2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
> > > >
> > > > Kohei KaiGai <kaigai@heterodb.com> writes:
> > > > > 2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
> > > > >> However, first I'd like to know why this situation is arising in the first
> > > > >> place.  To have the situation you're describing, we'd have to have
> > > > >> attempted to make some Gather paths before we have all the partial paths
> > > > >> for the relation they're for.  Why is that a good thing to do?  It seems
> > > > >> like such Gathers are necessarily being made with incomplete information,
> > > > >> and we'd be better off to fix things so that none are made till later.
> > > >
> > > > > Because of the hook location, Gather-node shall be constructed with built-in
> > > > > and foreign partial scan node first, then extension gets a chance to add its
> > > > > custom paths (partial and full).
> > > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > > > > generate_gather_paths().
> > > >
> > > > Hmm.  I'm inclined to think that we should have a separate hook
> > > > in which extensions are allowed to add partial paths, and that
> > > > set_rel_pathlist_hook should only be allowed to add regular paths.
> >
> > +1.  This idea sounds sensible to me.
> >
> > > >
> > > I have almost same opinion, but the first hook does not need to be
> > > dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> > > add both of partial and regular paths here, then generate_gather_paths()
> > > may generate a Gather-path on top of the best partial-path.
> > >
> >
> > Won't it be confusing for users if we allow both partial and full
> > paths in first hook and only full paths in the second hook?
> > Basically, in many cases, the second hook won't be of much use.  What
> > advantage you are seeing in allowing both partial and full paths in
> > the first hook?
> >
> Two advantages. The first one is, it follows same manner of
> set_foreign_pathlist()
> which allows to add both of full and partial path if FDW supports parallel-scan.
> The second one is practical. During the path construction, extension needs to
> check availability to run (e.g, whether operators in WHERE-clause is supported
> on GPU device...), calculate its estimated cost and so on. Not a small
> portion of
> them are common for both of full and partial path. So, if the first
> hook accepts to
> add both kind of paths at once, extension can share the common properties.
>

You have a point, though I am not sure how much difference it can
create for cost computation as ideally, both will have different
costing model.  I understand there are some savings by avoiding some
common work, is there any way to cache the required information?

> Probably, the second hook is only used for a few corner case where an extension
> wants to manipulate path-list already built, like pg_hint_plan.
>

Okay, but it could be some work for extension authors who are using
the current hook, not sure they would like to divide the work between
first and second hook.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: add_partial_path() may remove dominated path but still in use

От
Kohei KaiGai
Дата:
2018年12月31日(月) 22:25 Amit Kapila <amit.kapila16@gmail.com>:
>
> On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> >
> > 2018年12月31日(月) 13:10 Amit Kapila <amit.kapila16@gmail.com>:
> > >
> > > On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > > 2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
> > > > >
> > > > > Kohei KaiGai <kaigai@heterodb.com> writes:
> > > > > > 2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
> > > > > >> However, first I'd like to know why this situation is arising in the first
> > > > > >> place.  To have the situation you're describing, we'd have to have
> > > > > >> attempted to make some Gather paths before we have all the partial paths
> > > > > >> for the relation they're for.  Why is that a good thing to do?  It seems
> > > > > >> like such Gathers are necessarily being made with incomplete information,
> > > > > >> and we'd be better off to fix things so that none are made till later.
> > > > >
> > > > > > Because of the hook location, Gather-node shall be constructed with built-in
> > > > > > and foreign partial scan node first, then extension gets a chance to add its
> > > > > > custom paths (partial and full).
> > > > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > > > > > generate_gather_paths().
> > > > >
> > > > > Hmm.  I'm inclined to think that we should have a separate hook
> > > > > in which extensions are allowed to add partial paths, and that
> > > > > set_rel_pathlist_hook should only be allowed to add regular paths.
> > >
> > > +1.  This idea sounds sensible to me.
> > >
> > > > >
> > > > I have almost same opinion, but the first hook does not need to be
> > > > dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> > > > add both of partial and regular paths here, then generate_gather_paths()
> > > > may generate a Gather-path on top of the best partial-path.
> > > >
> > >
> > > Won't it be confusing for users if we allow both partial and full
> > > paths in first hook and only full paths in the second hook?
> > > Basically, in many cases, the second hook won't be of much use.  What
> > > advantage you are seeing in allowing both partial and full paths in
> > > the first hook?
> > >
> > Two advantages. The first one is, it follows same manner of
> > set_foreign_pathlist()
> > which allows to add both of full and partial path if FDW supports parallel-scan.
> > The second one is practical. During the path construction, extension needs to
> > check availability to run (e.g, whether operators in WHERE-clause is supported
> > on GPU device...), calculate its estimated cost and so on. Not a small
> > portion of
> > them are common for both of full and partial path. So, if the first
> > hook accepts to
> > add both kind of paths at once, extension can share the common properties.
> >
>
> You have a point, though I am not sure how much difference it can
> create for cost computation as ideally, both will have different
> costing model.  I understand there are some savings by avoiding some
> common work, is there any way to cache the required information?
>
I have no idea for the clean way.
We may be able to have an opaque pointer for extension usage, however,
it may be problematic if multiple extension uses the hook.

> > Probably, the second hook is only used for a few corner case where an extension
> > wants to manipulate path-list already built, like pg_hint_plan.
> >
>
> Okay, but it could be some work for extension authors who are using
> the current hook, not sure they would like to divide the work between
> first and second hook.
>
I guess they don't divide their code, but choose either of them.
In case of PG-Strom, even if there are two hooks around the point, it will use
the first hook only, unless it does not prohibit to call add_path() here.
However, some adjustments are required. Its current implementation makes
GatherPath node with partial CustomScanPath because set_rel_pathlist_hook()
is called after the generate_gather_paths().
Once we could choose the first hook, no need to make a GatherPath by itself,
because PostgreSQL-core will make the path if partial custom-path is enough
reasonable cost. Likely, this adjustment is more preferable one.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>


Re: add_partial_path() may remove dominated path but still in use

От
Kohei KaiGai
Дата:
I tried to make a patch to have dual hooks at set_rel_pathlist(), and
adjusted PG-Strom for the new design. It stopped to create GatherPath
by itself, just added a partial path for the base relation.
It successfully made a plan using parallel custom-scan node, without
system crash.

As I mentioned above, it does not use the new "post_rel_pathlist_hook"
because we can add both of partial/regular path-node at the first hook
with no particular problems.

Thanks,

dbt3=# explain select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1994-01-01'
    and l_shipdate < cast(date '1994-01-01' + interval '1 year' as date)
    and l_discount between 0.08 - 0.01 and 0.08 + 0.01
    and l_quantity < 24 limit 1;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=144332.62..144332.63 rows=1 width=4)
   ->  Aggregate  (cost=144332.62..144332.63 rows=1 width=4)
         ->  Gather  (cost=144285.70..144329.56 rows=408 width=4)
               Workers Planned: 2
               ->  Parallel Custom Scan (GpuPreAgg) on lineitem
(cost=143285.70..143288.76 rows=204 width=4)
                     Reduction: NoGroup
                     Outer Scan: lineitem  (cost=1666.67..143246.16
rows=63254 width=8)
                     Outer Scan Filter: ((l_discount >= '0.07'::double
precision) AND
                                                   (l_discount <=
'0.09'::double precision) AND
                                                   (l_quantity <
'24'::double precision) AND
                                                   (l_shipdate <
'1995-01-01'::date) AND
                                                   (l_shipdate >=
'1994-01-01'::date))
(8 rows)

Thanks,

2019年1月2日(水) 22:34 Kohei KaiGai <kaigai@heterodb.com>:
>
> 2018年12月31日(月) 22:25 Amit Kapila <amit.kapila16@gmail.com>:
> >
> > On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > >
> > > 2018年12月31日(月) 13:10 Amit Kapila <amit.kapila16@gmail.com>:
> > > >
> > > > On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > > > 2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
> > > > > >
> > > > > > Kohei KaiGai <kaigai@heterodb.com> writes:
> > > > > > > 2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
> > > > > > >> However, first I'd like to know why this situation is arising in the first
> > > > > > >> place.  To have the situation you're describing, we'd have to have
> > > > > > >> attempted to make some Gather paths before we have all the partial paths
> > > > > > >> for the relation they're for.  Why is that a good thing to do?  It seems
> > > > > > >> like such Gathers are necessarily being made with incomplete information,
> > > > > > >> and we'd be better off to fix things so that none are made till later.
> > > > > >
> > > > > > > Because of the hook location, Gather-node shall be constructed with built-in
> > > > > > > and foreign partial scan node first, then extension gets a chance to add its
> > > > > > > custom paths (partial and full).
> > > > > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > > > > > > generate_gather_paths().
> > > > > >
> > > > > > Hmm.  I'm inclined to think that we should have a separate hook
> > > > > > in which extensions are allowed to add partial paths, and that
> > > > > > set_rel_pathlist_hook should only be allowed to add regular paths.
> > > >
> > > > +1.  This idea sounds sensible to me.
> > > >
> > > > > >
> > > > > I have almost same opinion, but the first hook does not need to be
> > > > > dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> > > > > add both of partial and regular paths here, then generate_gather_paths()
> > > > > may generate a Gather-path on top of the best partial-path.
> > > > >
> > > >
> > > > Won't it be confusing for users if we allow both partial and full
> > > > paths in first hook and only full paths in the second hook?
> > > > Basically, in many cases, the second hook won't be of much use.  What
> > > > advantage you are seeing in allowing both partial and full paths in
> > > > the first hook?
> > > >
> > > Two advantages. The first one is, it follows same manner of
> > > set_foreign_pathlist()
> > > which allows to add both of full and partial path if FDW supports parallel-scan.
> > > The second one is practical. During the path construction, extension needs to
> > > check availability to run (e.g, whether operators in WHERE-clause is supported
> > > on GPU device...), calculate its estimated cost and so on. Not a small
> > > portion of
> > > them are common for both of full and partial path. So, if the first
> > > hook accepts to
> > > add both kind of paths at once, extension can share the common properties.
> > >
> >
> > You have a point, though I am not sure how much difference it can
> > create for cost computation as ideally, both will have different
> > costing model.  I understand there are some savings by avoiding some
> > common work, is there any way to cache the required information?
> >
> I have no idea for the clean way.
> We may be able to have an opaque pointer for extension usage, however,
> it may be problematic if multiple extension uses the hook.
>
> > > Probably, the second hook is only used for a few corner case where an extension
> > > wants to manipulate path-list already built, like pg_hint_plan.
> > >
> >
> > Okay, but it could be some work for extension authors who are using
> > the current hook, not sure they would like to divide the work between
> > first and second hook.
> >
> I guess they don't divide their code, but choose either of them.
> In case of PG-Strom, even if there are two hooks around the point, it will use
> the first hook only, unless it does not prohibit to call add_path() here.
> However, some adjustments are required. Its current implementation makes
> GatherPath node with partial CustomScanPath because set_rel_pathlist_hook()
> is called after the generate_gather_paths().
> Once we could choose the first hook, no need to make a GatherPath by itself,
> because PostgreSQL-core will make the path if partial custom-path is enough
> reasonable cost. Likely, this adjustment is more preferable one.
>
> Thanks,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei <kaigai@heterodb.com>



--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Вложения

Re: add_partial_path() may remove dominated path but still in use

От
Kyotaro HORIGUCHI
Дата:
At Sun, 30 Dec 2018 12:31:22 +0900, Kohei KaiGai <kaigai@heterodb.com> wrote in
<CAOP8fzY1Oqf-LGdrZT+TAu+JajwPGn1uYnpWWUPL=2LiattjYA@mail.gmail.com>
> 2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
> On the other hands, the later hook must be dedicated to add regular paths,
> and also provides a chance for extensions to manipulate pre-built path-list
> including Gather-path.
> As long as I know, pg_hint_plan uses the set_rel_pathlist_hook to enforce
> a particular path-node, including Gather-node, by manipulation of the cost
> value. Horiguchi-san, is it right?

Mmm. I haven't expected that it is mentioned here.

Actually in the hook, it changes enable_* planner variables, or
directory manipuraltes path costs or even can clear and
regenerate the path list and gather paths for the parallel
case. It will be happy if we had a chance to manpurate partial
paths before genrating gahter paths.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: add_partial_path() may remove dominated path but still in use

От
Kohei KaiGai
Дата:
2019年1月9日(水) 13:18 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:
>
> At Sun, 30 Dec 2018 12:31:22 +0900, Kohei KaiGai <kaigai@heterodb.com> wrote in
<CAOP8fzY1Oqf-LGdrZT+TAu+JajwPGn1uYnpWWUPL=2LiattjYA@mail.gmail.com>
> > 2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
> > On the other hands, the later hook must be dedicated to add regular paths,
> > and also provides a chance for extensions to manipulate pre-built path-list
> > including Gather-path.
> > As long as I know, pg_hint_plan uses the set_rel_pathlist_hook to enforce
> > a particular path-node, including Gather-node, by manipulation of the cost
> > value. Horiguchi-san, is it right?
>
> Mmm. I haven't expected that it is mentioned here.
>
> Actually in the hook, it changes enable_* planner variables, or
> directory manipuraltes path costs or even can clear and
> regenerate the path list and gather paths for the parallel
> case. It will be happy if we had a chance to manpurate partial
> paths before genrating gahter paths.
>
So, is it sufficient if set_rel_pathlist_hook is just relocated in
front of the generate_gather_paths?
If we have no use case for the second hook, here is little necessity
to have the post_rel_pathlist_hook() here.
(At least, PG-Strom will use the first hook only.)

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>


Re: add_partial_path() may remove dominated path but still in use

От
Robert Haas
Дата:
On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> So, is it sufficient if set_rel_pathlist_hook is just relocated in
> front of the generate_gather_paths?
> If we have no use case for the second hook, here is little necessity
> to have the post_rel_pathlist_hook() here.
> (At least, PG-Strom will use the first hook only.)

+1.  That seems like the best way to be consistent with the principle
that we need to have all the partial paths before generating any
Gather paths.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: add_partial_path() may remove dominated path but still in use

От
Kohei KaiGai
Дата:
2019年1月11日(金) 5:52 Robert Haas <robertmhaas@gmail.com>:
>
> On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > front of the generate_gather_paths?
> > If we have no use case for the second hook, here is little necessity
> > to have the post_rel_pathlist_hook() here.
> > (At least, PG-Strom will use the first hook only.)
>
> +1.  That seems like the best way to be consistent with the principle
> that we need to have all the partial paths before generating any
> Gather paths.
>
Patch was updated, just for relocation of the set_rel_pathlist_hook.
Please check it.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Вложения

Re: add_partial_path() may remove dominated path but still in use

От
Robert Haas
Дата:
On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> 2019年1月11日(金) 5:52 Robert Haas <robertmhaas@gmail.com>:
> > On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > > front of the generate_gather_paths?
> > > If we have no use case for the second hook, here is little necessity
> > > to have the post_rel_pathlist_hook() here.
> > > (At least, PG-Strom will use the first hook only.)
> >
> > +1.  That seems like the best way to be consistent with the principle
> > that we need to have all the partial paths before generating any
> > Gather paths.
> >
> Patch was updated, just for relocation of the set_rel_pathlist_hook.
> Please check it.

Seems reasonable to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: add_partial_path() may remove dominated path but still in use

От
Kyotaro HORIGUCHI
Дата:
Hello, sorry for the absence.

At Fri, 11 Jan 2019 11:36:43 -0500, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoYyxBgkfN_APBdxdutFMukb=P-EgGNY-NbauRcL7mGnmA@mail.gmail.com>
> On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > 2019年1月11日(金) 5:52 Robert Haas <robertmhaas@gmail.com>:
> > > On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > > > front of the generate_gather_paths?
> > > > If we have no use case for the second hook, here is little necessity
> > > > to have the post_rel_pathlist_hook() here.
> > > > (At least, PG-Strom will use the first hook only.)
> > >
> > > +1.  That seems like the best way to be consistent with the principle
> > > that we need to have all the partial paths before generating any
> > > Gather paths.
> > >
> > Patch was updated, just for relocation of the set_rel_pathlist_hook.
> > Please check it.
> 
> Seems reasonable to me.

Also seems reasonable to me.  The extension can call
generate_gather_paths redundantly as is but it almost doesn't
harm, so it is acceptable even in a minor release.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: add_partial_path() may remove dominated path but still in use

От
Kohei KaiGai
Дата:
Let me remind the thread.
If no more comments, objections, or better ideas, please commit this fix.

Thanks,

2019年1月17日(木) 18:29 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:
>
> Hello, sorry for the absence.
>
> At Fri, 11 Jan 2019 11:36:43 -0500, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoYyxBgkfN_APBdxdutFMukb=P-EgGNY-NbauRcL7mGnmA@mail.gmail.com>
> > On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > 2019年1月11日(金) 5:52 Robert Haas <robertmhaas@gmail.com>:
> > > > On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > > > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > > > > front of the generate_gather_paths?
> > > > > If we have no use case for the second hook, here is little necessity
> > > > > to have the post_rel_pathlist_hook() here.
> > > > > (At least, PG-Strom will use the first hook only.)
> > > >
> > > > +1.  That seems like the best way to be consistent with the principle
> > > > that we need to have all the partial paths before generating any
> > > > Gather paths.
> > > >
> > > Patch was updated, just for relocation of the set_rel_pathlist_hook.
> > > Please check it.
> >
> > Seems reasonable to me.
>
> Also seems reasonable to me.  The extension can call
> generate_gather_paths redundantly as is but it almost doesn't
> harm, so it is acceptable even in a minor release.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>


--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>


Re: add_partial_path() may remove dominated path but still in use

От
Kohei KaiGai
Дата:
Hello,

Let me remind the thread again.
I'm waiting for the fix getting committed for a month...

2019年1月22日(火) 20:50 Kohei KaiGai <kaigai@heterodb.com>:
>
> Let me remind the thread.
> If no more comments, objections, or better ideas, please commit this fix.
>
> Thanks,
>
> 2019年1月17日(木) 18:29 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:
> >
> > Hello, sorry for the absence.
> >
> > At Fri, 11 Jan 2019 11:36:43 -0500, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoYyxBgkfN_APBdxdutFMukb=P-EgGNY-NbauRcL7mGnmA@mail.gmail.com>
> > > On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > > 2019年1月11日(金) 5:52 Robert Haas <robertmhaas@gmail.com>:
> > > > > On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > > > > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > > > > > front of the generate_gather_paths?
> > > > > > If we have no use case for the second hook, here is little necessity
> > > > > > to have the post_rel_pathlist_hook() here.
> > > > > > (At least, PG-Strom will use the first hook only.)
> > > > >
> > > > > +1.  That seems like the best way to be consistent with the principle
> > > > > that we need to have all the partial paths before generating any
> > > > > Gather paths.
> > > > >
> > > > Patch was updated, just for relocation of the set_rel_pathlist_hook.
> > > > Please check it.
> > >
> > > Seems reasonable to me.
> >
> > Also seems reasonable to me.  The extension can call
> > generate_gather_paths redundantly as is but it almost doesn't
> > harm, so it is acceptable even in a minor release.
> >
> > regards.
> >
> > --
> > Kyotaro Horiguchi
> > NTT Open Source Software Center
> >
>
>
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei <kaigai@heterodb.com>

--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Вложения

Re: add_partial_path() may remove dominated path but still in use

От
Amit Kapila
Дата:
On Wed, Feb 6, 2019 at 10:35 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
>
> Hello,
> Let me remind the thread again.
> I'm waiting for the fix getting committed for a month...
>

It seems you would also like to see this back-patched.  I am not sure
if that is a good idea as there is some risk of breaking existing
usage.  Tom, do you have any opinion on this patch?  It seems to me
you were thinking to have a separate hook for partial paths, but the
patch has solved the problem by moving the hook location.  I think
whatever is the case we should try to reach some consensus and move
forward with this patch as KaiGai-San is waiting from quite some time.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: add_partial_path() may remove dominated path but still in use

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> It seems you would also like to see this back-patched.  I am not sure
> if that is a good idea as there is some risk of breaking existing
> usage.  Tom, do you have any opinion on this patch?  It seems to me
> you were thinking to have a separate hook for partial paths, but the
> patch has solved the problem by moving the hook location.

I was expecting Haas to take point on this, but since he doesn't seem
to be doing so, I'll push it.  I don't think there's any material
risk of breaking things --- the only functionality lost is the ability to
remove or modify baserel Gather paths, which I doubt anybody is interested
in doing.  Certainly that's way less useful than the ability to add
partial paths and have them be included in Gather-building.

In a green field I'd rather have had a separate hook for adding partial
paths, but it's not clear that that really buys much of anything except
logical cleanliness ... against which it adds cost since the using
extension(s) have to figure out what's going on twice.

Also this way does have the advantage that it retroactively fixes things
for extensions that may be trying to make partial paths today.

            regards, tom lane