Обсуждение: Question about some changes in 11.3

Поиск
Список
Период
Сортировка

Question about some changes in 11.3

От
Mat Arye
Дата:
Hi,

11.3 included some change to partition table planning. Namely commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is known empty.") seems to redo all paths for partitioned tables in apply_scanjoin_target_to_paths. It clears the paths in:

```
 if (rel_is_partitioned)
        rel->pathlist = NIL 
```

Then the code rebuild the paths. However, the rebuilt append path never gets the
set_rel_pathlist_hook called. Thus, the work that hook did previously gets thrown away and the rebuilt append path can never be influenced by this hook. Is this intended behavior? Am I missing something?

Thanks,
Mat
TimescaleDB

Re: Question about some changes in 11.3

От
Mat Arye
Дата:
On Fri, May 24, 2019 at 5:05 PM Mat Arye <mat@timescale.com> wrote:
Hi,

11.3 included some change to partition table planning. Namely commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is known empty.") seems to redo all paths for partitioned tables in apply_scanjoin_target_to_paths. It clears the paths in:

```
 if (rel_is_partitioned)
        rel->pathlist = NIL 
```

Then the code rebuild the paths. However, the rebuilt append path never gets the
set_rel_pathlist_hook called. Thus, the work that hook did previously gets thrown away and the rebuilt append path can never be influenced by this hook. Is this intended behavior? Am I missing something?

Thanks,
Mat
TimescaleDB

I've  attached a small patch to address this discrepancy for when the set_rel_pathlist_hook is called so that's it is called for actual paths used for partitioned rels. Please let me know if I am misunderstanding how this should be handled.
 
Вложения

Re: Question about some changes in 11.3

От
Amit Langote
Дата:
Hi Mat,

On 2019/05/25 6:05, Mat Arye wrote:
> Hi,
> 
> 11.3 included some change to partition table planning. Namely
> commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is
> known empty.") seems to redo all paths for partitioned tables
> in apply_scanjoin_target_to_paths. It clears the paths in:
> 
> ```
>  if (rel_is_partitioned)
>         rel->pathlist = NIL
> ```
> 
> Then the code rebuild the paths. However, the rebuilt append path never
> gets the
> set_rel_pathlist_hook called. Thus, the work that hook did previously gets
> thrown away and the rebuilt append path can never be influenced by this
> hook.

By dropping the old paths like done here, the core code is simply
forgetting that set_rel_pathlist_hook may have editorialized over them,
which seems like an oversight of that commit.

Your proposal to call set_rel_pathlist_hook() after
add_paths_to_append_rel() to rebuild the Append paths sounds fine to me.

Thanks,
Amit




Re: Question about some changes in 11.3

От
Tom Lane
Дата:
Mat Arye <mat@timescale.com> writes:
> On Fri, May 24, 2019 at 5:05 PM Mat Arye <mat@timescale.com> wrote:
>> 11.3 included some change to partition table planning. Namely
>> commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is
>> known empty.") seems to redo all paths for partitioned tables
>> in apply_scanjoin_target_to_paths. It clears the paths in:
>> 
>> ```
>>     if (rel_is_partitioned)
>>         rel->pathlist = NIL
>> ```
>> 
>> Then the code rebuild the paths. However, the rebuilt append path never
>> gets the
>> set_rel_pathlist_hook called. Thus, the work that hook did previously gets
>> thrown away and the rebuilt append path can never be influenced by this
>> hook. Is this intended behavior? Am I missing something?

Hm.  I'd say this was already broken by the invention of
apply_scanjoin_target_to_paths; perhaps 11-before-11.3 managed to
still work for you, but it's not hard to envision applications of
set_rel_pathlist_hook for which it would not have worked.  The contract
for set_rel_pathlist_hook is supposed to be that it gets to editorialize
on all normal (non-Gather) paths created by the core code, and that's
no longer the case now that apply_scanjoin_target_to_paths can add more.

> I've  attached a small patch to address this discrepancy for when the
> set_rel_pathlist_hook is called so that's it is called for actual paths
> used for partitioned rels. Please let me know if I am misunderstanding how
> this should be handled.

I'm not very happy with this patch either, as it makes the situation
even more confused, not less so.  The best-case scenario is that the
set_rel_pathlist_hook runs twice and does useless work; the worst case
is that it gets confused completely by being called twice for the same
rel.  I think we need to maintain the invariant that that hook is
called exactly once per baserel.

I wonder whether we could fix matters by postponing the
set_rel_pathlist_hook call till later in the same cases where
we postpone generate_gather_paths, ie, it's the only baserel.

That would make its name pretty misleading, though.  Maybe we
should leave it alone and invent a separate hook to be called
by/after apply_scanjoin_target_to_paths?  Although I don't
know if it'd be useful to add a new hook to v11 at this point.
Extensions would have a hard time knowing if they could use it.

            regards, tom lane



Re: Question about some changes in 11.3

От
Mat Arye
Дата:
Thanks for taking a look at this Tom.

On Mon, Jun 3, 2019 at 4:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mat Arye <mat@timescale.com> writes:
> On Fri, May 24, 2019 at 5:05 PM Mat Arye <mat@timescale.com> wrote:
>> 11.3 included some change to partition table planning. Namely
>> commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is
>> known empty.") seems to redo all paths for partitioned tables
>> in apply_scanjoin_target_to_paths. It clears the paths in:
>>
>> ```
>>     if (rel_is_partitioned)
>>         rel->pathlist = NIL
>> ```
>>
>> Then the code rebuild the paths. However, the rebuilt append path never
>> gets the
>> set_rel_pathlist_hook called. Thus, the work that hook did previously gets
>> thrown away and the rebuilt append path can never be influenced by this
>> hook. Is this intended behavior? Am I missing something?

Hm.  I'd say this was already broken by the invention of
apply_scanjoin_target_to_paths; perhaps 11-before-11.3 managed to
still work for you, but it's not hard to envision applications of
set_rel_pathlist_hook for which it would not have worked.  The contract
for set_rel_pathlist_hook is supposed to be that it gets to editorialize
on all normal (non-Gather) paths created by the core code, and that's
no longer the case now that apply_scanjoin_target_to_paths can add more.

Yeah it worked for our cases because (I guess) out paths turned out to be lower cost,
but I see your point.
 

> I've  attached a small patch to address this discrepancy for when the
> set_rel_pathlist_hook is called so that's it is called for actual paths
> used for partitioned rels. Please let me know if I am misunderstanding how
> this should be handled.

I'm not very happy with this patch either, as it makes the situation
even more confused, not less so.  The best-case scenario is that the
set_rel_pathlist_hook runs twice and does useless work; the worst case
is that it gets confused completely by being called twice for the same
rel.  I think we need to maintain the invariant that that hook is
called exactly once per baserel.

Yeah getting called once per baserel is a nice invariant to have. 
 
I wonder whether we could fix matters by postponing the
set_rel_pathlist_hook call till later in the same cases where
we postpone generate_gather_paths, ie, it's the only baserel.

That would make its name pretty misleading, though. 

How would simply delaying the hook make the name misleading? I am also wondering if
using the condition `rel->reloptkind == RELOPT_BASEREL &&
bms_membership(root->all_baserels) != BMS_SINGLETON` is sufficient. 
Is it really guaranteed that `apply_scanjoin_target_to_paths` will not be called in other cases?
 
Maybe we
should leave it alone and invent a separate hook to be called
by/after apply_scanjoin_target_to_paths?  Although I don't
know if it'd be useful to add a new hook to v11 at this point.
Extensions would have a hard time knowing if they could use it.

I think for us, either approach would work. We just need a place to add/modify
some paths. FWIW, I think delaying the hook is easier to deal with on our end if it could work
since we don't have to deal with two different code paths but either is workable.

Certainly if we go with the new hook approach I think it should be added to v11 as well.
That way extensions that need the functionality can hook into it and deal with patch level
differences instead of having no way at all to get at this functionality.

I am more than happy to work on a new patch once we settle on an approach.

 

                        regards, tom lane

Re: Question about some changes in 11.3

От
Tom Lane
Дата:
Mat Arye <mat@timescale.com> writes:
> On Mon, Jun 3, 2019 at 4:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hm.  I'd say this was already broken by the invention of
>> apply_scanjoin_target_to_paths; perhaps 11-before-11.3 managed to
>> still work for you, but it's not hard to envision applications of
>> set_rel_pathlist_hook for which it would not have worked.  The contract
>> for set_rel_pathlist_hook is supposed to be that it gets to editorialize
>> on all normal (non-Gather) paths created by the core code, and that's
>> no longer the case now that apply_scanjoin_target_to_paths can add more.
>> ...
>> I wonder whether we could fix matters by postponing the
>> set_rel_pathlist_hook call till later in the same cases where
>> we postpone generate_gather_paths, ie, it's the only baserel.

> Is it really guaranteed that `apply_scanjoin_target_to_paths` will not be
> called in other cases?

Well, apply_scanjoin_target_to_paths is called in *all* cases.  It only
throws away the original paths for partitioned rels, though.

I spent some more time looking at this, and I am afraid that my idea
of postponing set_rel_pathlist_hook into apply_scanjoin_target_to_paths
isn't going to work: there is not anyplace in that function where we
could call the hook without the API being noticeably different from
what it is at the current call site.  In particular, if we try to call
it down near the end so that it still has the property of being able
to remove any core-generated path, then there's a *big* difference for
queries involving SRFs: we've already plastered ProjectSetPath(s) atop
the original paths, and any user of the hook would have to know to do
likewise for freshly-generated paths.  That would certainly break
existing hook users.

I'm inclined to think that the safest course is to leave
set_rel_pathlist_hook as it stands, and invent a new hook that is called
in apply_scanjoin_target_to_paths just before the generate_gather_paths
call.  (Or, perhaps, just after that --- but the precedent of
set_rel_pathlist_hook suggests that "before" is more useful.)
For your use-case you'd have to get into both hooks, and they'd both have
to know that if they're dealing with a partitioned baserel that is the
only baserel in the query, the new hook is where to generate paths
rather than the old hook.  Maybe it'd be worth having the core code
export some simple test function for that, rather than having the details
of those semantics be wired into various extensions.

I think it'd be all right to put a patch done that way into the v11
branch.  It would not make anything any worse for code that uses
set_rel_pathlist_hook and is OK with the v11 behavior.  Code that
needs to use the new hook would fail to load into 11-before-11.whatever,
but that's probably better than loading and then doing the wrong thing.

            regards, tom lane