Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Дата
Msg-id CAFjFpRexx3dmVzrsocMZFL1jiEcW0HyeNb35Fv2JP4wnO50nmg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Ответы Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Forgot the patch set. Here it is.

On Mon, Jul 31, 2017 at 5:29 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Mon, Jul 31, 2017 at 8:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jul 14, 2017 at 3:02 AM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> Here's revised patch set with only 0004 revised. That patch deals with
>>> creating multi-level inheritance hierarchy from multi-level partition
>>> hierarchy. The original logic of recursively calling
>>> inheritance_planner()'s guts over the inheritance hierarchy required
>>> that for every such recursion we flatten many lists created by that
>>> code. Recursion also meant that root->append_rel_list is traversed as
>>> many times as the number of partitioned partitions in the hierarchy.
>>> Instead the revised version keep the iterative shape of
>>> inheritance_planner() intact, thus naturally creating flat lists,
>>> iterates over root->append_rel_list only once and is still easy to
>>> read and maintain.
>>
>> 0001-0003 look basically OK to me, modulo some cosmetic stuff.  Regarding 0004:
>>
>> +        if (brel->reloptkind != RELOPT_BASEREL &&
>> +            brte->relkind != RELKIND_PARTITIONED_TABLE)
>>
>> I spent a lot of time staring at this code before I figured out what
>> was going on here.  We're iterating over simple_rel_array, so the
>> reloptkind must be RELOPT_OTHER_MEMBER_REL if it isn't RELOPT_BASEREL.
>> But does that guarantee that rtekind is RTE_RELATION such that
>> brte->relkind will be initialized to a value?  I'm not 100% sure.
>
> Comment in RangeTblEntry says
>  952     /*
>  953      * Fields valid for a plain relation RTE (else zero):
>  954      *
> ... clipped portion for RTE_NAMEDTUPLESTORE related comment
>
>  960     Oid         relid;          /* OID of the relation */
>  961     char        relkind;        /* relation kind (see pg_class.relkind) */
>
> This means that relkind will be 0 when rtekind != RTE_RELATION. So,
> the condition holds. But code creating an RTE somewhere which is not
> in sync with this comment would create a problem. So your suggestion
> makes sense.
>
>> I
>> think it would be clearer to write this test like this:
>>
>> Assert(IS_SIMPLE_REL(brel));
>> if (brel->reloptkind == RELOPT_OTHER_MEMBER_REL &&
>>     (brte->rtekind != RELOPT_BASEREL ||
>
> Do you mean (brte_>rtekind != RTE_RELATION)?
>
>>     brte->relkind != RELKIND_PARTITIONED_TABLE))
>>     continue;
>>
>> Note that the way you wrote the comment is says if it *is* another
>> REL, not if it's *not* a baserel; it's good if those kinds of little
>> details match between the code and the comments.
>
> I find the existing comment and code in this part of the function
> differ. The comment just above the loop on simple_rel_array[], talks
> about changing something in the child, but the very next line skips
> child relations and later a loop on append_rel_list makes changes to
> appropriate children. I guess, it's done that way to keep the code
> working even after we introduce some RELOPTKIND other than BASEREL or
> OTHER_MEMBER_REL for a simple rel. But your suggestion makes more
> sense. Changed it according to your suggestion.
>
>>
>> It is not clear to me what the motivation is for the API changes in
>> expanded_inherited_rtentry.  They don't appear to be necessary.
>
> expand_inherited_rtentry() creates AppendRelInfos for all the children
> of a given parent and collects them in a list. The list is appended to
> root->append_rel_list at the end of the function. Now that function
> needs to do this recursively. This means that for a partitioned
> partition table its children's AppendRelInfos will be added to
> root->append_rel_list before AppendRelInfo of that partitioned
> partition table. inheritance_planner() assumes that the parent's
> AppendRelInfo comes before its children in append_rel_list.This
> assumption allows it to be simplified greately, retaining its
> iterative form. My earlier patches had recursive version of
> inheritance_planner(), which is complex. I have comments in this patch
> explaining this.
>
> Adding AppendRelInfos to root->append_rel_list as and when they are
> created would keep parent AppendRelInfos before those of children. But
> that function throws away the AppendRelInfo it created when their are
> no real children i.e. in partitioned table's case when has no leaf
> partitions. So, we can't do that. Hence, I chose to change the API to
> return the list of AppendRelInfos when the given RTE has real
> children.
>
>> If
>> they are necessary, you need to do a more thorough job updating the
>> comments.  This one, in particular:
>>
>>   *      If so, add entries for all the child tables to the query's
>>   *      rangetable, and build AppendRelInfo nodes for all the child tables
>>   *      and add them to root->append_rel_list.  If not, clear the entry's
>
> Done.
>
>>
>> And the comments could maybe say something like "We return the list of
>> appinfos rather than directly appending it to append_rel_list because
>> $REASON."
>
> Done. Please check the attached version.
>
>>
>> -         * is a partitioned table.
>> +         * RTE simply duplicates the parent *partitioned* table.
>>           */
>> -        if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
>> +        if (childrte->relkind != RELKIND_PARTITIONED_TABLE || childrte->inh)
>>
>> This is another case where it's hard to understand the test from the comments.
>
> The current comment says it all, but it very cryptic manner.
> 1526         /*
> 1527          * Build an AppendRelInfo for this parent and child,
> unless the child
> 1528          * RTE simply duplicates the parent *partitioned* table.
> 1529          */
>
> The comment makes sense in the context of this paragraph in the prologue
> 1364  * Note that the original RTE is considered to represent the whole
> 1365  * inheritance set.  The first of the generated RTEs is an RTE for the same
> 1366  * table, but with inh = false, to represent the parent table in its role
> 1367  * as a simple member of the inheritance set.
> 1368  *
>
> The code avoids creating AppendRelInfos for a child which represents
> the parent in its role as a simple member of inheritance set.
>
> I have reworded it as
> 1526         /*
> 1527          * Build an AppendRelInfo for this parent and child,
> unless the child
> 1528          * RTE represents the parent as a simple member of inheritance set.
> 1529          */
>
>>
>> +     * In case of multi-level inheritance hierarchy, for every child we require
>> +     * PlannerInfo of its immediate parent. Hence we save those in a an array
>>
>> Say why.  Also, need to fix "a an".
>
> Done.
>
>>
>> I'm a little bit surprised that this patch doesn't make any changes to
>> allpaths.c or relnode.c.
>
>> It looks to me like we'll generate paths for
>> the new RTEs that are being added.  Are we going to end up with
>> multiple levels of Append nodes, then?  Does the consider the way
>> consider_parallel is propagated up and down in set_append_rel_size()
>> and set_append_rel_pathlist() really work with multiple levels?  Maybe
>> this is all fine; I haven't tried to verify it in depth.
>
> This has been discussed before, but I can not locate the mail
> answering these questions. accumulate_append_subpath() called from
> add_paths_to_append_rel() takes care of flattening Merge/Append paths.
> The planner code deals with the multi-level inheritance hierarchy
> created for subqueries with set operations. There is code in relnode.c
> to build the RelOptInfos for such subqueries recursively through using
> RangeTblEntry::inh flag. So there are no changes in allpaths.c and
> relnode.c. Are you looking for some other changes?
>
>>
>> Overall I think this is a reasonable direction to go but I'm worried
>> that there may be bugs lurking -- other code that needs adjusting that
>> hasn't been found, really.
>>
>
> Planner code is already aware of such hierarchies except DMLs, which
> this patch adjusts. We have fixed issues revealed by mine and
> Rajkumar's testing.
> What kinds of things you suspect?
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: [HACKERS] 10 beta docs: different replication solutions
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] pl/perl extension fails on Windows