Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
От | Ashutosh Bapat |
---|---|
Тема | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables |
Дата | |
Msg-id | CAFjFpRexx3dmVzrsocMZFL1jiEcW0HyeNb35Fv2JP4wnO50nmg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Ответы |
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
Forgot the patch set. Here it is. On Mon, Jul 31, 2017 at 5:29 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Mon, Jul 31, 2017 at 8:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Jul 14, 2017 at 3:02 AM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >>> Here's revised patch set with only 0004 revised. That patch deals with >>> creating multi-level inheritance hierarchy from multi-level partition >>> hierarchy. The original logic of recursively calling >>> inheritance_planner()'s guts over the inheritance hierarchy required >>> that for every such recursion we flatten many lists created by that >>> code. Recursion also meant that root->append_rel_list is traversed as >>> many times as the number of partitioned partitions in the hierarchy. >>> Instead the revised version keep the iterative shape of >>> inheritance_planner() intact, thus naturally creating flat lists, >>> iterates over root->append_rel_list only once and is still easy to >>> read and maintain. >> >> 0001-0003 look basically OK to me, modulo some cosmetic stuff. Regarding 0004: >> >> + if (brel->reloptkind != RELOPT_BASEREL && >> + brte->relkind != RELKIND_PARTITIONED_TABLE) >> >> I spent a lot of time staring at this code before I figured out what >> was going on here. We're iterating over simple_rel_array, so the >> reloptkind must be RELOPT_OTHER_MEMBER_REL if it isn't RELOPT_BASEREL. >> But does that guarantee that rtekind is RTE_RELATION such that >> brte->relkind will be initialized to a value? I'm not 100% sure. > > Comment in RangeTblEntry says > 952 /* > 953 * Fields valid for a plain relation RTE (else zero): > 954 * > ... clipped portion for RTE_NAMEDTUPLESTORE related comment > > 960 Oid relid; /* OID of the relation */ > 961 char relkind; /* relation kind (see pg_class.relkind) */ > > This means that relkind will be 0 when rtekind != RTE_RELATION. So, > the condition holds. But code creating an RTE somewhere which is not > in sync with this comment would create a problem. So your suggestion > makes sense. > >> I >> think it would be clearer to write this test like this: >> >> Assert(IS_SIMPLE_REL(brel)); >> if (brel->reloptkind == RELOPT_OTHER_MEMBER_REL && >> (brte->rtekind != RELOPT_BASEREL || > > Do you mean (brte_>rtekind != RTE_RELATION)? > >> brte->relkind != RELKIND_PARTITIONED_TABLE)) >> continue; >> >> Note that the way you wrote the comment is says if it *is* another >> REL, not if it's *not* a baserel; it's good if those kinds of little >> details match between the code and the comments. > > I find the existing comment and code in this part of the function > differ. The comment just above the loop on simple_rel_array[], talks > about changing something in the child, but the very next line skips > child relations and later a loop on append_rel_list makes changes to > appropriate children. I guess, it's done that way to keep the code > working even after we introduce some RELOPTKIND other than BASEREL or > OTHER_MEMBER_REL for a simple rel. But your suggestion makes more > sense. Changed it according to your suggestion. > >> >> It is not clear to me what the motivation is for the API changes in >> expanded_inherited_rtentry. They don't appear to be necessary. > > expand_inherited_rtentry() creates AppendRelInfos for all the children > of a given parent and collects them in a list. The list is appended to > root->append_rel_list at the end of the function. Now that function > needs to do this recursively. This means that for a partitioned > partition table its children's AppendRelInfos will be added to > root->append_rel_list before AppendRelInfo of that partitioned > partition table. inheritance_planner() assumes that the parent's > AppendRelInfo comes before its children in append_rel_list.This > assumption allows it to be simplified greately, retaining its > iterative form. My earlier patches had recursive version of > inheritance_planner(), which is complex. I have comments in this patch > explaining this. > > Adding AppendRelInfos to root->append_rel_list as and when they are > created would keep parent AppendRelInfos before those of children. But > that function throws away the AppendRelInfo it created when their are > no real children i.e. in partitioned table's case when has no leaf > partitions. So, we can't do that. Hence, I chose to change the API to > return the list of AppendRelInfos when the given RTE has real > children. > >> If >> they are necessary, you need to do a more thorough job updating the >> comments. This one, in particular: >> >> * If so, add entries for all the child tables to the query's >> * rangetable, and build AppendRelInfo nodes for all the child tables >> * and add them to root->append_rel_list. If not, clear the entry's > > Done. > >> >> And the comments could maybe say something like "We return the list of >> appinfos rather than directly appending it to append_rel_list because >> $REASON." > > Done. Please check the attached version. > >> >> - * is a partitioned table. >> + * RTE simply duplicates the parent *partitioned* table. >> */ >> - if (childrte->relkind != RELKIND_PARTITIONED_TABLE) >> + if (childrte->relkind != RELKIND_PARTITIONED_TABLE || childrte->inh) >> >> This is another case where it's hard to understand the test from the comments. > > The current comment says it all, but it very cryptic manner. > 1526 /* > 1527 * Build an AppendRelInfo for this parent and child, > unless the child > 1528 * RTE simply duplicates the parent *partitioned* table. > 1529 */ > > The comment makes sense in the context of this paragraph in the prologue > 1364 * Note that the original RTE is considered to represent the whole > 1365 * inheritance set. The first of the generated RTEs is an RTE for the same > 1366 * table, but with inh = false, to represent the parent table in its role > 1367 * as a simple member of the inheritance set. > 1368 * > > The code avoids creating AppendRelInfos for a child which represents > the parent in its role as a simple member of inheritance set. > > I have reworded it as > 1526 /* > 1527 * Build an AppendRelInfo for this parent and child, > unless the child > 1528 * RTE represents the parent as a simple member of inheritance set. > 1529 */ > >> >> + * In case of multi-level inheritance hierarchy, for every child we require >> + * PlannerInfo of its immediate parent. Hence we save those in a an array >> >> Say why. Also, need to fix "a an". > > Done. > >> >> I'm a little bit surprised that this patch doesn't make any changes to >> allpaths.c or relnode.c. > >> It looks to me like we'll generate paths for >> the new RTEs that are being added. Are we going to end up with >> multiple levels of Append nodes, then? Does the consider the way >> consider_parallel is propagated up and down in set_append_rel_size() >> and set_append_rel_pathlist() really work with multiple levels? Maybe >> this is all fine; I haven't tried to verify it in depth. > > This has been discussed before, but I can not locate the mail > answering these questions. accumulate_append_subpath() called from > add_paths_to_append_rel() takes care of flattening Merge/Append paths. > The planner code deals with the multi-level inheritance hierarchy > created for subqueries with set operations. There is code in relnode.c > to build the RelOptInfos for such subqueries recursively through using > RangeTblEntry::inh flag. So there are no changes in allpaths.c and > relnode.c. Are you looking for some other changes? > >> >> Overall I think this is a reasonable direction to go but I'm worried >> that there may be bugs lurking -- other code that needs adjusting that >> hasn't been found, really. >> > > Planner code is already aware of such hierarchies except DMLs, which > this patch adjusts. We have fixed issues revealed by mine and > Rajkumar's testing. > What kinds of things you suspect? > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Merlin MoncureДата:
Сообщение: Re: [HACKERS] 10 beta docs: different replication solutions