Обсуждение: Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Amit Langote
Дата:
Fujita-san, (sorry about the repeated email, but my previous attempt failed due to trying to send to the -hackers and -performance lists at the same time, so trying again after removing -performance) On 2019/01/08 20:07, Etsuro Fujita wrote: > (2018/12/07 20:14), Ashutosh Bapat wrote: >> On Fri, Dec 7, 2018 at 11:13 AM Ashutosh Bapat >> <ashutosh.bapat.oss@gmail.com <mailto:ashutosh.bapat.oss@gmail.com>> wrote: > >> Robert, Ashutosh, any comments on this? I'm unfamiliar with the >> partitionwise join code. > >> As the comment says it has to do with the equivalence classes being >> used during merge append. EC's are used to create pathkeys used for >> sorting. Creating a sort node which has column on the nullable side >> of an OUTER join will fail if it doesn't find corresponding >> equivalence class. You may not notice this if both the partitions >> being joined are pruned for some reason. Amit's idea to make >> partition-wise join code do this may work, but will add a similar >> overhead esp. in N-way partition-wise join once those equivalence >> classes are added. > >> I looked at the patch. The problem there is that for a given relation, >> we will add child ec member multiple times, as many times as the number >> of joins it participates in. We need to avoid that to keep ec_member >> list length in check. > > Amit-san, are you still working on this, perhaps as part of the > speeding-up-planning-with-partitions patch [1]? I had tried to continue working on it after PGConf.ASIA last month, but got distracted by something else. So, while the patch at [1] can take care of this issue as I also mentioned upthread, I was trying to come up with a solution that can be back-patched to PG 11. The patch I posted above is one such solution and as Ashutosh points out it's perhaps not the best, because it can result in potentially creating many copies of the same child EC member if we do it in joinrel.c, as the patch proposes. I will try to respond to the concerns he raised in the next week if possible. Thanks, Amit
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
Amit-san, (2019/01/09 9:30), Amit Langote wrote: > (sorry about the repeated email, but my previous attempt failed due to > trying to send to the -hackers and -performance lists at the same time, so > trying again after removing -performance) Thanks! (Actually, I also failed to send my post to those lists...) > On 2019/01/08 20:07, Etsuro Fujita wrote: >> (2018/12/07 20:14), Ashutosh Bapat wrote: >>> On Fri, Dec 7, 2018 at 11:13 AM Ashutosh Bapat >>> <ashutosh.bapat.oss@gmail.com<mailto:ashutosh.bapat.oss@gmail.com>> wrote: >> >>> Robert, Ashutosh, any comments on this? I'm unfamiliar with the >>> partitionwise join code. >> >>> As the comment says it has to do with the equivalence classes being >>> used during merge append. EC's are used to create pathkeys used for >>> sorting. Creating a sort node which has column on the nullable side >>> of an OUTER join will fail if it doesn't find corresponding >>> equivalence class. You may not notice this if both the partitions >>> being joined are pruned for some reason. Amit's idea to make >>> partition-wise join code do this may work, but will add a similar >>> overhead esp. in N-way partition-wise join once those equivalence >>> classes are added. >> >>> I looked at the patch. The problem there is that for a given relation, >>> we will add child ec member multiple times, as many times as the number >>> of joins it participates in. We need to avoid that to keep ec_member >>> list length in check. >> >> Amit-san, are you still working on this, perhaps as part of the >> speeding-up-planning-with-partitions patch [1]? > > I had tried to continue working on it after PGConf.ASIA last month, but > got distracted by something else. > > So, while the patch at [1] can take care of this issue as I also mentioned > upthread, I was trying to come up with a solution that can be back-patched > to PG 11. The patch I posted above is one such solution and as Ashutosh > points out it's perhaps not the best, because it can result in potentially > creating many copies of the same child EC member if we do it in joinrel.c, > as the patch proposes. I will try to respond to the concerns he raised in > the next week if possible. Thanks for working on this! I like your patch in general. I think one way to address Ashutosh's concerns would be to use the consider_partitionwise_join flag: originally, that was introduced for partitioned relations to show that they can be partitionwise-joined, but I think that flag could also be used for non-partitioned relations to show that they have been set up properly for partitionwise-joins, and I think by checking that flag we could avoid creating those copies for child dummy rels in try_partitionwise_join. Please find attached an updated version of the patch. I modified your version so that building tlists for child dummy rels are also postponed until after they actually participate in partitionwise-joins, to avoid that possibly-useless work as well. I haven't done any performance tests yet though. Best regards, Etsuro Fujita
Вложения
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Amit Langote
Дата:
Fujita-san, On 2019/01/09 20:20, Etsuro Fujita wrote: > (2019/01/09 9:30), Amit Langote wrote: >> So, while the patch at [1] can take care of this issue as I also mentioned >> upthread, I was trying to come up with a solution that can be back-patched >> to PG 11. The patch I posted above is one such solution and as Ashutosh >> points out it's perhaps not the best, because it can result in potentially >> creating many copies of the same child EC member if we do it in joinrel.c, >> as the patch proposes. I will try to respond to the concerns he raised in >> the next week if possible. > > Thanks for working on this! > > I like your patch in general. I think one way to address Ashutosh's > concerns would be to use the consider_partitionwise_join flag: originally, > that was introduced for partitioned relations to show that they can be > partitionwise-joined, but I think that flag could also be used for > non-partitioned relations to show that they have been set up properly for > partitionwise-joins, and I think by checking that flag we could avoid > creating those copies for child dummy rels in try_partitionwise_join. Ah, that's an interesting idea. If I understand the original design of it correctly, consider_partitionwise_join being true for a given relation (simple or join) means that its RelOptInfo contains properties to consider it to be joined with another relation (simple or join) using partitionwise join mechanism. Partitionwise join will occur between the pair if the other relation also has relevant properties (hence its consider_partitionwise_join set to true) and properties on the two sides match. That's a loaded meaning and abusing it to mean something else can be challenged, but we can live with that if properly documented. Speaking of which: /* used by partitionwise joins: */ bool consider_partitionwise_join; /* consider partitionwise join * paths? (if partitioned rel) */ Maybe, mention here how it will be abused in back-branches for non-partitioned relations? > Please find attached an updated version of the patch. I modified your > version so that building tlists for child dummy rels are also postponed > until after they actually participate in partitionwise-joins, to avoid > that possibly-useless work as well. I haven't done any performance tests > yet though. Thanks for updating the patch. I tested your patch (test setup described below) and it has almost the same performance as my previous version: 552ms (vs. 41159ms on HEAD vs. 253ms on PG 10) for the query also mentioned below. Thanks, Amit [1] Test setup -- create tables CREATE TABLE precio(fecha timestamp, pluid int, loccd int, plusalesprice int) PARTITION BY RANGE (fecha); SELECT format('CREATE TABLE public.precio_%s PARTITION OF public.precio (PRIMARY KEY (fecha, pluid, loccd) ) FOR VALUES FROM (''%s'')TO(''%s'')', i, a, b) FROM (SELECT '1990-01-01'::timestamp +(i||'days')::interval a, '1990-01-02'::timestamp+(i||'days')::interval b, i FROM generate_series(1,999) i)x; \gexec -- query SELECT l_variacao.fecha, l_variacao.loccd , l_variacao.pant , l_variacao.patual , max_variacao.var_max FROM (SELECT p.fecha, p.loccd, p.plusalesprice patual, da.plusalesprice pant, abs(p.plusalesprice - da.plusalesprice) as var from precio p, (SELECT p.fecha, p.plusalesprice, p.loccd from precio p WHERE p.fecha between '2017-03-01' and '2017-03-02' and p.pluid = 2) da WHERE p.fecha between '2017-03-01' and '2017-03-02' and p.pluid = 2 and p.loccd = da.loccd and p.fecha = da.fecha) l_variacao, (SELECT max(abs(p.plusalesprice - da.plusalesprice)) as var_max from precio p, (SELECT p.fecha, p.plusalesprice, p.loccd from precio p WHERE p.fecha between '2017-03-01' and '2017-03-02' and p.pluid = 2) da WHERE p.fecha between '2017-03-01' and '2017-03-02' and p.pluid = 2 and p.loccd = da.loccd and p.fecha = da.fecha) max_variacao WHERE max_variacao.var_max = l_variacao.var;
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
Amit-san, (2019/01/10 10:41), Amit Langote wrote: > On 2019/01/09 20:20, Etsuro Fujita wrote: >> I like your patch in general. I think one way to address Ashutosh's >> concerns would be to use the consider_partitionwise_join flag: originally, >> that was introduced for partitioned relations to show that they can be >> partitionwise-joined, but I think that flag could also be used for >> non-partitioned relations to show that they have been set up properly for >> partitionwise-joins, and I think by checking that flag we could avoid >> creating those copies for child dummy rels in try_partitionwise_join. > > Ah, that's an interesting idea. > > If I understand the original design of it correctly, > consider_partitionwise_join being true for a given relation (simple or > join) means that its RelOptInfo contains properties to consider it to be > joined with another relation (simple or join) using partitionwise join > mechanism. Partitionwise join will occur between the pair if the other > relation also has relevant properties (hence its > consider_partitionwise_join set to true) and properties on the two sides > match. Actually, the flag being true just means that the tlist for a given partitioned relation (simple or join) doesn't contain any whole-row Vars. And if two given partitioned relations having the flag being true have additional properties to be joined using the PWJ technique, then we try to do PWJ for those partitioned relations. (The name of the flag isn't good? If so, that would be my fault because I named that flag.) > That's a loaded meaning and abusing it to mean something else can be > challenged, but we can live with that if properly documented. Speaking of > which: > > /* used by partitionwise joins: */ > bool consider_partitionwise_join; /* consider partitionwise join > * paths? (if partitioned > rel) */ > > Maybe, mention here how it will be abused in back-branches for > non-partitioned relations? Will do. >> Please find attached an updated version of the patch. I modified your >> version so that building tlists for child dummy rels are also postponed >> until after they actually participate in partitionwise-joins, to avoid >> that possibly-useless work as well. I haven't done any performance tests >> yet though. > > Thanks for updating the patch. I tested your patch (test setup described > below) and it has almost the same performance as my previous version: > 552ms (vs. 41159ms on HEAD vs. 253ms on PG 10) for the query also > mentioned below. Thanks for that testing! I also tested the patch with your script: 253.559 ms (vs. 85776.515 ms on HEAD vs. 206.066 ms on PG 10) Best regards, Etsuro Fujita
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Ashutosh Bapat
Дата:
On Thu, Jan 10, 2019 at 7:12 AM Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Fujita-san,
On 2019/01/09 20:20, Etsuro Fujita wrote:
> (2019/01/09 9:30), Amit Langote wrote:
>> So, while the patch at [1] can take care of this issue as I also mentioned
>> upthread, I was trying to come up with a solution that can be back-patched
>> to PG 11. The patch I posted above is one such solution and as Ashutosh
>> points out it's perhaps not the best, because it can result in potentially
>> creating many copies of the same child EC member if we do it in joinrel.c,
>> as the patch proposes. I will try to respond to the concerns he raised in
>> the next week if possible.
>
> Thanks for working on this!
>
> I like your patch in general. I think one way to address Ashutosh's
> concerns would be to use the consider_partitionwise_join flag: originally,
> that was introduced for partitioned relations to show that they can be
> partitionwise-joined, but I think that flag could also be used for
> non-partitioned relations to show that they have been set up properly for
> partitionwise-joins, and I think by checking that flag we could avoid
> creating those copies for child dummy rels in try_partitionwise_join.
Ah, that's an interesting idea.
If I understand the original design of it correctly,
consider_partitionwise_join being true for a given relation (simple or
join) means that its RelOptInfo contains properties to consider it to be
joined with another relation (simple or join) using partitionwise join
mechanism. Partitionwise join will occur between the pair if the other
relation also has relevant properties (hence its
consider_partitionwise_join set to true) and properties on the two sides
match.
Though this will solve a problem for performance when partition-wise join is not possible, we still have the same problem when partition-wise join is possible. And that problem really happens because our inheritance mechanism requires expression translation from parent to child everywhere. That consumes memory, eats CPU cycles and generally downgrades performance of partition related query planning. I think a better way would be to avoid these translations and use Parent var to represent a Var of the child being dealt with. That will be a massive churn on inheritance based planner code, but it will improve planning time for queries involving thousands of partitions.
--
Best Wishes,
Ashutosh Bapat
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Amit Langote
Дата:
On Thu, Jan 10, 2019 at 6:49 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > Though this will solve a problem for performance when partition-wise join is not possible, we still have the same problemwhen partition-wise join is possible. And that problem really happens because our inheritance mechanism requires expressiontranslation from parent to child everywhere. That consumes memory, eats CPU cycles and generally downgrades performanceof partition related query planning. I think a better way would be to avoid these translations and use Parentvar to represent a Var of the child being dealt with. That will be a massive churn on inheritance based planner code,but it will improve planning time for queries involving thousands of partitions. Yeah, it would be nice going forward to overhaul inheritance planning such that parent-to-child Var translation is not needed, especially where no pruning can occur or many partitions remain even after pruning. Thanks, Amit
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
(2019/01/10 21:23), Amit Langote wrote: > On Thu, Jan 10, 2019 at 6:49 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: >> Though this will solve a problem for performance when partition-wise join is not possible, we still have the same problemwhen partition-wise join is possible. And that problem really happens because our inheritance mechanism requires expressiontranslation from parent to child everywhere. That consumes memory, eats CPU cycles and generally downgrades performanceof partition related query planning. I think a better way would be to avoid these translations and use Parentvar to represent a Var of the child being dealt with. That will be a massive churn on inheritance based planner code,but it will improve planning time for queries involving thousands of partitions. > > Yeah, it would be nice going forward to overhaul inheritance planning > such that parent-to-child Var translation is not needed, especially > where no pruning can occur or many partitions remain even after > pruning. I agree on that point, but I think that's an improvement for a future release rather than a fix for the issue reported on this thread. Best regards, Etsuro Fujita
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Amit Langote
Дата:
Fujita-san, On 2019/01/10 15:07, Etsuro Fujita wrote: > Amit-san, > > (2019/01/10 10:41), Amit Langote wrote: >> On 2019/01/09 20:20, Etsuro Fujita wrote: >>> I like your patch in general. I think one way to address Ashutosh's >>> concerns would be to use the consider_partitionwise_join flag: originally, >>> that was introduced for partitioned relations to show that they can be >>> partitionwise-joined, but I think that flag could also be used for >>> non-partitioned relations to show that they have been set up properly for >>> partitionwise-joins, and I think by checking that flag we could avoid >>> creating those copies for child dummy rels in try_partitionwise_join. >> >> Ah, that's an interesting idea. >> >> If I understand the original design of it correctly, >> consider_partitionwise_join being true for a given relation (simple or >> join) means that its RelOptInfo contains properties to consider it to be >> joined with another relation (simple or join) using partitionwise join >> mechanism. Partitionwise join will occur between the pair if the other >> relation also has relevant properties (hence its >> consider_partitionwise_join set to true) and properties on the two sides >> match. > > Actually, the flag being true just means that the tlist for a given > partitioned relation (simple or join) doesn't contain any whole-row Vars. > And if two given partitioned relations having the flag being true have > additional properties to be joined using the PWJ technique, then we try to > do PWJ for those partitioned relations. I see. Thanks for the explanation. > (The name of the flag isn't > good? If so, that would be my fault because I named that flag.) If it's really just to store the fact that the relation's targetlist contains expressions that partitionwise join currently cannot handle, then setting it like this in set_append_rel_size seems a bit misleading: if (enable_partitionwise_join && rel->reloptkind == RELOPT_BASEREL && rte->relkind == RELKIND_PARTITIONED_TABLE && rel->attr_needed[InvalidAttrNumber - rel->min_attr] == NULL) rel->consider_partitionwise_join = true; Sorry, I wasn't around to comment on the patch which got committed in 7cfdc77023a, but checking the value of enable_partitionwise_join and other things in set_append_rel_size() to set the value of consider_partitionwise_join seems a bit odd to me. Perhaps, consider_partitionwise_join should be initially set to true for a relation (actually, to rel->part_scheme != NULL) and only set it to false if the relation's targetlist is found to contain unsupported expressions. That way, it becomes easier to think what it means imho. I think enable_partitionwise_join should only be checked in relnode.c or joinrels.c. I've attached a patch to show what I mean. Can you please take a look? If you think that this patch is a good idea, then you'll need to explicitly set consider_partitionwise_join to false for a dummy partition rel in set_append_rel_size(), because the assumption of your patch that such partition's rel's consider_partitionwise_join would be false (as initialized with the current code) would be broken by my patch. But that might be a good thing to do anyway as it will document the special case usage of consider_partitionwise_join variable more explicitly, assuming you'll be adding a comment describing why it's being set to false explicitly. >> That's a loaded meaning and abusing it to mean something else can be >> challenged, but we can live with that if properly documented. Speaking of >> which: >> >> /* used by partitionwise joins: */ >> bool consider_partitionwise_join; /* consider >> partitionwise join >> * paths? (if partitioned >> rel) */ >> >> Maybe, mention here how it will be abused in back-branches for >> non-partitioned relations? > > Will do. Thank you. >>> Please find attached an updated version of the patch. I modified your >>> version so that building tlists for child dummy rels are also postponed >>> until after they actually participate in partitionwise-joins, to avoid >>> that possibly-useless work as well. I haven't done any performance tests >>> yet though. >> >> Thanks for updating the patch. I tested your patch (test setup described >> below) and it has almost the same performance as my previous version: >> 552ms (vs. 41159ms on HEAD vs. 253ms on PG 10) for the query also >> mentioned below. > > Thanks for that testing! > > I also tested the patch with your script: > > 253.559 ms (vs. 85776.515 ms on HEAD vs. 206.066 ms on PG 10) Oh, PG 11 doesn't appear as bad compared to PG 10 with your numbers as it did on my machine. That's good anyway. Thanks, Amit
Вложения
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Amit Langote
Дата:
On 2019/01/11 11:21, Etsuro Fujita wrote: > (2019/01/10 21:23), Amit Langote wrote: >> On Thu, Jan 10, 2019 at 6:49 PM Ashutosh Bapat >> <ashutosh.bapat.oss@gmail.com> wrote: >>> Though this will solve a problem for performance when partition-wise >>> join is not possible, we still have the same problem when >>> partition-wise join is possible. And that problem really happens >>> because our inheritance mechanism requires expression translation from >>> parent to child everywhere. That consumes memory, eats CPU cycles and >>> generally downgrades performance of partition related query planning. I >>> think a better way would be to avoid these translations and use Parent >>> var to represent a Var of the child being dealt with. That will be a >>> massive churn on inheritance based planner code, but it will improve >>> planning time for queries involving thousands of partitions. >> >> Yeah, it would be nice going forward to overhaul inheritance planning >> such that parent-to-child Var translation is not needed, especially >> where no pruning can occur or many partitions remain even after >> pruning. > > I agree on that point, but I think that's an improvement for a future > release rather than a fix for the issue reported on this thread. Agreed. Improving planning performance for large number of partitions even in the absence of pruning is a good goal to pursue for future versions, as is being discussed in some other threads [1]. Thanks, Amit [1] https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1FB60AE5%40G01JPEXMBYT05
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
(2019/01/11 13:46), Amit Langote wrote: > On 2019/01/10 15:07, Etsuro Fujita wrote: >> (The name of the flag isn't >> good? If so, that would be my fault because I named that flag.) > > If it's really just to store the fact that the relation's targetlist > contains expressions that partitionwise join currently cannot handle, then > setting it like this in set_append_rel_size seems a bit misleading: > > if (enable_partitionwise_join&& > rel->reloptkind == RELOPT_BASEREL&& > rte->relkind == RELKIND_PARTITIONED_TABLE&& > rel->attr_needed[InvalidAttrNumber - rel->min_attr] == NULL) > rel->consider_partitionwise_join = true; > > Sorry, I wasn't around to comment on the patch which got committed in > 7cfdc77023a, but checking the value of enable_partitionwise_join and other > things in set_append_rel_size() to set the value of > consider_partitionwise_join seems a bit odd to me. Perhaps, > consider_partitionwise_join should be initially set to true for a relation > (actually, to rel->part_scheme != NULL) and only set it to false if the > relation's targetlist is found to contain unsupported expressions. One thing I intended in that commit was to set the flag to false for partitioned tables contained in inheritance trees where the top parent is a UNION ALL subquery, because we don't consider PWJ for those tables. Actually we wouldn't need to care about that, because we don't do PWJ for those tables regardless of what the flag is set, but I think that would make the code a bit cleaner. However, what you proposed here as-is would not keep that behavior, because rel->part_scheme is created for those tables as well (even though there would be no need IIUC). > That > way, it becomes easier to think what it means imho. May be or may not be. > I think > enable_partitionwise_join should only be checked in relnode.c or > joinrels.c. Sorry, I don't understand this. > I've attached a patch to show what I mean. Can you please > take a look? Thanks for the patch! Maybe I'm missing something, but I don't have a strong opinion about that change. I'd rather think to modify build_simple_rel so that it doesn't create rel->part_scheme if unnecessary (ie, partitioned tables contained in inheritance trees where the top parent is a UNION ALL subquery). > If you think that this patch is a good idea, then you'll need to > explicitly set consider_partitionwise_join to false for a dummy partition > rel in set_append_rel_size(), because the assumption of your patch that > such partition's rel's consider_partitionwise_join would be false (as > initialized with the current code) would be broken by my patch. But that > might be a good thing to do anyway as it will document the special case > usage of consider_partitionwise_join variable more explicitly, assuming > you'll be adding a comment describing why it's being set to false explicitly. I'm not sure we need this as part of a fix for the issue reported on this thread. I don't object to what you proposed here, but that would be rather an improvement, so I think we should leave that for another patch. >>>> Please find attached an updated version of the patch. I modified your >>>> version so that building tlists for child dummy rels are also postponed >>>> until after they actually participate in partitionwise-joins, to avoid >>>> that possibly-useless work as well. I haven't done any performance tests >>>> yet though. >>> >>> Thanks for updating the patch. I tested your patch (test setup described >>> below) and it has almost the same performance as my previous version: >>> 552ms (vs. 41159ms on HEAD vs. 253ms on PG 10) for the query also >>> mentioned below. >> I also tested the patch with your script: >> >> 253.559 ms (vs. 85776.515 ms on HEAD vs. 206.066 ms on PG 10) > > Oh, PG 11 doesn't appear as bad compared to PG 10 with your numbers as it > did on my machine. That's good anyway. Yeah, that's a good result! Best regards, Etsuro Fujita
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
(2019/01/11 13:49), Amit Langote wrote: > On 2019/01/11 11:21, Etsuro Fujita wrote: >> (2019/01/10 21:23), Amit Langote wrote: >>> On Thu, Jan 10, 2019 at 6:49 PM Ashutosh Bapat >>> <ashutosh.bapat.oss@gmail.com> wrote: >>>> Though this will solve a problem for performance when partition-wise >>>> join is not possible, we still have the same problem when >>>> partition-wise join is possible. And that problem really happens >>>> because our inheritance mechanism requires expression translation from >>>> parent to child everywhere. That consumes memory, eats CPU cycles and >>>> generally downgrades performance of partition related query planning. I >>>> think a better way would be to avoid these translations and use Parent >>>> var to represent a Var of the child being dealt with. That will be a >>>> massive churn on inheritance based planner code, but it will improve >>>> planning time for queries involving thousands of partitions. >>> >>> Yeah, it would be nice going forward to overhaul inheritance planning >>> such that parent-to-child Var translation is not needed, especially >>> where no pruning can occur or many partitions remain even after >>> pruning. >> >> I agree on that point, but I think that's an improvement for a future >> release rather than a fix for the issue reported on this thread. > > Agreed. Cool! > Improving planning performance for large number of partitions > even in the absence of pruning is a good goal to pursue for future > versions, as is being discussed in some other threads [1]. Yeah, we have a lot of challenges there. Thanks for sharing the info! Best regards, Etsuro Fujita
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
(2019/01/11 13:46), Amit Langote wrote: > On 2019/01/10 15:07, Etsuro Fujita wrote: >> (2019/01/10 10:41), Amit Langote wrote: >>> That's a loaded meaning and abusing it to mean something else can be >>> challenged, but we can live with that if properly documented. Speaking of >>> which: >>> >>> /* used by partitionwise joins: */ >>> bool consider_partitionwise_join; /* consider >>> partitionwise join >>> * paths? (if partitioned >>> rel) */ >>> >>> Maybe, mention here how it will be abused in back-branches for >>> non-partitioned relations? >> >> Will do. > > Thank you. I know we don't yet reach a consensus on what to do in details to address this issue, but for the above, how about adding comments like this to set_append_rel_size(), instead of the header file: /* * If we consider partitionwise joins with the parent rel, do the same * for partitioned child rels. * * Note: here we abuse the consider_partitionwise_join flag for child * rels that are not partitioned, to tell try_partitionwise_join() * that their targetlists and EC entries have been generated. */ if (rel->consider_partitionwise_join) childrel->consider_partitionwise_join = true; ISTM that that would be more clearer than the header file. Updated patch attached, which also updated other comments a little bit. Best regards, Etsuro Fujita
Вложения
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Amit Langote
Дата:
Fujita-san, On 2019/01/11 20:04, Etsuro Fujita wrote: > (2019/01/11 13:46), Amit Langote wrote: >> On 2019/01/10 15:07, Etsuro Fujita wrote: >>> (The name of the flag isn't >>> good? If so, that would be my fault because I named that flag.) >> >> If it's really just to store the fact that the relation's targetlist >> contains expressions that partitionwise join currently cannot handle, then >> setting it like this in set_append_rel_size seems a bit misleading: >> >> if (enable_partitionwise_join&& >> rel->reloptkind == RELOPT_BASEREL&& >> rte->relkind == RELKIND_PARTITIONED_TABLE&& >> rel->attr_needed[InvalidAttrNumber - rel->min_attr] == NULL) >> rel->consider_partitionwise_join = true; >> >> Sorry, I wasn't around to comment on the patch which got committed in >> 7cfdc77023a, but checking the value of enable_partitionwise_join and other >> things in set_append_rel_size() to set the value of >> consider_partitionwise_join seems a bit odd to me. Perhaps, >> consider_partitionwise_join should be initially set to true for a relation >> (actually, to rel->part_scheme != NULL) and only set it to false if the >> relation's targetlist is found to contain unsupported expressions. > > One thing I intended in that commit was to set the flag to false for > partitioned tables contained in inheritance trees where the top parent is > a UNION ALL subquery, because we don't consider PWJ for those tables. > Actually we wouldn't need to care about that, because we don't do PWJ for > those tables regardless of what the flag is set, but I think that would > make the code a bit cleaner. Yeah, we wouldn't do partitionwise join between partitioned tables that are under UNION ALL. > However, what you proposed here as-is would > not keep that behavior, because rel->part_scheme is created for those > tables as well It'd be easy to prevent set consider_partitionwise_join to false in that case as: + rel->consider_partitionwise_join = (rel->part_scheme != NULL && + (parent == NULL || + parent->rtekind != RTE_SUBQUERY)); > (even though there would be no need IIUC). Partition pruning uses part_scheme and pruning can occur even if a partitioned table is under UNION ALL, so it *is* needed in that case. >> I think >> enable_partitionwise_join should only be checked in relnode.c or >> joinrels.c. > > Sorry, I don't understand this. What I was trying to say is that we should check the GUC close to where partitionwise join is actually implemented even though there is no such hard and fast rule. Or maybe I'm just a bit uncomfortable with setting consider_partitionwise_join based on the GUC. >> I've attached a patch to show what I mean. Can you please >> take a look? > > Thanks for the patch! Maybe I'm missing something, but I don't have a > strong opinion about that change. I'd rather think to modify > build_simple_rel so that it doesn't create rel->part_scheme if unnecessary > (ie, partitioned tables contained in inheritance trees where the top > parent is a UNION ALL subquery). As I said above, partition pruning can occur even if a partitioned table happens to be under UNION ALL. However, we *can* avoid creating part_scheme and setting other partitioning properties if *all* of enable_partition_pruning, enable_partitionwise_join, and enable_partitionwise_aggregate are turned off. >> If you think that this patch is a good idea, then you'll need to >> explicitly set consider_partitionwise_join to false for a dummy partition >> rel in set_append_rel_size(), because the assumption of your patch that >> such partition's rel's consider_partitionwise_join would be false (as >> initialized with the current code) would be broken by my patch. But that >> might be a good thing to do anyway as it will document the special case >> usage of consider_partitionwise_join variable more explicitly, assuming >> you'll be adding a comment describing why it's being set to false >> explicitly. > > I'm not sure we need this as part of a fix for the issue reported on this > thread. I don't object to what you proposed here, but that would be > rather an improvement, so I think we should leave that for another patch. Sure, no problem with committing it separately if at all. Thanks for considering. Thanks, Amit
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Amit Langote
Дата:
On 2019/01/15 10:46, Amit Langote wrote: > On 2019/01/11 20:04, Etsuro Fujita wrote: >> One thing I intended in that commit was to set the flag to false for >> partitioned tables contained in inheritance trees where the top parent is >> a UNION ALL subquery, because we don't consider PWJ for those tables. >> Actually we wouldn't need to care about that, because we don't do PWJ for >> those tables regardless of what the flag is set, but I think that would >> make the code a bit cleaner. > > Yeah, we wouldn't do partitionwise join between partitioned tables that > are under UNION ALL. > >> However, what you proposed here as-is would >> not keep that behavior, because rel->part_scheme is created for those >> tables as well > > It'd be easy to prevent set consider_partitionwise_join to false in that > case as: Oops, I meant to say: It'd be easy to prevent setting consider_partitionwise_join in that case as: > > + rel->consider_partitionwise_join = (rel->part_scheme != NULL && > + (parent == NULL || > + parent->rtekind != RTE_SUBQUERY)); Thanks, Amit
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Amit Langote
Дата:
Fujita-san, On 2019/01/11 21:50, Etsuro Fujita wrote: >>> (2019/01/10 10:41), Amit Langote wrote: >>>> That's a loaded meaning and abusing it to mean something else can be >>>> challenged, but we can live with that if properly documented. >>>> Speaking of >>>> which: >>>> >>>> /* used by partitionwise joins: */ >>>> bool consider_partitionwise_join; /* consider >>>> partitionwise join >>>> * paths? (if >>>> partitioned >>>> rel) */ >>>> >>>> Maybe, mention here how it will be abused in back-branches for >>>> non-partitioned relations? >>> >>> Will do. >> >> Thank you. > > I know we don't yet reach a consensus on what to do in details to address > this issue, but for the above, how about adding comments like this to > set_append_rel_size(), instead of the header file: > > /* > * If we consider partitionwise joins with the parent rel, do the > same > * for partitioned child rels. > * > * Note: here we abuse the consider_partitionwise_join flag for child > * rels that are not partitioned, to tell try_partitionwise_join() > * that their targetlists and EC entries have been generated. > */ > if (rel->consider_partitionwise_join) > childrel->consider_partitionwise_join = true; > > ISTM that that would be more clearer than the header file. Thanks for updating the patch. I tend to agree that it might be better to add such details here than in the header as it's better to keep the latter more stable. About the comment you added, I think we could clarify the note further as: Note: here we abuse the consider_partitionwise_join flag by setting it *even* for child rels that are not partitioned. In that case, we set it to tell try_partitionwise_join() that it doesn't need to generate their targetlists and EC entries as they have already been generated here, as opposed to the dummy child rels for which the flag is left set to false so that it will generate them. Maybe it's a bit wordy, but it helps get the intention across more clearly. Thanks, Amit
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Ashutosh Bapat
Дата:
I think, there's something better possible. Two partitioned relations won't use partition-wise join, if their partition schemes do not match. Partitioned relations with same partitioning scheme share PartitionScheme pointer. PartitionScheme structure should get an extra counter, maintaining a count of number of partitioned relations sharing that structure. When this counter is 1, that relation is certainly not going to participate in PWJ and thus need not have all the structure required by PWJ set up. If we use this counter coupled with enable_partitionwise_join flag, we can get rid of consider_partitionwise_join flag altogether, I think.
On Tue, Jan 15, 2019 at 8:12 AM Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Fujita-san,
On 2019/01/11 21:50, Etsuro Fujita wrote:
>>> (2019/01/10 10:41), Amit Langote wrote:
>>>> That's a loaded meaning and abusing it to mean something else can be
>>>> challenged, but we can live with that if properly documented.
>>>> Speaking of
>>>> which:
>>>>
>>>> /* used by partitionwise joins: */
>>>> bool consider_partitionwise_join; /* consider
>>>> partitionwise join
>>>> * paths? (if
>>>> partitioned
>>>> rel) */
>>>>
>>>> Maybe, mention here how it will be abused in back-branches for
>>>> non-partitioned relations?
>>>
>>> Will do.
>>
>> Thank you.
>
> I know we don't yet reach a consensus on what to do in details to address
> this issue, but for the above, how about adding comments like this to
> set_append_rel_size(), instead of the header file:
>
> /*
> * If we consider partitionwise joins with the parent rel, do the
> same
> * for partitioned child rels.
> *
> * Note: here we abuse the consider_partitionwise_join flag for child
> * rels that are not partitioned, to tell try_partitionwise_join()
> * that their targetlists and EC entries have been generated.
> */
> if (rel->consider_partitionwise_join)
> childrel->consider_partitionwise_join = true;
>
> ISTM that that would be more clearer than the header file.
Thanks for updating the patch. I tend to agree that it might be better to
add such details here than in the header as it's better to keep the latter
more stable.
About the comment you added, I think we could clarify the note further as:
Note: here we abuse the consider_partitionwise_join flag by setting it
*even* for child rels that are not partitioned. In that case, we set it
to tell try_partitionwise_join() that it doesn't need to generate their
targetlists and EC entries as they have already been generated here, as
opposed to the dummy child rels for which the flag is left set to false so
that it will generate them.
Maybe it's a bit wordy, but it helps get the intention across more clearly.
Thanks,
Amit
--
Best Wishes,
Ashutosh Bapat
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
Amit-san, (2019/01/15 10:46), Amit Langote wrote: > On 2019/01/11 20:04, Etsuro Fujita wrote: >> (2019/01/11 13:46), Amit Langote wrote: >> However, what you proposed here as-is would >> not keep that behavior, because rel->part_scheme is created for those >> tables as well >> (even though there would be no need IIUC). > > Partition pruning uses part_scheme and pruning can occur even if a > partitioned table is under UNION ALL, so it *is* needed in that case. Ah, you are right. Thanks for pointing that out! >>> I think >>> enable_partitionwise_join should only be checked in relnode.c or >>> joinrels.c. >> >> Sorry, I don't understand this. > > What I was trying to say is that we should check the GUC close to where > partitionwise join is actually implemented even though there is no such > hard and fast rule. Or maybe I'm just a bit uncomfortable with setting > consider_partitionwise_join based on the GUC. I didn't think so. Consider the consider_parallel flag. I think the way of setting it deviates from that rule already; it is set essentially based on a GUC and is set in set_base_rel_sizes() (ie, before implementing parallel paths). When adding the consider_partitionwise_join flag, I thought it would be a good idea to set consider_partitionwise_join in a similar way to consider_parallel, keeping build_simple_rel() simple. >>> I've attached a patch to show what I mean. Can you please >>> take a look? >> >> Thanks for the patch! Maybe I'm missing something, but I don't have a >> strong opinion about that change. I'd rather think to modify >> build_simple_rel so that it doesn't create rel->part_scheme if unnecessary >> (ie, partitioned tables contained in inheritance trees where the top >> parent is a UNION ALL subquery). > > As I said above, partition pruning can occur even if a partitioned table > happens to be under UNION ALL. However, we *can* avoid creating > part_scheme and setting other partitioning properties if *all* of > enable_partition_pruning, enable_partitionwise_join, and > enable_partitionwise_aggregate are turned off. Yeah, I think so. >>> If you think that this patch is a good idea, then you'll need to >>> explicitly set consider_partitionwise_join to false for a dummy partition >>> rel in set_append_rel_size(), because the assumption of your patch that >>> such partition's rel's consider_partitionwise_join would be false (as >>> initialized with the current code) would be broken by my patch. But that >>> might be a good thing to do anyway as it will document the special case >>> usage of consider_partitionwise_join variable more explicitly, assuming >>> you'll be adding a comment describing why it's being set to false >>> explicitly. >> >> I'm not sure we need this as part of a fix for the issue reported on this >> thread. I don't object to what you proposed here, but that would be >> rather an improvement, so I think we should leave that for another patch. > > Sure, no problem with committing it separately if at all. OK Thanks! Best regards, Etsuro Fujita
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
(2019/01/15 11:42), Amit Langote wrote: > On 2019/01/11 21:50, Etsuro Fujita wrote: >>>> (2019/01/10 10:41), Amit Langote wrote: >>>>> That's a loaded meaning and abusing it to mean something else can be >>>>> challenged, but we can live with that if properly documented. >>>>> Speaking of >>>>> which: >>>>> >>>>> /* used by partitionwise joins: */ >>>>> bool consider_partitionwise_join; /* consider >>>>> partitionwise join >>>>> * paths? (if >>>>> partitioned >>>>> rel) */ >>>>> >>>>> Maybe, mention here how it will be abused in back-branches for >>>>> non-partitioned relations? >> I know we don't yet reach a consensus on what to do in details to address >> this issue, but for the above, how about adding comments like this to >> set_append_rel_size(), instead of the header file: >> >> /* >> * If we consider partitionwise joins with the parent rel, do the >> same >> * for partitioned child rels. >> * >> * Note: here we abuse the consider_partitionwise_join flag for child >> * rels that are not partitioned, to tell try_partitionwise_join() >> * that their targetlists and EC entries have been generated. >> */ >> if (rel->consider_partitionwise_join) >> childrel->consider_partitionwise_join = true; >> >> ISTM that that would be more clearer than the header file. > > Thanks for updating the patch. I tend to agree that it might be better to > add such details here than in the header as it's better to keep the latter > more stable. > > About the comment you added, I think we could clarify the note further as: > > Note: here we abuse the consider_partitionwise_join flag by setting it > *even* for child rels that are not partitioned. In that case, we set it > to tell try_partitionwise_join() that it doesn't need to generate their > targetlists and EC entries as they have already been generated here, as > opposed to the dummy child rels for which the flag is left set to false so > that it will generate them. > > Maybe it's a bit wordy, but it helps get the intention across more clearly. I think that is well-worded, so +1 from me. Thanks again! Best regards, Etsuro Fujita
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
Hi Ashutosh, (2019/01/15 13:29), Ashutosh Bapat wrote: > I think, there's something better possible. Two partitioned relations > won't use partition-wise join, if their partition schemes do not match. > Partitioned relations with same partitioning scheme share > PartitionScheme pointer. PartitionScheme structure should get an extra > counter, maintaining a count of number of partitioned relations sharing > that structure. When this counter is 1, that relation is certainly not > going to participate in PWJ and thus need not have all the structure > required by PWJ set up. If we use this counter coupled with > enable_partitionwise_join flag, we can get rid of > consider_partitionwise_join flag altogether, I think. Interesting! That flag was introduced to disable PWJs when whole-row Vars are involved, as you know, so I think we need to first eliminate that limitation, to remove that flag. Thanks! Best regards, Etsuro Fujita
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
(2019/01/16 15:21), Ashutosh Bapat wrote: > On Wed, Jan 16, 2019 at 11:22 AM Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote: > (2019/01/15 13:29), Ashutosh Bapat wrote: > > I think, there's something better possible. Two partitioned relations > > won't use partition-wise join, if their partition schemes do not > match. > > Partitioned relations with same partitioning scheme share > > PartitionScheme pointer. PartitionScheme structure should get an > extra > > counter, maintaining a count of number of partitioned relations > sharing > > that structure. When this counter is 1, that relation is > certainly not > > going to participate in PWJ and thus need not have all the structure > > required by PWJ set up. If we use this counter coupled with > > enable_partitionwise_join flag, we can get rid of > > consider_partitionwise_join flag altogether, I think. > > Interesting! > > That flag was introduced to disable PWJs when whole-row Vars are > involved, as you know, so I think we need to first eliminate that > limitation, to remove that flag. > > For that we don't need a separate flag. Do we? AFAIR, somewhere under > try_partitionwise_join() we check whether PWJ is possible between two > relations. That involves a bunch of checks like checking whether the > relations have same bounds. Those checks should be enhanced to > incorporate existence of whole-var, I think. Yeah, that check is actually done in build_joinrel_partition_info(), which is called from build_join_rel() and build_child_join_rel() (only the latter is called from try_partitionwise_join()). That flag is used in build_joinrel_partition_info() for that check, but as you mentioned, I think it would be possible to remove that flag, probably by checking the WRV existence from the outer_rel/inner_rel's reltarget, instead of that flag. But I'm not sure we can do that efficiently without complicating the existing code including the original PWJ one. That flag doesn't make that code complicated. I thought it would be better to not complicate that code, because disabling such PWJs would be something temporary until we support them. Anyway, I think this would be a separate issue from the original one we discussed on this thread. Best regards, Etsuro Fujita
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
(2019/01/16 14:45), Etsuro Fujita wrote: > (2019/01/15 11:42), Amit Langote wrote: >> On 2019/01/11 21:50, Etsuro Fujita wrote: >>>>> (2019/01/10 10:41), Amit Langote wrote: >>>>>> That's a loaded meaning and abusing it to mean something else can be >>>>>> challenged, but we can live with that if properly documented. >>>>>> Speaking of >>>>>> which: >>>>>> >>>>>> /* used by partitionwise joins: */ >>>>>> bool consider_partitionwise_join; /* consider >>>>>> partitionwise join >>>>>> * paths? (if >>>>>> partitioned >>>>>> rel) */ >>>>>> >>>>>> Maybe, mention here how it will be abused in back-branches for >>>>>> non-partitioned relations? > >>> I know we don't yet reach a consensus on what to do in details to >>> address >>> this issue, but for the above, how about adding comments like this to >>> set_append_rel_size(), instead of the header file: >>> >>> /* >>> * If we consider partitionwise joins with the parent rel, do the >>> same >>> * for partitioned child rels. >>> * >>> * Note: here we abuse the consider_partitionwise_join flag for child >>> * rels that are not partitioned, to tell try_partitionwise_join() >>> * that their targetlists and EC entries have been generated. >>> */ >>> if (rel->consider_partitionwise_join) >>> childrel->consider_partitionwise_join = true; >>> >>> ISTM that that would be more clearer than the header file. >> >> Thanks for updating the patch. I tend to agree that it might be better to >> add such details here than in the header as it's better to keep the >> latter >> more stable. >> >> About the comment you added, I think we could clarify the note further >> as: >> >> Note: here we abuse the consider_partitionwise_join flag by setting it >> *even* for child rels that are not partitioned. In that case, we set it >> to tell try_partitionwise_join() that it doesn't need to generate their >> targetlists and EC entries as they have already been generated here, as >> opposed to the dummy child rels for which the flag is left set to >> false so >> that it will generate them. >> >> Maybe it's a bit wordy, but it helps get the intention across more >> clearly. > > I think that is well-worded, so +1 from me. I updated the patch as such and rebased it to the latest HEAD. I also added the commit message. Attached is an updated patch. Does that make sense? If there are no objections, I'll push that patch early next week. Best regards, Etsuro Fujita
Вложения
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Amit Langote
Дата:
On Fri, Jan 18, 2019 at 9:58 PM Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > I updated the patch as such and rebased it to the latest HEAD. I also > added the commit message. Attached is an updated patch. Does that make > sense? If there are no objections, I'll push that patch early next week. Thank you. Looks good to me. Regards, Amit
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
(2019/01/19 21:17), Amit Langote wrote: > On Fri, Jan 18, 2019 at 9:58 PM Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> I updated the patch as such and rebased it to the latest HEAD. I also >> added the commit message. Attached is an updated patch. Does that make >> sense? If there are no objections, I'll push that patch early next week. > > Thank you. Looks good to me. Cool. Pushed after tweaking the commit message based on the feedback from Justin offlist and a self-review. Thanks everyone who worked on this! Best regards, Etsuro Fujita
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Amit Langote
Дата:
On 2019/01/21 17:17, Etsuro Fujita wrote: > (2019/01/19 21:17), Amit Langote wrote: >> On Fri, Jan 18, 2019 at 9:58 PM Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> I updated the patch as such and rebased it to the latest HEAD. I also >>> added the commit message. Attached is an updated patch. Does that make >>> sense? If there are no objections, I'll push that patch early next week. >> >> Thank you. Looks good to me. > > Cool. Pushed after tweaking the commit message based on the feedback from > Justin offlist and a self-review. Thank you. Regards, Amit
Re: Query with high planning time at version 11.1 compared versions10.5 and 11.0
От
Etsuro Fujita
Дата:
(2019/01/21 17:21), Amit Langote wrote: > On 2019/01/21 17:17, Etsuro Fujita wrote: >> (2019/01/19 21:17), Amit Langote wrote: >>> On Fri, Jan 18, 2019 at 9:58 PM Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>> I updated the patch as such and rebased it to the latest HEAD. I also >>>> added the commit message. Attached is an updated patch. Does that make >>>> sense? If there are no objections, I'll push that patch early next week. >>> >>> Thank you. Looks good to me. >> >> Cool. Pushed after tweaking the commit message based on the feedback from >> Justin offlist and a self-review. > > Thank you. One thing to add: I forgot back-patching :(, so I did that a little while ago. Best regards, Etsuro Fujita