Re: Pathify RHS unique-ification for semijoin planning
От | Andy Fan |
---|---|
Тема | Re: Pathify RHS unique-ification for semijoin planning |
Дата | |
Msg-id | 87r00hq9kb.fsf@163.com обсуждение исходный текст |
Ответы |
Re: Pathify RHS unique-ification for semijoin planning
|
Список | pgsql-hackers |
Richard Guo <guofenglinux@gmail.com> writes: Hi, > However, in the case of sort-based implementation, > this function pays no attention to the subpath's pathkeys or the > pathkeys of the resulting output. Good finding! > > In addition to this specific issue, it seems to me that there are > other potential issues in create_unique_path(). > > * Currently, we only consider the cheapest_total_path of the RHS when > unique-ifying it. I think it is better have a check the tuple_fraction for the startup_cost factor, for some paths where the total cost is high but the required fraction is lower. > I think this may cause us to miss some optimization > opportunities. For example, there might be a path with a better sort > order that isn't the cheapest-total one. Such a path could help avoid > a sort at a higher level, potentially resulting in a cheaper overall > plan. > * In create_unique_path(), we currently rely on heuristics to decide > whether to use a hash-based or sort-based method. I think a better > approach would be to create paths for both methods and let add_path() > determine which one is better, similar to how we handle path selection > in other parts of the planner. > > Therefore, I'm thinking that maybe we could create a new RelOptInfo > for the RHS rel to represent its unique-ified version, and then > generate all worthwhile paths for it, This sounds great for me. and I think we can keep the fraction cheapest path on the new RelOptInfo as well, then all the things should be on the way. > To be concrete, I'm envisioning something like the following: > > if (bms_equal(sjinfo->syn_righthand, rel2->relids) && > - create_unique_path(root, rel2, rel2->cheapest_total_path, > - sjinfo) != NULL) > + (rel2_unique = create_unique_rel(root, rel2, sjinfo)) != NULL) > > ... > > - add_paths_to_joinrel(root, joinrel, rel1, rel2, > - JOIN_UNIQUE_INNER, sjinfo, > + add_paths_to_joinrel(root, joinrel, rel1, rel2_unique, > + JOIN_INNER, sjinfo, > restrictlist); > - add_paths_to_joinrel(root, joinrel, rel2, rel1, > - JOIN_UNIQUE_OUTER, sjinfo, > + add_paths_to_joinrel(root, joinrel, rel2_unique, rel1, > + JOIN_INNER, sjinfo, > restrictlist); > > In addition, by changing the join from "rel1" and "rel2" using > JOIN_UNIQUE_OUTER or JOIN_UNIQUE_INNER to a join between "rel1" and > "rel2_unique" using a standard JOIN_INNER, we might be able to get > rid of the JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER jointypes. if we can, +10. -- Best Regards Andy Fan
В списке pgsql-hackers по дате отправления: