Обсуждение: Planner creating ineffective plans on LEFT OUTER joins
Hi, After pondering on the problem for quite some time and discussing it on IRC with RhodiumToad I thought the most sensible thing is to post the problem here (as RhodiumToad suggested as well). The original (although already quite reduced) problematic query and the related plan: http://anarazel.de/postgres/orig_query.sql http://anarazel.de/postgres/orig_query.plan I.e. it builds the right side of the LEFT JOIN for all elements it could possibly contain and not only for the ones which exist on the left side. (Database is freshly VACUUM ANALYZE'd) Perhaps I expect to much from the planner here? With this query this is not much of a problem, but the plan is the same if the inner part of the query yields some million rows (and possibly is not only In order to make testing easier I tried to reproduce the problem (with help of RhodiumToad): http://anarazel.de/postgres/create_testtables.sql Testquery: SELECT * FROMab LEFT OUTER JOIN ( bc JOIN cd ON bc.c = cd.d)ON ab.b = bc.b WHEREab.a = 20000 As ab.a = 20000 occurs only once in ab one would expect that it just does an index scan on bc for ab.b = bc.b. Unfortunately it builds the complete right side of the join first, and then selects the one element it needs... Queryplan: http://anarazel.de/postgres/testtable_query1.plan If there is no relatively easy fix for this, any idea how to work around that problem? Thanks, Andres Freund PS: Tested with 8.3.3 and 8.2.7. The problem was the same since 8.0 though (I didn't test earlier versions )
Andres Freund <andres@anarazel.de> writes: > SELECT * > FROM > ab LEFT OUTER JOIN ( > bc JOIN cd > ON bc.c = cd.d > ) > ON ab.b = bc.b > WHERE > ab.a = 20000 > As ab.a = 20000 occurs only once in ab one would expect that it just does an > index scan on bc for ab.b = bc.b. The only way it could do that would be by interchanging the order of the left and inner joins, ie (ab left join bc) join cd; which would change the results. I believe it could interchange the joins if they were both LEFT or both INNER. Do you really need exactly these semantics? regards, tom lane
>> SELECT * FROM ab LEFT OUTER JOIN (bc JOIN cd ON bc.c = cd.d) ON ab.b = bc.b >> WHERE ab.a = 20000 >> As ab.a = 20000 occurs only once in ab one would expect that it just does an >> index scan on bc for ab.b = bc.b. > > The only way it could do that would be by interchanging the order of the > left and inner joins, ie (ab left join bc) join cd; which would change > the results. In theory, I believe this could be rewritten as: SELECT * FROM ab LEFT OUTER JOIN (SELECT bc.b FROM ab JOIN bc ON ab.b = bc.b JOIN cd ON bc.c = cd.d WHERE ab.b = 20000) dummy ON ab.b = dummy.b WHERE ab.a = 20000 ...without affecting the results. If the condition ab.a = 20000 is highly selective, this is a big win. I can predict that Tom will say that the planning time it would take to avoid this problem isn't justified by the number of queries that it would improve. That's possible, but it's unfortunate that there's no way to fiddle with the knobs and get the planner to do this kind of thing when you want it to. Rewriting the query as described above is OK when you're writing the whole query from scratch, but I don't know of an easy fix for this: CREATE VIEW xyz AS SELECT * FROM ab LEFT OUTER JOIN (bc JOIN cd ON bc.c = cd.d) ON ab.b = bc.b Sometimes I want to SELECT * FROM xyz ORDER BY a LIMIT 100 (to let the user browse records) and sometimes I want to SELECT * FROM WHERE a = 20000 (retrieve a single record). Neither query performs acceptably if the planner generates the entire cross-product of bc and cd and then throws most of it away, unless bc and cd are very small tables. ...Robert
"Robert Haas" <robertmhaas@gmail.com> writes: > I can predict that Tom will say that the planning time it would take > to avoid this problem isn't justified by the number of queries that it > would improve. Took the words right out of my mouth ;-) It would be *possible* to do this sort of thing, but what it would imply is re-planning entire join nests on the strength of assumptions that some additional constraints are available. Right now we only do that for individual inner-side relations. AFAICS extending that to sub-joins would result in an exponential increase in plan time. regards, tom lane
On Wed, 2008-06-25 at 23:34 -0400, Robert Haas wrote: > I can predict that Tom will say that the planning time it would take > to avoid this problem isn't justified by the number of queries that it > would improve. > That's possible, but it's unfortunate that there's no > way to fiddle with the knobs and get the planner to do this kind of > thing when you want it to. I don't think we should invent a new parameter for each new optimisation. We would soon get swamped. IMHO we should have a single parameter which indicates how much planning time we consider acceptable for this query. e.g. optimization_level = 2 (default), varies 1-3 Most automatic optimisation systems allow this kind of setting, whether it be a DBMS, or compilers (e.g. gcc). We should agree a simple framework so that each new category of optimization can be described as being a level X optimisation, or discarded as being never worth the time. We do this with error messages, so why not do this with something to control planning time? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
simon@2ndquadrant.com (Simon Riggs) writes: > On Wed, 2008-06-25 at 23:34 -0400, Robert Haas wrote: >> I can predict that Tom will say that the planning time it would take >> to avoid this problem isn't justified by the number of queries that it >> would improve. > >> That's possible, but it's unfortunate that there's no >> way to fiddle with the knobs and get the planner to do this kind of >> thing when you want it to. > > I don't think we should invent a new parameter for each new > optimisation. We would soon get swamped. > > IMHO we should have a single parameter which indicates how much planning > time we consider acceptable for this query. e.g. > > optimization_level = 2 (default), varies 1-3 > > Most automatic optimisation systems allow this kind of setting, whether > it be a DBMS, or compilers (e.g. gcc). > > We should agree a simple framework so that each new category of > optimization can be described as being a level X optimisation, or > discarded as being never worth the time. We do this with error messages, > so why not do this with something to control planning time? Is there something "more parametric" that we could use to characterize this? That is, to attach some value that *does* have some numeric interpretation? I don't quite have a "for instance," but here's some thoughts on modelling this... - If there is some query optimization option/node that clearly adds to planning cost in a linear (or less) fashion, thenit would be meaningful to mark it as "linear", and we'd be fairly certain to validate any linear options. - There would also be options/nodes that have a multiplicative effect on planning time. - Thirdly, there are options/nodes (particularly when considering cases of multiple joins) where there is a polynomial/exponential effect on query planning. I could see: a) Evaluating which roads to consider from a "linear/multiplicative/exponential" perspective, which would look a lotlike "level 1, level 2, level 3". b) Estimating values, and, in effect, trying to model the amount of planning effort, and dropping out sets of routes that are expected to make the effort exceed [some value]. Sane? Silly? -- "cbbrowne","@","linuxfinances.info" http://www3.sympatico.ca/cbbrowne/nonrdbms.html STATED REASON DOES NOT COMPUTE WITH PROGRAMMED FACTS...
> IMHO we should have a single parameter which indicates how much planning > time we consider acceptable for this query. e.g. > > optimization_level = 2 (default), varies 1-3 > > Most automatic optimisation systems allow this kind of setting, whether > it be a DBMS, or compilers (e.g. gcc). It's my understanding that the philosophy of the PGDG in the past has been to avoid putting any kind of hints into the system, focusing rather an improving the planning of queries. A quick Google search turns up, for example: http://archives.postgresql.org/pgsql-performance/2003-12/msg00181.php Now, perhaps the thinking on this has changed, but a global knob like this strikes me as a bad idea. If Tom is right that improving the plan on queries like this would result in an exponential increase in planning time, then it's certainly important not to paint with too broad a brush. It would really be best to be able to tell the planner which specific part of the query may be susceptible to this type of optimization, because you could easily have many places in a complicated query that would need to be analyzed, and if the planning time is going to be a problem then we don't want to overplan the entire query just to fix the problem in one particular spot. And we certainly don't want to do a whole bunch of other, unrelated, expensive optimizations at the same time. If one were to add a hint, I think the hint should tell the planner: Hey, see this left join? Well, computing the right-hand side of this thing is going to take forever unless we get some information to help us out. So please do all of your limit and filter operations on the left-hand side first, and then if you have any rows left, then evaluate the right-hand side for just the values that matter. i.e. in the example query: SELECT * FROM ab LEFT OUTER JOIN (bc JOIN cd ON bc.c = cd.d) ON ab.b = bc.b WHERE ab.a = 20000 ...please look up the rows in ab where ab.a = 20000. If you find any, then make a hash table of all the values you find for b among those rows. Then when you evaluate (bc JOIN cd ON bc.c = cd.d) you can filter bc for rows where bc.b is in the hash table. This might not be a good query plan in the average case, but there are definitely instances where you might want to force this behavior. In fact, even if you had to do it as a nested loop (re-evaluating the bc JOIN cd clause for each possible value of b) there are still cases where it would be a big win. Of course the nicest thing would be for the planner to realize on its own that the right-hand side of the join is going to generate a gazillion rows and the left-hand side is going to generate one, but maybe (as Tom and the OP suggested) that is expecting too much (though I confess I don't quite see why). ...Robert
On Thu, 2008-06-26 at 12:36 -0400, Robert Haas wrote: > > IMHO we should have a single parameter which indicates how much planning > > time we consider acceptable for this query. e.g. > > > > optimization_level = 2 (default), varies 1-3 > > > > Most automatic optimisation systems allow this kind of setting, whether > > it be a DBMS, or compilers (e.g. gcc). > > It's my understanding that the philosophy of the PGDG in the past has > been to avoid putting any kind of hints into the system, focusing > rather an improving the planning of queries. It's not a specific hint, its a general goal setting. Providing information to the optimizer about our goals is not a universally bad thing; telling it to force a particular plan against its better judgement probably is. For example, gcc has exactly this kind of optimization mode. -O2 should be acceptable to us, but an option like -fsplit-ivs-in-unroller probably isn't. > If one were to add a hint, I think the hint should tell the planner: > Hey, see this left join? Now that *is* a hint. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndquadrant.com> writes: > On Thu, 2008-06-26 at 12:36 -0400, Robert Haas wrote: >> It's my understanding that the philosophy of the PGDG in the past has >> been to avoid putting any kind of hints into the system, focusing >> rather an improving the planning of queries. > It's not a specific hint, its a general goal setting. Right. There are definitely places where we've made engineering judgements to not attempt a particular type of optimization because it'd be too expensive compared to the typical payoff. Simon's idea has some merit for providing a framework to deal with that type of situation. However, just adding a GUC variable isn't going to make anything happen --- we'd need some concrete plans about what we'd do with it. regards, tom lane
On Thu, 2008-06-26 at 12:57 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > On Thu, 2008-06-26 at 12:36 -0400, Robert Haas wrote: > >> It's my understanding that the philosophy of the PGDG in the past has > >> been to avoid putting any kind of hints into the system, focusing > >> rather an improving the planning of queries. > > > It's not a specific hint, its a general goal setting. > > Right. There are definitely places where we've made engineering > judgements to not attempt a particular type of optimization because it'd > be too expensive compared to the typical payoff. Simon's idea has some > merit for providing a framework to deal with that type of situation. > However, just adding a GUC variable isn't going to make anything happen > --- we'd need some concrete plans about what we'd do with it. Well, I'm convinced the egg came first. So I figure to put the framework in place and then start reviewing things to see if they can be categorised. Plus I want new optimizer features to be considered in the light of the new framework. This also allows us a way of handling optimizer performance bugs. We just reclassify certain cases as being costs-more solutions, rather than stripping the code out entirely. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > IMHO we should have a single parameter which indicates how much planning > time we consider acceptable for this query. e.g. > optimization_level = 2 (default), varies 1-3 Couldn't the planner itself make a good guess if it should keep trying based on the estimated cost? if (the_best_plan_I_found_so_far_looks_like_itll_take_an_hour) keep_optimizing_for_a_few_minutes if (the_best_plan_I_found_so_far_looks_like_itll_take_0.01ms) stop_planning_and_run_with_it Or maybe as simple as something like if (time_spent_planning >= cost_of_the_best_plan_found / 10) stop_optimizing. If we wanted a GUC, perhaps make it that 10 in the expression above?
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > Couldn't the planner itself make a good guess if it should > keep trying based on the estimated cost? > if (the_best_plan_I_found_so_far_looks_like_itll_take_an_hour) > keep_optimizing_for_a_few_minutes > if (the_best_plan_I_found_so_far_looks_like_itll_take_0.01ms) > stop_planning_and_run_with_it You're operating on a mistaken assumption, which is that we generate a complete plan and then generate another. The places where we'd actually be doing something with an effort variable are usually dealing with small parts of plans, or even with preparatory calculations before we've got anything plan-like at all. They haven't got a sufficiently holistic view of what's happening to apply a rule like the above. regards, tom lane
Tom Lane wrote: > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: >> Couldn't the planner itself make a good guess if it should >> keep trying based on the estimated cost? > >> if (the_best_plan_I_found_so_far_looks_like_itll_take_an_hour) >> keep_optimizing_for_a_few_minutes >> if (the_best_plan_I_found_so_far_looks_like_itll_take_0.01ms) >> stop_planning_and_run_with_it > > You're operating on a mistaken assumption, which is that we generate a > complete plan and then generate another. The places where we'd actually > be doing something with an effort variable are usually dealing with > small parts of plans, or even with preparatory calculations before we've > got anything plan-like at all. They haven't got a sufficiently holistic > view of what's happening to apply a rule like the above. Then could the logic wait until the final plan is computed; and if that final plan looks very expensive (compared with the plan time so far), try again with the effort variable automatically increased?
Hi, On Thursday 26 June 2008 04:36:09 Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > SELECT * > > FROM > > ab LEFT OUTER JOIN ( > > bc JOIN cd > > ON bc.c = cd.d > > ) > > ON ab.b = bc.b > > > > WHERE > > ab.a = 20000 > > > > As ab.a = 20000 occurs only once in ab one would expect that it just does > > an index scan on bc for ab.b = bc.b. There was a typo in here (ON bc.c = cd.d should be ON bc.c = cd.c): http://anarazel.de/postgres/testtable_query4.plan Better query plan, but it still not optimal - interestingly the query plan works out perfecty for ab.a = 10: http://anarazel.de/postgres/testtable_query3.plan .... > The only way it could do that would be by interchanging the order of the > left and inner joins, ie (ab left join bc) join cd; which would change > the results. My knowledge about the implementation side of relational databases is quite limited, so my ideas may be quite flawed: The planner already recognizes that the left side of the join is quite small and the right side will be very big. Why cant it optimize the query the same way it does for a inner join, namely doing an index lookup on bc? I dont see the fundamental problem? > I believe it could interchange the joins if they were both LEFT or > both INNER. Do you really need exactly these semantics? I don't see an easy/effective way to express it: I need all data belonging left side of the join (proband) through a series (participation -> answer_group -> answer -> data) of inner joins and NULL if there is no data. If there would be only one such join it wouldn't be a problem - but a normal query has around 20 such LEFT JOINS. Currently I solve this through separately inserting the data for each join into a temporary table which is still way much faster. But not having the statistics the planner has selecting a good order isn't that easy. Besides its not very elegant. So, if somebody has a better idea... If I can use my time to improve pg instead of working around the problem on clientside both me and my employer will be happy... Thanks, Andres
Andres Freund <andres@anarazel.de> writes: >> The only way it could do that would be by interchanging the order of the >> left and inner joins, ie (ab left join bc) join cd; which would change >> the results. > My knowledge about the implementation side of relational databases is quite > limited, so my ideas may be quite flawed: > The planner already recognizes that the left side of the join is quite small > and the right side will be very big. > Why cant it optimize the query the same way it does for a inner join, namely > doing an index lookup on bc? > I dont see the fundamental problem? The only correct join order for this query is to join bc to cd, then left-join ab to that result. Now, if we make ab the outer side of a nestloop over the lower join's result, it would indeed be theoretically possible to pass down the value of ab.b through the lower join to the scan on bc and use it to constrain the scan. The problem is that finding plans that work like this would increase the planner's runtime exponentially, compared to the current situation where we only check for indexscan constraints coming from the immediate join partner. (There might be some executor issues too, but I think those would be relatively easily solved, compared to the plan search time problem.) regards, tom lane
Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> On Thu, 2008-06-26 at 12:36 -0400, Robert Haas wrote: >>> It's my understanding that the philosophy of the PGDG in the past has >>> been to avoid putting any kind of hints into the system, focusing >>> rather an improving the planning of queries. > >> It's not a specific hint, its a general goal setting. > > Right. There are definitely places where we've made engineering > judgements to not attempt a particular type of optimization because it'd > be too expensive compared to the typical payoff. Simon's idea has some > merit for providing a framework to deal with that type of situation. > However, just adding a GUC variable isn't going to make anything happen > --- we'd need some concrete plans about what we'd do with it. If the planner provided some facility to control how much effort it puts into planning, we could even tune that knob automatically with something like plan = plan_query(effort=0); if (estimated_execution_cost > triviality_threshold) plan = plan_query(estimated_execution_cost * effort_by_cost_ratio); regards, Forian Pflug