Обсуждение: from_collapse_limit vs. geqo_threshold
The docs contain the following sage advice concerning from_collapse_limit: "It is usually wise to keep this less than geqo_threshold." I've been thinking through this advice on and off for about 2 years and I still don't understand it. The point of either from_collapse_limit and geqo_threshold is to avoid exponential growth in planning time when planning large join nests. Either of them has the effect of reducing planning time at the expense of possibly getting an inferior plan. However, it's my experience that the plans that result when the from_collapse_limit kicks in are almost invariably terrible when fetching only a small number of rows. On the other hand, the few GEQO plans I've had experience with have been pretty reasonable. It appears that this statement has been in our documentation since Tom Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on January 25, 2003 (9bf97ff426de9), but I can't find any justification for it anywhere. I think we either need to justify this advice, or remove it. ...Robert
Robert, > It appears that this statement has been in our documentation since Tom > Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on > January 25, 2003 (9bf97ff426de9), but I can't find any justification > for it anywhere. I think we either need to justify this advice, or > remove it. ... trying to remember why I wrote that ... what would happen if FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD? -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
On Thu, May 21, 2009 at 12:21 AM, Josh Berkus <josh@agliodbs.com> wrote: >> It appears that this statement has been in our documentation since Tom >> Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on >> January 25, 2003 (9bf97ff426de9), but I can't find any justification >> for it anywhere. I think we either need to justify this advice, or >> remove it. > > ... trying to remember why I wrote that ... what would happen if > FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD? The two variables do different things, so there's nothing particularly magical about which one is larger AFAICS. I believe that if you make from_collapse_limit larger than geqo_threshold, then GEQO might be asked to plan a query into which subqueries have been pulled up. But that's not obviously bad; the alternative is planning the subquery separately and first, which at least for the very small number of cases that I've tested seems to be quite a bit worse. Apparently before from_collapse_limit was added the behavior existed, but the thereshold was geqo_threshold/2. So someone had a reason for believing that when the join nest got too large, not pulling up subqueries was a superior coping strategy versus invoking GEQO. I just don't know what the reason is, or whether it's still valid. ...Robert
Josh Berkus <josh@agliodbs.com> writes: > Robert, >> It appears that this statement has been in our documentation since Tom >> Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on >> January 25, 2003 (9bf97ff426de9), but I can't find any justification >> for it anywhere. I think we either need to justify this advice, or >> remove it. > ... trying to remember why I wrote that ... what would happen if > FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD? I think I wrote it, not you. The point of the advice is to keep subquery collapsation (hm, what's the right noun form? Need caffeine) from turning a non-GEQO query into a GEQO one, and thus subjecting you to unpredictable plans. Maybe the resulting plans would be better on average, or maybe they wouldn't, but in any case they'd be unpredictable. regards, tom lane
On Thu, May 21, 2009 at 7:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Berkus <josh@agliodbs.com> writes: >> Robert, >>> It appears that this statement has been in our documentation since Tom >>> Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on >>> January 25, 2003 (9bf97ff426de9), but I can't find any justification >>> for it anywhere. I think we either need to justify this advice, or >>> remove it. > >> ... trying to remember why I wrote that ... what would happen if >> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD? > > I think I wrote it, not you. The point of the advice is to keep > subquery collapsation (hm, what's the right noun form? Need caffeine) > from turning a non-GEQO query into a GEQO one, and thus subjecting > you to unpredictable plans. Maybe the resulting plans would be better > on average, or maybe they wouldn't, but in any case they'd be > unpredictable. That's more or less what I figured, but my real world experience is that pulling up subqueries and using GEQO leads to plans that are random but tolerable, whereas not pulling up subqueries leads to plans that are almost uniformly bad. Actually, it works OK if really would have needed to materialize the entire subquery, but otherwise it stinks. My real unvarnished opinion on this topic is that from_collapse_limit is a loaded foot-gun waiting to go off. We might as well have an option where if the number of tables in the query exceeds a certain threshold, we'll just sequential-scan the table rather than considering the use of indices. That option would actually be better, because everyone who read the documentation would be absolutely certain that they wanted to turn that option OFF, whereas the behavior of from_collapse_limit is sufficiently complex that it isn't obvious that it's a terrible idea. ...Robert
Having just raised the statistics targets I wonder if we should look at raising these two parameters too. The experience on the lists are that when people run into either of these two limits we recommend raising them and we've never seen anyone come back complaining that their planning time goes through the roof. To benchmark this like we did for the statistics target will be tricky though. I suppose we can gather candidate queries by trolling the lists for recommendations, though we didn't always get schemas to go along with the queries. -- greg
On Thu, May 21, 2009 at 8:51 AM, Greg Stark <stark@enterprisedb.com> wrote: > Having just raised the statistics targets I wonder if we should look > at raising these two parameters too. The experience on the lists are > that when people run into either of these two limits we recommend > raising them and we've never seen anyone come back complaining that > their planning time goes through the roof. > > To benchmark this like we did for the statistics target will be tricky > though. I suppose we can gather candidate queries by trolling the > lists for recommendations, though we didn't always get schemas to go > along with the queries. I think that's a good idea, too, but it's unrelated to the question of which one should be set to the higher value. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, May 21, 2009 at 7:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Josh Berkus <josh@agliodbs.com> writes: >>> ... trying to remember why I wrote that ... what would happen if >>> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD? >> >> I think I wrote it, not you. The point of the advice is to keep >> subquery collapsation (hm, what's the right noun form? Need caffeine) >> from turning a non-GEQO query into a GEQO one, and thus subjecting >> you to unpredictable plans. Maybe the resulting plans would be better >> on average, or maybe they wouldn't, but in any case they'd be >> unpredictable. > That's more or less what I figured, but my real world experience is > that pulling up subqueries and using GEQO leads to plans that are > random but tolerable, whereas not pulling up subqueries leads to plans > that are almost uniformly bad. I went back and looked at the CVS history to try to refresh my memory about how we got here. As best I can find, there were two steps: 1. The original commit of the ability to have subqueries at all, during 7.1 development: 2000-09-29 14:21 tgl Subselects in FROM clause, perISO syntax: FROM (SELECT ...) [AS] alias. (Don't forget that analias is required.) Viewsreimplemented as expanding tosubselect-in-FROM. Grouping, aggregates, DISTINCT in viewsactually work now (he says optimistically). No UNION support insubselects/views yet, but I have some ideas about that. Rule-related permissions checkingmoved out of rewriter and intoexecutor. INITDB REQUIRED! This introduced the ability to pull up subqueries, but with an arbitrary limit of geqo_threshold/2 on the number of relations that would be collected into a single planning problem. 2. During 7.4 development, we did this: 2003-01-25 18:10 tgl Allow the planner to collapse explicit inner JOINs together, ratherthan necessarily following the JOIN syntax to developthe queryplan. The old behavior is still available by setting GUC variableJOIN_COLLAPSE_LIMIT to 1. Also createa GUC variableFROM_COLLAPSE_LIMIT to control the similar decision about when tocollapse sub-SELECT lists into theirparent lists. (This behaviorexisted already, but the limit was always GEQO_THRESHOLD/2; nowit's separately adjustable.) The excuse for join_collapse_limit to exist at all is largely one of backwards compatibility. Up to then, we had not-infrequently suggested that people could force a desired join order by writing an explicit JOIN nest, and eliminating that escape hatch altogether didn't seem like a good idea. I think from_collapse_limit was added largely on grounds of symmetry. Now, as to why the original commit had the geqo_threshold/2 restriction: it was obviously not based on field experience with flattening, because we didn't have any. What I think it *was* based on was that GEQO sucked really badly back then, and I wanted to avoid having it kick in for queries that it had never kicked in for in previous releases. Some quick comparisons say that 7.1 in GEQO mode was about 5X slower than HEAD (despite its planning being a lot more simplistic), and tended to find considerably worse plans. Some of the significant improvements since then: 2004-01-23 18:54 tgl Revise GEQO planner to make use of some heuristic knowledge aboutSQL, namely that it's good to join where there are joinclausesrather than where there are not. Also enable it to generate bushyplans at need, so that it doesn't fail in thepresence of multipleIN clauses containing sub-joins. 2004-01-21 18:33 tgl Repair error apparently introduced in the initialcoding of GUC: the default value for geqo_effort is supposed to be40, not1. The actual 'genetic' component of the GEQO algorithmhas been practically disabled since 7.1 because of this mistake. Also, up to 7.0 there were some nasty memory leaks in the planner and especially in GEQO, because we didn't have the memory context mechanism. I think those were actually fixed as of 2000-09-29, but GEQO still had a reputation for blowing out backend memory. Now I'm still not exactly happy with GEQO, but it's surely a lot better than it was in the fall of 2000. So on the whole it does seem that the current relationships between from_collapse_limit, join_collapse_limit, and geqo_threshold are based on obsolete information and should be revisited. I don't have any data at hand to suggest specific new default values, though. regards, tom lane
On Mon, May 25, 2009 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Now I'm still not exactly happy with GEQO, but it's surely a lot better > than it was in the fall of 2000. So on the whole it does seem that the > current relationships between from_collapse_limit, join_collapse_limit, > and geqo_threshold are based on obsolete information and should be > revisited. I don't have any data at hand to suggest specific new > default values, though. For 8.4, I'd be happy to just improve the documentation. I think this sentence could just be deleted from the section on from_collapse_limit: It is usually wise to keep this less than <xref linkend="guc-geqo-threshold">. We could put some other explanation in place of that sentence, but I'm not exactly sure what that explanation would say. I guess the point is that setting from_collapse_limit < geqo_threshold may delay GEQO planning considerably in the face of complex subqueries, because pulling up subqueries increases the size of the FROM list (I think). That could be good if you want your query plans to be more deterministic, but there's no guarantee they'll be good. Setting from_collapse_limit > geqo_threshold is basically saying that the standard planner will always have subqueries pulled up, so from_collapse_limit should be based on what the pain point will be for GEQO. I'm not sure there's a lot of point in spelling all that out, though. It more or less follows from the definition of the parameters. So, I'd be just as happy to delete the misleading hint and call it good. But I could go either way. For 8.5, it sounds like we need to do some testing to determine an appropriate set of values, but I'm not exactly sure what to test. As a practical matter, the correct level of effort depends a lot on how long the query figures to run. For OLAP queries, planning times of more than 50 ms or so start to add noticeably to the overall runtime of the query, but if the query is expected to run for several minutes, we'd presumably be happy to spend several seconds planning it, which might make it feasible to use the standard planner even for very, very big queries. I'm not 100% convinced of the value of join_collapse_limit for anything other than explicit control over the join order. I have yet to meet a PostgreSQL who thought that it was intuitive that it might matter whether you wrote A JOIN B ON P1 JOIN C ON P2 JOIN D ON P3 [etc] or A, B, C, D, [etc] WHERE P1, P2, P3. I suspect there are many people who, if they knew that the latter might optimize better than the former in some circumstances, would simply always write it in the latter fashion, which makes the whole thing look a lot like a concealed foot-gun, since whether or not it actually protects you against exponential planning-time growth has a lot to do with how you happen to like to write your queries (myself, I've switched styles in the last few years). ...Robert
In the spirit of helping wrap-up 8.4 todo items... Robert Haas wrote: > On Mon, May 25, 2009 at 6:15 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Now I'm still not exactly happy with GEQO, but it's surely a lot better >> than it was in the fall of 2000. So on the whole it does seem that the >> current relationships between from_collapse_limit, join_collapse_limit, >> and geqo_threshold are based on obsolete information and should be >> revisited. I don't have any data at hand to suggest specific new >> default values, though. > > For 8.4, I'd be happy to just improve the documentation. I think this > sentence could just be deleted from the section on > from_collapse_limit: > > It is usually wise to keep this less than<xref linkend="guc-geqo-threshold">. > > We could put some other explanation in place of that sentence, but I'm > not exactly sure what that explanation would say. I guess the point > is that setting from_collapse_limit< geqo_threshold may delay GEQO > planning considerably in the face of complex subqueries, because > pulling up subqueries increases the size of the FROM list (I think). > That could be good if you want your query plans to be more > deterministic, but there's no guarantee they'll be good. Setting > from_collapse_limit> geqo_threshold is basically saying that the > standard planner will always have subqueries pulled up, so > from_collapse_limit should be based on what the pain point will be for > GEQO. My vote would be to provide some information. Suggested revision of Robert's prose: Because genetic query optimization may be triggered, increasing from_collapse_limit should be considered relative to <xref linkend="guc-geqo-threshold">. -selena -- Selena Deckelmann End Point Corporation selena@endpoint.com 503-282-2512
On Mon, Jun 1, 2009 at 3:34 PM, Selena Deckelmann <selena@endpoint.com> wrote: > In the spirit of helping wrap-up 8.4 todo items... > Robert Haas wrote: >> For 8.4, I'd be happy to just improve the documentation. I think this >> sentence could just be deleted from the section on >> from_collapse_limit: > My vote would be to provide some information. > > Suggested revision of Robert's prose: > > Because genetic query optimization may be triggered, increasing > from_collapse_limit should be considered relative to <xref > linkend="guc-geqo-threshold">. Here's my attempt. ...Robert
Вложения
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jun 1, 2009 at 3:34 PM, Selena Deckelmann <selena@endpoint.com> wrote: >> Suggested revision of Robert's prose: >> >> Because genetic query optimization may be triggered, increasing >> from_collapse_limit should be considered relative to <xref >> linkend="guc-geqo-threshold">. > Here's my attempt. I applied the attached, along with some other wordsmithing. regards, tom lane *************** *** 2252,2261 **** The planner will merge sub-queries into upper queries if the resulting <literal>FROM</literal>list would have no more than this many items. Smaller values reduce planning time but might ! yield inferior query plans. The default is eight. It is usually ! wise to keep this less than <xref linkend="guc-geqo-threshold">. For more information see <xref linkend="explicit-joins">. </para> </listitem> </varlistentry> --- 2261,2275 ---- The planner will merge sub-queries into upper queries if the resulting <literal>FROM</literal>list would have no more than this many items. Smaller values reduce planning time but might ! yield inferior query plans. The default is eight. For more information see <xref linkend="explicit-joins">. </para> + + <para> + Setting this value to <xref linkend="guc-geqo-threshold"> or more + may trigger use of the GEQO planner, resulting in nondeterministic + plans. See <xref linkend="runtime-config-query-geqo">. + </para> </listitem> </varlistentry>