Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
От | Robert Haas |
---|---|
Тема | Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a |
Дата | |
Msg-id | 603c8f070911281819v4f685cf4v7eb305f913d9b5d2@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Fri, Nov 27, 2009 at 8:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't think there's any easy fix for this. > > Nope :-(. I was able to get rid of the specific O(N^2) behavior that > you were running into, but that just pushes the problem out a bit. Yeah. Testing a couple of the cases I was looking at last night suggests that your patch shaved off close to 30%, which isn't bad for a Friday's evening's work. My original email was really about some GEQO difficulties under 8.3.x, but I think the effort towards improving the standard planner is well-spent, because the empirical evidence suggests that everyone always prefers to use the standard planner whenever possible. > FWIW, a profile of CVS HEAD running on this query looks like > > samples % symbol name > 208189 15.9398 compare_fuzzy_path_costs > 116260 8.9014 add_path > 97509 7.4657 AllocSetAlloc > 78354 5.9991 make_canonical_pathkey > 69027 5.2850 generate_join_implied_equalities > 65153 4.9884 cost_nestloop > 59689 4.5700 bms_is_subset > 49176 3.7651 add_paths_to_joinrel > 39720 3.0411 bms_overlap > 32562 2.4931 cost_mergejoin > 30101 2.3047 pathkeys_useful_for_merging > 26409 2.0220 AllocSetFree > 24459 1.8727 MemoryContextAllocZeroAligned > 23247 1.7799 hash_search_with_hash_value > 16018 1.2264 check_list_invariants > > which is at least unsurprising. However, it eats memory fast enough > to start swapping after level 7 or so, so there is no way that the > exhaustive planner is ever going to finish this problem. Yes, that looks similar to what I've seen in the past. The thing about this is that in a case like this one, the actual join order barely matters at all because the joins are mostly equivalent. None of them change the cardinality of the output set, and while in a real application the various foo{i}, bar{i}, baz{i}, and bletch{i} would be of somewhat different sizes, it typically doesn't matter very much which one you do first. If there were a filter criteria on say bar7_id, you'd want to do the joins necessary to allow the filter to be applied as early as possible, and then the order of the rest doesn't matter. Unfortunately, it's hard to know how to formalize that. >> Playing around with this a bit, I was easily able to get 2-second >> planing times on 15 table join, 6-second planning times on a 16 table >> join and 30-second planning times on a 17 table join. This makes it >> hard to support raising the GEQO threshold, as most recently suggested >> by Greg Sabino Mullane here (and previously by me on an earlier >> thread): > > Yeah. I think GEQO or other approximate methods are the only hope for > very large join problems. We could however hope to get to the point of > being able to raise the collapse limits, rather than keeping them > below the geqo_threshold by default. In principle that should result > in better plans ... Yes. from_collapse_limit implementation is all-but-guaranteed to generate bad plans on queries like "SELECT * FROM foo_view WHERE id = 1". It would be OK to constrain the search space at any given step by considering joins to only a subset of the relations to which a join would actually be legal - what we need to avoid is anything that encourages joining relations to each other before they've been joined to foo1, which is exactly what from_collapse_limit does. If from_collapse_limit could be read to mean "once you've joined anything to a relation within bar_view, you must finish joining to everything else in bar_view before you can think about joining to anything else", it would generate tolerable plans. Of course it also wouldn't constrain the search space nearly as effectively. ...Robert
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Philip WarnerДата:
Сообщение: Re: 8.5 TODO: Add comments to output indicating version of pg_dump and of the database server