Обсуждение: Re: Order by (for 15 rows) adds 30 seconds to query time
Tom Lane wrote: > That does look weird. Do we have a self-contained test case? I've been tinkering with this and I now have a self-contained test case (SQL statements and run results attached). I've debugged through it and things don't seem right in set_append_rel_pathlist, since childrel->rows seems to contain the total rows in each table rather than the number which meet the join conditions. That is reflected in the output from OPTIMIZER_DEBUG logging, but not in the query plan from ANALYZE? I'm afraid I'm a bit stuck on getting farther. Hopefully this much will be of use to someone. If you could point out where I should have looked next, I'd be grateful. :-) -Kevin
Вложения
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane wrote: >> That does look weird. Do we have a self-contained test case? > I've been tinkering with this and I now have a self-contained test > case (SQL statements and run results attached). I've debugged through > it and things don't seem right in set_append_rel_pathlist, since > childrel->rows seems to contain the total rows in each table rather > than the number which meet the join conditions. Yeah, that is expected. Nestloop inner indexscans have a rowcount estimate that is different from that of the parent table --- the parent's rowcount is what would be applicable for another type of join, such as merge or hash, where the join condition is applied at the join node not in the relation scan. The problem here boils down to the fact that examine_variable punts on appendrel variables: else if (rte->inh) { /* * XXX This means the Var represents a column of an append * relation. Later add code to look at the member relations and * try to derive some kind of combined statistics? */ } This means you get a default estimate for the selectivity of the join condition, so the joinrel size estimate ends up being 0.005 * 1 * 40000. That's set long before we ever generate indexscan plans, and I don't think there's any clean way to correct the size estimate when we do. Fixing this has been on the to-do list since forever. I don't think we'll make much progress on it until we have an explicit notion of partitioned tables. The approach contemplated in the comment, of assembling some stats on-the-fly from the stats for individual child tables, doesn't seem real practical from a planning-time standpoint. The thing that you really want to know here is that there will be only one matching id value in the whole partitioned table; and that would be something trivial to know if we understood about partitioning keys, but it's difficult to extract from independent sets of stats. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, that is expected. Nestloop inner indexscans have a rowcount > estimate that is different from that of the parent table --- the > parent's rowcount is what would be applicable for another type of > join, such as merge or hash, where the join condition is applied > at the join node not in the relation scan. > > The problem here boils down to the fact that examine_variable > punts on appendrel variables: > * XXX This means the Var represents a column of an append > * relation. Later add code to look at the member relations and > * try to derive some kind of combined statistics? > This means you get a default estimate for the selectivity of the > join condition, so the joinrel size estimate ends up being 0.005 * > 1 * 40000. That's set long before we ever generate indexscan > plans, and I don't think there's any clean way to correct the size > estimate when we do. Thanks for the explanation. > Fixing this has been on the to-do list since forever. Which item? I looked and couldn't find one which seems to fit. (I was hoping to find a reference to a discussion thread.) > I don't think we'll make much progress on it until we have an > explicit notion of partitioned tables. I'm not clear that a generalized solution for partitioned tables would solve the production query from the OP. The OP was using table extension to model the properties of the data. In the actual production problem, the table structure involved, for example, materials -- some of which were containers (which had all the properties of other materials, plus some unique to containers); so the containers table extends the materials table to model that. In the problem query, he wanted to find all the materials related to some item. He was also joining to location, which might be (among other things) a waypoint or a container (both extending location). Note that a container is both a location and a material. > The approach contemplated in the comment, of assembling some stats > on-the-fly from the stats for individual child tables, doesn't > seem real practical from a planning-time standpoint. Can you give a thumbnail sketch of why that is? > The thing that you really want to know here is that there will be > only one matching id value in the whole partitioned table It would seem to matter nearly as much if statistics indicated you would get five rows out of twenty million. > it's difficult to extract from independent sets of stats. Since we make the attempt for most intermediate results, it's not immediately clear to me why it's so hard here. Not that I'm doubting your assertion that it *is* hard; I'm just trying to see *why* it is. Perhaps the code which generates such estimates for everything else could be made available here? The usefulness of inheritance to model data would seem to be rather limited without better optimizer support. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The approach contemplated in the comment, of assembling some stats >> on-the-fly from the stats for individual child tables, doesn't >> seem real practical from a planning-time standpoint. > Can you give a thumbnail sketch of why that is? Well, it would be expensive, and it's not even clear you can do it at all (merging histograms with overlapping bins seems like a mess for instance). I think we have previously discussed the idea of generating and storing ANALYZE stats for a whole inheritance tree, which'd solve the problem nicely from the planner's standpoint. I'm a bit worried about the locking implications, but if it took just a SELECT lock on the child tables it probably wouldn't be too bad --- no worse than any other SELECT on the inheritance tree. Another thing that's hard to figure out is how autovacuum would know when to redo the stats. In a lot of common situations, the inheritance parent table is empty and never changes, so no autovac or autoanalyze would ever get launched against it. regards, tom lane