Обсуждение: Re: Indexes not always used after inserts/updates/vacuum
On Thu, 28 Feb 2002 at 09:51, Tom Lane wrote: > Reinhard Max <max@suse.de> writes: > > I've just found a case where forcing indexscans results in much higher > > speed. > > > -> Index Scan using foo_pkey on foo > > (cost=0.00..25153.18 rows=352072 width=4) > > (actual time=0.03..157.57 rows=38432 loops=1) > > The major estimation error is evidently in this indexscan. What > statistics does pg_stats show for this table? See attached file. BTW, I've just done the same test on PostgreSQL 7.1 and got similar results. cu Reinhard
Reinhard Max <max@suse.de> writes: >> The major estimation error is evidently in this indexscan. What >> statistics does pg_stats show for this table? > See attached file. Okay. It looks like foo.id has a pretty strong but not perfect descending order (the correlation statistic is -0.563276). The planner is evidently not rating that effect strongly enough. If you look in cost_index (see approx. lines 270-340 in src/backend/optimizer/path/costsize.c) you'll see that it computes access cost estimates for both the perfectly sequential case and the perfectly uncorrelated case, and then tries to interpolate between them. I have reasonable faith in both of the endpoint estimation methods, but very little in the interpolation equation --- it was chosen on the spur of the moment and hasn't really been tested. It might be interesting to replace csquared with just fabs(indexCorrelation) to see if the results are better. Also, if you cared to step through the code with a debugger or add some printout statements, we could learn what the min and max costs are that it's interpolating between; that'd be interesting to know as well. regards, tom lane
On Thu, 28 Feb 2002 at 10:15, Tom Lane wrote: > Okay. It looks like foo.id has a pretty strong but not perfect > descending order (the correlation statistic is -0.563276). The > planner is evidently not rating that effect strongly enough. Yes, that seems to be the reason. When I try SELECT * into foo2 from foo order by id; CREATE index foo2_id on foo2(id); VACUUM ANALYZE foo2; and repeat the join with foo2 instead of foo, index scans are used even when seqscans are not forbidden. > [...] > It might be interesting to replace csquared with just > fabs(indexCorrelation) to see if the results are better. Also, if you > cared to step through the code with a debugger or add some printout > statements, we could learn what the min and max costs are that it's > interpolating between; that'd be interesting to know as well. OK, this is what I've changed: - csquared = indexCorrelation * indexCorrelation; + elog(NOTICE, "min_IO_cost = %f, max_IO_cost = %f, indexCorrelation = %f", + min_IO_cost, max_IO_cost, indexCorrelation); + csquared = fabs (indexCorrelation); Are these the addtional values you wanted to see? These are the results: max=# EXPLAIN analyze SELECT count(foo.id) FROM foo, bar WHERE foo.id = bar.ref2foo; NOTICE: min_IO_cost = 299.000000, max_IO_cost = 1196.000000 indexCorrelation = -1.000000 NOTICE: min_IO_cost = 1.000000, max_IO_cost = 3.993322 indexCorrelation = -1.000000 NOTICE: min_IO_cost = 5880.000000, max_IO_cost = 1169154.985307 indexCorrelation = -0.532557 NOTICE: min_IO_cost = 1.000000, max_IO_cost = 3.999660 indexCorrelation = -0.532557 NOTICE: QUERY PLAN: Aggregate (cost=18709.65..18709.65 rows=1 width=8) (actual time=7229.15..7229.15 rows=1 loops=1) -> Hash Join (cost=911.39..18613.58 rows=38431 width=8) (actual time=208.23..7184.68 rows=38431 loops=1) -> Seq Scan on foo (cost=0.00..9400.72 rows=352072 width=4) (actual time=0.02..810.92 rows=352072 loops=1) -> Hash (cost=683.31..683.31 rows=38431 width=4) (actual time=149.87..149.87 rows=0 loops=1) -> Seq Scan on bar (cost=0.00..683.31 rows=38431 width=4) (actual time=0.02..83.32 rows=38431 loops=1) Total runtime: 7229.29 msec EXPLAIN max=# EXPLAIN analyze SELECT count(foo2.id) FROM foo2, bar WHERE foo2.id = bar.ref2foo; NOTICE: min_IO_cost = 299.000000, max_IO_cost = 1196.000000 indexCorrelation = -1.000000 NOTICE: min_IO_cost = 1.000000, max_IO_cost = 3.993322 indexCorrelation = -1.000000 NOTICE: min_IO_cost = 5741.000000, max_IO_cost = 1163366.000920 indexCorrelation = 1.000000 NOTICE: min_IO_cost = 1.000000, max_IO_cost = 3.999652 indexCorrelation = 1.000000 NOTICE: QUERY PLAN: Aggregate (cost=12748.26..12748.26 rows=1 width=8) (actual time=687.08..687.08 rows=1 loops=1) -> Merge Join (cost=0.00..12652.18 rows=38431 width=8) (actual time=0.44..633.53 rows=38431 loops=1) -> Index Scan using foo2_pkey on foo2 (cost=0.00..10387.79 rows=352072 width=4) (actual time=0.26..174.32 rows=38432 loops=1) -> Index Scan using idx_bar_ref2foo on bar (cost=0.00..807.74 rows=38431 width=4) (actual time=0.17..180.34 rows=38431 loops=1) Total runtime: 687.31 msec
> Are these the addtional values you wanted to see? Yes, but I just noticed something else strange: > -> Index Scan using foo2_pkey on foo2 > (cost=0.00..10387.79 rows=352072 width=4) > (actual time=0.26..174.32 rows=38432 loops=1) The actual rows read from this indexscan seem to be many fewer than the number of rows in the table. What are the ranges of the id values in tables foo and bar? I'm wondering if the merge could have stopped far short of the end of the foo table; if so, *that* is the effect that we are failing to model accurately. regards, tom lane
Hi, On Thu, 28 Feb 2002 at 16:10, Tom Lane wrote: > > -> Index Scan using foo2_pkey on foo2 > > (cost=0.00..10387.79 rows=352072 width=4) > > (actual time=0.26..174.32 rows=38432 loops=1) > > The actual rows read from this indexscan seem to be many fewer than > the number of rows in the table. What are the ranges of the id > values in tables foo and bar? I'm wondering if the merge could have > stopped far short of the end of the foo table; if so, *that* is the > effect that we are failing to model accurately. Good guess :) max=# SELECT 'bar' AS tablename, min(ref2foo), max(ref2foo), count(ref2foo) FROM bar UNION SELECT 'foo', min(id), max(id), count(id) from foo; tablename | min | max | count -----------+----------+----------+-------- bar | 10000010 | 10049999 | 38431 foo | 10000010 | 10423442 | 352072 (2 rows) I'll tell my colleague (it's his test database, after all) that he should take more realistic test data before complaining about bad performance... cu Reinhard
Reinhard Max <max@suse.de> writes: >> The actual rows read from this indexscan seem to be many fewer than >> the number of rows in the table. What are the ranges of the id >> values in tables foo and bar? I'm wondering if the merge could have >> stopped far short of the end of the foo table; if so, *that* is the >> effect that we are failing to model accurately. > Good guess :) > I'll tell my colleague (it's his test database, after all) that he > should take more realistic test data before complaining about bad > performance... Actually, is it unrealistic test data? After thinking about it awhile, I concluded that this is an effect the planner could and should model. We have statistics that will tell us the maximum values of both variables (at least in common cases), so it's not hard to estimate which input stream will be exhausted first and how much of the other one will actually be read. This could make a big difference in the cost of an indexscan-based merge. I have committed changes for 7.3 that do this. It's probably too big a change to risk back-patching for 7.2.1, but if you care to experiment with CVS tip then you could try it out. regards, tom lane
Hi, On Fri, 1 Mar 2002 at 09:37, Tom Lane wrote: > Reinhard Max <max@suse.de> writes: > > > I'll tell my colleague (it's his test database, after all) that he > > should take more realistic test data before complaining about bad > > performance... > > Actually, is it unrealistic test data? maybe not from the Database's point of view, but certainly from the application's. It is unrealistic insofar as it doesn't match the scenario very good it was meant to be a test case for. I think customer IDs usually appear in a more or less strict ascending order and foreign keys that reference them are likely to be rather equally distributed over the IDs or at least not that much biassed towards one end of the ID range. When I thought about the structure of this test data and experimented with an ascending ordered copy of the address table, I rememered a feature I've once seen in Informix. I think they call it "clustering" or something the like. I don't remember the precise syntax, but it was possible to order a table's rows physically by a given column. Do you think it would be worth the effort to add support for such a thing to the VACUUM command? I could imagine it to improve situations where long tables have to be joined very often. The syntax I have in mind is something like: "VACUUM foo ORDER BY id" or simply "VACUUM foo(id)". Another way would be to enhance the DDL so that the table itself could be told which column(s) to order by and then a "VACUUM ORDER" would physically re-order the tables by that column(s). > I have committed changes for 7.3 that do this. It's probably too > big a change to risk back-patching for 7.2.1, but if you care to > experiment with CVS tip then you could try it out. Hopefully I find some time to have a look at it when SuSE Linux 8.0 is done... Thanks for all your help, Tom. Greetings from Nuremberg, Reinhard