best_inner_indexscan vs. reality
От | Tom Lane |
---|---|
Тема | best_inner_indexscan vs. reality |
Дата | |
Msg-id | 26479.1179780776@sss.pgh.pa.us обсуждение исходный текст |
Список | pgsql-hackers |
I looked into the curious planner behavior described in this thread: http://archives.postgresql.org/pgsql-performance/2007-05/msg00388.php and after a bit of experimentation was able to duplicate it in the regression database: regression=# explain select * from int4_tbl a where f1 in (select hundred from tenk1 b); QUERYPLAN -------------------------------------------------------------------Nested Loop IN Join (cost=0.00..30.20 rows=5 width=4) Join Filter: (a.f1 = b.hundred) -> Seq Scan on int4_tbl a (cost=0.00..1.05 rows=5 width=4) -> Seq Scan on tenk1b (cost=0.00..458.00 rows=10000 width=4) (4 rows) regression=# set enable_bitmapscan TO 0; SET regression=# explain select * from int4_tbl a where f1 in (select hundred from tenk1 b); QUERY PLAN ---------------------------------------------------------------------------------------Nested Loop IN Join (cost=0.00..13.21rows=5 width=4) -> Seq Scan on int4_tbl a (cost=0.00..1.05 rows=5 width=4) -> Index Scan using tenk1_hundredon tenk1 b (cost=0.00..242.00 rows=100 width=4) Index Cond: (b.hundred = a.f1) (4 rows) WTF? How can disabling an unrelated plan type cause the thing to find a cheaper plan than it found otherwise? After some digging, the problem can be laid at the feet of best_inner_indexscan, whose comment indeed foresees this issue: * best_inner_indexscan* Finds the best available inner indexscan for a nestloop join* with the given rel on theinside and the given outer_rel outside.* May return NULL if there are no possible inner indexscans.** We ignore orderingconsiderations (since a nestloop's inner scan's order* is uninteresting). Also, we consider only total cost whendeciding which* of two possible paths is better --- this assumes that all indexpaths have* negligible startup cost. (True today, but someday we might have to think* harder.) Therefore, there is only one dimension of comparison and so it's*sufficient to return a single "best" path. The "best" inner indexscan for this query is a bitmap scan with startup cost of 5 and total cost of 170 (beating the plain indexscan's total cost of 242). However, for this IN join that's estimated as a worse choice than the seqscan, precisely because of its startup cost. We estimate a nestloop IN-join on the assumption that we'll stop scanning after fetching one matching inner tuple; therefore, if there are lots of potentially matching inner tuples, both a seqscan and an indexscan look pretty cheap (less than 5 cost units to find the first match), while the bitmap scan loses because of its startup cost. Disabling bitmap scans allows the regular indexscan to be seen as cheapest by best_inner_indexscan, and so it survives to be chosen as the join partner. Clearly, best_inner_indexscan's API is obsolete now that bitmap scans exist. What I'm inclined to do is make it cache and return both the cheapest-total and cheapest-startup plans for any given combination of rels. (This should add only negligibly to its cost, since it's enumerating all the possible paths anyway.) To avoid useless effort in match_unsorted_outer(), it should probably consider the cheapest-startup only when doing JOIN_IN cases --- AFAICS there isn't any other case where startup cost can outweigh total cost for the inside of a nestloop. I'm not sure if it's worth considering this issue for best_appendrel_indexscan() --- in an Append situation it's not clear that startup costs of individual subplans mean much of anything. Comments? Also, should this be backpatched? Both 8.1 and 8.2 have got the issue; prior releases no, since they didn't have bitmap scans. regards, tom lane
В списке pgsql-hackers по дате отправления: