Cost estimates for parameterized paths
От | Tom Lane |
---|---|
Тема | Cost estimates for parameterized paths |
Дата | |
Msg-id | 14624.1283463072@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Cost estimates for parameterized paths
(Robert Haas <robertmhaas@gmail.com>)
Re: Cost estimates for parameterized paths (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Awhile back I ranted about replacing the planner's concept of inner indexscans with a more generalized notion of "parameterized paths": http://archives.postgresql.org/pgsql-hackers/2009-10/msg00994.php The executor fixes for that are done, and now I'm grappling with getting the planner to do something useful with it. The biggest problem I've run into is that a parameterized path can't really be assigned a fixed cost in the same way that a normal path can. The current implementation of cost_index() depends on knowing the size of the outer relation --- that is, the expected number of execution loops for the indexscan --- in order to account for cache effects sanely while estimating the average cost of any one inner indexscan. We know that that is an important thing to do because the cost estimates seem to be a lot closer to reality now that we do that than what we were getting before; so dropping the consideration is entirely out of the question. The planner is already cheating on this to a considerable extent, because it estimates the cost of an inner indexscan only once, using the first outer rel we try to join to. That cost is cached and reused with other potential outer-rel join partners, even though very different numbers of outer rows might be involved. This problem will get a lot worse with the types of plans that I hope the planner will be able to come up with after this fix goes in, because the indexscan might be at the bottom of a join nest. So we need a real fix not another hack. The best idea I can come up with at the moment is to compute "best case" and "worst case" costs for a parameterized path, corresponding to the largest and smallest numbers of loops we might expect it to be repeated for. The largest number of loops could be estimated via the cartesian product of the sizes of the other tables in the query, for example. The "worst case" cost is its cost if only repeated once. Then, during path pruning in add_path, we only discard a parameterized path if its best-case cost is worse than the worst-case cost of some otherwise comparable path. Whenever we join the parameterized path with the required outer relation, we redo the cost calculation using that rel's actual rowcount estimate in order to form a final cost estimate for the no-longer-parameterized join path. While this looks like it would work in principle, I'm concerned that it would be unable to prune very many parameterized paths, and thus that planning time might run unacceptably long. The repeated cost calculations aren't much fun either, although we could probably cache most of that work if we're willing to throw memory at the problem. I wonder if anyone's got a better idea ... regards, tom lane
В списке pgsql-hackers по дате отправления:
Следующее
От: Tom LaneДата:
Сообщение: Re: Replacing the pg_get_expr security hack with a datatype solution