Re: A costing analysis tool
От | Kevin Grittner |
---|---|
Тема | Re: A costing analysis tool |
Дата | |
Msg-id | s34db8d4.059@gwmta.wicourts.gov обсуждение исходный текст |
Ответ на | A costing analysis tool ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Список | pgsql-hackers |
Good points, Tom. (I wish my client's email software supported quoting so that I could post replies closer to your points. Sorry 'bout that.) I tried searching the archives, though, and the words I could think to search with generated so many hits that it seemed more or less like a sequential search of the archives, which is daunting. If you have any particular references, suggestions for search strings I might have missed, or even a time range when you think it was discussed, I'll gladly go looking again. I'm not out to reinvent the wheel, lever, or any other basic idea. To cover the "database fits in RAM" situation, we could load some data, run test cases twice, using only the info from the second run, and never flush. Then we could load more data and get on to the cases where not everything is cached. I don't think we can get huge -- these tests have to run in a reasonable amount of time, but I hope we can load enough to get the major scaling effects covered. So far my wildest dreams have not gone beyond a few simple math operations to get to a cost estimate. Only testing will tell, but I don't think it will be significant compared to the other things going on in the planner. (Especially if I can compensate by talking you into letting me drop that ceil function on the basis that without it we're getting the statistical average of the possible actual costs.) It's even possible that more accurate costing of the current alternatives will reduce the need for other, more expensive, optimizer enhancements. (That glass is half FULL, I SWEAR it!) How do you establish that a cost estimate is completely out of line with reality except by comparing its runtime/estimate ratio with others? Unless you're saying not to look at just the summary level, in which case I totally agree -- any one subplan which has an unusual ratio in either direction needs to be examined. If you're getting at something else, please elaborate -- I don't want to miss anything. Thanks for your response. -Kevin >>> Tom Lane <tgl@sss.pgh.pa.us> 10/13/05 12:01 AM >>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Note that I'm talking about a tool strictly to check the accuracy of > the estimated costs of plans chosen by the planner, nothing else. We could definitely do with some infrastructure for testing this. I concur with Bruce's suggestion that you should comb the archives for previous discussions --- but if you can work on it, great! > (2) A large database must be created for these tests, since many > issues don't show up in small tables. The same data must be generated > in every database, so results are comparable and reproducable. Reproducibility is way harder than it might seem at first glance. What's worse, the obvious techniques for creating reproducible numbers amount to eliminating variables that are important in the real world. (One of which is size of database --- some people care about performance of DBs that fit comfortably in RAM...) Realistically, the planner is never going to have complete information. We need to design planning models that generally get the right answer, but are not so complicated that they are (a) impossible to maintain or (b) take huge amounts of time to compute. (We're already getting some flak on the time the planner takes.) So there is plenty of need for engineering compromise here. Still, you can't engineer without raw data, so I'm all for creating a tool that lets us gather real-world cost data. The only concrete suggestion I have at the moment is to not design the tool directly around "measure the ratio of real time to cost". That's only meaningful if the planner's cost model is already basically correct and you are just in need of correcting the cost multipliers. What we need for the near term is ways of quantifying cases where the cost models are just completely out of line with reality. regards, tom lane
В списке pgsql-hackers по дате отправления: