Re: A costing analysis tool
От | Greg Stark |
---|---|
Тема | Re: A costing analysis tool |
Дата | |
Msg-id | 87hdbifo0t.fsf@stark.xeocode.com обсуждение исходный текст |
Ответ на | Re: A costing analysis tool (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: A costing analysis tool
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
Tom Lane <tgl@sss.pgh.pa.us> writes: > More generally, I think that depending entirely on EXPLAIN ANALYZE > numbers is a bad idea, because the overhead of EXPLAIN ANALYZE is both > significant and variable depending on the plan structure. The numbers > that I think we must capture are the top-level EXPLAIN cost and the > actual runtime of the query (*without* EXPLAIN). Those are the things > we would like to get to track closely. EXPLAIN ANALYZE is incredibly > valuable as context for such numbers, but it's not the thing we actually > wish to optimize. If you really want to do this I think it pays to go down to the raw data that EXPLAIN is really calculating: The resulting time is a sum of the time spent in various operations. So x milliseconds were spent in sequential block fetchs, y milliseconds were spent in random access block fetches, z milliseconds were spent in index lookups, etc. Of course we can't really instrument every operation and record the actual time spent in all of these things, only the total. My point is that it's a linear equation, the sum of the average times for these various operations with factors representing their counts. The cost that the optimizer calculates is similar. It's a linear equation, where it predicts x' sequential block reads, y' random access block reads, and z' index lookups, etc. The resulting cost is just the sum of these things. If the optimizer didn't collapse the cost for each node into a single value and instead retained the individual parameters at each node it could bubble those values all the way up to the surface. Then use the configuration options like random_page_cost etc to calculate the resulting cost once. But then you could gather the raw data where each line is a linear equation and the actual calculated time spent in the query is the sum. The resulting matrix of linear equations could then be thrown at some linear solver to tell you what your systems "real" random_page_cost, index_tuple_cost, etc are. Of course the fact that these values aren't really constant across queries and also that the parameters being entered are just estimates to begin with would mean there wouldn't be a perfect solution, but I bet there's research into how to come up with good compromise solutions. Hell I bet there's even good algorithms for telling you which parameters are sketchy and are probably being poorly estimated or don't represent real-world constant operations well. Doing all this would mean a lot more far reaching changes than just looking at EXPLAIN output of course :/ -- greg
В списке pgsql-hackers по дате отправления: