Re. [HACKERS] Some notes on optimizer cost estimates
От | Xun Cheng |
---|---|
Тема | Re. [HACKERS] Some notes on optimizer cost estimates |
Дата | |
Msg-id | 200001210219.SAA22377@xp10-06.dialup.commserv.ucsb.edu обсуждение исходный текст |
Ответы |
Re: Re. [HACKERS] Some notes on optimizer cost estimates
(Don Baccus <dhogaza@pacifier.com>)
|
Список | pgsql-hackers |
I'm very glad you bring up this cost estimate issue. Recent work in database research have argued a more detailed disk access cost model should be used for large queries especially joins. Traditional cost estimate only considers the number of disk pages accessed. However a more detailed model would consider three parameters: avg. seek, avg. latency and avg. page transfer. For old disk, typical values are SEEK=9.5 milliseconds, LATENCY=8.3 ms, TRANSFER=2.6ms. A sequential continuous reading of a table (assuming 1000 continuous pages) would cost (SEEK+LATENCY+1000*TRANFER=2617.8ms); while quasi-randomly reading 200 times with 2 continuous pages/time would cost (SEEK+200*LATENCY+400*TRANSFER=2700ms). Someone from IBM lab re-studied the traditional ad hoc join algorithms (nested, sort-merge, hash) using the detailed cost model and found some interesting results. >I have been spending some time measuring actual runtimes for various >sequential-scan and index-scan query plans, and have learned that the >current Postgres optimizer's cost estimation equations are not very >close to reality at all. One interesting question I'd like to ask is if this non-closeness really affects the optimal choice of postgresql's query optimizer. And to what degree the effects might be? My point is that if the optimizer estimated the cost for sequential-scan is 10 and the cost for index-scan is 20 while the actual costs are 10 vs. 40, it should be ok because the optimizer would still choose sequential-scan as it should. >1. Since main-table tuples are visited in index order, we'll be hopping >around from page to page in the table. I'm not sure about the implementation in postgresql. One thing you might be able to do is to first collect all must-read page addresses from the index scan and then order them before the actual ordered page fetching. It would at least avoid the same page being read twice (not entirely true depending on the context (like in join) and algo.) >The current cost estimation >method essentially assumes that the buffer cache plus OS disk cache will >be 100% efficient --- we will never have to read the same page of the >main table twice in a scan, due to having discarded it between >references. This of course is unreasonably optimistic. Worst case >is that we'd fetch a main-table page for each selected tuple, but in >most cases that'd be unreasonably pessimistic. This is actually the motivation that I asked before if postgresql has a raw disk facility. That way we have much control on this cache issue. Of course only if we can provide some algo. better than OS cache algo. (depending on the context, like large joins), a raw disk facility will be worthwhile (besides the recoverability). Actually I have another question for you guys which is somehow related to this cost estimation issue. You know the difference between OLTP and OLAP. My question is how you target postgresql on both kinds of applications or just OLTP. From what I know OLTP and OLAP would have a big difference in query characteristics and thus optimization difference. If postgresql is only targeted on OLTP, the above cost estimation issue might not be that important. However for OLAP, large tables and large queries are common and optimization would be difficult. xun
В списке pgsql-hackers по дате отправления: