Re: Is There Any Way ....
От | Kevin Grittner |
---|---|
Тема | Re: Is There Any Way .... |
Дата | |
Msg-id | s3432970.018@gwmta.wicourts.gov обсуждение исходный текст |
Ответ на | Is There Any Way .... ("Lane Van Ingen" <lvaningen@esncc.com>) |
Список | pgsql-performance |
First off, Mr. Trainor's response proves nothing about anyone or anything except Mr. Trainor. I'm going to offer an opinion on the caching topic. I don't have any benchmarks; I'm offering a general sense of the issue based on decades of experience, so I'll give a short summary of that. I've been earning my living by working with computers since 1972, and am the architect and primary author of a little-known database product (developed in 1984) which saw tens of thousands of installations in various vertical markets. (Last I checked, a couple years ago, it was still being used statewide by one state government after a multi-million dollar attempt to replace it with a popular commercial database product failed.) I've installed and tuned many other database products over the years. I'm just getting to know PostgreSQL, and am pretty excited about it. Now on to the meat of it. My experience is that a DBMS can improve performance by caching certain types of data. In the product I developed, we had a fairly small cache which used a weighting algorithm for what to keep (rather than simply relying on LRU). Index pages got higher weight than data pages; the higher in the index, the higher the weight. Recent access got higher weight than older access, although it took quite a while for the older access to age out entirely. This improved performance quite a bit over a generalized caching product alone. However, there was a point of diminishing return. My sense is that every 10% you add to a "smart" cache yields less benefit at a higher cost, so beyond a certain point, taking RAM from the general cache to expand the smart cache degrades performance. Clever programming techniques can shift the break-even point, but I very much doubt it can be eliminated entirely, unless the ratio of performance between CPU+RAM and persistent storage is much more extreme than I've ever seen. There is another issue, which has been raised from time to time in these lists, but not enunciated clearly enough in my view. These discussions about caching generally address maximum throughput, while there are times when it is important that certain tables can be queried very quickly, even if it hurts overall throughput. As an example, there can be tables which are accessed as a user types in a window and tabs around from one GUI control to another. The user perception of the application performance is going to depend PRIMARILY on how quickly the GUI renders the results of these queries; if the run time for a large report goes up by 10%, they will probably not notice. This is a situation where removing RAM from a generalized cache, or one based on database internals, to create an "application specific" cache can yield big rewards. One client has addressed this in a commercial product by defining a named cache large enough to hold these tables, and binding those tables to the cache. One side benefit is that such caches can be defined as "relaxed LRU" -- meaning that they eliminate the overhead of tracking accesses, since they can assume that data will rarely, if ever, be discarded from the cache. It seems to me that in the PostgreSQL world, this would currently be addressed by binding the tables to a tablespace where the file system, controller, or drive(s) would cache the data, although this is somewhat less flexible than the "named cache" approach -- unless there is a file system that can layer a cache on top of a reference to some other file system's space. (And let's not forget the many OS environments in which people use PostgreSQL.) So I do see that there would be benefit to adding a feature to PostgreSQL to define caches and bind tables or indexes to them. So I do think that it is SMART of PostgreSQL to rely on the increasingly sophisticated file systems to provide the MAIN cache. I suspect that a couple types of smaller "smart" caches in front of this could boost performance, and it might be a significant boost. I'm not sure what the current shared memory is used for; perhaps this is already caching specific types of structures for the DBMS. I'm pretty sure that programmers of GUI apps would appreciate the named cache feature, so they could tune the database for snappy GUI response, even under heavy load. I realize this is short on specifics -- I'm shooting for perspective. For the record, I don't consider myself particularly religious on the topic, but I do pull back a little at arguments which sound strictly academic -- I've found that most of what I've drawn from those circles has needed adjustment in solving real-world problems. (Particularly when algorithms optimize for best worst-case performance. I've found users are much happier with best typical case performance as long as the product of worst case performance and worst case frequency is low.) Like many others who have posted on the topic, I am quite prepared to alter my views in the face of relavent evidence. Feel free to laugh at the old fart who decided to sip his Bushmill's while reading through this thread and try to run with the young lions. As someone else recently requested, though, please don't point while you laugh -- that's just rude. :-) -Kevin >>> Ron Peacetree <rjpeace@earthlink.net> 10/04/05 10:06 PM >>> Unfortunately, no matter what I say or do, I'm not going to please or convince anyone who has already have made their minds up to the extent that they post comments like Mr Trainor's below. His response style pretty much proves my earlier point that this is presently a religious issue within the pg community.
В списке pgsql-performance по дате отправления: