Обсуждение: DB page cache/query performance
I am on PG 8.1.10 on RedHat Linux. Some page cache/query performance questions: Is there a way to tell whether a query is satisfied from memory cache or from disk. The only way I know of is based on the time the query takes (reported by EXPLAIN ANALYZE) -- comparing the first-time performance (after dropping the Linux page caches "echo 3 > /proc/sys/vm/drop_caches" and restarting the server) vs. subsequent invocations. Can I calculate exactly (based on PG config and OS parameters) the amount of memory available for DB page caches? Can someone walk me through the calculation or point me to a "for dummies" version. Also, how long should pages stay in the cache? (Assuming I have way more memory than the total size of all the tables/indexes.) Is there any time-based expiration (in addition to LRU-based, which in my case should never be resorted to)? All of this is prompted by watching the performance of some queries that according to my understanding should be served from page cache always (except for the first time after the server starts) -- instead they periodically exhibit execution times that look like they are not coming from cache. TIA, George
On Wed, 14 May 2008, George Pavlov wrote: > Is there a way to tell whether a query is satisfied from memory cache or > from disk. No. You can look at the hit rate statistics for the tables and indexes referenced and see how they change before and after the query, but this just tells you about whether things were found in the PostgreSQL buffer cache. The database has no idea how much additional caching is going on at the OS level below it. Most people end up running "vmstat 1" and/or "iostat 1" to see what is actually moving around on disk. If you install dstat that's a handy way to combine what would normally be split between those two. > Can I calculate exactly (based on PG config and OS parameters) the > amount of memory available for DB page caches? You can estimate it based on shared_buffers and the output from free. Example: $ free total used free shared buffers cached Mem: 2074940 1161764 913176 0 205416 530116 -/+ buffers/cache: 426232 1648708 Swap: 1992020 0 1992020 The upper limit for how much the OS might be caching for you is the 1648708 in this example. Add that to the size of shared_buffers you allocated, and you have a decent estimate for the total memory available, which is also what to set effective_cache_size to. > Also, how long should pages stay in the cache? (Assuming I have way more > memory than the total size of all the tables/indexes.) Is there any > time-based expiration (in addition to LRU-based, which in my case should > never be resorted to)? Forever, at least as far as the PostgreSQL one is concerned. They only get evicted when a new buffer needs to be allocated and there's none available. It's not quite a LRU mechanism either. If you visit http://www.westnet.com/~gsmith/content/postgresql/ there's a presentation called "Inside the PostgreSQL Buffer Cache" that goes over how that part works. You'd probably find a look inside your system using pg_buffercache informative. Some of the sample queries I give only work on 8.3, but the "Buffer contents summary, with percentages" one should be OK on 8.1. Linux doesn't do any time-based expiration that I'm available of either, it just uses a customized second-chance LRU I think. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On May 14, 2008, at 6:18 PM, Greg Smith wrote: >> Also, how long should pages stay in the cache? (Assuming I have >> way more >> memory than the total size of all the tables/indexes.) Is there any >> time-based expiration (in addition to LRU-based, which in my case >> should >> never be resorted to)? > > Forever, at least as far as the PostgreSQL one is concerned. They > only get evicted when a new buffer needs to be allocated and > there's none available. It's not quite a LRU mechanism either. If > you visit http://www.westnet.com/~gsmith/content/postgresql/ > there's a presentation called "Inside the PostgreSQL Buffer Cache" > that goes over how that part works. You'd probably find a look > inside your system using pg_buffercache informative. Some of the > sample queries I give only work on 8.3, but the "Buffer contents > summary, with percentages" one should be OK on 8.1. Hrm... don't seqscans use a separate set of buffers in 8.3? While technically those won't be evicted until needed, you're unlikely to find stuff hanging around there for terribly long... -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Вложения
On Mon, 19 May 2008, Decibel! wrote: > Hrm... don't seqscans use a separate set of buffers in 8.3? While technically > those won't be evicted until needed, you're unlikely to find stuff hanging > around there for terribly long... Not quite a separate set, and I have been remiss that I only skim over that in the presentation right now. Keep meaning to write down those details, and now that you ask that time is now. A quick read of the code suggests that in 8.3, the following rules apply: 1) If you are doing a scan where the table is larger than (shared_buffers/4) or you are doing a VACUUM, your backend gets allocated a ring list it keeps track of the buffers it has requested in. 2) The ring size is 8K * min(shared_buffers/8,32) which for any non-trivial buffer size cases is 256K. As you request pages they get added to the list of ones in the ring. Here shared_buffers is specified as in older versions, as a count of 8K buffers. 3) Once the ring is full and you circle around to a page that's already been used, if its usage count is <=1 (which means that nobody else has used it since it was put in there) that page will get evicted and then re-used rather than allocating a new one in the normal fashion. 4) If someone else is using the buffer, instead a new one is allocated the normal way and it replaces the original entry in the ring. So, yes, in 8.3 it's possible that you can have sequential scans of large tables or the VACUUM data pass through the buffer cache, but not remain in it afterwards. I didn't think George would ever run into this in the specific example he asked about because of (1). This behavior only kicks in if you're scanning a table large relative to the total shared buffer cache and that didn't seem like an issue in his case. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
> From: Greg Smith [mailto:gsmith@gregsmith.com] > Sent: Monday, May 19, 2008 9:03 PM > > So, yes, in 8.3 it's possible that you can have sequential > scans of large > tables or the VACUUM data pass through the buffer cache, but > not remain in > it afterwards. I didn't think George would ever run into this in the > specific example he asked about because of (1). This > behavior only kicks > in if you're scanning a table large relative to the total > shared buffer > cache and that didn't seem like an issue in his case. Correct -- the tables in this example were tiny, shared buffers are large, and, in any case, I am still on 8.1... George