Re: Proposal of PITR performance improvement for 8.4.
От | Simon Riggs |
---|---|
Тема | Re: Proposal of PITR performance improvement for 8.4. |
Дата | |
Msg-id | 1225269154.3971.278.camel@ebony.2ndQuadrant обсуждение исходный текст |
Ответ на | Re: Proposal of PITR performance improvement for 8.4. (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Ответы |
Re: Proposal of PITR performance improvement for 8.4.
|
Список | pgsql-hackers |
On Tue, 2008-10-28 at 14:21 +0200, Heikki Linnakangas wrote: > 1. You should avoid useless posix_fadvise() calls. In the naive > implementation, where you simply call posix_fadvise() for every page > referenced in every WAL record, you'll do 1-2 posix_fadvise() syscalls > per WAL record, and that's a lot of overhead. We face the same design > question as with Greg's patch to use posix_fadvise() to prefetch index > and bitmap scans: what should the interface to the buffer manager look > like? The simplest approach would be a new function call like > AdviseBuffer(Relation, BlockNumber), that calls posix_fadvise() for the > page if it's not in the buffer cache, but is a no-op otherwise. But that > means more overhead, since for every page access, we need to find the > page twice in the buffer cache; once for the AdviseBuffer() call, and > 2nd time for the actual ReadBuffer(). That's a much smaller overhead than waiting for an I/O. The CPU overhead isn't really a problem if we're I/O bound. > It would be more efficient to pin > the buffer in the AdviseBuffer() call already, but that requires much > more changes to the callers. That would be hard to cleanup safely, plus we'd have difficulty with timing: is there enough buffer space to allow all the prefetched blocks live in cache at once? If not, this approach would cause problems. > 2. The format of each WAL record is different, so you need a "readahead > handler" for every resource manager, for every record type. It would be > a lot simpler if there was a standardized way to store that information > in the WAL records. I would prefer a new rmgr API call that returns a list of blocks. That's better than trying to make everything fit one pattern. If the call doesn't exist then that rmgr won't get prefetch. > 3. IIRC I tried to handle just a few most important WAL records at > first, but it turned out that you really need to handle all WAL records > (that are used at all) before you see any benefit. Otherwise, every time > you hit a WAL record that you haven't done posix_fadvise() on, the > recovery "stalls", and you don't need much of those to diminish the gains. > > Not sure how these apply to your approach, it's very different. You seem > to handle 1. by collecting all the page references for the WAL file, and > sorting and removing the duplicates. I wonder how much CPU time is spent > on that? Removing duplicates seems like it will save CPU. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
В списке pgsql-hackers по дате отправления: