Spinlock performance improvement proposal
От | Tom Lane |
---|---|
Тема | Spinlock performance improvement proposal |
Дата | |
Msg-id | 1054.1001520600@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Spinlock performance improvement proposal
Re: Spinlock performance improvement proposal Re: Spinlock performance improvement proposal Re: Spinlock performance improvement proposal |
Список | pgsql-hackers |
At the just-past OSDN database conference, Bruce and I were annoyed by some benchmark results showing that Postgres performed poorly on an 8-way SMP machine. Based on past discussion, it seems likely that the culprit is the known inefficiency in our spinlock implementation. After chewing on it for awhile, we came up with an idea for a solution. The following proposal should improve performance substantially when there is contention for a lock, but it creates no portability risks because it uses the same system facilities (TAS and SysV semaphores) that we have always relied on. Also, I think it'd be fairly easy to implement --- I could probably get it done in a day. Comments anyone? regards, tom lane Plan: Replace most uses of spinlocks with "lightweight locks" (LW locks) implemented by a new lock manager. The principal remaining use of true spinlocks (TAS locks) will be to provide mutual exclusion of access to LW lock structures. Therefore, we can assume that spinlocks are never held for more than a few dozen instructions --- and never across a kernel call. It's pretty easy to rejigger the spinlock code to work well when the lock is never held for long. We just need to change the spinlock retry code so that it does a tight spin (continuous retry) for a few dozen cycles --- ideally, the total delay should be some small multiple of the max expected lock hold time. If lock still not acquired, yield the CPU via a select() call (10 msec minimum delay) and repeat. Although this looks inefficient, it doesn't matter on a uniprocessor because we expect that backends will only rarely be interrupted while holding the lock, so in practice a held lock will seldom be encountered. On SMP machines the tight spin will win since the lock will normally become available before we give up and yield the CPU. Desired properties of the LW lock manager include:* very fast fall-through when no contention for lock* waiting proc doesnot spin* support both exclusive and shared (read-only) lock modes* grant lock to waiters in arrival order (no starvation)*small lock structure to allow many LW locks to exist. Proposed contents of LW lock structure: spinlock mutex (protects LW lock state and PROC queue links)count of exclusive holders (always 0 or 1)count of shared holders(0 .. MaxBackends)queue head pointer (NULL or ptr to PROC object)queue tail pointer (could do without this to savespace) If a backend sees it must wait to acquire the lock, it adds its PROC struct to the end of the queue, releases the spinlock mutex, and then sleeps by P'ing its per-backend wait semaphore. A backend releasing the lock will check to see if any waiter should be granted the lock. If so, it will update the lock state, release the spinlock mutex, and finally V the wait semaphores of any backends that it decided should be released (which it removed from the lock's queue while holding the sema). Notice that no kernel calls need be done while holding the spinlock. Since the wait semaphore will remember a V occurring before P, there's no problem if the releaser is fast enough to release the waiter before the waiter reaches its P operation. We will need to add a few fields to PROC structures:* Flag to show whether PROC is waiting for an LW lock, and if so whetherit waits for read or write access* Additional PROC queue link field. We can't reuse the existing queue link field because it is possible for a PROC to be waiting for both a heavyweight lock and a lightweight one --- this will occur when HandleDeadLock or LockWaitCancel tries to acquire the LockMgr module's lightweight lock (formerly spinlock). It might seem that we also need to create a second wait semaphore per backend, one to wait on HW locks and one to wait on LW locks. But I believe we can get away with just one, by recognizing that a wait for an LW lock can never be interrupted by a wait for a HW lock, only vice versa. After being awoken (V'd), the LW lock manager must check to see if it was actually granted the lock (easiest way: look at own PROC struct to see if LW lock wait flag has been cleared). If not, the V must have been to grant us a HW lock --- but we still have to sleep to get the LW lock. So remember this happened, then loop back and P again. When we finally get the LW lock, if there was an extra P operation then V the semaphore once before returning. This will allow ProcSleep to exit the wait for the HW lock when we return to it. Fine points: While waiting for an LW lock, we need to show in our PROC struct whether we are waiting for read or write access. But we don't need to remember this after getting the lock; if we know we have the lock, it's easy to see by inspecting the lock whether we hold read or write access. ProcStructLock cannot be replaced by an LW lock, since a backend cannot use an LW lock until it has obtained a PROC struct and a semaphore, both of which are protected by this lock. It seems okay to use a plain spinlock for this purpose. NOTE: it's okay for SInvalLock to be an LW lock, as long as the LW mgr does not depend on accessing the SI array of PROC objects, but only chains through the PROCs themselves. Another tricky point is that some of the setup code executed by the postmaster may try to to grab/release LW locks. Here, we can probably allow a special case for MyProc=NULL. It's likely that we should never see a block under these circumstances anyway, so finding MyProc=NULL when we need to block may just be a fatal error condition. A nastier case is checkpoint processes; these expect to grab BufMgr and WAL locks. Perhaps okay for them to do plain sleeps in between attempts to grab the locks? This says that the MyProc=NULL case should release the spinlock mutex, sleep 10 msec, try again, rather than any sort of error or expectation of no conflict. Are there any cases where this represents a horrid performance loss? Checkpoint itself seems noncritical. Alternative is for checkpoint to be allowed to create a PROC struct (but not to enter it in SI list) so's it can participate normally in LW lock operations. That seems a good idea anyway, actually, so that the PROC struct's facility for releasing held LW locks at elog time will work inside the checkpointer. (But that means we need an extra sema too? Okay, but don't want an extra would-be backend to obtain the extra sema and perhaps cause a checkpoint proc to fail. So must allocate the PROC and sema for checkpoint process separately from those reserved for backends.)
В списке pgsql-hackers по дате отправления: