Re: Reworking the writing of WAL
От | Robert Haas |
---|---|
Тема | Re: Reworking the writing of WAL |
Дата | |
Msg-id | CA+TgmoYR6sXfyS6gJCE-+BLpcvVDBZaO_=dObL+B+XdQBDsk1w@mail.gmail.com обсуждение исходный текст |
Ответ на | Reworking the writing of WAL (Simon Riggs <simon@2ndQuadrant.com>) |
Ответы |
Re: Reworking the writing of WAL
(Simon Riggs <simon@2ndQuadrant.com>)
|
Список | pgsql-hackers |
On Fri, Aug 12, 2011 at 11:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > 1. Earlier, I suggested that the sync rep code would allow us to > redesign the way we write WAL, using ideas from group commit. My > proposal is that when when a backend needs to flush WAL to local disk > it will be added to a SHMQUEUE exactly the same as when we flush WAL > to sync standby. The WALWriter will be woken by latch and then perform > the actual work. When complete WALWriter will wake the queue in order, > so there is a natural group commit effect. The WAL queue will be > protected by a new lock WALFlushRequestLock, which should be much less > heavily contended than the way we do things now. Notably this approach > will mean that all waiters get woken quickly, without having to wait > for the queue of WALWriteLock requests to drain down, so commit will > be marginally quicker. On almost idle systems this will give very > nearly the same response time as having each backend write WAL > directly. On busy systems this will give optimal efficiency by having > WALWriter working in a very tight loop to perform the I/O instead of > queuing itself to get the WALWriteLock with all the other backends. It > will also allow piggybacking of commits even when WALInsertLock is not > available. I like the idea of putting all the backends that are waiting for xlog flush on a SHM_QUEUE, and having a single process do the flush and then wake them all up. That seems like a promising approach, and should avoid quite a bit of context-switching and spinlocking that would otherwise be necessary. However, I think it's possible that the overhead in the single-client case might be pretty significant, and I'm wondering whether we might be able to set things up so that backends can flush their own WAL in the uncontended case. What I'm imagining is something like this: struct { slock_t mutex; XLogRecPtr CurrentFlushLSN; XLogRecPtr HighestFlushLSN; SHM_QUEUE WaitersForCurrentFlush; SHM_QUEUE WaitersForNextFlush; }; To flush, you first acquire the mutex. If the CurrentFlushLSN is not InvalidXLogRecPtr, then there's a flush in progress, and you add yourself to either WaitersForCurrentFlush or WaitersForNextFlush, depending on whether your LSN is lower or higher than CurrentFlushLSN.If you queue on WaitersForNextFlush you advance HighestFlushLSNto the LSN you need flushed. You then release the spinlock and sleep on your semaphore. But if you get the mutex and find that CurrentFlushLSN is XLogRecPtr, then you know that no flush is in progress. In that case, you set CurrentFlushLSN to the maximum of the LSN you need flushed and HighestFlushLSN and move all WaitersForNextFlush over to WaitersForCurrentFlush. You then release the spinlock and perform the flush. After doing so, you reacquire the spinlock, remove everyone from WaitersForCurrentFlush, note whether there are any WaitersForNextFlush, and release the spinlock. If there were any WaitersForNextFlush, you set the WAL writer latch. You then wake up anyone you removed from WaitersForCurrentFlush. Every time the WAL writer latch is set, the WAL writer wakes up and performs any needed flush, unless there's already one in progress. This allows processes to flush their own WAL when there's no contention, but as more contention develops the work moves to the WAL writer which will then run in a tight loop, as in your proposal. > 5. And we would finally get rid of the group commit parameters. That would be great, and I think the performance will be quite a bit better, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: