Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)
От | Heikki Linnakangas |
---|---|
Тема | Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock) |
Дата | |
Msg-id | 4F610516.9040102@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock) (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Ответы |
Re: Scaling XLog insertion (was Re: Moving more work
outside WALInsertLock)
|
Список | pgsql-hackers |
On 12.03.2012 21:33, I wrote: > The slowdown with > 6 clients seems to be spinlock contention. I ran > "perf record" for a short duration during one of the ramdrive tests, and > saw the spinlock acquisition in ReserveXLogInsertLocation() consuming > about 80% of all CPU time. > > I then hacked the patch a little bit, removing the check in XLogInsert > for fullPageWrites and forcePageWrites, as well as the check for "did a > checkpoint just happen" (see > http://community.enterprisedb.com/xloginsert-scale-tests/disable-fpwcheck.patch). > My hunch was that accessing those fields causes cache line stealing, > making the cache line containing the spinlock even more busy. That hunch > seems to be correct; when I reran the tests with that patch, the > performance with high # of clients became much better. See the results > with "xloginsert-scale-13.patch". With that change, the single-client > case is still about 10% slower than current code, but the performance > with > 8 clients is almost as good as with current code. Between 2-6 > clients, the patch is a win. > > The hack that restored the > 6 clients performance to current level is > not safe, of course, so I'll have to figure out a safe way to get that > effect. I managed to do that in a safe way, and also found a couple of other small changes that made a big difference to performance. I found out that the number of cache misses while holding the spinlock matter a lot, which in hindsight isn't surprising. I aligned the XLogCtlInsert struct on a 64-byte boundary, so that the new spinlock and the fields it protects all fit on the same cache line (on boxes with cache line size >= 64 bytes, anyway). I also changed the logic of the insertion slots slightly, so that when a slot is reserved, while holding the spinlock, it doesn't need to be immediately updated. That avoids one cache miss, as the cache line holding the slot doesn't need to be accessed while holding the spinlock. And to reduce contention on cache lines when an insertion is finished and the insertion slot is updated, I shuffled the slots so that slots that are logically adjacent are spaced apart in memory. When all those changes are put together, the patched version now beats or matches the current code in the RAM drive tests, except that the single-client case is still about 10% slower. I added the new test results at http://community.enterprisedb.com/xloginsert-scale-tests/, and a new version of the patch is attached. If all of this sounds pessimistic, let me remind that I've been testing the cases where I'm seeing regressions, so that I can fix them, and not trying to demonstrate how good this is in the best case. These tests have been with very small WAL records, with only 16 bytes of payload. Larger WAL records benefit more. I also ran one test with larger, 100 byte WAL records, and put the results up on that site. > Also, even when the performance is as good as current code, it's > not good to spend all the CPU time spinning on the spinlock. I didn't > measure the CPU usage with current code, but I would expect it to be > sleeping, not spinning, when not doing useful work. This is still an issue. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: