Re: spinlock contention
От | Robert Haas |
---|---|
Тема | Re: spinlock contention |
Дата | |
Msg-id | CA+TgmobbxMh_9zjudheSWO6m8sBMb5hdZt+3ChCLuv5eztv8Ug@mail.gmail.com обсуждение исходный текст |
Ответ на | spinlock contention (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: spinlock contention
Re: spinlock contention |
Список | pgsql-hackers |
On Thu, Jun 23, 2011 at 11:42 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jun 22, 2011 at 5:43 PM, Florian Pflug <fgp@phlo.org> wrote: >> On Jun12, 2011, at 23:39 , Robert Haas wrote: >>> So, the majority (60%) of the excess spinning appears to be due to >>> SInvalReadLock. A good chunk are due to ProcArrayLock (25%). >> >> Hm, sizeof(LWLock) is 24 on X86-64, making sizeof(LWLockPadded) 32. >> However, cache lines are 64 bytes large on recent Intel CPUs AFAIK, >> so I guess that two adjacent LWLocks currently share one cache line. >> >> Currently, the ProcArrayLock has index 4 while SInvalReadLock has >> index 5, so if I'm not mistaken exactly the two locks where you saw >> the largest contention on are on the same cache line... >> >> Might make sense to try and see if these numbers change if you >> either make LWLockPadded 64bytes or arrange the locks differently... > > I fooled around with this a while back and saw no benefit. It's > possible a more careful test would turn up something, but I think the > only real way forward here is going to be to eliminate some of that > locking altogether. I did some benchmarking, on the 32-core system from Nate Boley, with pgbench -n -S -c 80 -j 80. With master+fastlock+lazyvxid, I can hit 180-200k TPS in the 32-client range. But at 80 clients throughput starts to fall off quite a bit, dropping down to about 80k TPS. So then, just for giggles, I inserted a "return;" statement at the top of AcceptInvalidationMessages(), turning it into a noop. Performance at 80 clients shot up to 210k TPS. I also tried inserting an acquire-and-release cycle on MyProc->backendLock, which was just as good. That seems like a pretty encouraging result, since there appear to be several ways of reimplementing SIGetDataEntries() that would work with a per-backend lock rather than a global one. I did some other experiments, too. In the hopes of finding a general way to reduce spinlock contention, I implemented a set of "elimination buffers", where backends that can't get a spinlock go and try to combine their requests and then send a designated representative to acquire the spinlock and acquire shared locks simultaneously for all group members. It took me a while to debug the code, and I still can't get it to do anything other than reduce performance, which may mean that I haven't found all the bugs yet, but I'm not feeling encouraged. Some poking around suggests that the problem isn't that spinlocks are routinely contended - it seems that we nearly always get the spinlock right off the bat. I'm wondering if the problem may be not so much that we have continuous spinlock contention, but rather than every once in a while a process gets time-sliced out while it holds a spinlock. If it's an important spinlock (like the one protecting SInvalReadLock), the system will quickly evolve into a state where every single processor is doing nothing but trying to get that spinlock. Even after the hapless lock-holder gets to run again and lets go of the lock, you have a whole pile of other backends who are sitting there firing of lock xchgb in a tight loop, and they can only get it one at a time, so you have ferocious cache line contention until the backlog clears. Then things are OK again for a bit until the same thing happens to some other backend. This is just a theory, I might be totally wrong... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Robert HaasДата:
Сообщение: Re: [v9.2] DROP Reworks Part.1 - Consolidate routines to handle DropStmt