Re: futex results with dbt-3
От | Tom Lane |
---|---|
Тема | Re: futex results with dbt-3 |
Дата | |
Msg-id | 21093.1098722706@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: futex results with dbt-3 (Manfred Spraul <manfred@colorfullife.com>) |
Ответы |
Re: futex results with dbt-3
(Manfred Spraul <manfred@colorfullife.com>)
|
Список | pgsql-performance |
Manfred Spraul <manfred@colorfullife.com> writes: > But: According to the descriptions the problem is a context switch > storm. I don't see that cache line bouncing can cause a context switch > storm. What causes the context switch storm? As best I can tell, the CS storm arises because the backends get into some sort of lockstep timing that makes it far more likely than you'd expect for backend A to try to enter the bufmgr when backend B is already holding the BufMgrLock. In the profiles we were looking at back in April, it seemed that about 10% of the time was spent inside bufmgr (which is bad enough in itself) but the odds of LWLock collision were much higher than 10%, leading to many context swaps. This is not totally surprising given that they are running identical queries and so are running through loops of the same length, but still it seems like there must be some effect driving their timing to converge instead of diverge away from the point of conflict. What I think (and here is where it's a leap of logic, cause I can't prove it) is that the excessive time spent passing the spinlock cache line back and forth is exactly the factor causing that convergence. Somehow, the delay caused when a processor has to wait to get the cache line contributes to keeping the backend loops in lockstep. It is critical to understand that the CS storm is associated with LWLock contention not spinlock contention: what we saw was a lot of semop()s not a lot of select()s. > If it's the pg_usleep in s_lock, then my patch should help a lot: with > pthread_rwlock locks, this line doesn't exist anymore. The profiles showed that s_lock() is hardly entered at all, and the select() delay is reached even more seldom. So changes in that area will make exactly zero difference. This is the surprising and counterintuitive thing: oprofile clearly shows that very large fractions of the CPU time are being spent at the initial TAS instructions in LWLockAcquire and LWLockRelease, and yet those TASes hardly ever fail, as proven by the fact that oprofile shows s_lock() is seldom entered. So as far as the Postgres code can tell, there isn't any contention worth mentioning for the spinlock. This is indeed the way it was designed to be, but when so much time is going to the TAS instructions, you'd think there'd be more software-visible contention for the spinlock. It could be that I'm all wet and there is no relationship between the cache line thrashing and the seemingly excessive BufMgrLock contention. They are after all occurring at two very different levels of abstraction. But I think there is some correlation that we just don't understand yet. regards, tom lane
В списке pgsql-performance по дате отправления: