Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

Поиск
Список
Период
Сортировка
От Florian Pflug
Тема Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Дата
Msg-id 246CDA37-6A93-4F60-9F48-F0B43DC06AC4@phlo.org
обсуждение исходный текст
Ответ на Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile  (Sergey Koposov <koposov@ast.cam.ac.uk>)
Ответы Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile  (Sergey Koposov <koposov@ast.cam.ac.uk>)
Список pgsql-hackers
On May31, 2012, at 01:16 , Sergey Koposov wrote:
> On Wed, 30 May 2012, Florian Pflug wrote:
>>
>> I wonder if the huge variance could be caused by non-uniform synchronization costs across different cores. That's
notall that unlikely, because at least some cache levels (L2 and/or L3, I think) are usually shared between all cores
ona single die. Thus, a cache bouncing line between cores on the same die might very well be faster then it bouncing
betweencores on different dies. 
>>
>> On linux, you can use the taskset command to explicitly assign processes to cores. The easiest way to check if that
makesa difference is to assign one core for each connection to the postmaster before launching your test. Assuming that
cpuassignment are inherited to child processes, that should then spread your backends out over exactly the cores you
specify.
>
> Wow, thanks! This seems to be working to some extend. I've found that distributing each thread x ( 0<x<7) to the cpu
1+3*x
> (reminder, that i have HT disabled and in total I have 4 cpus with 6 proper cores each) gives quite good results. And
aftera few runs, I seem to be getting a more or less stable results for the multiple threads, with the performance of
multithreadedruns going from 6 to 11 seconds for various threads. (another reminder is that 5-6 seconds  is roughly the
timingof a my queries running in a single  thread). 

Wait, so performance *increased* by spreading the backends out over as many dies as possible, not by using as few as
possible?That'd 
be exactly the opposite of what I'd have expected. (I'm assuming that cores on one die have ascending ids on linux. If
youcould post the contents of /proc/cpuinfo, we could verify that) 

> So to some extend one can say that the problem is partially solved (i.e. it is probably understood)

Not quite, I think. We still don't really know why there's that much spinlock contention AFAICS. But what we've learned
isthat the actual 
spinning on a contested lock is only part of the problem. The cache-line bouncing caused by all those lock acquisition
isthe other part, and it's pretty expensive too - otherwise, moving the backends around wouldn't have helped. 

> But the question now is whether there is a *PG* problem here or not, or is it Intel's or Linux's problem ?

Neither Intel nor Linux can do much about this, I fear. Synchronization will always be expensive, and the more so the
largerthe number of cores. Linux could maybe pick a better process to core assignment, but it probably won't be able to
pickthe optimal one for every workload. So unfortunately, this is a postgres problem I'd say. 

> Because still the slowdown was caused by locking. If there wouldn't be locking there wouldn't be any problems (as
demonstrateda while ago by just cat'ting the files in multiple threads). 

Yup, we'll have to figure out a way to reduce the locking overhead. 9.2 already scales much better to a large number of
coresthan previous versions did, but your test case shows that there's still room for improvement. 

best regards,
Florian Pflug



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: WalSndWakeup() and synchronous_commit=off
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: Figuring out shared buffer pressure