Re: BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4)
От | Andres Freund |
---|---|
Тема | Re: BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4) |
Дата | |
Msg-id | 20150708142241.GQ10242@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4) (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4)
(Graeme <graeme.b.bell@gmail.com>)
|
Список | pgsql-bugs |
On 2015-07-08 09:56:51 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > So there's an interesting "dip" between 4 and 8 clients. A perf profile > > doesn't show any actual lock contention on master. Not that surprising, > > there shouldn't be any exclusive locks here. > > What size of machine are you testing on? 2xE5520 (=> 2 x 4 sockets, 8 threads); numa. (note that I intentionally did not fix the volatility of the function) > I ran Graeme's tests on a 2-socket, 4-core-per-socket, no-hyperthreading > machine, which has separate NUMA zones for the 2 sockets. What I saw > (after fixing the "stable" issue) was that all the 8-client and 16-client > cases were about 8x faster than 1-client, and 2-client was generally > within hailing distance of 2x faster, but 4-client was often noticeably > worse than the expected 4x faster. > I figured this was likely some weird NUMA effect, possibly compounded > by brutally stupid scheduling on the part of my kernel. But I didn't > have time to look closer. > > You might be seeing the same kind of effect, or something different. > It's hard to tell without knowing more about your machine. I think it's likely to be some scheduler effect. The number of cpu migrations between 4 and 8 is very different: 4: 64,599 context-switches # 0.003 M/sec (100.00%) 172 cpu-migrations # 0.007 K/sec (100.00%) 537 page-faults # 0.023 K/sec 8: 381,383 context-switches # 0.002 M/sec (100.00%) 1,279 cpu-migrations # 0.008 K/sec (100.00%) 3,869 page-faults # 0.024 K/sec 16: 514,426 context-switches # 0.003 M/sec (100.00%) 1,166 cpu-migrations # 0.007 K/sec (100.00%) 6,308 page-faults # 0.039 K/sec There's a pretty large increase in the number of migrations between 4 and 8, but none between 8 and 16. My guess is that the kernel tries to move around processes to idle nodes too aggressively. second-by-second pgbench is quite interesting: progress: 1.0 s, 22915.3 tps, lat 0.346 ms stddev 0.078 progress: 2.0 s, 15596.8 tps, lat 0.512 ms stddev 0.185 progress: 3.0 s, 15519.2 tps, lat 0.514 ms stddev 0.499 progress: 4.0 s, 15535.7 tps, lat 0.512 ms stddev 0.306 progress: 5.0 s, 15494.3 tps, lat 0.515 ms stddev 0.162 so at -j8 we're routinely much faster than later. Comparing perf stat pgbench -j8 -T 1 and -T 8: -T 1 46 cpu-migrations -T 8 534 cpu-migrations so indeed the number of migration rises noticeably after the first second...
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4)