Re: Wierd context-switching issue on Xeon
От | Joe Conway |
---|---|
Тема | Re: Wierd context-switching issue on Xeon |
Дата | |
Msg-id | 40849235.2070808@joeconway.com обсуждение исходный текст |
Ответ на | Re: Wierd context-switching issue on Xeon (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Wierd context-switching issue on Xeon
(Joe Conway <mail@joeconway.com>)
|
Список | pgsql-performance |
Tom Lane wrote: > Here is a test case. To set up, run the "test_setup.sql" script once; > then launch two copies of the "test_run.sql" script. (For those of > you with more than two CPUs, see whether you need one per CPU to make > trouble, or whether two test_runs are enough.) Check that you get a > nestloops-with-index-scans plan shown by the EXPLAIN in test_run. Check. > In isolation, test_run.sql should do essentially no syscalls at all once > it's past the initial ramp-up. On a machine that's functioning per > expectations, multiple copies of test_run show a relatively low rate of > semop() calls --- a few per second, at most --- and maybe a delaying > select() here and there. > > What I actually see on Josh's client's machine is a context swap storm: > "vmstat 1" shows CS rates around 170K/sec. strace'ing the backends > shows a corresponding rate of semop() syscalls, with a few delaying > select()s sprinkled in. top(1) shows system CPU percent of 25-30 > and idle CPU percent of 16-20. Your test case works perfectly. I ran 4 concurrent psql sessions, on a quad Xeon (IBM x445, 2.8GHz, 4GB RAM), hyperthreaded. Heres what 'top' looks like: 177 processes: 173 sleeping, 3 running, 1 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 35.9% 0.0% 7.2% 0.0% 0.0% 0.0% 56.8% cpu00 19.6% 0.0% 4.9% 0.0% 0.0% 0.0% 75.4% cpu01 44.1% 0.0% 7.8% 0.0% 0.0% 0.0% 48.0% cpu02 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 100.0% cpu03 32.3% 0.0% 13.7% 0.0% 0.0% 0.0% 53.9% cpu04 21.5% 0.0% 10.7% 0.0% 0.0% 0.0% 67.6% cpu05 42.1% 0.0% 9.8% 0.0% 0.0% 0.0% 48.0% cpu06 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% cpu07 27.4% 0.0% 10.7% 0.0% 0.0% 0.0% 61.7% Mem: 4123700k av, 3933896k used, 189804k free, 0k shrd, 221948k buff 2492124k actv, 760612k in_d, 41416k in_c Swap: 2040244k av, 5632k used, 2034612k free 3113272k cached Note that cpu06 is not a postgres process. The output of vmstat looks like this: # vmstat 1 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy id wa 4 0 5632 184264 221948 3113308 0 0 0 0 0 0 0 0 0 0 3 0 5632 184264 221948 3113308 0 0 0 0 112 211894 36 9 55 0 5 0 5632 184264 221948 3113308 0 0 0 0 125 222071 39 8 53 0 4 0 5632 184264 221948 3113308 0 0 0 0 110 215097 39 10 52 0 1 0 5632 184588 221948 3113308 0 0 0 96 139 187561 35 10 55 0 3 0 5632 184588 221948 3113308 0 0 0 0 114 241731 38 10 52 0 3 0 5632 184920 221948 3113308 0 0 0 0 132 257168 40 9 51 0 1 0 5632 184912 221948 3113308 0 0 0 0 114 251802 38 9 54 0 > Note the test case assumes you've got shared_buffers set to at least > 1000; with smaller values, you may get some I/O syscalls, which will > probably skew the results. shared_buffers ---------------- 16384 (1 row) I found that killing three of the four concurrent queries dropped context switches to about 70,000 to 100,000. Two or more sessions brings it up to 200K+. Joe
В списке pgsql-performance по дате отправления: