Re: BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4)

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4)
Дата
Msg-id 20150708142241.GQ10242@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4)  (Graeme <graeme.b.bell@gmail.com>)
Список pgsql-bugs
On 2015-07-08 09:56:51 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > So there's an interesting "dip" between 4 and 8 clients. A perf profile
> > doesn't show any actual lock contention on master. Not that surprising,
> > there shouldn't be any exclusive locks here.
>
> What size of machine are you testing on?

2xE5520 (=> 2 x 4 sockets, 8 threads); numa.

(note that I intentionally did not fix the volatility of the function)

> I ran Graeme's tests on a 2-socket, 4-core-per-socket, no-hyperthreading
> machine, which has separate NUMA zones for the 2 sockets.  What I saw
> (after fixing the "stable" issue) was that all the 8-client and 16-client
> cases were about 8x faster than 1-client, and 2-client was generally
> within hailing distance of 2x faster, but 4-client was often noticeably
> worse than the expected 4x faster.

> I figured this was likely some weird NUMA effect, possibly compounded
> by brutally stupid scheduling on the part of my kernel.  But I didn't
> have time to look closer.
>
> You might be seeing the same kind of effect, or something different.
> It's hard to tell without knowing more about your machine.

I think it's likely to be some scheduler effect. The number of cpu
migrations between 4 and 8 is very different:

4:

            64,599      context-switches          #    0.003 M/sec                    (100.00%)
               172      cpu-migrations            #    0.007 K/sec                    (100.00%)
               537      page-faults               #    0.023 K/sec
8:
           381,383      context-switches          #    0.002 M/sec                    (100.00%)
             1,279      cpu-migrations            #    0.008 K/sec                    (100.00%)
             3,869      page-faults               #    0.024 K/sec
16:

           514,426      context-switches          #    0.003 M/sec                    (100.00%)
             1,166      cpu-migrations            #    0.007 K/sec                    (100.00%)
             6,308      page-faults               #    0.039 K/sec

There's a pretty large increase in the number of migrations between 4
and 8, but none between 8 and 16.

My guess is that the kernel tries to move around processes to idle nodes
too aggressively.

second-by-second pgbench is quite interesting:
progress: 1.0 s, 22915.3 tps, lat 0.346 ms stddev 0.078
progress: 2.0 s, 15596.8 tps, lat 0.512 ms stddev 0.185
progress: 3.0 s, 15519.2 tps, lat 0.514 ms stddev 0.499
progress: 4.0 s, 15535.7 tps, lat 0.512 ms stddev 0.306
progress: 5.0 s, 15494.3 tps, lat 0.515 ms stddev 0.162

so at -j8 we're routinely much faster than later.

Comparing perf stat pgbench -j8 -T 1 and -T 8:
-T 1
                46      cpu-migrations
-T 8
               534      cpu-migrations
so indeed the number of migration rises noticeably after the first
second...

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4)
Следующее
От: Graeme
Дата:
Сообщение: Re: BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4)