Обсуждение: 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

Поиск
Список
Период
Сортировка

9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

От
Vladimir Borodin
Дата:
Hi all.

We have found that queries through PgBouncer 1.7.2 (with transaction pooling) to local PostgreSQL are almost two times slower in 9.5.3 than in 9.4.8 on RHEL 6 hosts (all packages are updated to last versions). Meanwhile the problem can’t be reproduced i.e. on Ubuntu 14.04 (also fully-updated).

Here is how the results look like for 9.4, 9.5 and 9.6. All are built from latest commits on yesterday in
* REL9_4_STABLE (a0cc89a28141595d888d8aba43163d58a1578bfb),
* REL9_5_STABLE (e504d915bbf352ecfc4ed335af934e799bf01053),
* master (6ee7fb8244560b7a3f224784b8ad2351107fa55d).

All of them are build on the host where testing is done (with stock gcc versions). Sysctls, pgbouncer config and everything we found are the same, postgres configs are default, PGDATA is in tmpfs. All numbers are reproducible, they are stable between runs.

Shortly:

OS PostgreSQL version TPS Avg. latency
RHEL 6 9.4 44898 1.425 ms
RHEL 6 9.5 26199 2.443 ms
RHEL 6 9.5 43027 1.487 ms
Ubuntu 14.04 9.4 67458 0.949 ms
Ubuntu 14.04 9.5 64065 0.999 ms
Ubuntu 14.04 9.6 64350 0.995 ms

You could see that the difference between major versions on Ubuntu is not significant, but on RHEL 9.5 is 70% slower than 9.4 and 9.6.

Below are more details.

RHEL 6:

postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 2693962
latency average: 1.425 ms
tps = 44897.461518 (including connections establishing)
tps = 44898.763258 (excluding connections establishing)
postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 1572014
latency average: 2.443 ms
tps = 26198.928627 (including connections establishing)
tps = 26199.803363 (excluding connections establishing)
postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 2581645
latency average: 1.487 ms
tps = 43025.676995 (including connections establishing)
tps = 43027.038275 (excluding connections establishing)
postgres@pgload05g ~ $

Ubuntu 14.04 (the same hardware):

postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4047653
latency average: 0.949 ms
tps = 67458.361515 (including connections establishing)
tps = 67459.983480 (excluding connections establishing)
postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 3844084
latency average: 0.999 ms
tps = 64065.447458 (including connections establishing)
tps = 64066.943627 (excluding connections establishing)
postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 3861088
latency average: 0.995 ms
tps = 64348.573126 (including connections establishing)
tps = 64350.195750 (excluding connections establishing)
postgres@pgloadpublic02:~$

In both tests (RHEL and Ubuntu) the bottleneck is performance of singe CPU core which is 100% consumed by PgBouncer. If pgbench connects to postgres directly I get the following (expected) numbers:

postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5432'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 10010710
latency average: 0.384 ms
tps = 166835.937859 (including connections establishing)
tps = 166849.730224 (excluding connections establishing)
postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5433'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 13373890
latency average: 0.287 ms
tps = 222888.311289 (including connections establishing)
tps = 222951.470125 (excluding connections establishing)
postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5434'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 12989816
latency average: 0.296 ms
tps = 216487.458399 (including connections establishing)
tps = 216548.069976 (excluding connections establishing)
postgres@pgload05g ~ $

Compilation options look almost the same:
# RHEL 6
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -O2
# Ubuntu
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2

Attached are a simple script to deploy the testing environment (PgBouncer should be installed) and pgbouncer config. I could provide any other needed information like backtraces or perf reports or anything else.


--
May the force be with you…

Вложения

Re: 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

От
Антон Бушмелев
Дата:
UP. repeat tests on local vm.. reults are discouraging
OS         PG     TPS AVG latency
Centos 7 9.5.3 23.711023 168.421
Centos 7 9.5.3 26.609271 150.188
Centos 7 9.5.3 25.220044 158.416
Centos 7 9.5.3 25.598977 156.047
Centos 7 9.4.8 278.572191 14.077
Centos 7 9.4.8 247.237755 16.177
Centos 7 9.4.8 240.007524 16.276
Centos 7 9.4.8 237.862238 16.596

ps: latest update on centos 7 +xfs + lates database version from repo, no pgbouncer 


On 25 May 2016, at 17:33, Vladimir Borodin <root@simply.name> wrote:

Hi all.

We have found that queries through PgBouncer 1.7.2 (with transaction pooling) to local PostgreSQL are almost two times slower in 9.5.3 than in 9.4.8 on RHEL 6 hosts (all packages are updated to last versions). Meanwhile the problem can’t be reproduced i.e. on Ubuntu 14.04 (also fully-updated).

Here is how the results look like for 9.4, 9.5 and 9.6. All are built from latest commits on yesterday in
* REL9_4_STABLE (a0cc89a28141595d888d8aba43163d58a1578bfb),
* REL9_5_STABLE (e504d915bbf352ecfc4ed335af934e799bf01053),
* master (6ee7fb8244560b7a3f224784b8ad2351107fa55d).

All of them are build on the host where testing is done (with stock gcc versions). Sysctls, pgbouncer config and everything we found are the same, postgres configs are default, PGDATA is in tmpfs. All numbers are reproducible, they are stable between runs.

Shortly:

OS PostgreSQL version TPS Avg. latency
RHEL 6 9.4 44898 1.425 ms
RHEL 6 9.5 26199 2.443 ms
RHEL 6 9.5 43027 1.487 ms
Ubuntu 14.04 9.4 67458 0.949 ms
Ubuntu 14.04 9.5 64065 0.999 ms
Ubuntu 14.04 9.6 64350 0.995 ms

You could see that the difference between major versions on Ubuntu is not significant, but on RHEL 9.5 is 70% slower than 9.4 and 9.6.

Below are more details.

RHEL 6:

postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 2693962
latency average: 1.425 ms
tps = 44897.461518 (including connections establishing)
tps = 44898.763258 (excluding connections establishing)
postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 1572014
latency average: 2.443 ms
tps = 26198.928627 (including connections establishing)
tps = 26199.803363 (excluding connections establishing)
postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 2581645
latency average: 1.487 ms
tps = 43025.676995 (including connections establishing)
tps = 43027.038275 (excluding connections establishing)
postgres@pgload05g ~ $

Ubuntu 14.04 (the same hardware):

postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4047653
latency average: 0.949 ms
tps = 67458.361515 (including connections establishing)
tps = 67459.983480 (excluding connections establishing)
postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 3844084
latency average: 0.999 ms
tps = 64065.447458 (including connections establishing)
tps = 64066.943627 (excluding connections establishing)
postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 3861088
latency average: 0.995 ms
tps = 64348.573126 (including connections establishing)
tps = 64350.195750 (excluding connections establishing)
postgres@pgloadpublic02:~$

In both tests (RHEL and Ubuntu) the bottleneck is performance of singe CPU core which is 100% consumed by PgBouncer. If pgbench connects to postgres directly I get the following (expected) numbers:

postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5432'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 10010710
latency average: 0.384 ms
tps = 166835.937859 (including connections establishing)
tps = 166849.730224 (excluding connections establishing)
postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5433'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 13373890
latency average: 0.287 ms
tps = 222888.311289 (including connections establishing)
tps = 222951.470125 (excluding connections establishing)
postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5434'
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 12989816
latency average: 0.296 ms
tps = 216487.458399 (including connections establishing)
tps = 216548.069976 (excluding connections establishing)
postgres@pgload05g ~ $

Compilation options look almost the same:
# RHEL 6
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -O2
# Ubuntu
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2

Attached are a simple script to deploy the testing environment (PgBouncer should be installed) and pgbouncer config. I could provide any other needed information like backtraces or perf reports or anything else.

<pgbouncer.ini>
<deploy.sh>

--
May the force be with you…


Вложения

Re: 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

От
Andres Freund
Дата:
Hi,

On 2016-06-02 14:18:26 +0300, Антон Бушмелев wrote:
> UP. repeat tests on local vm.. reults are discouraging
> OS             PG        TPS         AVG latency
> Centos 7     9.5.3     23.711023     168.421
> Centos 7     9.5.3     26.609271     150.188
> Centos 7     9.5.3     25.220044     158.416
> Centos 7     9.5.3     25.598977     156.047
> Centos 7     9.4.8     278.572191     14.077
> Centos 7     9.4.8     247.237755     16.177
> Centos 7     9.4.8     240.007524     16.276
> Centos 7     9.4.8     237.862238     16.596

Could you provide profiles on 9.4 and 9.5?  Which kernel did you have
enabled? Is /proc/sys/kernel/sched_autogroup_enabled 1 or 0?

Regards,

Andres


Re: 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

От
Andres Freund
Дата:
Hi Vladimir,

Thanks for these reports.

On 2016-06-13 00:42:19 +0300, Vladimir Borodin wrote:
> perf report -g -i pg9?_all.data >/tmp/pg9?_perf_report.txt

Any chance you could redo the reports with --no-children --call-graph=fractal
added? The mode that includes child overheads unfortunately makes the
output hard to interpet/compare.

> The results from pg9?_perf_report.txt are attached. Note that in all cases some events were lost, i.e.:
>
> root@pgload05g ~ # perf report -g -i pg94_all.data >/tmp/pg94_perf_report.txt
> Failed to open [vsyscall], continuing without symbols
> Warning:
> Processed 537137 events and lost 7846 chunks!

You can reduce the overhead by reducing the sampling frequency, e.g. by
specifying -F 300.

Greetings,

Andres Freund


Re: [HACKERS] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

От
Vladimir Borodin
Дата:

13 июня 2016 г., в 21:58, Vladimir Borodin <root@simply.name> написал(а):


13 июня 2016 г., в 0:51, Andres Freund <andres@anarazel.de> написал(а):

Hi Vladimir,

Thanks for these reports.

On 2016-06-13 00:42:19 +0300, Vladimir Borodin wrote:
perf report -g -i pg9?_all.data >/tmp/pg9?_perf_report.txt

Any chance you could redo the reports with --no-children --call-graph=fractal
added? The mode that includes child overheads unfortunately makes the
output hard to interpet/compare.

Of course. Not sure if that is important but I upgraded perf for that (because --no-children option was introduced in ~3.16), so perf record and perf report were done with different perf versions.

<pg94_perf_report.txt.gz>
<pg95_perf_report.txt.gz>
<pg96_perf_report.txt.gz>

Also I’ve done the same test on same host (RHEL 6) but with 4.6 kernel/perf and writing perf data to /dev/shm for not loosing events. Perf report output is also attached but important thing is that the regression is not so significant:

root@pgload05g ~ # uname -r
4.6.0-1.el6.elrepo.x86_64
root@pgload05g ~ # cat /proc/sys/kernel/sched_autogroup_enabled
1
root@pgload05g ~ # /tmp/run.sh
RHEL 6 9.4 71634 0.893
RHEL 6 9.5 54005 1.185
RHEL 6 9.6 65550 0.976
root@pgload05g ~ # echo 0 >/proc/sys/kernel/sched_autogroup_enabled
root@pgload05g ~ # /tmp/run.sh
RHEL 6 9.4 73041 0.876
RHEL 6 9.5 60105 1.065
RHEL 6 9.6 67984 0.941
root@pgload05g ~ #

<pg96_perf_report_4.6.txt.gz>
<pg95_perf_report_4.6.txt.gz>
<pg94_perf_report_4.6.txt.gz>

Andres, is there any chance that you would find time to look at those results? Are they actually useful?




The results from pg9?_perf_report.txt are attached. Note that in all cases some events were lost, i.e.:

root@pgload05g ~ # perf report -g -i pg94_all.data >/tmp/pg94_perf_report.txt
Failed to open [vsyscall], continuing without symbols
Warning:
Processed 537137 events and lost 7846 chunks!

You can reduce the overhead by reducing the sampling frequency, e.g. by
specifying -F 300.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
May the force be with you…


--
May the force be with you…

Re: [HACKERS] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

От
Andres Freund
Дата:
On 2016-06-13 21:58:30 +0300, Vladimir Borodin wrote:
>
> > 13 июня 2016 г., в 0:51, Andres Freund <andres@anarazel.de> написал(а):
> >
> > Hi Vladimir,
> >
> > Thanks for these reports.
> >
> > On 2016-06-13 00:42:19 +0300, Vladimir Borodin wrote:
> >> perf report -g -i pg9?_all.data >/tmp/pg9?_perf_report.txt
> >
> > Any chance you could redo the reports with --no-children --call-graph=fractal
> > added? The mode that includes child overheads unfortunately makes the
> > output hard to interpet/compare.
>
> Of course. Not sure if that is important but I upgraded perf for that (because --no-children option was introduced in
~3.16),so perf record and perf report were done with different perf versions. 
>
>
>
> Also I’ve done the same test on same host (RHEL 6) but with 4.6 kernel/perf and writing perf data to /dev/shm for not
loosingevents. Perf report output is also attached but important thing is that the regression is not so significant: 
>
> root@pgload05g ~ # uname -r
> 4.6.0-1.el6.elrepo.x86_64
> root@pgload05g ~ # cat /proc/sys/kernel/sched_autogroup_enabled
> 1
> root@pgload05g ~ # /tmp/run.sh
> RHEL 6        9.4        71634        0.893
> RHEL 6        9.5        54005        1.185
> RHEL 6        9.6        65550        0.976
> root@pgload05g ~ # echo 0 >/proc/sys/kernel/sched_autogroup_enabled
> root@pgload05g ~ # /tmp/run.sh
> RHEL 6        9.4        73041        0.876
> RHEL 6        9.5        60105        1.065
> RHEL 6        9.6        67984        0.941
> root@pgload05g ~ #

Hm. Have you measured how large the slowdown is if you connect via tcp
to pgbouncer, but have pgbouncer connect to postgres via unix sockets?

Andres


Re: [HACKERS] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

От
Andres Freund
Дата:
On 2016-07-04 16:30:51 +0300, Vladimir Borodin wrote:
>
> > 13 июня 2016 г., в 21:58, Vladimir Borodin <root@simply.name> написал(а):
> >
> >>
> >> 13 июня 2016 г., в 0:51, Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> написал(а):
> >>
> >> Hi Vladimir,
> >>
> >> Thanks for these reports.
> >>
> >> On 2016-06-13 00:42:19 +0300, Vladimir Borodin wrote:
> >>> perf report -g -i pg9?_all.data >/tmp/pg9?_perf_report.txt
> >>
> >> Any chance you could redo the reports with --no-children --call-graph=fractal
> >> added? The mode that includes child overheads unfortunately makes the
> >> output hard to interpet/compare.
> >
> > Of course. Not sure if that is important but I upgraded perf for that (because --no-children option was introduced
in~3.16), so perf record and perf report were done with different perf versions. 
> >
> > <pg94_perf_report.txt.gz>
> > <pg95_perf_report.txt.gz>
> > <pg96_perf_report.txt.gz>
> >
> > Also I’ve done the same test on same host (RHEL 6) but with 4.6
> > kernel/perf and writing perf data to /dev/shm for not loosing
> > events. Perf report output is also attached but important thing is
> > that the regression is not so significant:

FWIW, you can instead use -F 300 or something to reduce the sampling
frequency.

> > root@pgload05g ~ # uname -r
> > 4.6.0-1.el6.elrepo.x86_64
> > root@pgload05g ~ # cat /proc/sys/kernel/sched_autogroup_enabled
> > 1
> > root@pgload05g ~ # /tmp/run.sh
> > RHEL 6        9.4        71634        0.893
> > RHEL 6        9.5        54005        1.185
> > RHEL 6        9.6        65550        0.976
> > root@pgload05g ~ # echo 0 >/proc/sys/kernel/sched_autogroup_enabled
> > root@pgload05g ~ # /tmp/run.sh
> > RHEL 6        9.4        73041        0.876
> > RHEL 6        9.5        60105        1.065
> > RHEL 6        9.6        67984        0.941
> > root@pgload05g ~ #
> >
> > <pg96_perf_report_4.6.txt.gz>
> > <pg95_perf_report_4.6.txt.gz>
> > <pg94_perf_report_4.6.txt.gz>
>
> Andres, is there any chance that you would find time to look at those results? Are they actually useful?

I don't really see anything suspicious in the profile. This looks more
like a kernel scheduler issue than a postgres bottleneck one. It seems
that somehow using nonblocking IO (started in 9.5) causes scheduling
issues when pgbouncer is also local.

Could you do perf stat -ddd -a sleep 10 or something during both runs? I
suspect that the context switch ratios will be quite different.

Andres