Re: [HACKERS] WAL logging problem in 9.4.3?
| От | Kyotaro Horiguchi |
|---|---|
| Тема | Re: [HACKERS] WAL logging problem in 9.4.3? |
| Дата | |
| Msg-id | 20191203.205146.1521643852457054060.horikyota.ntt@gmail.com обсуждение исходный текст |
| Ответ на | Re: [HACKERS] WAL logging problem in 9.4.3? (Noah Misch <noah@leadboat.com>) |
| Ответы |
Re: [HACKERS] WAL logging problem in 9.4.3?
Re: [HACKERS] WAL logging problem in 9.4.3? |
| Список | pgsql-hackers |
Hello.
At Thu, 28 Nov 2019 17:23:19 -0500, Noah Misch <noah@leadboat.com> wrote in
> On Thu, Nov 28, 2019 at 09:35:08PM +0900, Kyotaro Horiguchi wrote:
> > I measured the performance with the latest patch set.
> >
> > > 1. Determine $DDL_COUNT, a number of DDL transactions that take about one
> > > minute when done via syncs.
> > > 2. Start "pgbench -rP1 --progress-timestamp -T180 -c10 -j10".
> > > 3. Wait 10s.
> > > 4. Start one DDL backend that runs $DDL_COUNT transactions.
> > > 5. Save DDL start timestamp, DDL end timestamp, and pgbench output.
>
> If you have the raw data requested in (5), please share them here so folks
> have the option to reproduce your graphs and calculations.
Sorry, I forgot to attach the scripts. The raw data was vanished into
unstable connection and the steps was quite crude. I prioritized on
showing some numbers at the time. I revised the scripts into more
automated way and will take numbers again.
> > > 2. Start server with wal_level = replica (all other variables are not
> > > changed) then run the attached ./bench.sh
> >
> > The bench.sh attachment was missing; please attach it. Please give the output
> > of this command:
> >
> > select name, setting from pg_settings where setting <> boot_val;
(I intentionally show all the results..)
=# select name, setting from pg_settings where setting<> boot_val;
name | setting
----------------------------+--------------------
application_name | psql
archive_command | (disabled)
client_encoding | UTF8
data_directory_mode | 0700
default_text_search_config | pg_catalog.english
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8
log_checkpoints | on
log_file_mode | 0600
log_timezone | Asia/Tokyo
max_stack_depth | 2048
max_wal_senders | 0
max_wal_size | 10240
server_encoding | UTF8
shared_buffers | 16384
TimeZone | Asia/Tokyo
unix_socket_permissions | 0777
wal_buffers | 512
wal_level | minimal
(23 rows)
The result for "replica" setting in the benchmark script are used as
base numbers (or the denominator of the percentages).
> > 3. Restart server with wal_level = replica then run the bench.sh
> > twice.
>
> I assume this is wal_level=minimal, not wal_level=replica.
Oops! It's wrong I ran once with replica, then twice with minimal.
Anyway, I revised the benchmarking scripts and attached them. The
parameters written in benchmain.sh were decided as ./bench2.pl 5
<count> <pages> s with wal_level=minimal server takes around 60
seconds.
I'll send the complete data tomorrow (in JST). The attached f.txt is
the result of preliminary test only with pages=100 and 250 (with HDD).
The attached files are:
benchmain.sh - main script
bench2.sh - run a benchmark with a single set of parameters
bench1.pl - behchmark client program
summarize.pl - script to summarize benchmain.sh's output
f.txt.gz - result only for pages=100, DDL count = 2200 (not 2250)
How to run:
$ /..unpatched_path../initdb -D <unpatched_datadir>
(wal_level=replica, max_wal_senders=0, log_checkpoints=yes, max_wal_size=10GB)
$ /..patched_path../initdb -D <patched_datadir>
(wal_level=minimal, max_wal_senders=0, log_checkpoints=yes, max_wal_size=10GB)
$./benchmain.sh > <result_file> # output raw data
$./summarize.pl [-v] < <result_file> # show summary
With the attached f.txt, summarize.pl gives the following output.
WAL wins with the that pages.
$ cat f.txt | ./summarize.pl
## params: wal_level=replica mode=none pages=100 count=353 scale=20
(% are relative to "before")
before: tps 262.3 (100.0%), lat 39.840 ms (100.0%) (29 samples)
during: tps 120.7 ( 46.0%), lat 112.508 ms (282.4%) (35 samples)
after: tps 106.3 ( 40.5%), lat 163.492 ms (410.4%) (86 samples)
DDL time: 34883 ms ( 100.0% relative to mode=none)
## params: wal_level=minimal mode=sync pages=100 count=353 scale=20
(% are relative to "before")
before: tps 226.3 (100.0%), lat 48.091 ms (100.0%) (29 samples)
during: tps 83.0 ( 36.7%), lat 184.942 ms (384.6%) (100 samples)
after: tps 82.6 ( 36.5%), lat 196.863 ms (409.4%) (21 samples)
DDL time: 99239 ms ( 284.5% relative to mode=none)
## params: wal_level=minimal mode=WAL pages=100 count=353 scale=20
(% are relative to "before")
before: tps 240.3 (100.0%), lat 44.686 ms (100.0%) (29 samples)
during: tps 129.6 ( 53.9%), lat 113.585 ms (254.2%) (31 samples)
after: tps 124.5 ( 51.8%), lat 141.992 ms (317.8%) (90 samples)
DDL time: 30392 ms ( 87.1% relative to mode=none)
## params: wal_level=replica mode=none pages=250 count=258 scale=20
(% are relative to "before")
before: tps 266.3 (100.0%), lat 45.884 ms (100.0%) (29 samples)
during: tps 87.9 ( 33.0%), lat 148.433 ms (323.5%) (54 samples)
after: tps 105.6 ( 39.6%), lat 153.216 ms (333.9%) (67 samples)
DDL time: 53176 ms ( 100.0% relative to mode=none)
## params: wal_level=minimal mode=sync pages=250 count=258 scale=20
(% are relative to "before")
before: tps 225.1 (100.0%), lat 47.705 ms (100.0%) (29 samples)
during: tps 93.7 ( 41.6%), lat 143.231 ms (300.2%) (83 samples)
after: tps 93.8 ( 41.7%), lat 186.097 ms (390.1%) (38 samples)
DDL time: 82104 ms ( 154.4% relative to mode=none)
## params: wal_level=minimal mode=WAL pages=250 count=258 scale=20
(% are relative to "before")
before: tps 230.2 (100.0%), lat 48.472 ms (100.0%) (29 samples)
during: tps 90.3 ( 39.2%), lat 183.365 ms (378.3%) (48 samples)
after: tps 123.9 ( 53.8%), lat 131.129 ms (270.5%) (73 samples)
DDL time: 47660 ms ( 89.6% relative to mode=none)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
#! /usr/bin/bash
PGBENCH_SCALE=20
export PATH=/home/horiguti/bin/pgsql_pendsync/bin:$PATH
function run_one_param()
{
# bench2.sh <ddl_count> <insert_pages> <method (s:sync, w:WAL, n:not-set)> <pgbench scale>
./bench2.sh $1 $2 n ${PGBENCH_SCALE} 2>&1
./bench2.sh $1 $2 s ${PGBENCH_SCALE} 2>&1
./bench2.sh $1 $2 w ${PGBENCH_SCALE} 2>&1
}
killall -9 postgres
sleep 1
#run_one_param <ddl_count> <insert_pages>
# On slow HDD
export PGDATA=/home/horiguti/storage/hdd/data/data_pendsync
run_one_param 353 100
run_one_param 258 250
run_one_param 185 500
run_one_param 118 1000
run_one_param 58 2500
run_one_param 32 5000
run_one_param 18 10000
# On M.2 SSD
export PGDATA=/home/horiguti/storage/ssd/data/data_pendsync
run_one_param 2250 100
run_one_param 1162 250
run_one_param 564 500
run_one_param 297 1000
run_one_param 123 2500
run_one_param 63 5000
run_one_param 32 10000
#! /usr/bin/bash
if [ "$PGDATA" == "" ]; then
echo "\$PGDATA should be set"
exit;
fi;
rm -r $PGDATA/*
initdb
if [ "$3" == "n" ]; then
wal_level=replica
else
wal_level=minimal
fi
cat <<EOF >> $PGDATA/postgresql.conf
wal_level=$wal_level
max_wal_senders=0
log_checkpoints=yes
max_wal_size = 10GB
EOF
binary=`which postgres`
scale=$4
echo "## params: count=$1 pages=$2 mode=$3 binary=$binary scale=$scale wal_level=$wal_level"
pg_ctl stop -m f
pg_ctl start 2>&1
pgbench -i -s ${scale}
psql -c 'checkpoint;'
((sleep 30; echo "START"; ./bench1.pl 5 $1 $2 $3; echo "END") & pgbench -rP1 --progress-timestamp -T150 -c10 -j10)
2>&1
pg_ctl stop -m i
#! /usr/bin/perl
use strict;
use IPC::Open2;
use Time::HiRes qw (gettimeofday tv_interval);
my $tupperpage = 226;
my $large_size = 100000000;
my @time = ();
sub bench {
my ($header, $nprocs, $ntups, $threshold, $ddlcount) = @_;
my @result = ();
my @rds = ();
for (my $ip = 0 ; $ip < $nprocs ; $ip++)
{
pipe(my $rd, my $wr);
$rds[$ip] = $rd;
my $pid = fork();
die "fork failed: $!\n" if ($pid < 0);
if ($pid == 0)
{
close($rd);
my $pid = open2(my $psqlrd, my $psqlwr, "psql postgres > /dev/null");
if ($threshold >= 0) {
print $psqlwr "SET wal_skip_threshold to $threshold;\n";
}
print $psqlwr "DROP TABLE IF EXISTS t$ip;";
print $psqlwr "CREATE TABLE t$ip (a int);\n";
my @st = gettimeofday();
for (my $i = 0 ; $i < $ddlcount ; $i++)
{
print $psqlwr "BEGIN;";
print $psqlwr "TRUNCATE t$ip;";
print $psqlwr "INSERT INTO t$ip (SELECT a FROM generate_series(1, $ntups) a);";
print $psqlwr "COMMIT;";
}
close($psqlwr);
waitpid($pid, 0);
print $wr $ip, " ", 1000 * tv_interval(\@st, [gettimeofday]), "\n";
exit;
}
close($wr);
}
my $rpid;
while (($rpid = wait()) == 0) {}
my $sum = 0;
for (my $ip = 0 ; $ip < $nprocs ; $ip++)
{
my $ret = readline($rds[$ip]);
die "format? $ret\n" if ($ret !~ /^([0-9]+) ([0-9.]+)$/);
$sum += $2;
}
printf "$header: procs $nprocs: time %.0f\n", $sum / $nprocs;
}
sub log10 { return log($_[0]) / log(10); }
# benchmark for wal_level = replica, the third parameter of bench
# doesn't affect
sub bench1
{
my $ddlcount = 5;
$ddlcount = $ARGV[1] if ($#ARGV > 0);
print "benchmark for wal_level = replica\n";
for (my $s = 0 ; $s <= 4 ; $s += 0.25)
{
my $ss = int(10 ** $s);
bench("size $ss", 1, $ss * $tupperpage, 0, $ddlcount);
}
}
# benchmark for wal_level = minimal.
sub bench2
{
my $ddlcount = 5;
$ddlcount = $ARGV[1] if ($#ARGV > 0);
print "benchmark for wal_level = minimal\n";
for (my $s = 0 ; $s <= 4.5 ; $s += 0.25)
{
my $ss = int(10 ** $s);
bench("size $ss: SYNC ", 1, $ss * $tupperpage, 0, $ddlcount);
bench("size $ss: WAL ", 1, $ss * $tupperpage, $large_size, $ddlcount);
}
}
# find crossing point of WAL and SYNC by bisecting
sub bench3
{
my $ddlcount = 5;
$ddlcount = $ARGV[1] if ($#ARGV > 0);
print "find crossing point of WAL and SYNC by bisecting\n";
bench("SYNC: size 0", 1, 1, 8);
bench("WAL : size 0", 1, 1, 16);
my $s = 1;
my $st = 10000;
while (1)
{
my $ts = bench("SYNC: size $s",
$tupperpage * $s, 0, $ddlcount);
my $tw = bench("WAL : size $s",
$tupperpage * $s, $large_size, $ddlcount);
if ($st < 1.0){
print "DONE\n";
exit(0);
}
if ($ts > $tw)
{
$s += $st; $st /= 2;
}
else
{
$s -= $st; $st /= 2;
}
}
}
# benchmark with multiple processes
sub bench4
{
my $ddlcount = 5;
my $nprocs = 10;
$nprocs = $ARGV[1] if ($#ARGV > 0);
$ddlcount = $ARGV[2] if ($#ARGV > 1);
print "benchmark for wal_level = minimal, $nprocs processes\n";
print "bench 4: nprocs = $nprocs, DDL count = $ddlcount\n";
for (my $s = 1.0 ; $s <= 3.5 ; $s += 0.25)
{
my $ss = int(10 ** $s);
bench("pages $ss: SYNC ", $nprocs, $ss * $tupperpage, 0, 5);
bench("pages $ss: WAL ", $nprocs, $ss * $tupperpage, $large_size, 5);
}
}
sub bench5
{
my $ddlcount = 5;
my $pages = 100;
my $mode = "s";
my $threshold = 0;
$ddlcount = $ARGV[1] if ($#ARGV > 0);
$pages = $ARGV[2] if ($#ARGV > 1);
$mode = $ARGV[3] if ($#ARGV > 2);
if ($mode eq 's') {
$threshold = 0;
} elsif ($mode eq 's') {
$threshold = $large_size;
} elsif ($mode eq 'n') {
$threshold = -1;
} elsif ($mode eq 'w') {
$threshold = $large_size;
} else {
die "mode must be s, w or n\n";
}
print "bench 5: mode = $mode, DDL count = $ddlcount, pages = $pages\n";
bench("size $pages", 1, $pages * $tupperpage, $threshold, $ddlcount);
}
bench1() if ($ARGV[0] == 1);
bench2() if ($ARGV[0] == 2);
bench3() if ($ARGV[0] == 3);
bench4() if ($ARGV[0] == 4);
bench5() if ($ARGV[0] == 5);
#! /usr/bin/perl
use strict;
use POSIX qw(floor);
my $state = 0;
my $wal_level = '';
my $pages = 0;
my $binary = '';
my $scale = 0;
my $paramcount = 0;
my $mode = '';
my $sumtps = 0;
my $sumlat = 0;
my $count = 0;
my $trig = 0;
my $title = "(undef)";
my $ddltime = 0;
my @lines = ();
my $trailstr = '';
my $verbose = 1 if ($ARGV[0] eq '-v');
my %tps = ();
my %lat = ();
my %ddltime = ();
my %modestr=("n", "none", "s", "sync", "w", "WAL");
while (<STDIN>) {
push(@lines, $_);
chomp;
next if (/^(END|START)$/);
next if (/NOTICE: /);
# print "$state: $_\n";
if ($state == 0) {
if (/^## params: count=([0-9.]+) pages=([0-9.]+) mode=(.) binary=([^ ]+) scale=([0-9.]+) wal_level=([a-z]+)/)
{
$paramcount = $1;
$pages = $2;
$mode = $3;
$binary = $4;
$scale = $5;
$wal_level = $6;
my $modestr = $modestr{$mode};
print "## params: wal_level=$wal_level mode=$modestr pages=$pages count=$paramcount scale=$scale\n";
print "(% are relative to \"before\")\n";
$state = 1;
next;
} else {
next;
}
} elsif ($state == 1) {
if (/^starting vacuum/) {
$state = 2;
}
next;
} elsif ($state == 2) {
if (/^bench.*/) {
$trig = 1;
$title = "before";
$state = 3;
}
} elsif ($state == 3) {
if (/^size ([0-9]+): procs ([0-9]+): time ([0-9]+)$/) {
$ddltime{$mode} = $3;
$trig = 1;
$title = "during";
$trailstr = '';
$state = 4;
}
} elsif ($state == 4) {
if (/^transaction type: /) {
$trig = 1;
$title = "after";
$trailstr = "# $_\n";
$state = 5;
}
} elsif ($state == 5) {
if (!/^statement latencies /) {
$trailstr .= "# $_\n";
next;
}
printf "DDL time: %6.0f ms (%6.1f%% relative to mode=%s)\n",
$ddltime{$mode},
floor(1000.0 * $ddltime{$mode} / $ddltime{n} + 0.5) / 10,
$modestr{n};
$trailstr .= "# $_\n";
$state = 6;
next;
} elsif ($state == 6) {
if (/^ {8}/) {
$trailstr.= "# $_\n";
next;
}
if ($verbose) {
print $trailstr;
}
$state = 0;
next;
}
if ($trig) {
die "count 0?\n" if ($count == 0);
$tps{$title} = $sumtps / $count;
$lat{$title} = $sumlat / $count;
printf "%6s: tps %6.1f (%5.1f%%), lat %9.3f ms (%5.1f%%) (%d samples)\n",
$title,
$tps{$title}, floor(1000.0 * $tps{$title} / $tps{before} + 0.5)/10,
$lat{$title}, floor(1000.0 * $lat{$title} / $lat{before} + 0.5)/10,
$count;
$sumtps = $sumlat = $count = 0;
$trig = 0;
next;
}
if (!/^progress: ([0-9.]+) s, ([0-9.]+) tps, lat ([0-9.]+) ms stddev ([0-9.]+|NaN)$/) {
last;
}
$sumtps += $2;
$sumlat += $3;
$count++;
}
if ($state != 0) {
print "Wrong state after EOF: state = $state\n";
print "=====================================\n";
foreach (-10 .. -1) {
printf "%d: %s", ($. + $_), $lines[$. + $_];
}
print "=====================================\n";
exit(1);
}
die "uncounted lines?\n" if ($count > 0);
Вложения
В списке pgsql-hackers по дате отправления: