Cooled down then measured performance again.
I show you the true result briefly for now.
At Mon, 11 Jul 2016 19:07:22 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20160711.190722.145849861.horiguchi.kyotaro@lab.ntt.co.jp>
> Anyway I need some time to cool down..
I recalled that I put Makefile.custom that contains
CFLAGS="-O0". Removing that gave me a sainer result.
patched- -O2
table 10-average(ms) stddev runtime-diff from unpatched(%)
t0 441.78 0.32 3.4
pl 201.77 0.32 13.6
pf0 6619.22 18.99 -19.7
pf1 1800.72 32.72 -78.0
---
unpatched- -O2
t0 427.21 0.42
pl 177.54 0.25
pf0 8250.42 23.29
pf1 8206.02 12.91
==========
3% slower for local 1*seqscan (2-parallel)14% slower for append-4*seqscan (no-prallel)19% faster for
append-4*foreignscan(all scans on one connection)78% faster for append-4*foreignscan (scans have dedicate connection)
ExecProcNode might be able to be optimized a bit.
ExecAppend seems to need some fix.
Addition to the aboves, I will try reentrant ExecAsyncWaitForNode
or something.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center