On 05/10/2016 03:04 PM, Kevin Grittner wrote:
> On Tue, May 10, 2016 at 3:29 AM, Kevin Grittner <kgrittn@gmail.com
> <mailto:kgrittn@gmail.com>> wrote:
>
>>> * The results are a bit noisy, but I think in general this shows
>>> that for certain cases there's a clearly measurable difference
>>> (up to 5%) between the "disabled" and "reverted" cases. This is
>>> particularly visible on the smallest data set.
>>
>> In some cases, the differences are in favor of disabled over
>> reverted.
>
> There were 75 samples each of "disabled" and "reverted" in the
> spreadsheet. Averaging them all, I see this:
>
> reverted: 290,660 TPS
> disabled: 292,014 TPS
Well, that kinda assumes it's one large group. I was wondering whether
the difference depends on some of the other factors (scale factor,
number of clients), which is why I mentioned "for certain cases".
The other problem is averaging the difference like this overweights the
results for large client counts. Also, it mixes results for different
scales, which I think is pretty important.
The following table shows the differences between the disabled and
reverted cases like this:
sum('reverted' results with N clients) ---------------------------------------- - 1.0 sum('disabled' results
withN clients)
for each scale/client count combination. So for example 4.83% means with
a single client on the smallest data set, the sum of the 5 runs for
reverted was about 1.0483x than for disabled.
scale 1 16 32 64 128 100 4.83% 2.84% 1.21% 1.16% 3.85% 3000
1.97% 0.83% 1.78% 0.09% 7.70% 10000 -6.94% -5.24% -12.98% -3.02% -8.78%
So in average for each scale;
scale revert/disable 100 2.78% 3000 2.47% 10000 -7.39%
Of course, it still might be due to noise. But looking at the two tables
that seems rather unlikely, I guess.
>
> That's a 0.46% overall increase in performance with the patch,
> disabled, compared to reverting it. I'm surprised that you
> consider that to be a "clearly measurable difference". I mean, it
> was measured and it is a difference, but it seems to be well within
> the noise. Even though it is based on 150 samples, I'm not sure we
> should consider it statistically significant.
Well, luckily we're in the position that we can collect more data.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services