The max rate at which the transfer is happening when the tar size is 128 Kb is at most .48 GB/sec. Is there a possibility to understand what is the buffer size which is being used. That could help us explain some part of the puzzle.
Secondly the idea of taking just the min of two runs is a bit counter to the following. How do we justify the performance numbers and attribute that the differences is not related to noise. It might be better to do a few experiments for each of the kind and then try and fit a basic linear model and report the std deviation. "Order statistics" where you get the min(X1, X2, ... , Xn) is generally a biased estimator. A variance calculation of the biased statistics is a bit tricky and so the results could be corrupted by noise.