Re: gaussian distribution pgbench

Поиск
Список
Период
Сортировка
От Mitsumasa KONDO
Тема Re: gaussian distribution pgbench
Дата
Msg-id CADupcHX=WBDBEsqeQCWkGOLX=e=YM6JtL4M2cbW+aoYqh-dczQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: gaussian distribution pgbench  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: gaussian distribution pgbench  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

2014-07-04 19:05 GMT+09:00 Andres Freund <andres@2ndquadrant.com>:
On 2014-07-04 11:59:23 +0200, Fabien COELHO wrote:
>
> >Yea. I certainly disagree with the patch in it's current state because it
> >copies the same 15 lines several times with a two word difference.
> >Independent of whether we want those options, I don't think that's going
> >to fly.
>
> I liked a simple static string for the different variants, which means
> replication. Factorizing out the (large) common part will mean malloc &
> sprintf. Well, why not.

It sucks from a maintenance POV. And I don't see the overhead of malloc
being relevant here...

> >>OTOH, we've almost reached the consensus that supporting gaussian
> >>and exponential options in \setrandom. So I think that you should
> >>separate those two features into two patches, and we should apply
> >>the \setrandom one first. Then we can discuss whether the other patch
> >>should be applied or not.
>
> >Sounds like a good plan.
>
> Sigh. I'll do that as it seems to be a blocker...
I still agree with Fabien-san. I cannot understand why our logical proposal isn't accepted...

I think we also need documentation about the actual mathematical
behaviour of the randomness generators.
> +     <para>
> +      With the gaussian option, the larger the <replaceable>threshold</>,
> +      the more frequently values close to the middle of the interval are drawn,
> +      and the less frequently values close to the <replaceable>min</> and
> +      <replaceable>max</> bounds.
> +      In other worlds, the larger the <replaceable>threshold</>,
> +      the narrower the access range around the middle.
> +      the smaller the threshold, the smoother the access pattern
> +      distribution. The minimum threshold is 2.0 for performance.
> +     </para>

The only way to actually understand the distribution here is to create a
table, insert random values, and then look at the result. That's not a
good thing.
That's right. Therefore, we create command line option to easy to understand parametrized Gaussian distribution.
When you want to know the parameter of distribution, you can use command line option like under followings.

 [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
starting vacuum...end.
transaction type: Exponential distribution TPC-B (sort of)
scaling factor: 1
exponential threshold: 10.00000
decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
highest/lowest percent of the range: 9.5% 0.0%

[nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=5
starting vacuum...end.
transaction type: Exponential distribution TPC-B (sort of)
scaling factor: 1
exponential threshold: 5.00000
decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4%
highest/lowest percent of the range: 4.9% 0.0%

If you have a better method than our method, please share us.
 
> The caveat that I have is that without these options there is:
>
> (1) no return about the actual distributions in the final summary, which
> depend on the threshold value, and
>
> (2) no included mean to test the feature, so the first patch is less
> meaningful if the feature cannot be used simply and require a custom script.

I personally agree that we likely want that as an additional
feature. Even if just because it makes the results easier to compare.
If we can do positive and logical discussion, I will agree with the proposal about separate patches.
However, I think that most opposite hacker decided by his feelings...
Actuary, he didn't answer to our proposal about understanding the parametrized distribution...
So I also think it is blocker. Command line feature is also needed.
Besides, is there a other good method? Please share us.

Best regards,
--
Mitsumasa KONDO

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: PoC: Partial sort
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: tweaking NTUP_PER_BUCKET