Re: gaussian distribution pgbench

Поиск
Список
Период
Сортировка
От Mitsumasa KONDO
Тема Re: gaussian distribution pgbench
Дата
Msg-id CADupcHWUDkgKbMa1K=Z5kgVShH91ipHVJzx1+gypf09RxNzRbw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: gaussian distribution pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Ответы Re: gaussian distribution pgbench  (Mitsumasa KONDO <kondo.mitsumasa@gmail.com>)
Re: gaussian distribution pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Re: gaussian distribution pgbench  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi

2014-03-15 15:53 GMT+09:00 Fabien COELHO <coelho@cri.ensmp.fr>:

Hello Heikki,


A couple of comments:

* There should be an explicit "\setrandom ... uniform" option too, even though you get that implicitly if you don't specify the distribution

Indeed. I agree. I suggested it, but it got lost.
OK. If we keep to the SQL grammar, your saying is right. I will add it.
 
* What exactly does the "threshold" mean? The docs informally explain that "the larger the thresold, the more frequent values close to the middle of the interval are drawn", but that's pretty vague.

There are explanations and computations as comments in the code. If it is about the documentation, I'm not sure that a very precise mathematical definition will help a lot of people, and might rather hinder understanding, so the doc focuses on an intuitive explanation instead.
Yeah, I think that we had better to only explain necessary infomation for using this feature. If we add mathematical theory in docs, it will be too difficult for user.  And it's waste. 
 

* Does min and max really make sense for gaussian and exponential distributions? For gaussian, I would expect mean and standard deviation as the parameters, not min/max/threshold.

Yes... and no:-) The aim is to draw an integer primary key from a table, so it must be in a specified range. This is approximated by drawing a double value with the expected distribution (gaussian or exponential) and project it carefully onto integers. If it is out of range, there is a loop and another value is drawn. The minimal threshold constraint (2.0) ensures that the probability of looping is low.
I think it is difficult to understand from our text... So I create picture that will help you to understand it.
Please see it.
 

* How about setting the variable as a float instead of integer? Would seem more natural to me. At least as an option.

Which variable? The values set by setrandom are mostly used for primary keys. We really want integers in a range.
I think he said threshold parameter. Threshold parameter is very sensitive parameter, so we need to set double in threshold. I think that you can consent it when you see attached picture.

regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: gaussian distribution pgbench
Следующее
От: Mitsumasa KONDO
Дата:
Сообщение: Re: gaussian distribution pgbench