Re: gaussian distribution pgbench

Поиск
Список
Период
Сортировка
От Mitsumasa KONDO
Тема Re: gaussian distribution pgbench
Дата
Msg-id CADupcHXwhX8ab6jjCVp4up8jJjpPxS8W2xedcqan1nx+yUhT1g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: gaussian distribution pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Ответы Re: gaussian distribution pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Список pgsql-hackers



2014-07-18 5:13 GMT+09:00 Fabien COELHO <coelho@cri.ensmp.fr>:

However, ISTM that it is not the purpose of pgbench documentation to be a
primer about what is an exponential or gaussian distribution, so the idea
would yet be to have a relatively compact explanation, and that the
interested but clueless reader would document h..self from wikipedia or a
text book or a friend or a math teacher (who could be a friend as well:-).

Well, I think it's a balance.  I agree that the pgbench documentation
shouldn't try to substitute for a text book or a math teacher, but I
also think that you shouldn't necessarily need to refer to a text book
or a math teacher in order to figure out how to use pgbench.  Saying
"it's complicated, so we don't have to explain it" would be a cop out;
we need to *make* it simple.  And if there's no way to do that, then
IMHO we should reject the patch in favor of some future patch that
implements something that will be easy for users to understand.

 [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
starting vacuum...end.
transaction type: Exponential distribution TPC-B (sort of)
scaling factor: 1
exponential threshold: 10.00000

decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
highest/lowest percent of the range: 9.5% 0.0%

I don't have a clue what that means.  None.

Maybe we could add in front of the decile/percent

"distribution of increasing account key values selected by pgbench:"

I still wouldn't know what that meant.  And it misses the point
anyway: if the documentation is good, this will be unnecessary.  If
the documentation is bad, a printout that tries to illustrate it by
example is not an acceptable substitute.

The decile description is quite classic when discussing statistics.
Yeah, maybe, I and Fabien-san don't believe that he doesn't know the decile percentage.
However, I think more description about decile is needed.

For example,  when we set the number of transaction 10,000 (-t 10000), range of aid is 100,000,
and --exponential is 10, decile percents is under following as you know.

decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
highest/lowest percent of the range: 9.5% 0.0%  
 
They mean that,
#number of access in range of aid (from decile percents):
  1 to 10,000             => 6,320 times
  10,001 to 20,000    => 2,330 times
  20,001 to 30,000    => 860 times
  ...
  90,001 to 10,0000  => 0 times

#number of access in range of aid (from highest/lowest percent of the range):
 1 to 1,000                => 950 times
 ...
 99,001 to 10,0000   => 0 times

that's all.

Their information is easy to understand distribution of access probability, isn't it?
Maybe I and Fabien-san have a knowledge of mathematics, so we think decile percentage is common sense.
But if it isn't common sense, I agree with adding about these explanation in the documents.

Best regards,
--
Mitsumasa KONDO

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Portability issues in TAP tests
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Doing better at HINTing an appropriate column within errorMissingColumn()