Re: gaussian distribution pgbench

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: gaussian distribution pgbench
Дата
Msg-id CA+TgmobrbQVL_f+-NVSJ5j1_FWoZTZ-jaS4Z0G8Q6R3HwoEW9w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: gaussian distribution pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Ответы Re: gaussian distribution pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Re: gaussian distribution pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Список pgsql-hackers
On Wed, Jul 16, 2014 at 12:57 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> Well, I think the feedback has been pretty clear, honestly.  Here's
>> what I'm unhappy about: I can't understand what these options are
>> actually doing.
>
> We can try to improve the documentation, once more!
>
> However, ISTM that it is not the purpose of pgbench documentation to be a
> primer about what is an exponential or gaussian distribution, so the idea
> would yet be to have a relatively compact explanation, and that the
> interested but clueless reader would document h..self from wikipedia or a
> text book or a friend or a math teacher (who could be a friend as well:-).

Well, I think it's a balance.  I agree that the pgbench documentation
shouldn't try to substitute for a text book or a math teacher, but I
also think that you shouldn't necessarily need to refer to a text book
or a math teacher in order to figure out how to use pgbench.  Saying
"it's complicated, so we don't have to explain it" would be a cop out;
we need to *make* it simple.  And if there's no way to do that, then
IMHO we should reject the patch in favor of some future patch that
implements something that will be easy for users to understand.

>>>  [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
>>> starting vacuum...end.
>>> transaction type: Exponential distribution TPC-B (sort of)
>>> scaling factor: 1
>>> exponential threshold: 10.00000
>>>
>>> decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
>>> highest/lowest percent of the range: 9.5% 0.0%
>>
>> I don't have a clue what that means.  None.
>
> Maybe we could add in front of the decile/percent
>
> "distribution of increasing account key values selected by pgbench:"

I still wouldn't know what that meant.  And it misses the point
anyway: if the documentation is good, this will be unnecessary.  If
the documentation is bad, a printout that tries to illustrate it by
example is not an acceptable substitute.

>> Here is an example of an explanation that would make sense to me.
>> This is not the actual behavior of your patch, I'm quite sure, so this
>> is just an example of the *kind* of explanation that I think is
>> needed:
>
> This is more or less the approximate behavior of the patch, but for 1% of
> the range, not 50%. However I'm not sure that the current documentation is
> so bad.

I think it isn't, because in the system I described, a larger value
indicates a flatter distribution, but in the documentation, a smaller
value indicates a flatter distribution.  That having been said, I
agree the current documentation for the exponential distribution is
not too bad.  But this part does not make sense:

+      A crude approximation of the distribution is that the most frequent 1%
+      values are drawn <replaceable>threshold</>% of the time.
+      The closer to 0.0 the threshold, the flatter (more uniform) the access
+      distribution.

Given the first statement, I'd expect the lowest possible threshold to
be 0.01, not 0.

The documentation for the Gaussian distribution is in somewhat worse
shape.  Unlike the documentation for exponential, it makes no attempt
at all to give the user a clear idea what the distribution actually
looks like.  The closest it comes is this:

+      In other worlds, the larger the <replaceable>threshold</>,
+      the narrower the access range around the middle.

But that's not really very close - there's no way for a user to judge
what impact the threshold parameter actually has except to try it.
Unlike the discussion of exponential, which contains a fairly-precise
mathematical characterization of the behavior, the Gaussian stuff has
nothing except a hand-wavy explanation that a higher threshold skews
the distribution more.  (Also, the English expression is "in other
words" not "in other worlds" - but in fact the phrase has no business
in that sentence at all, because it is not reiterating the contents of
the previous sentence in different language, but rather making a new
point entirely.  And the following sentence does not start with a
capital letter, though maybe that's because it was intended to be
incorporated into this sentence somehow.)

I think that you also need to consider which instances of the words
"gaussian" and "exponential" are referring to the option and which are
referring to the abstract mathematical concept.  When you're talking
about the option, you should use all lower-case (as you've done) but
with <literal> tags or similar.  When you're referring to the
mathematical distribution, Gaussian should be capitalized.

BTW, I agree with both Heikki's suggestion that we make these options
to setrandom only and not expose command-line options for them, and
with Andres's critique that the documentation of those options is far
too repetitive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [TODO] Process pg_hba.conf keywords as case-insensitive
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Re: 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"