Re: PATCH: pgbench - random sampling of transaction written into log

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: PATCH: pgbench - random sampling of transaction written into log
Дата
Msg-id CA+TgmoYENHaLJoWDvMwusTxeUatp2Fp3Hd7h-tRj2Jc5X5u-qw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: pgbench - random sampling of transaction written into log  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: PATCH: pgbench - random sampling of transaction written into log
Список pgsql-hackers
On Sun, Aug 26, 2012 at 1:04 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> Attached is an improved patch, with a call to rand() replaced with
> getrand().
>
> I was thinking about the counter but I'm not really sure how to handle
> cases like "39%" - I'm not sure a plain (counter % 100 < 37) is not a
> good sampling, because it always keeps continuous sequences of
> transactions. Maybe there's a clever way to use a counter, but let's
> stick to a getrand() unless we can prove is't causing issues. Especially
> considering that a lot of data won't be be written at all with low
> sampling rates.

I like this patch, and I think sticking with a random number is a good
idea.  But I have two suggestions.  Number one, I think the sampling
rate should be stored as a float, not an integer, because I can easily
imagine wanting a sampling rate that is not an integer percentage -
especially, one that is less than one percent, like half a percent or
a tenth of a percent.  Also, I suggest that the command-line option
should be a long option rather than a single character option.  That
will be more mnemonic and avoid using up too many single letter
options, of which there is a limited supply.  So to sample every
hundredth result, you could do something like this:

pgbench --latency-sample-rate 0.01

Another option I personally think would be useful is an option to
record only those latencies that are above some minimum bound, like
this:

pgbench --latency-only-if-more-than $MICROSECONDS

The problem with recording all the latencies is that it tends to have
a material impact on throughput.  Your patch should address that for
the case where you just want to characterize the latency, but it would
also be nice to have a way of recording the outliers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: hunspell and tsearch2 ?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: PATCH: optimized DROP of multiple tables within a transaction