too much pgbench init output

От:

Peter Eisentraut <peter_e@gmx.net>

Дата:

1 сентября 2012 г. в 04:00:45

When initializing a large database, pgbench writes tons of "%d tuples
done" lines.  I propose to change this to a sort of progress counter
that stays on the same line, as in the attached patch.

Re: too much pgbench init output

От:

Peter Eisentraut <peter_e@gmx.net>

Дата:

5 сентября 2012 г. в 03:31:10

Re: too much pgbench init output

От:

Peter Eisentraut <peter_e@gmx.net>

Дата:

6 сентября 2012 г. в 03:14:21

Re: too much pgbench init output

От:

Tom Lane <tgl@sss.pgh.pa.us>

Дата:

5 сентября 2012 г. в 00:35:49

Re: too much pgbench init output

От:

Alvaro Herrera <alvherre@2ndquadrant.com>

Дата:

23 октября 2012 г. в 16:02:24

Re: too much pgbench init output

От:

Tatsuo Ishii <ishii@postgresql.org>

Дата:

6 января 2013 г. в 02:03:42

Re: too much pgbench init output

От:

Tatsuo Ishii <ishii@postgresql.org>

Дата:

6 января 2013 г. в 04:08:01

Re: too much pgbench init output

От:

Tatsuo Ishii <ishii@postgresql.org>

Дата:

6 января 2013 г. в 09:35:57

Re: too much pgbench init output

От:

Tatsuo Ishii <ishii@postgresql.org>

Дата:

7 января 2013 г. в 02:29:52

Re: too much pgbench init output

От:

Tom Lane <tgl@sss.pgh.pa.us>

Дата:

5 сентября 2012 г. в 03:44:48

Re: too much pgbench init output

От:

Peter Eisentraut <peter_e@gmx.net>

Дата:

4 сентября 2012 г. в 22:00:52

Re: too much pgbench init output

От:

Tomas Vondra <tv@fuzzy.cz>

Дата:

16 сентября 2012 г. в 22:26:15

On 5.9.2012 06:17, Robert Haas wrote:
> On Tue, Sep 4, 2012 at 11:31 PM, Peter Eisentraut  wrote:
>> On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote:
>>> Actually, this whole things seems like a solution in search of a
>>> problem to me.  We just reduced the verbosity of pgbench -i tenfold in
>>> the very recent past - I would have thought that enough to address
>>> this problem.  But maybe not.
>>
>> The problem is that
>>
>> a) It blasts out too much output and everything scrolls off the screen,
>> and
>>
>> b) There is no indication of where the end is.
>>
>> These are independent problems, and I'd be happy to address them
>> separately if there are such specific concerns attached to this.
>>
>> Speaking of tenfold, we could reduce the output frequency tenfold to
>> once every 1000000, which would alleviate this problem for a while
>> longer.
> 
> Well, I wouldn't object to displaying a percentage on each output
> line.  But I don't really like the idea of having them less frequent
> than they already are, because if you run into a situation that makes
> pgbench -i run slowly, as I occasionally do, it's marginal to tell the
> difference between "slow" and "completely hung" even with the current
> level of verbosity.
> 
> However, we could add a -q flag to run more quietly, or something like
> that.  Actually, I'd even be fine with making the default quieter,
> though we can't use -v for verbose since that's already taken.  But
> I'd like to preserve the option of getting the current amount of
> output because sometimes I need that to troubleshoot problems.
> Actually it'd be nice to even get a bit more output: say, a timestamp
> on each line, and a completion percentage... but now I'm getting
> greedy.

Hi,

I've been thinking about this a bit more, and do propose to use an
option that determines "logging step" i.e. number of items (either
directly or as a percentage) between log lines.

The attached patch defines a new option "--logging-step" that accepts
either integers or percents. For example if you want to print a line
each 1000 lines, you can to this

  $ pgbench -i -s 1000 --logging-step 1000 testdb

and if you want to print a line each 5%, you can do this

  $ pgbench -i -s 1000 --logging-step 5% testdb

and that's it.

Moreover the patch adds a record of elapsed an estimate of remaining
time. So for example with 21% you may get this:

creating tables...
21000 of 100000 tuples (21%) done (elapsed 1.56 s, remaining 5.85 s).
42000 of 100000 tuples (42%) done (elapsed 3.15 s, remaining 4.35 s).
63000 of 100000 tuples (63%) done (elapsed 4.73 s, remaining 2.78 s).
84000 of 100000 tuples (84%) done (elapsed 6.30 s, remaining 1.20 s).
100000 of 100000 tuples (100%) done (elapsed 8.17 s, remaining 0.00 s).
vacuum...
set primary keys...

Now, I've had a hard time with the patch - no matter what I do, I do get
"invalid option" error whenever I try to run that from command line for
some reason. But when I run it from gdb, it works just fine.

kind regards
Tomas

Re: too much pgbench init output

От:

Tomas Vondra <tv@fuzzy.cz>

Дата:

11 ноября 2012 г. в 17:32:08

On 23.10.2012 18:21, Robert Haas wrote:
> On Tue, Oct 23, 2012 at 12:02 PM, Alvaro Herrera
>  wrote:
>> Tomas Vondra wrote:
>>
>>> I've been thinking about this a bit more, and do propose to use an
>>> option that determines "logging step" i.e. number of items (either
>>> directly or as a percentage) between log lines.
>>>
>>> The attached patch defines a new option "--logging-step" that accepts
>>> either integers or percents. For example if you want to print a line
>>> each 1000 lines, you can to this
>>>
>>>   $ pgbench -i -s 1000 --logging-step 1000 testdb
>>
>> I find it hard to get excited about having to specify a command line
>> argument to tweak this.  Would it work to have it emit messages
>> depending on elapsed time and log scale of tuples emitted?  So for
>> example emit the first message after 5 seconds or 100k tuples, then back
>> off until (say) 15 seconds have lapsed and 1M tuples, etc?  The idea is
>> to make it verbose enough to keep a human satisfied with what he sees,
>> but not flood the terminal with pointless updates.  (I think printing
>> the ETA might be nice as well, not sure).
> 
> I like this idea.  One of the times when the more verbose output is
> really useful is when you expect it to run fast but then it turns out
> that for some reason it runs really slow.  If you make the output too
> terse, then you end up not really knowing what's going on.  Having it
> give an update at least every 5 seconds would be a nice way to give
> the user a heads-up if things aren't going as planned, without
> cluttering the normal case.

I've prepared a patch along these lines. The attached version used only
elapsed time to print the log messages each 5 seconds, so now it prints
a meessage each 5 seconds no matter what, along with an estimate of
remaining time.

I've removed the config option, although it might be useful to specify
the interval?

I'm not entirely sure how the 'log scale of tuples' should work - for
example when the time 15 seconds limit is reached, should it be reset
back to the previous step (5 seconds) to give a more detailed info, or
should it be kept at 15 seconds?

Tomas

Re: too much pgbench init output

От:

Tomas Vondra <tv@fuzzy.cz>

Дата:

19 ноября 2012 г. в 18:38:22

Re: too much pgbench init output

От:

Tomas Vondra <tv@fuzzy.cz>

Дата:

9 декабря 2012 г. в 02:41:03

On 20.11.2012 08:22, Jeevan Chalke wrote:
> Hi,
> 
> 
> On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra  > wrote:
> 
>     On 19.11.2012 11:59, Jeevan Chalke wrote:
>     > Hi,
>     >
>     > I gone through the discussion for this patch and here is my review:
>     >
>     > The main aim of this patch is to reduce the number of log lines. It is
>     > also suggested to use an options to provide the interval but few of us
>     > are not much agree on it. So final discussion ended at keeping 5 sec
>     > interval between each log line.
>     >
>     > However, I see, there are two types of users here:
>     > 1. Who likes these log lines, so that they can troubleshoot some
>     > slowness and all
>     > 2. Who do not like these log lines.
> 
>     Who likes these lines / needs them for something useful?
> 
> 
> No idea. I fall in second category.
> 
> But from the discussion, I believe some people may need detailed (or lot
> more) output.

I've read the thread again and my impression is that no one really needs
or likes those lines, but

  (1) it's rather pointless to print a message every 100k rows, as it
      usually fills the console with garbabe

  (2) it's handy to have regular updates of the progress

I don't think there're people (in the thread) that require to keep the
current amount of log messages.

But there might be users who actually use the current logs to do
something (although I can't imagine what). If we want to do this in a
backwards compatible way, we should probably use a new option (e.g.
"-q") to enable the new (less verbose) logging.

Do we want to allow both types of logging, or shall we keep only the new
one? If both, which one should be the default one?

>     > So keeping these in mind, I rather go for an option which will control
>     > this. People falling in category one can set this option to very low
>     > where as users falling under second category can keep it high.
> 
>     So what option(s) would you expect? Something that tunes the interval
>     length or something else?
> 
> 
> Interval length.

Well, I can surely imagine something like "--interval N".

>     A switch that'd choose between the old and new behavior might be a good
>     idea, but I'd strongly vote against "automagic" heuristics. It makes the
>     behavior very difficult to predict and I really don't want to force the
>     users to wonder whether the long delay is due to general slowness of the
>     machine or some "clever" logic that causes long delays between log
>     messages.
> 
>     That's why I choose a very simple approach with constant time interval.
>     It does what I was aiming for (less logs) and it's easy to predict.
>     Sure, we could choose different interval (or make it an option).
> 
> 
> I am preferring an option for choosing an interval, say from 1 second to
> 10 seconds.

Ummmm, why not to allow arbitrary integer? Why saying 1 to 10 seconds?

> BTW, what if, we put one log message every 10% (or 5%) with time taken
> (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
> This will have only 10 (or 20) lines per pgbench initialisation.
> And since we are showing time taken for each block, if any slowness
> happens, one can easily find a block by looking at the timings and
> troubleshoot it.
> Though 10% or 5% is again a debatable number, but keeping it constant
> will eliminate the requirement of an option.

That's what I originally proposed in September (see the messages from
17/9), and Alvaro was not relly excited about this.

Attached is a patch with fixed whitespace / indentation errors etc.
Otherwise it's the same as the previous version.

Tomas

Re: too much pgbench init output

От:

Tomas Vondra <tv@fuzzy.cz>

Дата:

17 декабря 2012 г. в 00:07:46

Hi,

attached is a new version of the patch that

(a) converts the 'log_step_seconds' variable to a constant (and does
    not allow changing it using a command-line option etc.)

(b) keeps the current logging as a default

(c) adds a "-q" switch that enables the new logging with a 5-second
    interval

I'm still not convinced there should be yet another know for tuning the
log interval - opinions?


Tomas

On 11.12.2012 10:23, Jeevan Chalke wrote:
> 
> 
> 
> On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra  > wrote:
> 
>     On 20.11.2012 08:22, Jeevan Chalke wrote:
>     > Hi,
>     >
>     >
>     > On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra      
>     > >> wrote:
>     >
>     >     On 19.11.2012 11:59, Jeevan Chalke wrote:
>     >     > Hi,
>     >     >
>     >     > I gone through the discussion for this patch and here is my
>     review:
>     >     >
>     >     > The main aim of this patch is to reduce the number of log
>     lines. It is
>     >     > also suggested to use an options to provide the interval but
>     few of us
>     >     > are not much agree on it. So final discussion ended at
>     keeping 5 sec
>     >     > interval between each log line.
>     >     >
>     >     > However, I see, there are two types of users here:
>     >     > 1. Who likes these log lines, so that they can troubleshoot some
>     >     > slowness and all
>     >     > 2. Who do not like these log lines.
>     >
>     >     Who likes these lines / needs them for something useful?
>     >
>     >
>     > No idea. I fall in second category.
>     >
>     > But from the discussion, I believe some people may need detailed
>     (or lot
>     > more) output.
> 
>     I've read the thread again and my impression is that no one really needs
>     or likes those lines, but
> 
>       (1) it's rather pointless to print a message every 100k rows, as it
>           usually fills the console with garbabe
> 
>       (2) it's handy to have regular updates of the progress
> 
>     I don't think there're people (in the thread) that require to keep the
>     current amount of log messages.
> 
>     But there might be users who actually use the current logs to do
>     something (although I can't imagine what). If we want to do this in a
>     backwards compatible way, we should probably use a new option (e.g.
>     "-q") to enable the new (less verbose) logging.
> 
>     Do we want to allow both types of logging, or shall we keep only the new
>     one? If both, which one should be the default one?
> 
> 
> Both the options are fine with me, but the default should be the current
> behaviour.
> 
> 
>     >     > So keeping these in mind, I rather go for an option which
>     will control
>     >     > this. People falling in category one can set this option to
>     very low
>     >     > where as users falling under second category can keep it high.
>     >
>     >     So what option(s) would you expect? Something that tunes the
>     interval
>     >     length or something else?
>     >
>     >
>     > Interval length.
> 
>     Well, I can surely imagine something like "--interval N".
> 
> 
> +1
>  
> 
> 
>     >     A switch that'd choose between the old and new behavior might
>     be a good
>     >     idea, but I'd strongly vote against "automagic" heuristics. It
>     makes the
>     >     behavior very difficult to predict and I really don't want to
>     force the
>     >     users to wonder whether the long delay is due to general
>     slowness of the
>     >     machine or some "clever" logic that causes long delays between log
>     >     messages.
>     >
>     >     That's why I choose a very simple approach with constant time
>     interval.
>     >     It does what I was aiming for (less logs) and it's easy to
>     predict.
>     >     Sure, we could choose different interval (or make it an option).
>     >
>     >
>     > I am preferring an option for choosing an interval, say from 1
>     second to
>     > 10 seconds.
> 
>     Ummmm, why not to allow arbitrary integer? Why saying 1 to 10 seconds?
> 
> 
> Hmm.. actually, I have no issues with any number there. Just put 1..10
> as we hard-coded it 5. No particular reason as such.
>  
> 
> 
>     > BTW, what if, we put one log message every 10% (or 5%) with time taken
>     > (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
>     > This will have only 10 (or 20) lines per pgbench initialisation.
>     > And since we are showing time taken for each block, if any slowness
>     > happens, one can easily find a block by looking at the timings and
>     > troubleshoot it.
>     > Though 10% or 5% is again a debatable number, but keeping it constant
>     > will eliminate the requirement of an option.
> 
>     That's what I originally proposed in September (see the messages from
>     17/9), and Alvaro was not relly excited about this.
> 
>     Attached is a patch with fixed whitespace / indentation errors etc.
>     Otherwise it's the same as the previous version.
> 
> 
> OK. Looks good now.
> 
> Any other views / suggestions are welcome.
> 
> Thanks
> 
> 
>     Tomas
> 
> 
>     --
>     Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
>     )
>     To make changes to your subscription:
>     http://www.postgresql.org/mailpref/pgsql-hackers
> 
> 
> 
> 
> -- 
> Jeevan B Chalke
> Senior Software Engineer, R&D
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
> 
> Phone: +91 20 30589500
> 
> Website: www.enterprisedb.com 
> EnterpriseDB Blog: http://blogs.enterprisedb.com/
> Follow us on Twitter: http://www.twitter.com/enterprisedb
> 
> This e-mail message (and any attachment) is intended for the use of the
> individual or entity to whom it is addressed. This message contains
> information from EnterpriseDB Corporation that may be privileged,
> confidential, or exempt from disclosure under applicable law. If you are
> not the intended recipient or authorized to receive this for the
> intended recipient, any use, dissemination, distribution, retention,
> archiving, or copying of this communication is strictly prohibited. If
> you have received this e-mail in error, please notify the sender
> immediately by reply e-mail and delete this message.

Re: too much pgbench init output

От:

Tomas Vondra <tv@fuzzy.cz>

Дата:

19 декабря 2012 г. в 21:01:33

On 19.12.2012 06:30, Jeevan Chalke wrote:
> 
> 
> 
> On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra  > wrote:
> 
>     Hi,
> 
>     attached is a new version of the patch that
> 
>     (a) converts the 'log_step_seconds' variable to a constant (and does
>         not allow changing it using a command-line option etc.)
> 
>     (b) keeps the current logging as a default
> 
>     (c) adds a "-q" switch that enables the new logging with a 5-second
>         interval
> 
>     I'm still not convinced there should be yet another know for tuning the
>     log interval - opinions?
> 
> 
> It seems that you have generated a patch over your earlier version and
> due to that it is not cleanly applying on fresh sources.
> Please generate patch on fresh sources.

Seems you're right - I've attached the proper patch against current master.

> However, I absolutely no issues with the design. Also code review is
> already done and looks good to me.
> I think to move forward on this we need someone from core-team. So I am
> planning to change its status to "ready-for-committor".
> 
> Before that please provide updated patch for final code review.

thanks
Tomas

Re: too much pgbench init output

От:

Tomas Vondra <tv@fuzzy.cz>

Дата:

6 января 2013 г. в 03:13:17

On 6.1.2013 03:03, Tatsuo Ishii wrote:
> As a committer, I have looked into the patch. I noticed two things:
> 
> 1) In the help you put '-q' option into "Common options" section. I
> think this should be moved to "Initialization options" section because
> the option is only applied while initializing.

Good point, moved.

> 2) Shouldn't a long option for '-q' be provided? Something like
> '--quiet-progress-logging'?

I don't think so. Currently pgbench has either short or long option, not
both (for the same thing). I believe we should stick to this and either
choose "-q" or "--quiet-logging" but not both.

> 3) No patches for docs found (doc/src/sgml/pgbench.sgml)

I've added a brief description of the "-q" option into the docs. IMHO
it's enough but feel free to add some more details.

There's one more thing I've just noticed - the original version of the
patch simply removed the old logging, but this one keeps both old and
quiet logging. But the old logging still uses this:

    fprintf(stderr, "%d of %d tuples (%d%%) done.\n", ....

while the new logging does this

    fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s,
remaining %.2f s).\n",

i.e. it prints additional info about elapsed/estimated time. Do we want
to keep it this way (i.e. not to mess with the old logging) or do we
want to add these new fields to the old logging too?

I suggest to add it to the old logging, to keep the log messages the
same, the only difference being the logging frequency.

Tomas

Re: too much pgbench init output

От:

Tomas Vondra <tv@fuzzy.cz>

Дата:

6 января 2013 г. в 05:21:27

On 6.1.2013 05:07, Tatsuo Ishii wrote:
>> On 6.1.2013 03:03, Tatsuo Ishii wrote:
>>> As a committer, I have looked into the patch. I noticed two things:
>>>
>>> 1) In the help you put '-q' option into "Common options" section. I
>>> think this should be moved to "Initialization options" section because
>>> the option is only applied while initializing.
>>
>> Good point, moved.
> 
> In addition to this, I'd suggest to add checking -q is only possible
> with -i option since without -i, -q is meaningless.

Done.

>> There's one more thing I've just noticed - the original version of the
>> patch simply removed the old logging, but this one keeps both old and
>> quiet logging. But the old logging still uses this:
>>
>>     fprintf(stderr, "%d of %d tuples (%d%%) done.\n", ....
>>
>> while the new logging does this
>>
>>     fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s,
>> remaining %.2f s).\n",
>>
>> i.e. it prints additional info about elapsed/estimated time. Do we want
>> to keep it this way (i.e. not to mess with the old logging) or do we
>> want to add these new fields to the old logging too?
>>
>> I suggest to add it to the old logging, to keep the log messages the
>> same, the only difference being the logging frequency.
> 
> If we do so, probably '-q' is not appropeate option name any more,
> since the only difference between old logging and new one is, the
> former is printed every 10000 lines while the latter is every 5
> seconds, which is not really "quiet". What do you think?

AFAIK the "5 second" logging is much quieter in most cases (and a bit
more verbose when the initialization gets very slower), so I think the
"quiet" logging is not a bad match although maybe there's a better name.

This change (adding the elapsed/remaining fields to the original loggin)
would be consistent with this name, because considering a single line,
the "-q" is more verbose right now.

So I'd stick with the "-q" option and added the fields to the original
logging. But I'm not opposing a different name, I just can't think of a
better one.

Tomas

Re: too much pgbench init output

От:

Tomas Vondra <tv@fuzzy.cz>

Дата:

6 января 2013 г. в 12:18:36

On 6.1.2013 10:35, Tatsuo Ishii wrote:
>>> If we do so, probably '-q' is not appropeate option name any more,
>>> since the only difference between old logging and new one is, the
>>> former is printed every 10000 lines while the latter is every 5
>>> seconds, which is not really "quiet". What do you think?
>>
>> AFAIK the "5 second" logging is much quieter in most cases (and a bit
>> more verbose when the initialization gets very slower), so I think the
>> "quiet" logging is not a bad match although maybe there's a better name.
>>
>> This change (adding the elapsed/remaining fields to the original loggin)
>> would be consistent with this name, because considering a single line,
>> the "-q" is more verbose right now.
>>
>> So I'd stick with the "-q" option and added the fields to the original
>> logging. But I'm not opposing a different name, I just can't think of a
>> better one.
> 
> Ok, I'm with you ("-q" and along with adding the elapsed/remaining
> fields to the original logging).

Great, attached is a patch that does that.

Tomas

Re: too much pgbench init output

От:

Robert Haas <robertmhaas@gmail.com>

Дата:

5 сентября 2012 г. в 04:17:09

Re: too much pgbench init output

От:

Robert Haas <robertmhaas@gmail.com>

Дата:

23 октября 2012 г. в 16:21:36

Re: too much pgbench init output

От:

Robert Haas <robertmhaas@gmail.com>

Дата:

5 сентября 2012 г. в 03:14:30

Re: too much pgbench init output

От:

Robert Haas <robertmhaas@gmail.com>

Дата:

1 сентября 2012 г. в 10:30:13

Re: too much pgbench init output

От:

Pavan Deolasee <pavan.deolasee@gmail.com>

Дата:

5 сентября 2012 г. в 04:39:12

On Wed, Sep 5, 2012 at 9:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Actually it'd be nice to even get a bit more output: say, a timestamp
on each line, and a completion percentage... but now I'm getting
greedy.

May be we need a verbosity level and print a lot less or a lot more information than what we do today. That will satisfy everyone. Hopefully.

Thanks,

Pavan

Re: too much pgbench init output

От:

Jeevan Chalke <jeevan.chalke@enterprisedb.com>

Дата:

19 ноября 2012 г. в 11:00:17

Hi,

I gone through the discussion for this patch and here is my review:

The main aim of this patch is to reduce the number of log lines. It is also suggested to use an options to provide the interval but few of us are not much agree on it. So final discussion ended at keeping 5 sec interval between each log line.

However, I see, there are two types of users here:
1. Who likes these log lines, so that they can troubleshoot some slowness and all
2. Who do not like these log lines.

So keeping these in mind, I rather go for an option which will control this. People falling in category one can set this option to very low where as users falling under second category can keep it high.

However, assuming we settled on 5 sec delay, here are few comments on that patch attached:

Comments:
=========

Patch gets applied cleanly with some whitespace errors. make and make install too went smooth.
make check was smooth. Rather it should be smooth since there are NO changes in other part of the code rather than just pgbench.c and we do not have any test-case as well.

However, here are few comments on changes in pgbench.c

1.
Since the final discussion ended at keeping a 5 seconds interval will be good enough, Author used a global int variable for that.
Given that it's just a constant, #define would be a better choice.

2.
+        /* let's not call the timing for each row, but only each 100 rows */
Why only 100 rows ? Have you done any testing to come up with number 100 ? To me it seems very low. It will be good to test with 1K or even 10K.
On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So checking every 100 rows looks overkill.

3.
Please indent following block as per the indentation just above that

    /* used to track elapsed time and estimate of the remaining time */
    instr_time    start, diff;
    double elapsed_sec, remaining_sec;
    int log_interval = 1;

4.
+            /* have ve reached the next interval? */
Do you mean "have WE reached..."

5.
While applying a patch, I got few white-space errors. But I think every patch goes through pgindent which might take care of this.

Thanks

On Sun, Nov 11, 2012 at 11:02 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 23.10.2012 18:21, Robert Haas wrote:
> On Tue, Oct 23, 2012 at 12:02 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> Tomas Vondra wrote:
>>
>>> I've been thinking about this a bit more, and do propose to use an
>>> option that determines "logging step" i.e. number of items (either
>>> directly or as a percentage) between log lines.
>>>
>>> The attached patch defines a new option "--logging-step" that accepts
>>> either integers or percents. For example if you want to print a line
>>> each 1000 lines, you can to this
>>>
>>> $ pgbench -i -s 1000 --logging-step 1000 testdb
>>
>> I find it hard to get excited about having to specify a command line
>> argument to tweak this. Would it work to have it emit messages
>> depending on elapsed time and log scale of tuples emitted? So for
>> example emit the first message after 5 seconds or 100k tuples, then back
>> off until (say) 15 seconds have lapsed and 1M tuples, etc? The idea is
>> to make it verbose enough to keep a human satisfied with what he sees,
>> but not flood the terminal with pointless updates. (I think printing
>> the ETA might be nice as well, not sure).
>
> I like this idea. One of the times when the more verbose output is
> really useful is when you expect it to run fast but then it turns out
> that for some reason it runs really slow. If you make the output too
> terse, then you end up not really knowing what's going on. Having it
> give an update at least every 5 seconds would be a nice way to give
> the user a heads-up if things aren't going as planned, without
> cluttering the normal case.

I've prepared a patch along these lines. The attached version used only
elapsed time to print the log messages each 5 seconds, so now it prints
a meessage each 5 seconds no matter what, along with an estimate of
remaining time.

I've removed the config option, although it might be useful to specify
the interval?

I'm not entirely sure how the 'log scale of tuples' should work - for
example when the time 15 seconds limit is reached, should it be reset
back to the previous step (5 seconds) to give a more detailed info, or
should it be kept at 15 seconds?

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

Re: too much pgbench init output

От:

Jeevan Chalke <jeevan.chalke@enterprisedb.com>

Дата:

26 декабря 2012 г. в 14:55:23

Looks good to me.

Will mark "Ready for Committer"

Thanks

On Thu, Dec 20, 2012 at 2:30 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 19.12.2012 06:30, Jeevan Chalke wrote:
>
>
>
> On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>> wrote:
>
> Hi,
>
> attached is a new version of the patch that
>
> (a) converts the 'log_step_seconds' variable to a constant (and does
> not allow changing it using a command-line option etc.)
>
> (b) keeps the current logging as a default
>
> (c) adds a "-q" switch that enables the new logging with a 5-second
> interval
>
> I'm still not convinced there should be yet another know for tuning the
> log interval - opinions?
>
>
> It seems that you have generated a patch over your earlier version and
> due to that it is not cleanly applying on fresh sources.
> Please generate patch on fresh sources.

Seems you're right - I've attached the proper patch against current master.

> However, I absolutely no issues with the design. Also code review is
> already done and looks good to me.
> I think to move forward on this we need someone from core-team. So I am
> planning to change its status to "ready-for-committor".
>
> Before that please provide updated patch for final code review.

thanks
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: too much pgbench init output

От:

Jeevan Chalke <jeevan.chalke@enterprisedb.com>

Дата:

20 ноября 2012 г. в 07:23:10

Hi,

On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 19.11.2012 11:59, Jeevan Chalke wrote:
> Hi,
>
> I gone through the discussion for this patch and here is my review:
>
> The main aim of this patch is to reduce the number of log lines. It is
> also suggested to use an options to provide the interval but few of us
> are not much agree on it. So final discussion ended at keeping 5 sec
> interval between each log line.
>
> However, I see, there are two types of users here:
> 1. Who likes these log lines, so that they can troubleshoot some
> slowness and all
> 2. Who do not like these log lines.

Who likes these lines / needs them for something useful?

No idea. I fall in second category.

But from the discussion, I believe some people may need detailed (or lot more) output.

> So keeping these in mind, I rather go for an option which will control
> this. People falling in category one can set this option to very low
> where as users falling under second category can keep it high.

So what option(s) would you expect? Something that tunes the interval
length or something else?

Interval length.

A switch that'd choose between the old and new behavior might be a good
idea, but I'd strongly vote against "automagic" heuristics. It makes the
behavior very difficult to predict and I really don't want to force the
users to wonder whether the long delay is due to general slowness of the
machine or some "clever" logic that causes long delays between log messages.

That's why I choose a very simple approach with constant time interval.
It does what I was aiming for (less logs) and it's easy to predict.
Sure, we could choose different interval (or make it an option).

I am preferring an option for choosing an interval, say from 1 second to 10 seconds.

BTW, what if, we put one log message every 10% (or 5%) with time taken (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
This will have only 10 (or 20) lines per pgbench initialisation.
And since we are showing time taken for each block, if any slowness happens, one can easily find a block by looking at the timings and troubleshoot it.
Though 10% or 5% is again a debatable number, but keeping it constant will eliminate the requirement of an option.

> However, assuming we settled on 5 sec delay, here are few comments on
> that patch attached:
>
> Comments:
> =========
>
> Patch gets applied cleanly with some whitespace errors. make and make
> install too went smooth.
> make check was smooth. Rather it should be smooth since there are NO
> changes in other part of the code rather than just pgbench.c and we do
> not have any test-case as well.
>
> However, here are few comments on changes in pgbench.c
>
> 1.
> Since the final discussion ended at keeping a 5 seconds interval will be
> good enough, Author used a global int variable for that.
> Given that it's just a constant, #define would be a better choice.

Good point. Although if we add an option to supply different values, a
variable is a better match.

Exactly, if we ended up with an option then it looks good. But in your current patch it was constant, so #define should be preferred.

> 2.
> + /* let's not call the timing for each row, but only each 100
> rows */
> Why only 100 rows ? Have you done any testing to come up with number 100
> ? To me it seems very low. It will be good to test with 1K or even 10K.
> On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in
> VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So
> checking every 100 rows looks overkill.

Well, the 100 is clearly a magical constant. The goal was to choose a
value large enough to amortize the getlocaltime() cost, but small enough
to print info even in cases when the performance sucks for some reason.

I've seen issues where the speed suddenly dropped to a fraction of the
expected value, e.g. 100 rows/second, and in those cases you'd have to
wait for a very long time to actually get the next log message (with the
suggested 10k step).

So 100 seems like a good compromise to me ...

Hmm... inserting just 100 rows / seconds is really slow.
Since you already seen such behaviour, I have no objection keeping it 100.

> 3.
> Please indent following block as per the indentation just above that
>
> /* used to track elapsed time and estimate of the remaining time */
> instr_time start, diff;
> double elapsed_sec, remaining_sec;
> int log_interval = 1;

OK

> 4.
> + /* have ve reached the next interval? */
> Do you mean "have WE reached..."

OK

>
> 5.
> While applying a patch, I got few white-space errors. But I think every
> patch goes through pgindent which might take care of this.

OK

>
> Thanks

Thanks for the review. I'll wait for a bit more discussion about the
choices before submitting another version of the patch.

Sure

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thanks
--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

Re: too much pgbench init output

От:

Jeevan Chalke <jeevan.chalke@enterprisedb.com>

Дата:

11 декабря 2012 г. в 09:24:02

On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 20.11.2012 08:22, Jeevan Chalke wrote:
> Hi,
>
>
> On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>> wrote:
>
> On 19.11.2012 11:59, Jeevan Chalke wrote:
> > Hi,
> >
> > I gone through the discussion for this patch and here is my review:
> >
> > The main aim of this patch is to reduce the number of log lines. It is
> > also suggested to use an options to provide the interval but few of us
> > are not much agree on it. So final discussion ended at keeping 5 sec
> > interval between each log line.
> >
> > However, I see, there are two types of users here:
> > 1. Who likes these log lines, so that they can troubleshoot some
> > slowness and all
> > 2. Who do not like these log lines.
>
> Who likes these lines / needs them for something useful?
>
>
> No idea. I fall in second category.
>
> But from the discussion, I believe some people may need detailed (or lot
> more) output.

I've read the thread again and my impression is that no one really needs
or likes those lines, but

(1) it's rather pointless to print a message every 100k rows, as it
usually fills the console with garbabe

(2) it's handy to have regular updates of the progress

I don't think there're people (in the thread) that require to keep the
current amount of log messages.

But there might be users who actually use the current logs to do
something (although I can't imagine what). If we want to do this in a
backwards compatible way, we should probably use a new option (e.g.
"-q") to enable the new (less verbose) logging.

Do we want to allow both types of logging, or shall we keep only the new
one? If both, which one should be the default one?

Both the options are fine with me, but the default should be the current behaviour.

> > So keeping these in mind, I rather go for an option which will control
> > this. People falling in category one can set this option to very low
> > where as users falling under second category can keep it high.
>
> So what option(s) would you expect? Something that tunes the interval
> length or something else?
>
>
> Interval length.

Well, I can surely imagine something like "--interval N".

> A switch that'd choose between the old and new behavior might be a good
> idea, but I'd strongly vote against "automagic" heuristics. It makes the
> behavior very difficult to predict and I really don't want to force the
> users to wonder whether the long delay is due to general slowness of the
> machine or some "clever" logic that causes long delays between log
> messages.
>
> That's why I choose a very simple approach with constant time interval.
> It does what I was aiming for (less logs) and it's easy to predict.
> Sure, we could choose different interval (or make it an option).
>
>
> I am preferring an option for choosing an interval, say from 1 second to
> 10 seconds.

Ummmm, why not to allow arbitrary integer? Why saying 1 to 10 seconds?

Hmm.. actually, I have no issues with any number there. Just put 1..10 as we hard-coded it 5. No particular reason as such.

> BTW, what if, we put one log message every 10% (or 5%) with time taken
> (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
> This will have only 10 (or 20) lines per pgbench initialisation.
> And since we are showing time taken for each block, if any slowness
> happens, one can easily find a block by looking at the timings and
> troubleshoot it.
> Though 10% or 5% is again a debatable number, but keeping it constant
> will eliminate the requirement of an option.

That's what I originally proposed in September (see the messages from
17/9), and Alvaro was not relly excited about this.

Attached is a patch with fixed whitespace / indentation errors etc.
Otherwise it's the same as the previous version.

OK. Looks good now.

Any other views / suggestions are welcome.

Thanks

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: too much pgbench init output

От:

Jeevan Chalke <jeevan.chalke@enterprisedb.com>

Дата:

19 декабря 2012 г. в 05:30:42

On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

Hi,

attached is a new version of the patch that

(a) converts the 'log_step_seconds' variable to a constant (and does
not allow changing it using a command-line option etc.)

(b) keeps the current logging as a default

(c) adds a "-q" switch that enables the new logging with a 5-second
interval

I'm still not convinced there should be yet another know for tuning the
log interval - opinions?

It seems that you have generated a patch over your earlier version and due to that it is not cleanly applying on fresh sources.

Please generate patch on fresh sources.

However, I absolutely no issues with the design. Also code review is already done and looks good to me.

I think to move forward on this we need someone from core-team. So I am planning to change its status to "ready-for-committor".

Before that please provide updated patch for final code review.

Thanks

Tomas

On 11.12.2012 10:23, Jeevan Chalke wrote:
>
>
>
> On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>> wrote:
>
> On 20.11.2012 08:22, Jeevan Chalke wrote:
> > Hi,
> >
> >
> > On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>
> > <mailto:tv@fuzzy.cz <mailto:tv@fuzzy.cz>>> wrote:
> >
> > On 19.11.2012 11:59, Jeevan Chalke wrote:
> > > Hi,
> > >
> > > I gone through the discussion for this patch and here is my
> review:
> > >
> > > The main aim of this patch is to reduce the number of log
> lines. It is
> > > also suggested to use an options to provide the interval but
> few of us
> > > are not much agree on it. So final discussion ended at
> keeping 5 sec
> > > interval between each log line.
> > >
> > > However, I see, there are two types of users here:
> > > 1. Who likes these log lines, so that they can troubleshoot some
> > > slowness and all
> > > 2. Who do not like these log lines.
> >
> > Who likes these lines / needs them for something useful?
> >
> >
> > No idea. I fall in second category.
> >
> > But from the discussion, I believe some people may need detailed
> (or lot
> > more) output.
>
> I've read the thread again and my impression is that no one really needs
> or likes those lines, but
>
> (1) it's rather pointless to print a message every 100k rows, as it
> usually fills the console with garbabe
>
> (2) it's handy to have regular updates of the progress
>
> I don't think there're people (in the thread) that require to keep the
> current amount of log messages.
>
> But there might be users who actually use the current logs to do
> something (although I can't imagine what). If we want to do this in a
> backwards compatible way, we should probably use a new option (e.g.
> "-q") to enable the new (less verbose) logging.
>
> Do we want to allow both types of logging, or shall we keep only the new
> one? If both, which one should be the default one?
>
>
> Both the options are fine with me, but the default should be the current
> behaviour.
>
>
> > > So keeping these in mind, I rather go for an option which
> will control
> > > this. People falling in category one can set this option to
> very low
> > > where as users falling under second category can keep it high.
> >
> > So what option(s) would you expect? Something that tunes the
> interval
> > length or something else?
> >
> >
> > Interval length.
>
> Well, I can surely imagine something like "--interval N".
>
>
> +1
>
>
>
> > A switch that'd choose between the old and new behavior might
> be a good
> > idea, but I'd strongly vote against "automagic" heuristics. It
> makes the
> > behavior very difficult to predict and I really don't want to
> force the
> > users to wonder whether the long delay is due to general
> slowness of the
> > machine or some "clever" logic that causes long delays between log
> > messages.
> >
> > That's why I choose a very simple approach with constant time
> interval.
> > It does what I was aiming for (less logs) and it's easy to
> predict.
> > Sure, we could choose different interval (or make it an option).
> >
> >
> > I am preferring an option for choosing an interval, say from 1
> second to
> > 10 seconds.
>
> Ummmm, why not to allow arbitrary integer? Why saying 1 to 10 seconds?
>
>
> Hmm.. actually, I have no issues with any number there. Just put 1..10
> as we hard-coded it 5. No particular reason as such.
>
>
>
> > BTW, what if, we put one log message every 10% (or 5%) with time taken
> > (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
> > This will have only 10 (or 20) lines per pgbench initialisation.
> > And since we are showing time taken for each block, if any slowness
> > happens, one can easily find a block by looking at the timings and
> > troubleshoot it.
> > Though 10% or 5% is again a debatable number, but keeping it constant
> > will eliminate the requirement of an option.
>
> That's what I originally proposed in September (see the messages from
> 17/9), and Alvaro was not relly excited about this.
>
> Attached is a patch with fixed whitespace / indentation errors etc.
> Otherwise it's the same as the previous version.
>
>
> OK. Looks good now.
>
> Any other views / suggestions are welcome.
>
> Thanks
>
>
> Tomas
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
> <mailto:pgsql-hackers@postgresql.org>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>
>
> --
> Jeevan B Chalke
> Senior Software Engineer, R&D
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
> Phone: +91 20 30589500
>
> Website: www.enterprisedb.com <http://www.enterprisedb.com>
> EnterpriseDB Blog: http://blogs.enterprisedb.com/
> Follow us on Twitter: http://www.twitter.com/enterprisedb
>
> This e-mail message (and any attachment) is intended for the use of the
> individual or entity to whom it is addressed. This message contains
> information from EnterpriseDB Corporation that may be privileged,
> confidential, or exempt from disclosure under applicable law. If you are
> not the intended recipient or authorized to receive this for the
> intended recipient, any use, dissemination, distribution, retention,
> archiving, or copying of this communication is strictly prohibited. If
> you have received this e-mail in error, please notify the sender
> immediately by reply e-mail and delete this message.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: too much pgbench init output

От:

"Tomas Vondra" <tv@fuzzy.cz>

Дата:

1 сентября 2012 г. в 20:10:09