Обсуждение: [PATCH] pg_basebackup: progress report max once per second
Prevent excessive progress reporting that can grow to gigabytes of output with large databases. ---src/bin/pg_basebackup/pg_basebackup.c | 21 ++++++++++++---------1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index a1e12a8..90c4683 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -45,6 +45,7 @@ bool streamwal = false;bool fastcheckpoint = false;bool writerecoveryconf = false;int standby_message_timeout = 10 * 1000; /* 10 sec = default */ +int last_progress_report = 0;/* Progress counters */static uint64 totalsize; @@ -74,7 +75,7 @@ static PQExpBuffer recoveryconfcontents = NULL;/* Function headers */static void usage(void);static voidverify_dir_is_empty_or_create(char *dirname); -static void progress_report(int tablespacenum, const char *filename); +static void progress_report(int tablespacenum, const char *filename, int force);static void ReceiveTarFile(PGconn *conn,PGresult *res, int rownum);static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum); @@ -399,12 +400,15 @@ verify_dir_is_empty_or_create(char *dirname) * is enabled, also print the current file name. */staticvoid -progress_report(int tablespacenum, const char *filename) +progress_report(int tablespacenum, const char *filename, int force){ int percent = (int) ((totaldone / 1024)* 100 / totalsize); char totaldone_str[32]; char totalsize_str[32]; + if(!showprogress || (time(NULL) == last_progress_report && !force)) return; /* Max once per second */ + last_progress_report = time(NULL); + /* * Avoid overflowing past 100% or the full size. This may make the total * size number change as we approachthe end of the backup (the estimate @@ -850,9 +854,9 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum) } } totaldone += r; - if (showprogress) - progress_report(rownum, filename); + progress_report(rownum, filename, 0); } /* while (1) */ + progress_report(rownum, filename, 1); if (copybuf != NULL) PQfreemem(copybuf); @@ -1073,8 +1077,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) disconnect_and_exit(1); } totaldone += r; - if (showprogress) - progress_report(rownum, filename); + progress_report(rownum, filename, 0); current_len_left -= r; if (current_len_left == 0&& current_padding == 0) @@ -1090,6 +1093,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) } } /* continuing data in existing file */ } /* loop over all data blocks */ + progress_report(rownum, filename, 1); if (file != NULL) { @@ -1450,8 +1454,7 @@ BaseBackup(void) tablespacecount = PQntuples(res); for (i = 0; i < PQntuples(res); i++) { - if (showprogress) - totalsize += atol(PQgetvalue(res, i, 2)); + totalsize += atol(PQgetvalue(res, i, 2)); /* * Verify tablespace directories are empty. Don't botherwith the @@ -1498,7 +1501,7 @@ BaseBackup(void) if (showprogress) { - progress_report(PQntuples(res), NULL); + progress_report(PQntuples(res), NULL, 1); fprintf(stderr, "\n"); /* Need to move to next line */ } PQclear(res); -- 1.8.4.2
Iy On Thu, Nov 14, 2013 at 3:51 AM, Mika Eloranta <mel@ohmu.fi> wrote: > Prevent excessive progress reporting that can grow to gigabytes > of output with large databases. It might be interesting to add this patch to the next commit fest where you could get a formal review: https://commitfest.postgresql.org/action/commitfest_view?id=20 Then just be sure to attach a patch file properly to your email such as people can grab and test the patch easily. -- Michael
On 13 Nov 2013, at 20:51, Mika Eloranta <mel@ohmu.fi> wrote: > Prevent excessive progress reporting that can grow to gigabytes > of output with large databases. Same patch as an attachment. -- Mika Eloranta Ohmu Ltd. http://www.ohmu.fi/
Вложения
On 11/14/13 10:27 AM, Mika Eloranta wrote: > On 13 Nov 2013, at 20:51, Mika Eloranta <mel@ohmu.fi> wrote: > >> Prevent excessive progress reporting that can grow to gigabytes >> of output with large databases. > > Same patch as an attachment. I can't comment on the usefulness of this patch, but the first hunk in progress_report does not conform to the code style guidelines of the project. Regards, Marko Tiikkaja
On Thu, Nov 14, 2013 at 6:27 PM, Mika Eloranta <mel@ohmu.fi> wrote: > On 13 Nov 2013, at 20:51, Mika Eloranta <mel@ohmu.fi> wrote: > >> Prevent excessive progress reporting that can grow to gigabytes >> of output with large databases. > > Same patch as an attachment. > > -- > Mika Eloranta > Ohmu Ltd. http://www.ohmu.fi/ > I got error with following scenario. $ initdb -D data -E UTF8 --no-locale /* setting the replication parameters */ $ pg_basebackup -D 2data Floating point exception LOG: could not send data to client: Broken pipe ERROR: base backup could not send data, aborting backup FATAL: connection to client lost Regards, ------- Sawada Masahiko
On Thu, Nov 14, 2013 at 10:27 AM, Mika Eloranta <mel@ohmu.fi> wrote:
On 13 Nov 2013, at 20:51, Mika Eloranta <mel@ohmu.fi> wrote:Same patch as an attachment.
> Prevent excessive progress reporting that can grow to gigabytes
> of output with large databases.
Would it not make more sense to instead store the last number printed, and only print it if the percentage has changed? AIUI with this patch we still print the same thing on top of itself if it takes >1 second to get 1% further.
(Except for verbose mode - but if you're asking for verbose mode, you are *asking* to get lots of output)
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Thu, Jan 9, 2014, at 20:05, Magnus Hagander wrote: > On Thu, Nov 14, 2013 at 10:27 AM, Mika Eloranta <mel@ohmu.fi> wrote: >> On 13 Nov 2013, at 20:51, Mika Eloranta <mel@ohmu.fi> wrote: >> >> >> > Prevent excessive progress reporting that can grow to gigabytes >> > of output with large databases. >> >> >> Same patch as an attachment. > > Would it not make more sense to instead store the last number printed, and only print it if the percentage has changed?AIUI with this patch we still print the same thing on top of itself if it takes >1 second to get 1% further. > > (Except for verbose mode - but if you're asking for verbose mode, you are *asking* to get lots of output) (re-sent response as I used the wrong sender address previously, sorry about the dupe) Printing out progress periodically is IMHO slightly better as the interactive user can see that there is some progress (e.g. by pressing enter for a new empty console line) during a huge basebackup operation. The original problem I wanted to address was that I had a daemon watching over the basebackup process and reading its output to make sure that the basebackup is proceeding properly. It also wrote all the output to a logfile for postmortem analysis. The log file grew to a very big size (multiple gigabytes due to the progress prints). With my patch the log was only a few kilos, without any negative effects. The excessive progress reporting can also be an issue in an interactive session over a slow network (mobile), hogging both time and bandwidth. Cheers, Mika
On Thu, Jan 9, 2014, at 20:05, Magnus Hagander wrote: > On Thu, Nov 14, 2013 at 10:27 AM, Mika Eloranta <mel@ohmu.fi> wrote: >> On 13 Nov 2013, at 20:51, Mika Eloranta <mel@ohmu.fi> wrote: >> >> >> > Prevent excessive progress reporting that can grow to gigabytes >> > of output with large databases. >> >> >> Same patch as an attachment. > > Would it not make more sense to instead store the last number printed, and only print it if the percentage has changed?AIUI with this patch we still print the same thing on top of itself if it takes >1 second to get 1% further. > > (Except for verbose mode - but if you're asking for verbose mode, you are *asking* to get lots of output) Printing out progress periodically is IMHO slightly better as the interactive user can see that there is some progress (e.g. by pressing enter for a new empty console line) during a huge basebackup operation. The original problem I wanted to address was that I had a daemon watching over the basebackup process and reading its output to make sure that the basebackup is proceeding properly. It also wrote all the output to a logfile for postmortem analysis. The log file grew to a very big size (multiple gigabytes due to the progress prints). With my patch the log was only a few kilos, without any negative effects. The excessive progress reporting can also be an issue in an interactive session over a slow network (mobile), hogging both time and bandwidth. Cheers, Mika
On Fri, Jan 10, 2014 at 3:56 AM, Mika Eloranta <mel@ohmu.fi> wrote:
On Thu, Jan 9, 2014, at 20:05, Magnus Hagander wrote:> On Thu, Nov 14, 2013 at 10:27 AM, Mika Eloranta <mel@ohmu.fi> wrote:(re-sent response as I used the wrong sender address previously, sorry
>> On 13 Nov 2013, at 20:51, Mika Eloranta <mel@ohmu.fi> wrote:
>>
>>
>>
> Prevent excessive progress reporting that can grow to gigabytes
>>
> of output with large databases.
>>
>>
>> Same patch as an attachment.
>
> Would it not make more sense to instead store the last number printed, and only print it if the percentage has changed? AIUI with this patch we still print the same thing on top of itself if it takes >1 second to get 1% further.
>
> (Except for verbose mode - but if you're asking for verbose mode, you are *asking* to get lots of output)
about the dupe)
Printing out progress periodically is IMHO slightly better as the
interactive user can see that there is some progress (e.g. by pressing
enter for a new empty console line) during a huge basebackup operation.
That's an argument I hadn't considered - but I still think it's acceptable to wait until the next percentage digit in this case.
The original problem I wanted to address was that I had a daemon
watching over the basebackup process and reading its output to make sure
that the basebackup is proceeding properly. It also wrote all the output
to a logfile for postmortem analysis. The log file grew to a very big
size (multiple gigabytes due to the progress prints). With my patch the
log was only a few kilos, without any negative effects. The excessive
progress reporting can also be an issue in an interactive session over a
slow network (mobile), hogging both time and bandwidth.
Yeah, I was guessing it was something like that. I didn't realize you'd actually monitor it through a deamon (I've just looked at the output filesize when minitoring things like that), but the remote connection was easily reproducible. I definitely agree we should do something about it.
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
18.11.2013 07:53, Sawada Masahiko kirjoitti: >> On 13 Nov 2013, at 20:51, Mika Eloranta <mel@ohmu.fi> wrote: >>> Prevent excessive progress reporting that can grow to gigabytes >>> of output with large databases. > > I got error with following scenario. > > $ initdb -D data -E UTF8 --no-locale > /* setting the replication parameters */ > $ pg_basebackup -D 2data > Floating point exception > LOG: could not send data to client: Broken pipe > ERROR: base backup could not send data, aborting backup > FATAL: connection to client lost Attached a rebased patch with a fix for division by zero error plus some code style issues. I also moved the patch to the current commitfest. / Oskari
Вложения
On Tue, Jan 21, 2014 at 7:17 AM, Oskari Saarenmaa <os@ohmu.fi> wrote: > 18.11.2013 07:53, Sawada Masahiko kirjoitti: >>> >>> On 13 Nov 2013, at 20:51, Mika Eloranta <mel@ohmu.fi> wrote: >>>> >>>> Prevent excessive progress reporting that can grow to gigabytes >>>> of output with large databases. >> >> >> I got error with following scenario. >> >> $ initdb -D data -E UTF8 --no-locale >> /* setting the replication parameters */ >> $ pg_basebackup -D 2data >> Floating point exception >> LOG: could not send data to client: Broken pipe >> ERROR: base backup could not send data, aborting backup >> FATAL: connection to client lost > > > Attached a rebased patch with a fix for division by zero error plus some > code style issues. I also moved the patch to the current commitfest. > Thank you for updating the patch! I have reviewed it easily. I didn't get error of compile, and the patch works fine. I have one question. What does it mean the calling progress_report() which you added at end of ReceiveUnpackTarFile() and RecieveTarFile()? I could not understand necessity of this code. And the force argument of progress_report() is also same. If you would like to use 'force' option, I think that you should add the comment to source code about it. Regards, ------- Sawada Masahiko
31.01.2014 10:59, Sawada Masahiko kirjoitti: > On Tue, Jan 21, 2014 at 7:17 AM, Oskari Saarenmaa <os@ohmu.fi> wrote: >> 18.11.2013 07:53, Sawada Masahiko kirjoitti: >>>> >>>> On 13 Nov 2013, at 20:51, Mika Eloranta <mel@ohmu.fi> wrote: >>>>> >>>>> Prevent excessive progress reporting that can grow to gigabytes >>>>> of output with large databases. >>> >>> >>> I got error with following scenario. >>> >>> $ initdb -D data -E UTF8 --no-locale >>> /* setting the replication parameters */ >>> $ pg_basebackup -D 2data >>> Floating point exception >>> LOG: could not send data to client: Broken pipe >>> ERROR: base backup could not send data, aborting backup >>> FATAL: connection to client lost >> >> >> Attached a rebased patch with a fix for division by zero error plus some >> code style issues. I also moved the patch to the current commitfest. >> > > Thank you for updating the patch! > I have reviewed it easily. > > I didn't get error of compile, and the patch works fine. > > I have one question. > What does it mean the calling progress_report() which you added at end > of ReceiveUnpackTarFile() and RecieveTarFile()? > I could not understand necessity of this code. And the force argument > of progress_report() is also same. > If you would like to use 'force' option, I think that you should add > the comment to source code about it. I think the idea in the new progress_report() call (with force == true) is to make sure that there is at least one progress_report call that actually writes the progress report. Otherwise the final report may go missing if it gets suppressed by the time-based check. The force argument as used in the new call skips that check. / Oskari
On Sat, Feb 1, 2014 at 8:29 PM, Oskari Saarenmaa <os@ohmu.fi> wrote: > 31.01.2014 10:59, Sawada Masahiko kirjoitti: > > I think the idea in the new progress_report() call (with force == true) is > to make sure that there is at least one progress_report call that actually > writes the progress report. Otherwise the final report may go missing if it > gets suppressed by the time-based check. The force argument as used in the > new call skips that check. > I understood. I have two concerns as follows. - I think that there is possible that progress_report() is called frequently ( less than 1 second). That is, progress_report() is called with force == true after progress_report was called with force == false and execute this function. - progress_report() is called even if -P option is disabled. I'm concerned about that is cause of performance degradation. Regards, ------- Sawada Masahiko
On Mon, Feb 3, 2014 at 4:14 AM, Sawada Masahiko <sawada.mshk@gmail.com> wrote:
As for the first one - I believe that's the point. progress_report should be called with force==true after it was called with it false, that's the intended design.
On Sat, Feb 1, 2014 at 8:29 PM, Oskari Saarenmaa <os@ohmu.fi> wrote:
> 31.01.2014 10:59, Sawada Masahiko kirjoitti:
>> I think the idea in the new progress_report() call (with force == true) isI understood.
> to make sure that there is at least one progress_report call that actually
> writes the progress report. Otherwise the final report may go missing if it
> gets suppressed by the time-based check. The force argument as used in the
> new call skips that check.
>
I have two concerns as follows.
- I think that there is possible that progress_report() is called
frequently ( less than 1 second).
That is, progress_report() is called with force == true after
progress_report was called with force == false and execute this
function.
- progress_report() is called even if -P option is disabled. I'm
concerned about that is cause of performance degradation.
I looked over the latest version, and the only real problem I see here is your second point, which is the calling with -P not specified. I doubt it's going to be much, but in theory I guess the call to time(NULL) many times could have an effect. I've fixed that by just moving it to after a check for showprogress.
As for the first one - I believe that's the point. progress_report should be called with force==true after it was called with it false, that's the intended design.
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/