Re: Progress reporting for pg_verify_checksums

Поиск
Список
Период
Сортировка
От Michael Banck
Тема Re: Progress reporting for pg_verify_checksums
Дата
Msg-id 20180919211103.GB23519@nighthawk.caipicrew.dd-dns.de
обсуждение исходный текст
Ответ на Re: Progress reporting for pg_verify_checksums  (Fabien COELHO <coelho@cri.ensmp.fr>)
Ответы Re: Progress reporting for pg_verify_checksums  (Fabien COELHO <coelho@cri.ensmp.fr>)
Список pgsql-hackers
Hi,

thanks for the review!

On Wed, Sep 19, 2018 at 05:17:05PM +0200, Fabien COELHO wrote:
> >>This optionally prints the progress of pg_verify_checksums via read
> >>kilobytes to the terminal with the new command line argument -P.
> >>
> >>If -P was forgotten and pg_verify_checksums operates on a large cluster,
> >>the caller can send SIGUSR1 to pg_verify_checksums to turn progress
> >>status reporting on during runtime.
> >
> >Version 2 of the patch is attached. This is rebased to master as of
> >422952ee and changes the signal handling to be a toggle as suggested by
> >Alvaro.
> 
> Patch applies cleanly and compiles.
> 
> About tests: "make check" is okay, but ITSM that the command is not started
> once, ever, in any test:-( It is unclear whether any test triggers checksums
> anyway...

Yeah, this was discussed on another thread and some basic tap tests for
pg_verify_checksums were proposed (also by me), but this hasn't been
further addressed.

Personally, I think this would be a good thing to add to v11 even.

In any case, this particular feature might not be very easy to tap test,
but I am open to suggestions, of course.

> The time() granularity to the second makes the result awkward on small
> tests:
> 
>  8/1554552 kB (0%, 8 kB/s)
>  1040856/1554552 kB (66%, 1040856 kB/s)
>  1554552/1554552 kB (100%, 1554552 kB/s)
> 
> I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
> much better precision.
> 
> The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying
> 1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are
> about storage which is usually measured in powers of 1,000, so I'd suggest
> to use thousands.
> 
> Another reserve I have is that on any hardware it is likely that there will
> be a lot of kilos, so maybe megas (MB) should be used instead.

The display is exactly the same as in pg_basebackup (except the
bandwith is shown as well), so right now I think it is more useful to be
consistent here.  So if we change that, pg_basebackup should be changed
as well I think.

Maybe the code could be factored out to some common file in the future.
 
> I'm wondering whether the bandwidth display should be local (from the last
> display) or global (from the start of the command), but for the last forced
> one. This is an open question.


 
> Maybe it would be nice to show elapsed time and expected completion time
> based on the current speed.

Maybe; this could be added to the factored out common code I mentioned
above.
 
> I would be in favor or having this turned on by default, and silenced with
> some option. I'm not sure other people would agree, though, so it is an open
> question as well.

If this runs in a script, it would be pretty annoying, so everybody
would have to add --no-progress so I am not convinced. Also, both
pg_basebackup and pgbench don't show progress by default.
 
> About the code:
> 
>  +               if (show_progress)
>  +                       show_progress = false;
>  +               else
>  +                       show_progress = true;
> 
> Why not a simple:
> 
>     /* invert current state */
>     show_progress = !show_progress;

Fair enough.
 
> I do not see much advantage in using intermediate string buffers for values:
> 
>  + snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
>  +          total_size / 1024);
>  + snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
>  +          current_size / 1024);
>  + snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
>  +          (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started)));
>  + fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)",
>  +          currentstr, totalstr, total_percent, currspeedstr);
> 
> ISTM that just one simple fprintf would be fine:
> 
>   fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...",
>           formulas for each slot...);

That code is adopted from pg_basebackup.c and the comment there says:

|     * Separate step to keep platform-dependent format code out of
|     * translatable strings.  And we only test for INT64_FORMAT
|     * availability in snprintf, not fprintf.

So that sounds legit to me.

> ISTM that this line overwriting trick deserves a comment:
> 
>  + if (isatty(fileno(stderr)))
>  +   fprintf(stderr, "\r");
>  + else
>  +   fprintf(stderr, "\n");

I agree a comment should be in there. But that code is also taken
verbatim from pg_basebackup.c (but this time, there's no comment there,
either).

How about this:

|     * If we are reporting to a terminal, send a carriage return so that
|     * we stay on the same line.  If not, send a newline. 

> And it runs a little amok with "-v".

Do you suggest we should make those mutually exlusive?  I agree that in
general, -P is not very useful if -v is on, but if you have a really big
table, it should still be, no?

>  + memset(&act, '\0', sizeof(act));
> 
> pg uses 0 instead of '\0' everywhere else.

Ok.
 
>   +   /* Make sure we just report at least once a second */
>   +   if ((now == last_progress_update) && !force) return;
> 
> The comments seems contradictory, I understand the code makes sure that it
> is "at most" once a second. 

I guess you're right as the identical code in pg_basebackup.c has a
comment saying "Max once per second".

> Pgbench uses "-P XXX" to strigger a progress
> report every XXX second. I'm unsure whether it would be useful to allow the
> user to change the 1 second display interval.

I think pgbench writes a new line for each report? In that case, having
it every second for longer runs might be annoying and I can see the
point in having in configurable.

In the case of pg_basebackup/pg_verify_checksums, I guess 1 second is
fine.
 
> I'm not sure why you removed one unrelated line:
> 
>   #include "storage/checksum.h"
>   #include "storage/checksum_impl.h"
> 
>  -
>   static int64 files = 0;
>   static int64 blocks = 0;
>   static int64 badblocks = 0;

Merge error on my side I guess, will fix.
 
> There is a problem in the scan_file code: The added sizeonly parameter is
> not used. It should be removed.

Right, this might have been a leftover from an earlier version of the
code, I'll check back with Bernd to make sure that was not a
porting/merge error on my side.

I've attached V3 of the patch, addressing some of your comments.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: errmsg() ending with a period
Следующее
От: Tom Lane
Дата:
Сообщение: Re: PATCH: pgbench - option to build using ppoll() for larger connection counts