On Sat, Mar 30, 2019 at 09:07:41PM +0100, Michael Banck wrote:
> The way you've now changed this is that there's a function call into
> progress_report() for every block that's being read, even if there is no
> progress reporting requested. That looks like a pretty severe
> performance problem so I suggest to at least stick with checking
> showprogress before calling progress_report() and not the other way
> round.
>
> So my vote is in favour of only calling progress_report() every once in
> a while - I saw quite a speedup (or removal of slowdown) due to this in
> my tests, this was not just some unwarranted microoptimization.
Do you have some runtime numbers? If all the pages are in the OS
cache you should be able to see more impact. I have been testing with
a pgbench database at scale 300 and --check, and perf is showing me
that progress_report counts for 0.10% with or without the option of
the profile while pg_checksum_page() counts for 36%. Switching the
switch in or out of it makes the performance change lost in the noise.
I'd rather keep the check for showprogress within progress_report() so
as extra callers don't miss bypassing the report if --progress is not
enabled, still we are talking about only one caller now so the
attached does it the other way around with an assertion.
>> + fprintf(stderr, "\r");
>
> I think the isatty() check that was in our versions of the patch is
> useful here, did you check how this looks when piping the output to a
> file?
Oops, fixed. The output was strange.
> This hunk is from the performance optimization of calling
> progress_report only every 1024 blocks and could be removed if we stick
> with calling it for every block anyway (but see above).
Indeed, removed this one.
>> -static void
>> -scan_directory(const char *basedir, const char *subdir)
>> +/*
>> + * Scan the given directory for items which can be checksummed and
>> + * operate on each one of them. If "sizeonly" is true, the size of
>> + * all the items which have checksums is computed and returned back
>
> "computed" is maybe a bit strong a word here, how about "added up"?
"computed" sounds fine to me.
--
Michael