Re: Progress reporting for pg_verify_checksums

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Progress reporting for pg_verify_checksums
Дата
Msg-id 20190401061041.GA1685@paquier.xyz
обсуждение исходный текст
Ответ на Re: Progress reporting for pg_verify_checksums  (Michael Banck <michael.banck@credativ.de>)
Ответы Re: Progress reporting for pg_verify_checksums  (Michael Banck <michael.banck@credativ.de>)
Список pgsql-hackers
On Sat, Mar 30, 2019 at 09:07:41PM +0100, Michael Banck wrote:
> The way you've now changed this is that there's a function call into
> progress_report() for every block that's being read, even if there is no
> progress reporting requested. That looks like a pretty severe
> performance problem so I suggest to at least stick with checking
> showprogress before calling progress_report() and not the other way
> round.
>
> So my vote is in favour of only calling progress_report() every once in
> a while - I saw quite a speedup (or removal of slowdown) due to this in
> my tests, this was not just some unwarranted microoptimization.

Do you have some runtime numbers?  If all the pages are in the OS
cache you should be able to see more impact.  I have been testing with
a pgbench database at scale 300 and --check, and perf is showing me
that progress_report counts for 0.10% with or without the option of
the profile while pg_checksum_page() counts for 36%.  Switching the
switch in or out of it makes the performance change lost in the noise.
I'd rather keep the check for showprogress within progress_report() so
as extra callers don't miss bypassing the report if --progress is not
enabled, still we are talking about only one caller now so the
attached does it the other way around with an assertion.

>> +     fprintf(stderr, "\r");
>
> I think the isatty() check that was in our versions of the patch is
> useful here, did you check how this looks when piping the output to a
> file?

Oops, fixed.  The output was strange.

> This hunk is from the performance optimization of calling
> progress_report only every 1024 blocks and could be removed if we stick
> with calling it for every block anyway (but see above).

Indeed, removed this one.

>> -static void
>> -scan_directory(const char *basedir, const char *subdir)
>> +/*
>> + * Scan the given directory for items which can be checksummed and
>> + * operate on each one of them.  If "sizeonly" is true, the size of
>> + * all the items which have checksums is computed and returned back
>
> "computed" is maybe a bit strong a word here, how about "added up"?

"computed" sounds fine to me.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: New vacuum option to do only freezing
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: REINDEX CONCURRENTLY 2.0