Re: [PROPOSAL] VACUUM Progress Checker.

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [PROPOSAL] VACUUM Progress Checker.
Дата
Msg-id CAB7nPqRNw=w4mt-W+gtq0ED0KTR=B8Qgu6D+4BN3XmzFRuAgXQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PROPOSAL] VACUUM Progress Checker.  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [PROPOSAL] VACUUM Progress Checker.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Re: [PROPOSAL] VACUUM Progress Checker.  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Список pgsql-hackers
On Thu, Dec 10, 2015 at 4:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Dec 7, 2015 at 2:41 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> Hm, I guess progress messages would not change that much over the course
>> of a long-running command. How about we pass only the array(s) that we
>> change and NULL for others:
>>
>> With the following prototype for pgstat_report_progress:
>>
>> void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param,
>>                            bool *message_param[], int num_message_param);
>>
>> If we just wanted to change, say scanned_heap_pages, then we report that
>> using:
>>
>> uint32_param[1] = scanned_heap_pages;
>> pgstat_report_progress(uint32_param, 3, NULL, 0);
>>
>> Also, pgstat_report_progress() should check each of its parameters for
>> NULL before iterating over to copy. So in most reports over the course of
>> a command, the message_param would be NULL and hence not copied.
>
> It's going to be *really* important that this facility provides a
> lightweight way of updating progress, so I think this whole API is
> badly designed.  VACUUM, for example, is going to want a way to update
> the individual counter for the number of pages it's scanned after
> every page.  It should not have to pass all of the other information
> that is part of a complete report.  It should just be able to say
> pgstat_report_progress_update_counter(1, pages_scanned) or something
> of this sort.  Don't marshal all of the data and then make
> pgstat_report_progress figure out what's changed.  Provide a series of
> narrow APIs where the caller tells you exactly what they want to
> update, and you update only that.  It's fine to have a
> pgstat_report_progress() function to update everything at once, for
> the use at the beginning of a command, but the incremental updates
> within the command should do something lighter-weight.

[first time looking really at the patch and catching up with this thread]

Agreed. As far as I can guess from the code, those should be as
followed to bloat pgstat message queue a minimum:

+               pgstat_report_activity_flag(ACTIVITY_IS_VACUUM);               /*                * Loop to process each
selectedrelation.                */
 
Defining a new routine for this purpose is a bit surprising. Cannot we
just use pgstat_report_activity with a new BackendState STATE_INVACUUM
or similar if we pursue the progress tracking approach?

A couple of comments:
- The relation OID should be reported and not its name. In case of a
relation rename that would not be cool for tracking, and most users
are surely going to join with other system tables using it.
- The progress tracking facility adds a whole level of complexity for
very little gain, and IMO this should *not* be part of PgBackendStatus
because in most cases its data finishes wasted. We don't expect
backends to run frequently such progress reports, do we? My opinion on
the matter if that we should define a different collector data for
vacuum, with something like PgStat_StatVacuumEntry, then have on top
of it a couple of routines dedicated at feeding up data with it when
some work is done on a vacuum job.

In short, it seems to me that this patch needs a rework, and should be
returned with feedback. Other opinions?
-- 
Michael



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Move PinBuffer and UnpinBuffer to atomics
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: Speedup twophase transactions