Re: [HACKERS] Block level parallel vacuum

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: [HACKERS] Block level parallel vacuum
Дата
Msg-id CAD21AoB+bkCQ1Eomamtq6B=z_ZFDWoKNtO-V2HX5T8kR59b-ow@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Block level parallel vacuum  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Block level parallel vacuum
Список pgsql-hackers
On Thu, Feb 28, 2019 at 2:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Feb 14, 2019 at 5:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Thank you. Attached the rebased patch.
>
> Here are some review comments.

Thank you for reviewing the patches!

>
> +         started by a single utility command.  Currently, the parallel
> +         utility commands that support the use of parallel workers are
> +         <command>CREATE INDEX</command> and <command>VACUUM</command>
> +         without <literal>FULL</literal> option, and only when building
> +         a B-tree index.  Parallel workers are taken from the pool of
>
> That sentence is garbled.  The end part about b-tree indexes applies
> only to CREATE INDEX, not to VACUUM, since VACUUM does build indexes.

Fixed.

>
> +      Vacuum index and cleanup index in parallel
> +      <replaceable class="parameter">N</replaceable> background
> workers (for the detail
> +      of each vacuum phases, please refer to <xref
> linkend="vacuum-phases"/>. If the
>
> I have two problems with this.  One is that I can't understand the
> English very well. I think you mean something like: "Perform the
> 'vacuum index' and 'cleanup index' phases of VACUUM in parallel using
> N background workers," but I'm not entirely sure.  The other is that
> if that is what you mean, I don't think it's a sufficient description.
> Users need to understand whether, for example, only one worker can be
> used per index, or whether the work for a single index can be split
> across workers.
>
> +      parallel degree <replaceable class="parameter">N</replaceable>
> is omitted,
> +      then <command>VACUUM</command> decides the number of workers based on
> +      number of indexes on the relation which further limited by
> +      <xref linkend="guc-max-parallel-workers-maintenance"/>. Also if
> this option
>
> Now this makes it sound like it's one worker per index, but you could
> be more explicit about it.

Fixed.

>
> +      is specified multile times, the last parallel degree
> +      <replaceable class="parameter">N</replaceable> is considered
> into the account.
>
> Typo, but I'd just delete this sentence altogether; the behavior if
> the option is multiply specified seems like a triviality that need not
> be documented.

Understood, removed.

>
> +    Setting a value for <literal>parallel_workers</literal> via
> +    <xref linkend="sql-altertable"/> also controls how many parallel
> +    worker processes will be requested by a <command>VACUUM</command>
> +    against the table. This setting is overwritten by setting
> +    <replaceable class="parameter">N</replaceable> of
> <literal>PARALLEL</literal>
> +    option.
>
> I wonder if we really want this behavior.  Should a setting that
> controls the degree of parallelism when scanning the table also affect
> VACUUM?  I tend to think that we probably don't ever want VACUUM of a
> table to be parallel by default, but rather something that the user
> must explicitly request.  Happy to hear other opinions.  If we do want
> this behavior, I think this should be written differently, something
> like this: The PARALLEL N option to VACUUM takes precedence over this
> option.

For example, I can imagine a use case where a batch job does parallel
vacuum to some tables in a maintenance window. The batch operation
will need to compute and specify the degree of parallelism every time
according to for instance the number of indexes, which would be
troublesome. But if we can set the degree of parallelism for each
tables it can just to do 'VACUUM (PARALLEL)'.

>
> + * parallel mode nor destories the parallel context. For updating the index
>
> Spelling.

Fixed.

>
> +/* DSM keys for parallel lazy vacuum */
> +#define PARALLEL_VACUUM_KEY_SHARED UINT64CONST(0xFFFFFFFFFFF00001)
> +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES UINT64CONST(0xFFFFFFFFFFF00002)
> +#define PARALLEL_VACUUM_KEY_QUERY_TEXT UINT64CONST(0xFFFFFFFFFFF00003)
>
> Any special reason not to use just 1, 2, 3 here?  The general
> infrastructure stuff uses high numbers to avoid conflicting with
> plan_node_id values, but end clients of the parallel infrastructure
> can generally just use small integers.

It seems that I was worrying unnecessarily, changed to 1, 2, 3.

>
> + bool updated; /* is the stats updated? */
>
> is -> are
>
> + * LVDeadTuples controls the dead tuple TIDs collected during heap scan.
>
> what do you mean by "controls", exactly? stores?

Fixed.

>
> + * This is allocated in a dynamic shared memory segment when parallel
> + * lazy vacuum mode, or allocated in a local memory.
>
> If this is in DSM, then max_tuples is a wart, I think.  We can't grow
> the segment at that point.  I'm suspicious that we need a better
> design here.  It looks like you gather all of the dead tuples in
> backend-local memory and then allocate an equal amount of DSM to copy
> them.  But that means that we are using twice as much memory, which
> seems pretty bad.  You'd have to do that at least momentarily no
> matter what, but it's not obvious that the backend-local copy is ever
> freed.

Hmm, the current design is more simple; only the leader process scans
heap and save dead tuples TID to DSM. The DSM is allocated at once
when starting lazy vacuum and we never need to enlarge DSM . Also we
can use the same code around heap vacuum and collecting dead tuples
for both single process vacuum and parallel vacuum. Once index vacuum
is completed, the leader process reinitializes DSM and reuse it in the
next time.

> There's another patch kicking around to allocate memory for
> vacuum in chunks rather than preallocating the whole slab of memory at
> once; we might want to think about getting that committed first and
> then having this build on top of it.  At least we need something
> smarter than this.

Since the parallel vacuum uses memory in the same manner as the single
process vacuum it's not deteriorated. I'd agree that that patch is
more smarter and this patch can be built on top of it but I'm
concerned that there two proposals on that thread and the discussion
has not been active for 8 months. I wonder if  it would be worth to
think of improving the memory allocating based on that patch after the
parallel vacuum get committed.

>
> -heap_vacuum_rel(Relation onerel, int options, VacuumParams *params,
> +heap_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,
>
> We generally avoid passing a struct by value; copying the struct can
> be expensive and having multiple shallow copies of the same data
> sometimes leads to surprising results.  I think it might be a good
> idea to propose a preliminary refactoring patch that invents
> VacuumOptions and gives it just a single 'int' member and refactors
> everything to use it, and then that can be committed first.  It should
> pass a pointer, though, not the actual struct.

Agreed. I'll separate patches and propose it.

>
> + LVState    *lvstate;
>
> It's not clear to me why we need this new LVState thing.  What's the
> motivation for that?  If it's a good idea, could it be done as a
> separate, preparatory patch?  It seems to be responsible for a lot of
> code churn in this patch.   It also leads to strange stuff like this:

The main motivations are refactoring and improving readability but
it's mainly for the previous version patch which implements parallel
heap vacuum. It might no longer need here. I'll try to implement
without LVState. Thank you.

>
>   ereport(elevel,
> - (errmsg("scanned index \"%s\" to remove %d row versions",
> + (errmsg("scanned index \"%s\" to remove %d row versions %s",
>   RelationGetRelationName(indrel),
> - vacrelstats->num_dead_tuples),
> + dead_tuples->num_tuples,
> + IsParallelWorker() ? "by parallel vacuum worker" : ""),
>
> This doesn't seem to be great grammar, and translation guidelines
> generally discourage this sort of incremental message construction
> quite strongly.  Since the user can probably infer what happened by a
> suitable choice of log_line_prefix, I'm not totally sure this is worth
> doing in the first place, but if we're going to do it, it should
> probably have two completely separate message strings and pick between
> them using IsParallelWorker(), rather than building it up
> incrementally like this.

Fixed.

>
> +compute_parallel_workers(Relation rel, int nrequests, int nindexes)
>
> I think 'nrequets' is meant to be 'nrequested'.  It isn't the number
> of requests; it's the number of workers that were requested.

Fixed.

>
> + /* quick exit if no workers are prepared, e.g. under serializable isolation */
>
> That comment makes very little sense in this context.

Fixed.

>
> + /* Report parallel vacuum worker information */
> + initStringInfo(&buf);
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker %s (planned: %d",
> +   "launched %d parallel vacuum workers %s (planned: %d",
> +   lvstate->pcxt->nworkers_launched),
> + lvstate->pcxt->nworkers_launched,
> + for_cleanup ? "for index cleanup" : "for index vacuum",
> + lvstate->pcxt->nworkers);
> + if (lvstate->options.nworkers > 0)
> + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
> +
> + appendStringInfo(&buf, ")");
> + ereport(elevel, (errmsg("%s", buf.data)));
>
> This is another example of incremental message construction, again
> violating translation guidelines.

Fixed.

>
> + WaitForParallelWorkersToAttach(lvstate->pcxt);
>
> Why?

Oh not necessary, removed.

>
> + /*
> + * If there is already-updated result in the shared memory we use it.
> + * Otherwise we pass NULL to index AMs, meaning it's first time call,
> + * and copy the result to the shared memory segment.
> + */
>
> I'm probably missing something here, but isn't the intention that we
> only do each index once?  If so, how would there be anything there
> already?  Once from for_cleanup = false and once for for_cleanup =
> true?

We call ambulkdelete (for_cleanup = false) 0 or more times for each
index and call amvacuumcleanup (for_cleanup = true) at the end. In the
first time calling either ambulkdelete or amvacuumcleanup the lazy
vacuum must pass NULL to them. They return either palloc'd
IndexBulkDeleteResult or NULL. If they returns the former the lazy
vacuum must pass it to them again at the next time. In current design,
since there is no guarantee that an index is always processed by the
same vacuum process each vacuum processes save the result to DSM in
order to share those results among vacuum processes. The 'updated'
flags indicates that its slot is used. So we can pass the address of
DSM if 'updated' is true, otherwise pass NULL.

>
> + if (a->options.flags != b->options.flags)
> + return false;
> + if (a->options.nworkers != b->options.nworkers)
> + return false;
>
> You could just do COMPARE_SCALAR_FIELD(options.flags);
> COMPARE_SCALAR_FIELD(options.nworkers);

Fixed.

Almost comments I got have been incorporated to the local branch but a
few comments need discussion. I'll submit the updated version patch
once I addressed all of comments.





Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuro Yamada
Дата:
Сообщение: Re: [HACKERS] CLUSTER command progress monitor
Следующее
От: Tatsuro Yamada
Дата:
Сообщение: Re: [HACKERS] CLUSTER command progress monitor