Re: parallel vacuum comments

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: parallel vacuum comments
Дата
Msg-id CAA4eK1JnU0Fv+fgBCFuWzfmToN_oHV65w5wDg5h-ZZyxQfXnCg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: parallel vacuum comments  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: parallel vacuum comments  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: parallel vacuum comments  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Fri, Dec 3, 2021 at 6:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Dec 3, 2021 at 6:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I've attached updated patches.
> > >
> >
> > I have a few comments on v4-0001.
>
> Thank you for the comments!
>
> > 1.
> > In parallel_vacuum_process_all_indexes(), we can combine the two
> > checks for vacuum/cleanup at the beginning of the function
>
> Agreed.
>
> > and I think
> > it is better to keep the variable name as bulkdel or cleanup instead
> > of vacuum as that is more specific and clear.
>
> I was thinking to use the terms "bulkdel" and "cleanup" instead of
> "vacuum" and "cleanup" for the same reason. That way, probably we can
> use “bulkdel" and “cleanup" when doing index bulk-deletion (i.g.,
> calling to ambulkdelete) and index cleanup (calling to
> amvacuumcleanup), respectively, and use "vacuum" when processing an
> index, i.g., doing either bulk-delete or cleanup, instead of using
> just "processing" . But we already use “vacuum” and “cleanup” in many
> places, e.g., lazy_vacuum_index() and lazy_cleanup_index(). If we want
> to use “bulkdel” instead of “vacuum”, I think it would be better to
> change the terminology in vacuumlazy.c thoroughly, probably in a
> separate patch.
>

Okay.

> > 2. The patch seems to be calling parallel_vacuum_should_skip_index
> > thrice even before starting parallel vacuum. It has a call to find the
> > number of blocks which has to be performed for each index. I
> > understand it might not be too costly to call this but it seems better
> > to remember this info like we are doing in the current code.
>
> Yes, we can bring will_vacuum_parallel array back to the code. That
> way, we can remove the call to parallel_vacuum_should_skip_index() in
> parallel_vacuum_begin().
>
> > We can
> > probably set parallel_workers_can_process in parallel_vacuum_begin and
> > then again update in parallel_vacuum_process_all_indexes. Won't doing
> > something like that be better?
>
> parallel_workers_can_process can vary depending on bulk-deletion, the
> first time cleanup, or the second time (or more) cleanup. What can we
> set parallel_workers_can_process based on in parallel_vacuum_begin()?
>

I was thinking to set the results of will_vacuum_parallel in
parallel_vacuum_begin().

> >
> > 3.  /*
> >   * Copy the index bulk-deletion result returned from ambulkdelete and
> > @@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel,
> >   * Since all vacuum workers write the bulk-deletion result at different
> >   * slots we can write them without locking.
> >   */
> > - if (shared_istat && !shared_istat->updated && istat_res != NULL)
> > + if (!pindstats->istat_updated && istat_res != NULL)
> >   {
> > - memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
> > - shared_istat->updated = true;
> > + memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
> > + pindstats->istat_updated = true;
> >
> >   /* Free the locally-allocated bulk-deletion result */
> >   pfree(istat_res);
> > -
> > - /* return the pointer to the result from shared memory */
> > - return &shared_istat->istat;
> >   }
> >
> > I think here now we copy the results both for local and parallel
> > cleanup. Isn't it better to write something about that in comments as
> > it is not clear from current comments?
>
> What do you mean by "local cleanup"?
>

Clean-up performed by leader backend.

> >
> > 4.
> > + /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
> > + est_shared_len = MAXALIGN(sizeof(LVShared));
> >   shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
> >
> > Do we need MAXALIGN here? I think previously we required it here
> > because immediately after this we were writing index stats but now
> > those are allocated separately. Normally, shm_toc_estimate_chunk()
> > aligns the size but sometimes we need to do it so as to avoid
> > unaligned accessing of shared mem. I am really not sure whether we
> > require it for dead_items, do you remember why it is there in the
> > first place.
>
> Indeed. I don't remember that. Probably it's an oversight.
>

Yeah, I also think so.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sadhuprasad Patro
Дата:
Сообщение: Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: pg_get_publication_tables() output duplicate relid