Re: parallel vacuum comments

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: parallel vacuum comments
Дата
Msg-id CAD21AoDKJpzWiztw5Tk=kFZrUh=-kB=VWUanEbkNQJrzSujzOw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: parallel vacuum comments  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: parallel vacuum comments  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Fri, Dec 3, 2021 at 6:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached updated patches.
> >
>
> I have a few comments on v4-0001.

Thank you for the comments!

> 1.
> In parallel_vacuum_process_all_indexes(), we can combine the two
> checks for vacuum/cleanup at the beginning of the function

Agreed.

> and I think
> it is better to keep the variable name as bulkdel or cleanup instead
> of vacuum as that is more specific and clear.

I was thinking to use the terms "bulkdel" and "cleanup" instead of
"vacuum" and "cleanup" for the same reason. That way, probably we can
use “bulkdel" and “cleanup" when doing index bulk-deletion (i.g.,
calling to ambulkdelete) and index cleanup (calling to
amvacuumcleanup), respectively, and use "vacuum" when processing an
index, i.g., doing either bulk-delete or cleanup, instead of using
just "processing" . But we already use “vacuum” and “cleanup” in many
places, e.g., lazy_vacuum_index() and lazy_cleanup_index(). If we want
to use “bulkdel” instead of “vacuum”, I think it would be better to
change the terminology in vacuumlazy.c thoroughly, probably in a
separate patch.

> 2. The patch seems to be calling parallel_vacuum_should_skip_index
> thrice even before starting parallel vacuum. It has a call to find the
> number of blocks which has to be performed for each index. I
> understand it might not be too costly to call this but it seems better
> to remember this info like we are doing in the current code.

Yes, we can bring will_vacuum_parallel array back to the code. That
way, we can remove the call to parallel_vacuum_should_skip_index() in
parallel_vacuum_begin().

> We can
> probably set parallel_workers_can_process in parallel_vacuum_begin and
> then again update in parallel_vacuum_process_all_indexes. Won't doing
> something like that be better?

parallel_workers_can_process can vary depending on bulk-deletion, the
first time cleanup, or the second time (or more) cleanup. What can we
set parallel_workers_can_process based on in parallel_vacuum_begin()?

>
> 3.  /*
>   * Copy the index bulk-deletion result returned from ambulkdelete and
> @@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel,
>   * Since all vacuum workers write the bulk-deletion result at different
>   * slots we can write them without locking.
>   */
> - if (shared_istat && !shared_istat->updated && istat_res != NULL)
> + if (!pindstats->istat_updated && istat_res != NULL)
>   {
> - memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
> - shared_istat->updated = true;
> + memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
> + pindstats->istat_updated = true;
>
>   /* Free the locally-allocated bulk-deletion result */
>   pfree(istat_res);
> -
> - /* return the pointer to the result from shared memory */
> - return &shared_istat->istat;
>   }
>
> I think here now we copy the results both for local and parallel
> cleanup. Isn't it better to write something about that in comments as
> it is not clear from current comments?

What do you mean by "local cleanup"?

>
> 4.
> + /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
> + est_shared_len = MAXALIGN(sizeof(LVShared));
>   shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
>
> Do we need MAXALIGN here? I think previously we required it here
> because immediately after this we were writing index stats but now
> those are allocated separately. Normally, shm_toc_estimate_chunk()
> aligns the size but sometimes we need to do it so as to avoid
> unaligned accessing of shared mem. I am really not sure whether we
> require it for dead_items, do you remember why it is there in the
> first place.

Indeed. I don't remember that. Probably it's an oversight.

>
> 5. The below-updated comment based on one of my previous suggestions
> seems to be missing in this version.
> + /*
> + * Not safe, if the index supports parallel cleanup conditionally,
> + * but we have already processed the index (for bulkdelete).  We do
> + * this to avoid the need to invoke workers when parallel index
> + * cleanup doesn't need to scan the index.  See the comments for
> + * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes
> + * support parallel cleanup conditionally.
> + */

Added.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: pg_get_publication_tables() output duplicate relid
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Some RELKIND macro refactoring