Re: parallel vacuum comments

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: parallel vacuum comments
Дата
Msg-id CAD21AoBpZpc5--3G9p4WefkM22oZ=9hDUrc5jaR-_25XbBFacw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: parallel vacuum comments  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wed, Nov 24, 2021 at 1:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 24, 2021 at 7:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Nov 22, 2021 at 6:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Nov 16, 2021 at 11:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > I've incorporated these comments and attached an updated patch.
> > > >
> > >
> > > Review comments:
> > > ================
> > > 1.
> > > index_can_participate_parallel_vacuum()
> > > {
> > > ..
> > > + /*
> > > + * Not safe, if the index supports parallel cleanup conditionally,
> > > + * but we have already processed the index (for bulkdelete).  See the
> > > + * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
> > > + * when indexes support parallel cleanup conditionally.
> > > + */
> > > + if (num_index_scans > 0 &&
> > > + ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
> > > ..
> > > }
> > >
> > > IIRC, we do this to avoid the need to invoke worker when parallel
> > > cleanup doesn't need to scan the index which means the work required
> > > to be performed by a worker would be minimal. If so, maybe we can
> > > write that in comments here or with
> > > VACUUM_OPTION_PARALLEL_COND_CLEANUP.
> >
> > Right. Will add the comments.
> >
> > > If the above understanding is correct then is it correct to check
> > > num_index_scans here? AFAICS, this value is incremented in
> > > parallel_vacuum_all_indexes irrespective of whether it is invoked for
> > > bulk delete or clean up. OTOH, previously, this was done based on
> > > first_time variable which was in turn set based on
> > > vacrel->num_index_scans and that is incremented in
> > > lazy_vacuum_all_indexes(both in serial and parallel case).
> >
> > You're right. That's wrong to increment num_index_scans also when
> > vacuumcleanup. It should be incremented only when bulkdelete. Perhaps,
> > the caller (i.g., table AM) should pass num_index_scans to parallel
> > vacuum code? I initially thought that ParallelVacuumState can have
> > num_index_scans and increment it only when parallel bulk-deletion. But
> > if we do that we will end up having the same thing in two places:
> > ParallelVacuumState and LVRelState. It would be clearer if we maintain
> > num_index_scan in LVRelState and pass it to parallel index vacuum when
> > calling to parallel index bulk-deletion or cleanup.
> >
>
> That sounds reasonable.
>
> > On the other hand,
> > the downside would be that there is a possibility that a table AM
> > passes the wrong num_index_scans.
> >
>
> If that happens then also there will be no problem as such because it
> will do some work via worker where it would have been done by the
> leader itself. I think it is better to have one source of information
> for this as we need to mainly consider whether bulkdelete has been
> already performed or not, it doesn't matter whether that is performed
> by leader or worker.

Agreed.

>
> >
> > >
> > > 2. The structure ParallelVacuumState contains both PVIndVacStatus and
> > > PVIndStats. Considering PVIndVacStatus is already present in
> > > PVIndStats, does ParallelVacuumState need to have both?
> >
> > "PVIndVacStatus status” of ParallelVacuumState is used by the worker
> > in the error callback function,
> > parallel_index_vacuum_error_callback(), in order to know the status of
> > the index vacuum that the worker is working on. I think that without
> > PVIndVacStatus, the worker needs to have the index of the PVIndStats
> > array in order to get the status by like
> > errinfo->indstats[idx]->status. Do you prefer to do that?
> >
>
> As mentioned in my another email to which you agreed that we need to
> re-design this callback and do it separately, I think it is better to
> consider it separately. So, we can probably remove this parameter from
> the main patch as of now.

Yes, I'll remove it from the next version patch. With that, since
parallel vacuum workers don't set errcontext we will need to do
something for that in a separate patch.

>
> > > 3. Why ParallelVacuumCtl is declared in vacuum.h? It appears to be
> > > only used in one function begin_parallel_vacuum, can't we just declare
> > > in vacuumparallel.c?
> >
> > ParallelVacuumCtl is a struct to begin the parallel vacuum and
> > therefore is expected to be passed by table AM. If we declare it in
> > vacuumparallel.c a table AM (e.g., vacuumlazy.c) cannot use it, no?
> >
> > > As it is only required for one function and it is
> > > not that the number of individual parameters will be too huge, can't
> > > we do without having that structure.
> >
> > Yes, we can do that without having that structure. I was a bit
> > concerned that there are already 7 arguments.
> >
>
> Yeah, it is better to have fewer arguments but I don't this number is
> big enough to worry. Also, I am not sure about the table AM point as
> there is no clear example in front of us which tells how any other
> table AM might use it and whether this structure is generic enough. So
> I think it might be better to use arguments for this and if we later
> find some generic use then we can replace it with structure.

Makes sense. Will fix in the next version patch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: Amit Langote
Дата:
Сообщение: Re: pg_get_publication_tables() output duplicate relid