Re: parallel vacuum comments

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: parallel vacuum comments
Дата
Msg-id CAA4eK1KRhGOHZN8F3LpmySQi7+kaX=x4znrY5zma8fSMMcWVrw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: parallel vacuum comments  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: parallel vacuum comments  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On Wed, Nov 24, 2021 at 7:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Nov 22, 2021 at 6:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Nov 16, 2021 at 11:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I've incorporated these comments and attached an updated patch.
> > >
> >
> > Review comments:
> > ================
> > 1.
> > index_can_participate_parallel_vacuum()
> > {
> > ..
> > + /*
> > + * Not safe, if the index supports parallel cleanup conditionally,
> > + * but we have already processed the index (for bulkdelete).  See the
> > + * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
> > + * when indexes support parallel cleanup conditionally.
> > + */
> > + if (num_index_scans > 0 &&
> > + ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
> > ..
> > }
> >
> > IIRC, we do this to avoid the need to invoke worker when parallel
> > cleanup doesn't need to scan the index which means the work required
> > to be performed by a worker would be minimal. If so, maybe we can
> > write that in comments here or with
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP.
>
> Right. Will add the comments.
>
> > If the above understanding is correct then is it correct to check
> > num_index_scans here? AFAICS, this value is incremented in
> > parallel_vacuum_all_indexes irrespective of whether it is invoked for
> > bulk delete or clean up. OTOH, previously, this was done based on
> > first_time variable which was in turn set based on
> > vacrel->num_index_scans and that is incremented in
> > lazy_vacuum_all_indexes(both in serial and parallel case).
>
> You're right. That's wrong to increment num_index_scans also when
> vacuumcleanup. It should be incremented only when bulkdelete. Perhaps,
> the caller (i.g., table AM) should pass num_index_scans to parallel
> vacuum code? I initially thought that ParallelVacuumState can have
> num_index_scans and increment it only when parallel bulk-deletion. But
> if we do that we will end up having the same thing in two places:
> ParallelVacuumState and LVRelState. It would be clearer if we maintain
> num_index_scan in LVRelState and pass it to parallel index vacuum when
> calling to parallel index bulk-deletion or cleanup.
>

That sounds reasonable.

> On the other hand,
> the downside would be that there is a possibility that a table AM
> passes the wrong num_index_scans.
>

If that happens then also there will be no problem as such because it
will do some work via worker where it would have been done by the
leader itself. I think it is better to have one source of information
for this as we need to mainly consider whether bulkdelete has been
already performed or not, it doesn't matter whether that is performed
by leader or worker.

>
> >
> > 2. The structure ParallelVacuumState contains both PVIndVacStatus and
> > PVIndStats. Considering PVIndVacStatus is already present in
> > PVIndStats, does ParallelVacuumState need to have both?
>
> "PVIndVacStatus status” of ParallelVacuumState is used by the worker
> in the error callback function,
> parallel_index_vacuum_error_callback(), in order to know the status of
> the index vacuum that the worker is working on. I think that without
> PVIndVacStatus, the worker needs to have the index of the PVIndStats
> array in order to get the status by like
> errinfo->indstats[idx]->status. Do you prefer to do that?
>

As mentioned in my another email to which you agreed that we need to
re-design this callback and do it separately, I think it is better to
consider it separately. So, we can probably remove this parameter from
the main patch as of now.

> > 3. Why ParallelVacuumCtl is declared in vacuum.h? It appears to be
> > only used in one function begin_parallel_vacuum, can't we just declare
> > in vacuumparallel.c?
>
> ParallelVacuumCtl is a struct to begin the parallel vacuum and
> therefore is expected to be passed by table AM. If we declare it in
> vacuumparallel.c a table AM (e.g., vacuumlazy.c) cannot use it, no?
>
> > As it is only required for one function and it is
> > not that the number of individual parameters will be too huge, can't
> > we do without having that structure.
>
> Yes, we can do that without having that structure. I was a bit
> concerned that there are already 7 arguments.
>

Yeah, it is better to have fewer arguments but I don't this number is
big enough to worry. Also, I am not sure about the table AM point as
there is no clear example in front of us which tells how any other
table AM might use it and whether this structure is generic enough. So
I think it might be better to use arguments for this and if we later
find some generic use then we can replace it with structure.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "r.takahashi_2@fujitsu.com"
Дата:
Сообщение: RE: Implementing Incremental View Maintenance
Следующее
От: "r.takahashi_2@fujitsu.com"
Дата:
Сообщение: RE: Implementing Incremental View Maintenance