Re: parallel vacuum comments

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: parallel vacuum comments
Дата
Msg-id CAD21AoA3F+3O1M57S55w2Me4KHQy_90xbXuFXWsnhvEKMYrSGg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: parallel vacuum comments  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: parallel vacuum comments  (Amit Kapila <amit.kapila16@gmail.com>)
RE: parallel vacuum comments  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Список pgsql-hackers
On Tue, Nov 9, 2021 at 9:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Nov 5, 2021 at 4:00 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2021-11-01 10:44:34 +0900, Masahiko Sawada wrote:
> > > On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
> > > >   But even though we have this space optimized bitmap thing, we actually need
> > > >   more memory allocated for each index, making this whole thing pointless.
> > >
> > > Right. But is better to change to use booleans?
> >
> > It seems very clearly better to me. We shouldn't use complicated stuff like
> >
> > #define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
> > #define GetSharedIndStats(s) \
> >         ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
> > #define IndStatsIsNull(s, i) \
> >         (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
> >
> > when there's reason / benefit.
> >
> >
> > > > - Imo it's pretty confusing to have functions like
> > > >   lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
> > > >   vacuum or index cleanup with parallel workers.", based on
> > > >   lps->lvshared->for_cleanup.
> > >
> > > Okay. We need to set lps->lvshared->for_cleanup to tell worker do
> > > either index vacuum or index cleanup. So it might be better to pass
> > > for_cleanup flag down to the functions in addition to setting
> > > lps->lvshared->for_cleanup.
> >
> > Yup.
> >
> >
> > > > - I don't like some of the new names introduced in 14. E.g.
> > > >   "do_parallel_processing" is way too generic.
> > >
> > > I listed the function names that probably needs to be renamed from
> > > that perspecti:
> > >
> > > * do_parallel_processing
> > > * do_serial_processing_for_unsafe_indexes
> > > * parallel_process_one_index
> > >
> > > Is there any other function that should be renamed?
> >
> > parallel_processing_is_safe().
> >
> > I don't like that there's three different naming patterns for parallel
> > things. There's do_parallel_*, there's parallel_, and there's
> > (begin|end|compute)_parallel_*.
> >
> >
> > > > - On a higher level, a lot of this actually doesn't seem to belong into
> > > >   vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
> > > >   code is heap specific.  And vacuumlazy.c is large enough without the parallel
> > > >   code.
> > >
> > > I don't come with an idea to make them more generic. Could you
> > > elaborate on that?
> >
> > To me the the job that the parallel vacuum stuff does isn't really specific to
> > heap. Any table AM supporting indexes is going to need to do something pretty
> > much like it (it's calling indexam stuff). Most of the stuff in vacuumlazy.c
> > is very heap specific - you're not going to be able to reuse lazy_scan_heap()
> > or such. Before the parallel vacuum stuff, the index specific code in
> > vacuumlazy.c was fairly limited - but now it's a nontrivial amount of code.
> >
> > Based on a quick look
> >   parallel_vacuum_main(), parallel_processing_is_safe(),
> >   parallel_stats_for_idx(), end_parallel_vacuum(), begin_parallel_vacuum(),
> >   compute_parallel_vacuum_workers(), parallel_process_one_index(),
> >   do_serial_processing_for_unsafe_indexes(), do_parallel_processing(),
> >   do_parallel_vacuum_or_cleanup(), do_parallel_lazy_cleanup_all_indexes(),
> >   do_parallel_lazy_vacuum_all_indexes(),
> >
> > don't really belong in vacuumlazy.c. but should be in vacuum.c or a new
> > file. Of course that requires a bit of work, because of the heavy reliance on
> > LVRelState, but afaict there's not really an intrinsic need to use that.
>
> Thanks for your explanation. Understood.
>
> I'll update the patch accordingly.
>

I've attached a draft patch that refactors parallel vacuum and
separates parallel-vacuum-related code to new file vacuumparallel.c.
After discussion, I'll divide the patch into logical chunks.

What I'm not convinced yet in this patch is that vacuum.c,
vacuumlazy.c and vacuumparallel.c depend on the data structure to
store dead tuples (now called VacDeadTuples, was LVDeadTuples). I
thought that it might be better to separate it so that a table AM can
use another type of data structure to store dead tuples. But since I
think it may bring complexity, currently a table AM has to use
VacDeadTuples in order to use the parallel vacuum. Feedback is very
welcome.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "kuroda.hayato@fujitsu.com"
Дата:
Сообщение: RE: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Clean up build warnings of plperl with clang-12+