Re: [HACKERS] Block level parallel vacuum

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: [HACKERS] Block level parallel vacuum
Дата
Msg-id CAFiTN-vgifngq_nMs=GYpJwDWk+xTFE9skxjAN9CdVKBQLxvFg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Fri, Jan 17, 2020 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I have few small comments.
> >
> > 1.
> > logical streaming for large in-progress transactions+
> > + /* Can't perform vacuum in parallel */
> > + if (parallel_workers <= 0)
> > + {
> > + pfree(can_parallel_vacuum);
> > + return lps;
> > + }
> >
> > why are we checking parallel_workers <= 0, Function
> > compute_parallel_vacuum_workers only returns 0 or greater than 0
> > so isn't it better to just check if (parallel_workers == 0) ?
> >
>
> Why to have such an assumption about
> compute_parallel_vacuum_workers()?  The function
> compute_parallel_vacuum_workers() returns int, so such a check
> (<= 0) seems reasonable to me.

Okay so I should probably change my statement to why
compute_parallel_vacuum_workers is returning "int" instead of uint?  I
mean when this function is designed to return 0 or more worker why to
make it return int and then handle extra values on caller.  Am I
missing something, can it really return negative in some cases?

I find the below code in "compute_parallel_vacuum_workers" a bit confusing

+static int
+compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
+ bool *can_parallel_vacuum)
+{
......
+ /* The leader process takes one index */
+ nindexes_parallel--;        --> nindexes_parallel can become -1
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel == 0) .  -> Now if it is 0 then return 0 but
if its -1 then continue. seems strange no?  I think here itself we can
handle if (nindexes_parallel <= 0), that will make code cleaner.
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;



>
> > 2.
> > +/*
> > + * Macro to check if we are in a parallel vacuum.  If true, we are in the
> > + * parallel mode and the DSM segment is initialized.
> > + */
> > +#define ParallelVacuumIsActive(lps) (((LVParallelState *) (lps)) != NULL)
> >
> > (LVParallelState *) (lps) -> this typecast is not required, just (lps)
> > != NULL should be enough.
> >
>
> I think the better idea would be to just replace it PointerIsValid
> like below. I see similar usage in other places.
> #define ParallelVacuumIsActive(lps)  PointerIsValid(lps)
Make sense to me.
>
> > 3.
> >
> > + shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
> > + prepare_index_statistics(shared, can_parallel_vacuum, nindexes);
> > + pg_atomic_init_u32(&(shared->idx), 0);
> > + pg_atomic_init_u32(&(shared->cost_balance), 0);
> > + pg_atomic_init_u32(&(shared->active_nworkers), 0);
> >
> > I think it will look cleaner if we can initialize in the order they
> > are declared in structure.
> >
>
> Okay.
>
> > 4.
> > + VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
> > + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
> > +
> > + /*
> > + * Set up shared cost balance and the number of active workers for
> > + * vacuum delay.
> > + */
> > + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
> > + pg_atomic_write_u32(VacuumActiveNWorkers, 0);
> > +
> > + /*
> > + * The number of workers can vary between bulkdelete and cleanup
> > + * phase.
> > + */
> > + ReinitializeParallelWorkers(lps->pcxt, nworkers);
> > +
> > + LaunchParallelWorkers(lps->pcxt);
> > +
> > + if (lps->pcxt->nworkers_launched > 0)
> > + {
> > + /*
> > + * Reset the local cost values for leader backend as we have
> > + * already accumulated the remaining balance of heap.
> > + */
> > + VacuumCostBalance = 0;
> > + VacuumCostBalanceLocal = 0;
> > + }
> > + else
> > + {
> > + /*
> > + * Disable shared cost balance if we are not able to launch
> > + * workers.
> > + */
> > + VacuumSharedCostBalance = NULL;
> > + VacuumActiveNWorkers = NULL;
> > + }
> > +
> >
> > I don't like the idea of first initializing the
> > VacuumSharedCostBalance with lps->lvshared->cost_balance and then
> > uninitialize if nworkers_launched is 0.
> > I am not sure why do we need to initialize VacuumSharedCostBalance
> > here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance,
> > VacuumCostBalance);?
> > I think we can initialize it only if nworkers_launched > 0 then we can
> > get rid of the else branch completely.
> >
>
> No, we can't initialize after nworkers_launched > 0 because by that
> time some workers would have already tried to access the shared cost
> balance.  So, it needs to be done before launching the workers as is
> done in code.  We can probably add a comment.
I don't think so, VacuumSharedCostBalance is a process local which is
just pointing to the shared memory variable right?

and each process has to point it to the shared memory and that we are
already doing in parallel_vacuum_main.  So we can initialize it after
worker is launched.
Basically code will look like below

pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
 ..
ReinitializeParallelWorkers(lps->pcxt, nworkers);

LaunchParallelWorkers(lps->pcxt);

if (lps->pcxt->nworkers_launched > 0)
{
..
VacuumCostBalance = 0;
VacuumCostBalanceLocal = 0;
VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
}
-- remove the else part completely..

>
> >
> > + /* Carry the shared balance value to heap scan */
> > + if (VacuumSharedCostBalance)
> > + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
> > +
> > + if (nworkers > 0)
> > + {
> > + /* Disable shared cost balance */
> > + VacuumSharedCostBalance = NULL;
> > + VacuumActiveNWorkers = NULL;
> > + }
> >
> > Doesn't make sense to keep them as two conditions, we can combine them as below
> >
> > /* If shared costing is enable, carry the shared balance value to heap
> > scan and disable the shared costing */
> >  if (VacuumSharedCostBalance)
> > {
> >      VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
> >      VacuumSharedCostBalance = NULL;
> >      VacuumActiveNWorkers = NULL;
> > }
> >
>
> makes sense to me, will change.
ok

>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] Block level parallel vacuum
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Unnecessary delay in streaming replication due to replay lag