Re: [HACKERS] Block level parallel vacuum
| От | Haribabu Kommi | 
|---|---|
| Тема | Re: [HACKERS] Block level parallel vacuum | 
| Дата | |
| Msg-id | CAJrrPGdALjr9veOoiM=s7sNhm0pYo8d1GjQgwK1qn53rCkYhfQ@mail.gmail.com обсуждение исходный текст | 
| Ответ на | Re: [HACKERS] Block level parallel vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) | 
| Ответы | Re: [HACKERS] Block level parallel vacuum | 
| Список | pgsql-hackers | 
On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
>
>
> + * Before starting parallel index vacuum and parallel cleanup index we launch
> + * parallel workers. All parallel workers will exit after processed all indexes
>
> parallel vacuum index and parallel cleanup index?
>
>
ISTM we're using like "index vacuuming", "index cleanup" and "FSM
vacuming" in vacuumlazy.c so maybe "parallel index vacuuming" and
"parallel index cleanup" would be better?
OK.
> + /*
> + * If there is already-updated result in the shared memory we
> + * use it. Otherwise we pass NULL to index AMs and copy the
> + * result to the shared memory segment.
> + */
> + if (lvshared->indstats[idx].updated)
> + result = &(lvshared->indstats[idx].stats);
>
> I didn't really find a need of the flag to differentiate the stats pointer from
> first run to second run? I don't see any problem in passing directing the stats
> and the same stats are updated in the worker side and leader side. Anyway no two
> processes will do the index vacuum at same time. Am I missing something?
>
> Even if this flag is to identify whether the stats are updated or not before
> writing them, I don't see a need of it compared to normal vacuum.
>
The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
first time execution. For example, btvacuumcleanup skips cleanup if
it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
amvacuumcleanup when the first time calling. And they store the result
stats to the memory allocated int the local memory. Therefore in the
parallel vacuum I think that both worker and leader need to move it to
the shared memory and mark it as updated as different worker could
vacuum different indexes at the next time.
OK, understood the point. But for btbulkdelete whenever the stats are NULL,
it allocates the memory. So I don't see a problem with it. 
The only problem is with btvacuumcleanup, when there are no dead tuples
present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
is called at the end of vacuum, in that scenario, there is code flow difference
based on the stats. so why can't we use the deadtuples number to differentiate
instead of adding another flag? And also this scenario is not very often, so avoiding
memcpy for normal operations would be better. It may be a small gain, just 
thought of it.
> + initStringInfo(&buf);
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker %s (planned: %d",
> + "launched %d parallel vacuum workers %s (planned: %d",
> + lvstate->pcxt->nworkers_launched),
> + lvstate->pcxt->nworkers_launched,
> + for_cleanup ? "for index cleanup" : "for index vacuum",
> + lvstate->pcxt->nworkers);
> + if (lvstate->options.nworkers > 0)
> + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
>
> what is the difference between planned workers and requested workers, aren't both
> are same?
The request is the parallel degree that is specified explicitly by
user whereas the planned is the actual number we planned based on the
number of indexes the table has. For example, if we do like 'VACUUM
(PARALLEL 3000) tbl' where the tbl has 4 indexes, the request is 3000
and the planned is 4. Also if max_parallel_maintenance_workers is 2
the planned is 2.
OK.
Regards,
Haribabu Kommi
Fujitsu Australia
В списке pgsql-hackers по дате отправления: