Re: [HACKERS] Block level parallel vacuum

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: [HACKERS] Block level parallel vacuum
Дата
Msg-id CA+fd4k48uhavyuYmLj7FMz8X+i8BXAVKWmetekObvssLOvB9QQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Block level parallel vacuum  (Sergei Kornilov <sk@zsrv.org>)
Ответы Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
Re: [HACKERS] Block level parallel vacuum  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Sun, 1 Dec 2019 at 18:31, Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hi
>
> > I think I got your point. Your proposal is that it's more efficient if
> > we make the leader process vacuum the index that can be processed only
> > the leader process (i.e. indexes not supporting parallel index vacuum)
> > while workers are processing indexes supporting parallel index vacuum,
> > right? That way, we can process indexes in parallel as much as
> > possible.
>
> Right
>
> > So maybe we can call vacuum_or_cleanup_skipped_indexes first
> > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
> > there are parallel-safe remaining indexes after the leader finished
> > vacuum_or_cleanup_indexes_worker, as described on your proposal.
>
> I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing
indexesthat support the parallel index vacuum, along with parallel workers.
 
> Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before
vacuum_or_cleanup_indexes_workeror something with similar effect.
 
> If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation.

I think your idea might not work well in some cases. That is, I think
there are some cases where it's better if leader participates to
parallel vacuum as a worker as soon as possible especially if a table
has many indexes that designedly don't support parallel vacuum (e.g.
bulkdelete of brin and using VACUUM_OPTION_PARALLEL_COND_CLEANUP).
Suppose the table has both 3 indexes that support parallel vacuum and
takes time 5 sec, 10 sec and 10 sec to vacuum respectively and 3
indexes that don't support and takes 2 sec for each. In current patch
we launch 2 workers. Then they take two indexes to vacuum and will
take 5 sec and 10 sec. At the same time the leader processes 3 indexes
that don't support parallel index and takes 6 sec. Therefore after the
worker finishes its index it takes the next index and takes 10 sec
more. The total execution time will be 15 sec. On the other hand, if
the leader participated to parallel vacuum first the total execution
time can be 11 sec (taking 5 sec and 2 sec * 3).

It's just an example, I'm not saying your idea is bad. ISTM the idea
is good on an assumption that all indexes take the same time or take a
long time so I'd also like to consider if this is true even in
production and which approaches is better if we don't have such
assumption.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: surprisingly expensive join planning query
Следующее
От: Dave Cramer
Дата:
Сообщение: Re: Binary support for pgoutput plugin