Re: parallel vacuum comments

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: parallel vacuum comments
Дата
Msg-id CAD21AoD4ZZPi59UH=sPa51b4-RDSqBxWoVxjvbQJjkh7_=9utA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: parallel vacuum comments  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: parallel vacuum comments  (Peter Geoghegan <pg@bowt.ie>)
RE: parallel vacuum comments  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Список pgsql-hackers
On Wed, Nov 3, 2021 at 1:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 2, 2021 at 11:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > >
> >
> > > Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
> > > assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
> > > dedicated shmem area for the array of LVSharedIndStats (no more
> > > storing LVSharedIndStats entries at the end of the LVShared space in
> > > an ad-hoc, type unsafe way). There should be one array element for
> > > each and every index -- even those indexes where parallel index
> > > vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
> > > processing for "not worthwhile" indexes actually makes sense, BTW). We
> > > can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
> > > also add new per-index state fields to LVSharedIndStats itself. We
> > > could directly record the status of each index (e.g., parallel unsafe,
> > > amvacuumcleanup processing done, ambulkdelete processing done)
> > > explicitly. All code could safely subscript the LVSharedIndStats array
> > > directly, using idx style integers. That seems far more robust and
> > > consistent.
> >
> > Sounds good.
> >
> > During the development, I wrote the patch while considering using
> > fewer shared memory but it seems that it brought complexity (and
> > therefore the bug). It would not be harmful even if we allocate index
> > statistics on DSM for unsafe indexes and “not worthwhile" indexes in
> > practice.
> >
>
> If we want to allocate index stats for all indexes in DSM then why not
> consider it on the lines of buf/wal_usage means tack those via
> LVParallelState? And probably replace bitmap with an array of bools
> that indicates which indexes can be skipped by the parallel worker.
>

I've attached a draft patch. The patch incorporated all comments from
Andres except for the last comment that moves parallel related code to
another file. I'd like to discuss how we split vacuumlazy.c.

Regarding tests, I’d like to add tests to check if a vacuum with
multiple index scans (i.g., due to small maintenance_work_mem) works
fine. But a problem is that we need at least about 200,000 garbage
tuples to perform index scan twice even with the minimum
maintenance_work_mem. Which takes a time to load tuples.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: row filtering for logical replication
Следующее
От: Peter Smith
Дата:
Сообщение: Re: Added schema level support for publication.