Re: parallel vacuum comments

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: parallel vacuum comments
Дата
Msg-id CAD21AoB46U9SgWHSz3PZqBMDJbraMUBR7hFcMsny6tiKiuSk-Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: parallel vacuum comments  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: parallel vacuum comments  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: parallel vacuum comments  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Nov 1, 2021 at 5:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > For discussion, I've written a patch only for adding some tests to
> > parallel vacuum. The test includes the reported case where small
> > indexes are not processed by the leader process as well as cases where
> > different kinds of indexes (i.g., different amparallelvacuumoptions)
> > are vacuumed or cleaned up.
>
> I started looking at this because I want to commit something like it
> alongside a fix to bug #17245.
>
> While I tend to favor relatively heavy assertions (e.g., the
> heap_page_is_all_visible() related asserts I added to
> lazy_scan_prune()), the idea of having a whole shared memory area just
> for assertions seems a bit too much, even to me. I tried to simplify
> it by just adding a new field to LVSharedIndStats, which seemed more
> natural. It took me at least 15 minutes before I realized that I was
> actually repeating exactly the same mistake that led to bug #17245 in
> the first place. I somehow forgot that
> parallel_stats_for_idx()/IndStatsIsNull() will return NULL for any
> index that has already been deemed too small to be worth processing in
> parallel. Even after all that drama!

The idea of that patch was for back branches in order to not affect
non-enable-cassert builds. That said, I agree that it's an overkill
solution.

> Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
> assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
> dedicated shmem area for the array of LVSharedIndStats (no more
> storing LVSharedIndStats entries at the end of the LVShared space in
> an ad-hoc, type unsafe way). There should be one array element for
> each and every index -- even those indexes where parallel index
> vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
> processing for "not worthwhile" indexes actually makes sense, BTW). We
> can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
> also add new per-index state fields to LVSharedIndStats itself. We
> could directly record the status of each index (e.g., parallel unsafe,
> amvacuumcleanup processing done, ambulkdelete processing done)
> explicitly. All code could safely subscript the LVSharedIndStats array
> directly, using idx style integers. That seems far more robust and
> consistent.

Sounds good.

During the development, I wrote the patch while considering using
fewer shared memory but it seems that it brought complexity (and
therefore the bug). It would not be harmful even if we allocate index
statistics on DSM for unsafe indexes and “not worthwhile" indexes in
practice. Rather, tracking bulkdelete and vacuumcleanup completion
might enable us to improve the vacuum progress reporting so that the
progress stats view shows how many indexes have been vacuumed (or
cleaned up).

Anyway, I'll write a patch accordingly.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side
Следующее
От: Sasasu
Дата:
Сообщение: Re: XTS cipher mode for cluster file encryption