Re: [HACKERS] Block level parallel vacuum

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: [HACKERS] Block level parallel vacuum
Дата
Msg-id CAD21AoB=naMOoB8cgtMerh5qWAW4W9S9y7OZCFB=oq1zb_BWWQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> >
> > I see a much bigger problem with the way this patch collects the index
> > stats in shared memory.  IIUC, it allocates the shared memory (DSM)
> > for all the index stats, in the same way, considering its size as
> > IndexBulkDeleteResult.  For the first time, it gets the stats from
> > local memory as returned by ambulkdelete/amvacuumcleanup call and then
> > copies it in shared memory space.  There onwards, it always updates
> > the stats in shared memory by pointing each index stats to that
> > memory.  In this scheme, you overlooked the point that an index AM
> > could choose to return a larger structure of which
> > IndexBulkDeleteResult is just the first field.  This generally
> > provides a way for ambulkdelete to communicate additional private data
> > to amvacuumcleanup.  We use this idea in the gist index, see how
> > gistbulkdelete and gistvacuumcleanup works. The current design won't
> > work for such cases.

Indeed. That's a very good point. Thank you for pointing out.

> >
>
> Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I
> have a few observations about those which might help us to solve this
> problem for gist indexes:
> 1. Are we using memory context GistBulkDeleteResult->page_set_context?
>  It seems to me it is not being used.

Yes I also think this memory context is not being used.

> 2. Each time we perform gistbulkdelete, we always seem to reset the
> GistBulkDeleteResult stats, see gistvacuumscan.  So, how will it
> accumulate it for the cleanup phase when the vacuum needs to call
> gistbulkdelete multiple times because the available space for
> dead-tuple is filled.  It seems to me like we only use the stats from
> the very last call to gistbulkdelete.

I think you're right. gistbulkdelete scans all pages and collects all
internal pages and all empty pages. And then in gistvacuumcleanup it
uses them to unlink all empty pages. Currently it accumulates such
information over multiple gistbulkdelete calls due to missing
switching the memory context but I guess this code intends to use them
only from the very last call to gistbulkdelete.

> 3. Do we really need to give the responsibility of deleting empty
> pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> do it in gistbulkdelte?  I see one advantage of postponing it till the
> cleanup phase which is if somehow we can accumulate stats over
> multiple calls of gistbulkdelete, but I am not sure if it is feasible.
> At least, the way current code works, it seems that there is no
> advantage to postpone deleting empty pages till the cleanup phase.
>

Considering the current strategy of page deletion of gist index the
advantage of postponing the page deletion till the cleanup phase is
that we can do the bulk deletion in cleanup phase which is called at
most once. But I wonder if we can do the page deletion in the similar
way to btree index. Or even we use the current strategy I think we can
do that while not passing the pages information from bulkdelete to
vacuumcleanup using by GistBulkDeleteResult.

> If we avoid postponing deleting empty pages till the cleanup phase,
> then we don't have the problem for gist indexes.

Yes. But considering your pointing out I guess that there might be
other index AMs use the stats returned from bulkdelete in the similar
way to gist index (i.e. using more larger structure of which
IndexBulkDeleteResult is just the first field). If we have the same
concern the parallel vacuum still needs to deal with that as you
mentioned.

Regards,

--
Masahiko Sawada



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: Collation versioning
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Transparent Data Encryption (TDE) and encrypted files