Re: [HACKERS] Block level parallel vacuum

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: [HACKERS] Block level parallel vacuum
Дата
Msg-id CAFiTN-up=WE3bgyzCyEguGhUY7wuHQSGZbL_FkW_heZNDwY62w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Mon, Oct 14, 2019 at 3:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> >
> > I see a much bigger problem with the way this patch collects the index
> > stats in shared memory.  IIUC, it allocates the shared memory (DSM)
> > for all the index stats, in the same way, considering its size as
> > IndexBulkDeleteResult.  For the first time, it gets the stats from
> > local memory as returned by ambulkdelete/amvacuumcleanup call and then
> > copies it in shared memory space.  There onwards, it always updates
> > the stats in shared memory by pointing each index stats to that
> > memory.  In this scheme, you overlooked the point that an index AM
> > could choose to return a larger structure of which
> > IndexBulkDeleteResult is just the first field.  This generally
> > provides a way for ambulkdelete to communicate additional private data
> > to amvacuumcleanup.  We use this idea in the gist index, see how
> > gistbulkdelete and gistvacuumcleanup works. The current design won't
> > work for such cases.
> >
>
> Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I
> have a few observations about those which might help us to solve this
> problem for gist indexes:
> 1. Are we using memory context GistBulkDeleteResult->page_set_context?
>  It seems to me it is not being used.
To me also it appears that it's not being used.

> 2. Each time we perform gistbulkdelete, we always seem to reset the
> GistBulkDeleteResult stats, see gistvacuumscan.  So, how will it
> accumulate it for the cleanup phase when the vacuum needs to call
> gistbulkdelete multiple times because the available space for
> dead-tuple is filled.  It seems to me like we only use the stats from
> the very last call to gistbulkdelete.
IIUC, it is fine to use the stats from the latest gistbulkdelete call
because we are trying to collect the information of the empty pages
while scanning the tree.  So I think it would be fine to just use the
information collected from the latest scan otherwise we will get
duplicate information.

> 3. Do we really need to give the responsibility of deleting empty
> pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> do it in gistbulkdelte?  I see one advantage of postponing it till the
> cleanup phase which is if somehow we can accumulate stats over
> multiple calls of gistbulkdelete, but I am not sure if it is feasible.
It seems that we want to use the latest result. That might be the
reason for postponing to the cleanup phase.


> At least, the way current code works, it seems that there is no
> advantage to postpone deleting empty pages till the cleanup phase.
>
> If we avoid postponing deleting empty pages till the cleanup phase,
> then we don't have the problem for gist indexes.
>
> This is not directly related to this patch, so we can discuss these
> observations in a separate thread as well, but before that, I wanted
> to check your opinion to see if this makes sense to you as this will
> help us in moving this patch forward.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: v12.0: ERROR: could not find pathkey item to sort