Re: decoupling table and index vacuum
От | Dilip Kumar |
---|---|
Тема | Re: decoupling table and index vacuum |
Дата | |
Msg-id | CAFiTN-visUO9VTz2+h224z5QeUjKhKNdSfjaCucPhYJdbzxx0g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: decoupling table and index vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Ответы |
Re: decoupling table and index vacuum
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
On Mon, Sep 27, 2021 at 8:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Hi, Here is the first WIP patch for the decoupling table and index vacuum. The first mail of the thread has already explained the complete background of why we want to do this so instead of describing that I will directly jump into explaining what these patches do. Currently, the table vacuum and index vacuum are executed as a single operation. Basically, we vacuum the table by performing hot pruning and remembering dead items in the cache and then we perform the index vacuum and perform the second pass of the heap vacuum under which we mark items unused. In this patch, we make these multiple vacuum passes as independent operations. So the idea is that we provide multiple vacuum options under that, the user can perform the independent operations i.e. "VACUUM (heap_hot_prune) tbl_name" for performing just the hot prune (first vacuum) pass, "VACUUM (heap_vacuum) tbl_name" for the second heap pass to set dead item unused for which index vacuum is done. And additionally, we are now allowing users to just perform the index vacuum i.e. "VACUUM idx_name". So under the heap_hot_prune pass, we will generate the dead tids and instead of directly performing the index vacuum we will flush those dead tids into the conveyor belt using Deadtidstore interfaces. Then in the index pass, we will read the data from the conveyor belt and perform the index vacuum and at last, in the heap_vacuum pass, we will read the data from the conveyor belt and mark all dead items unused. However, in the second pass, we can only mark those items unused which are dead, and for which all the indexes for the table are already vacuumed. So for identifying that in the pg_class entry we store the conveyor belt pageno up to which we have already done the index vacuum for the index related entry and we have already done the heap_vacuum pass for the table related entry. Additionally while doing the hot_prune pass we also check if the item is already dead and index vacuum is also done for that then we directly set it unused, for this, we use Deadtidstore interfaces. Deadtidstore provides interfaces over the conveyor belt for storing and retrieving dead tids into the conveyor belt. This module maintains a DeadTidState which keeps track of the current insertion progress i.e the first and the last conveyor belt page for the current vacuum run. And on the completion of the vacuum run, this takes care of setting the complete vacuum run bound by storing the last conveyor belt pageno of the current vacuum run into the special space of the first conveyor belt page for this run. This also provides the infrastructure to avoid adding duplicate tids into the conveyor belt. Basically, if we perform the first vacuum pass multiple times without executing the second vacuum pass then it is possible that we encounter the same dead tids in the conveyor belt so this module maintains a cache over the conveyor belt such that it only loads the data into the cache w.r.t the current block the vacuum is processing so we don't need to maintain a huge cache. Test example: CREATE TABLE t (a int); CREATE INDEX idx on t(a); INSERT INTO t VALUES (generate_series(1,1000000)); DELETE FROM t where a > 300; VACUUM (heap_hot_prune) t; VACUUM idx; "VACUUM (heap_vacuum) t; TODO: - This is just a POC patch to discuss the design idea and needs a lot of improvement and testing. - We are using a slightly different format for storing the dead tids into the conveyor belt which is explained in the patch but the traditional multi-pass vacuum is still using the same format (array of ItemPointeData), so we need to unify that format. - Performance testing. - Cleaner interfaces so that we can easily be integrated with auto vacuum, currently, this is not provided for the manual vacuum. - Add test cases. Patches can be applied on the latest conveyor belt patches[1] [1] https://www.postgresql.org/message-id/CAFiTN-sQUddO9JPiH3tz%2BvbNqRqi_pgndecy8k2yXAnO3ymqZA%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Julien RouhaudДата:
Сообщение: Re: Schema variables - new implementation for Postgres 15