Re: [HACKERS] Block level parallel vacuum WIP
От | Masahiko Sawada |
---|---|
Тема | Re: [HACKERS] Block level parallel vacuum WIP |
Дата | |
Msg-id | CAD21AoBaT0zvH85TTO3DDpARosVnKLLP8tOraQfYYZVCeV-zxQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Block level parallel vacuum WIP (Masahiko Sawada <sawada.mshk@gmail.com>) |
Ответы |
Re: [HACKERS] Block level parallel vacuum WIP
(Masahiko Sawada <sawada.mshk@gmail.com>)
|
Список | pgsql-hackers |
On Sun, Mar 5, 2017 at 4:09 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Sun, Mar 5, 2017 at 12:14 PM, David Steele <david@pgmasters.net> wrote: >> On 3/4/17 9:08 PM, Masahiko Sawada wrote: >>> On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>> Yes, it's taking a time to update logic and measurement but it's >>>>> coming along. Also I'm working on changing deadlock detection. Will >>>>> post new patch and measurement result. >>>> >>>> I think that we should push this patch out to v11. I think there are >>>> too many issues here to address in the limited time we have remaining >>>> this cycle, and I believe that if we try to get them all solved in the >>>> next few weeks we're likely to end up getting backed into some choices >>>> by time pressure that we may later regret bitterly. Since I created >>>> the deadlock issues that this patch is facing, I'm willing to try to >>>> help solve them, but I think it's going to require considerable and >>>> delicate surgery, and I don't think doing that under time pressure is >>>> a good idea. >>>> >>>> From a fairness point of view, a patch that's not in reviewable shape >>>> on March 1st should really be pushed out, and we're several days past >>>> that. >>>> >>> >>> Agreed. There are surely some rooms to discuss about the design yet, >>> and it will take long time. it's good to push this out to CF2017-07. >>> Thank you for the comment. >> >> I have marked this patch "Returned with Feedback." Of course you are >> welcome to submit this patch to the 2017-07 CF, or whenever you feel it >> is ready. > > Thank you! > I re-considered the basic design of parallel lazy vacuum. I didn't change the basic concept of this feature and usage, the lazy vacuum still executes with some parallel workers. In current design, dead tuple TIDs are shared with all vacuum workers including leader process when table has index. If we share dead tuple TIDs, we have to make two synchronization points: before starting vacuum and before clearing dead tuple TIDs. Before starting vacuum we have to make sure that the dead tuple TIDs are not added no more. And before clearing dead tuple TIDs we have to make sure that it's used no more. For index vacuum, each indexes is assigned to a vacuum workers based on ParallelWorkerNumber. For example, if a table has 5 indexes and vacuum with 2 workers, the leader process and one vacuum worker are assigned to 2 indexes, and another vacuum process is assigned the remaining one. The following steps are how the parallel vacuum processes if table has indexes. 1. The leader process and workers scan the table in parallel using ParallelHeapScanDesc, and collect dead tuple TIDs to shared memory. 2. Before vacuum on table, the leader process sort the dead tuple TIDs in physical order once all workers completes to scan the table. 3. In vacuum on table, the leader process and workers reclaim garbage on table in block-level parallel. 4. In vacuum on indexes, the indexes on table is assigned to particular parallel worker or leader process. The process assigned to a index vacuums on the index. 5. Before back to scanning the table, the leader process clears the dead tuple TIDs once all workers completes to vacuum on table and indexes. Attached the latest patch but it's still PoC version patch and contains some debug codes. Note that this patch still requires another patch which moves the relation extension lock out of heavy-weight lock[1]. The parallel lazy vacuum patch could work even without [1] patch but could fail during vacuum in some cases. Also, I attached the result of performance evaluation. The table size is approximately 300MB ( > shared_buffers) and I deleted tuples on every blocks before execute vacuum so that vacuum visits every blocks. The server spec is * Intel Xeon E5620 @ 2.4Ghz (8cores) * 32GB RAM * ioDrive According to the result of table with indexes, performance of lazy vacuum improved up to a point where the number of indexes and parallel degree are the same. If a table has 16 indexes and vacuum with 16 workers, parallel vacuum is 10x faster than single process execution. Also according to the result of table with no indexes, the parallel vacuum is 5x faster than single process execution at 8 parallel degree. Of course we can vacuum only for indexes I'm planning to work on that in PG11, will register it to next CF. Comment and feedback are very welcome. [1] https://www.postgresql.org/message-id/CAD21AoAmdW7eWKi28FkXXd_4fjSdzVDpeH1xYVX7qx%3DSyqYyJA%40mail.gmail.com Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Julien RouhaudДата:
Сообщение: Re: [HACKERS] proposal: psql: check env variable PSQL_PAGER
Следующее
От: Rafia SabihДата:
Сообщение: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables