Re: Vacuum: allow usage of more than 1GB of work mem
От | Claudio Freire |
---|---|
Тема | Re: Vacuum: allow usage of more than 1GB of work mem |
Дата | |
Msg-id | CAGTBQpa6NFGO_6g_y_7zQx8L9GcHDSQKYdo1tGuh791z6PYgEg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Vacuum: allow usage of more than 1GB of work mem (Claudio Freire <klaussfreire@gmail.com>) |
Ответы |
Re: Vacuum: allow usage of more than 1GB of work mem
(Masahiko Sawada <sawada.mshk@gmail.com>)
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Список | pgsql-hackers |
On Thu, Sep 15, 2016 at 1:16 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Wed, Sep 14, 2016 at 12:24 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >> On Wed, Sep 14, 2016 at 12:17 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> >>> I am kind of doubtful about this whole line of investigation because >>> we're basically trying pretty hard to fix something that I'm not sure >>> is broken. I do agree that, all other things being equal, the TID >>> lookups will probably be faster with a bitmap than with a binary >>> search, but maybe not if the table is large and the number of dead >>> TIDs is small, because cache efficiency is pretty important. But even >>> if it's always faster, does TID lookup speed even really matter to >>> overall VACUUM performance? Claudio's early results suggest that it >>> might, but maybe that's just a question of some optimization that >>> hasn't been done yet. >> >> FYI, the reported impact was on CPU time, not runtime. There was no >> significant difference in runtime (real time), because my test is >> heavily I/O bound. >> >> I tested with a few small tables and there was no significant >> difference either, but small tables don't stress the array lookup >> anyway so that's expected. >> >> But on the assumption that some systems may be CPU bound during vacuum >> (particularly those able to do more than 300-400MB/s sequential I/O), >> in those cases the increased or decreased cost of lazy_tid_reaped will >> directly correlate to runtime. It's just none of my systems, which all >> run on amazon and is heavily bandwidth constrained (fastest I/O >> subsystem I can get my hands on does 200MB/s). > > Attached is the patch with the multiarray version. > > The tests are weird. Best case comparison over several runs, to remove > the impact of concurrent activity on this host (I couldn't remove all > background activity even when running the tests overnight, the distro > adds tons of crons and background cleanup tasks it would seem), > there's only very mild CPU impact. I'd say insignificant, as it's well > below the mean variance. I reran the tests on a really dedicated system, and with a script that captured a lot more details about the runs. The system isn't impressive, an i5 with a single consumer HD and 8GB RAM, but it did the job. These tests make more sense, so I bet it was the previous tests that were spoiled by concurrent activity on the host. Attached is the raw output of the test, the script used to create it, and just in case the patch set used. I believe it's the same as the last one I posted, just rebased. In the results archive, the .vacuum prefix is the patched version with both patch 1 and 2, .git.ref is just patch 1 (without which the truncate takes unholily long). Grepping the results a bit, picking an average run out of all runs on each scale: Timings: Patched: s100: CPU: user: 3.21 s, system: 1.54 s, elapsed: 18.95 s. s400: CPU: user: 14.03 s, system: 6.35 s, elapsed: 107.71 s. s4000: CPU: user: 228.17 s, system: 108.33 s, elapsed: 3017.30 s. Unpatched: s100: CPU: user: 3.39 s, system: 1.64 s, elapsed: 18.67 s. s400: CPU: user: 15.39 s, system: 7.03 s, elapsed: 114.91 s. s4000: CPU: user: 282.21 s, system: 105.95 s, elapsed: 3017.28 s. Total I/O (in MB) Patched: s100: R:2.4 - W:5862 s400: R:1337.4 - W:29385.6 s4000: R:318631 - W:370154 Unpatched: s100: R:1412.4 - W:7644.6 s400: R:3180.6 - W:36281.4 s4000: R:330683 - W:370391 So, in essence, CPU time didn't get adversely affected. If anything, it got improved by about 20% on the biggest case (scale 4000). While total runtime didn't change much, I believe this is only due to the fact that the index is perfectly correlated (clustered?) since it's a pristine index, so index scans either remove or skip full pages, never leaving things half-way. A bloated index would probably show a substantially different behavior, I'll try to get a script that does it by running pgbench a while before the vacuum or something like that. However, the total I/O metric already shows remarkable improvement. This metric is measuring all the I/O including pgbench init, the initial vacuum pgbench init does, the delete and the final vacuum. So it's not just the I/O for the vacuum itself, but the whole run. We can see the patched version reading a lot less (less scans over the indexes will do that), and in some cases writing less too (again, less index scans may be performing less redundant writes when cleaning up/reclaiming index pages). I'll post when I get the results for the bloated case, but I believe this already shows substantial improvement as is. This approach can later be improved upon by turning tid segments into bitmaps if they're packed densely enough, but I believe this patch represents a sensible first step before attempting that.
Вложения
В списке pgsql-hackers по дате отправления: