Обсуждение: Parallel Vacuum
Folks, is there any constrains/problems/etc. to run several vacuum processes in parallel while each one is 'vaccuming' one different table? Example: vacuum -d db1 -t table1 & vacuum -d db1 -t table2 & vacuum -d db1 -t table3 & wait (sorry if it was already asked, but I did not find an explicit answer in archives) Thanks for any inputs! Rgds, -Dimitri
Dimitri escribió: > Folks, > > is there any constrains/problems/etc. to run several vacuum processes in > parallel while each one is 'vaccuming' one different table? No, no problem. Keep in mind that if one of them takes a very long time, the others will not be able to remove dead tuples that were killed while the long vacuum was running -- unless you are in 8.2. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thursday 22 March 2007 14:52, Alvaro Herrera wrote: > Dimitri escribió: > > Folks, > > > > is there any constrains/problems/etc. to run several vacuum processes in > > parallel while each one is 'vaccuming' one different table? > > No, no problem. Keep in mind that if one of them takes a very long > time, the others will not be able to remove dead tuples that were > killed while the long vacuum was running -- unless you are in 8.2. Yes, I'm using the last 8.2.3 version. So, will they *really* processing in parallel, or will block each other step by step? Rgds, -Dimitri
Dimitri escribió: > On Thursday 22 March 2007 14:52, Alvaro Herrera wrote: > > Dimitri escribió: > > > Folks, > > > > > > is there any constrains/problems/etc. to run several vacuum processes in > > > parallel while each one is 'vaccuming' one different table? > > > > No, no problem. Keep in mind that if one of them takes a very long > > time, the others will not be able to remove dead tuples that were > > killed while the long vacuum was running -- unless you are in 8.2. > > Yes, I'm using the last 8.2.3 version. So, will they *really* processing in > parallel, or will block each other step by step? They won't block. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thursday 22 March 2007 16:12, Alvaro Herrera wrote: > Dimitri escribió: > > On Thursday 22 March 2007 14:52, Alvaro Herrera wrote: > > > Dimitri escribió: > > > > Folks, > > > > > > > > is there any constrains/problems/etc. to run several vacuum processes > > > > in parallel while each one is 'vaccuming' one different table? > > > > > > No, no problem. Keep in mind that if one of them takes a very long > > > time, the others will not be able to remove dead tuples that were > > > killed while the long vacuum was running -- unless you are in 8.2. > > > > Yes, I'm using the last 8.2.3 version. So, will they *really* processing > > in parallel, or will block each other step by step? > > They won't block. Wow! Excellent! :) So, in this case why not to add 'parallel' option integrated directly into the 'vacuumdb' command? In my case I have several CPU on the server and quite powerful storage box which is not really busy with a single vacuum. So, my idea is quite simple - speed-up vacuum with parallel execution (just an algorithm): -------------------------------------------------------------------------- PLL=parallel_degree select tab_size, tabname, dbname from ... order by tab_size desc; vacuumdb -d $dbname -t $tabname 2>&1 > /tmp/vac.$dbname.$tabname.log & while (pgrep vacuumdb | wc -l ) >= $PLL sleep 1 end end wait -------------------------------------------------------------------------- biggest tables are vacuumed first, etc. But of course it will be much more cool to have something like: vacuumdb -a -P parallel_degree What do you think? ;) Rgds, -Dimitri
Dimitri escribió: > But of course it will be much more cool to have something like: > > vacuumdb -a -P parallel_degree > > What do you think? ;) I think our time is better spent enhancing autovacuum ... but if you feel like improving vacuumdb, be my guest. This discussion belongs into pgsql-hackers though, and any patches you may feel like submitting for review should go to pgsql-patches. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Mar 22, 2007 at 04:55:02PM +0100, Dimitri wrote: >In my case I have several CPU on the server and quite powerful storage box >which is not really busy with a single vacuum. So, my idea is quite simple - >speed-up vacuum with parallel execution (just an algorithm): Vacuum is I/O intensive, not CPU intensive. Running more of them will probably make things slower rather than faster, unless each thing you're vacuuming has its own (separate) disks. The fact that your CPU isn't pegged while vacuuming suggests that your disk is already your bottleneck--and doing multiple sequential scans on the same disk will definitely be slower than doing one. Mike Stone
Mike, you're right until you're using a single disk :) Now, imagine you have more disks - more I/O operations you may perform, and you'll need also a CPU time to process them :) until you fully use one CPU per 'vacuumdb' - and then you stop... As well, even in case when CPU is not highly used by vacuumdb - single process is still not able to get a max performance of the storage array, just because you need several concurrent I/O running in the system to reach max throughput. And even filesystem might help you here - it's not all... More concurrent writers you have - higher performance you reach (until real limit)... In my case I have a small storage array capable to give you more than 500MB/sec and say 5000 op/s. All my data are striped throw all array disks. Single 'vacuumdb' process here become more CPU-bound rather I/O as it cannot fully load storage array... So, more vacuum processes I start in parallel - faster I'll finish database vacuuming. Best regards! -Dimitri On Thursday 22 March 2007 18:10, Michael Stone wrote: > On Thu, Mar 22, 2007 at 04:55:02PM +0100, Dimitri wrote: > >In my case I have several CPU on the server and quite powerful storage box > >which is not really busy with a single vacuum. So, my idea is quite simple > > - speed-up vacuum with parallel execution (just an algorithm): > > Vacuum is I/O intensive, not CPU intensive. Running more of them will > probably make things slower rather than faster, unless each thing you're > vacuuming has its own (separate) disks. The fact that your CPU isn't > pegged while vacuuming suggests that your disk is already your > bottleneck--and doing multiple sequential scans on the same disk will > definitely be slower than doing one. > > Mike Stone > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
On Thu, Mar 22, 2007 at 07:24:38PM +0100, Dimitri wrote: >you're right until you're using a single disk :) >Now, imagine you have more disks I do have more disks. I maximize the I/O performance by dedicating different sets of disks to different tables. YMMV. I do suggest watching your I/O rates and wallclock time if you try this to see if your aggregate is actually substantially faster than the single case. (I assume that you haven't yet gotten far enough to actually do performance testing.) You may also want to look into tuning your sequential I/O performance. Mike Stone
On Thursday 22 March 2007 19:46, Michael Stone wrote: > On Thu, Mar 22, 2007 at 07:24:38PM +0100, Dimitri wrote: > >you're right until you're using a single disk :) > >Now, imagine you have more disks > > I do have more disks. I maximize the I/O performance by dedicating > different sets of disks to different tables. YMMV. I do suggest watching > your I/O rates and wallclock time if you try this to see if your > aggregate is actually substantially faster than the single case. (I > assume that you haven't yet gotten far enough to actually do performance > testing.) You may also want to look into tuning your sequential I/O > performance. > > Mike Stone Mike, specially for you :) Parallel Vacuum Test ====================== - Database 'db_OBJ' PgSQL 8.2.3 tables: object1, object2, ... object8 (all the same) volume: 10.000.000 rows in each table, 22GB in total - Script Mono Vacuum $ cat vac_mono.sh /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object1 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object2 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object3 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object4 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object5 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object6 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object7 /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object8 $ - Script Parallel Vacuum $ cat vac_pll.sh /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object1 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object2 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object3 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object4 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object5 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object6 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object7 & /usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object8 & wait $ Test 1: Cold Clean database (already previously vacuumed) ========================================================= Scenario: - stop database - flush FS cache (umount/mount) - start database - execute vacuum script $ time sh vac_mono.sh real 4m24.23s user 0m0.00s sys 0m0.01s $ time sh vac_pll.sh real 1m9.36s user 0m0.00s sys 0m0.01s Test 2: Hot Dirty database (modified and not vacuumed) ====================================================== Scenario: - stop database - flush FS cache (umount/mount) - start database - execute 200.000 updates against each from 8 object' tables - execute vacuum script $ time sh vac_mono.sh real 9m36.90s user 0m0.00s sys 0m0.01s $ time sh vac_pll.sh real 2m10.41s user 0m0.00s sys 0m0.02s Speed-up x4 is obtained just because single vacuum process reaching max 80MB/sec in throughput, while with 8 parallel vacuum processes I'm jumping to 360MB/sec... And speakink about Sequential I/O: while you're doing read - file system may again prefetch incoming data in way once you reclaim next read - your data will be already in FS cache. However, file system cannot 'pre-write' data for you - so having more concurrent writers helps a lot! (Of course in case you have a storage configured to keep concurrent I/O :)) Well, why all this staff?... Let's imagine once you need more performance, and you buy 10x times more performant storage box, will you still able to kill it with a single-process I/O activity? No... :) To scale well you need to be able to split your work in several task executed in parallel. And personally, I'm very happy we can do it with vacuum now - the one of the most critical part of PostgreSQL... Best regards! -Dimitri
On Fri, Mar 23, 2007 at 04:37:32PM +0100, Dimitri wrote: >Speed-up x4 is obtained just because single vacuum process reaching max >80MB/sec in throughput I'd look at trying to improve that, it seems very low. Mike Stone