Обсуждение: parallel data loading for pgbench -i
Hi, I propose a patch for speeding up pgbench -i through multithreading. To enable this, pass -j and then the number of workers you want to use. Here are some results I got on my laptop: master --- -i -s 100 done in 20.95 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 14.51 s, vacuum 0.27 s, primary keys 6.16 s). -i -s 100 --partitions=10 done in 29.73 s (drop tables 0.00 s, create tables 0.02 s, client-side generate 16.33 s, vacuum 8.72 s, primary keys 4.67 s). patch (-j 10) --- -i -s 100 -j 10 done in 18.64 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 5.82 s, vacuum 6.89 s, primary keys 5.93 s). -i -s 100 -j 10 --partitions=10 done in 14.66 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 8.42 s, vacuum 1.55 s, primary keys 4.68 s). The speedup is more significant for the partitioned use-case. This is because all workers can use COPY FREEZE (thus incurring a lower vacuum penalty) because they create their separate partitions. For the non-partitioned case the speedup is lower, but I observe it improves somewhat with larger scale factors. When parallel vacuum support is merged, this should further reduce the time. I'd still need to update docs, tests, better integrate the code with its surroundings, and other aspects. Would appreciate any feedback on what I have so far though. Thanks! Kind regards, Mircea Cadariu
Вложения
Hi Mircea,
I tested the patch on 19devel and it worked well for me.
Before applying it, -j is rejected in pgbench initialization mode as expected. After applying the patch, pgbench -i -s 100 -j 10 runs successfully and shows a clear speedup.
On my system the total runtime dropped to about 9.6s, with client-side data generation around 3.3s.
I also checked correctness after the run — row counts for pgbench_accounts, pgbench_branches, and pgbench_tellers all match the expected values.
Thanks for working on this, the improvement is very noticeable.
Best regards,
lakshmi
Hi,
I propose a patch for speeding up pgbench -i through multithreading.
To enable this, pass -j and then the number of workers you want to use.
Here are some results I got on my laptop:
master
---
-i -s 100
done in 20.95 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 14.51 s, vacuum 0.27 s, primary keys 6.16 s).
-i -s 100 --partitions=10
done in 29.73 s (drop tables 0.00 s, create tables 0.02 s, client-side
generate 16.33 s, vacuum 8.72 s, primary keys 4.67 s).
patch (-j 10)
---
-i -s 100 -j 10
done in 18.64 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 5.82 s, vacuum 6.89 s, primary keys 5.93 s).
-i -s 100 -j 10 --partitions=10
done in 14.66 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 8.42 s, vacuum 1.55 s, primary keys 4.68 s).
The speedup is more significant for the partitioned use-case. This is
because all workers can use COPY FREEZE (thus incurring a lower vacuum
penalty) because they create their separate partitions.
For the non-partitioned case the speedup is lower, but I observe it
improves somewhat with larger scale factors. When parallel vacuum
support is merged, this should further reduce the time.
I'd still need to update docs, tests, better integrate the code with its
surroundings, and other aspects. Would appreciate any feedback on what I
have so far though. Thanks!
Kind regards,
Mircea Cadariu
Hi Lakshmi,
Hi Mircea,
I tested the patch on 19devel and it worked well for me.
Before applying it,-jis rejected in pgbench initialization mode as expected. After applying the patch,pgbench -i -s 100 -j 10runs successfully and shows a clear speedup.
On my system the total runtime dropped to about 9.6s, with client-side data generation around 3.3s.
I also checked correctness after the run — row counts for pgbench_accounts, pgbench_branches, and pgbench_tellers all match the expected values.Thanks for working on this, the improvement is very noticeable.
Best regards,
lakshmi
Thanks for having a look and trying it out!
FYI this is one of Tomas Vondra's patch ideas from his blog [1].
I have attached a new version which now includes docs, tests, a proposed commit message, and an attempt to fix the current CI failures (Windows).
[1] - https://vondra.me/posts/patch-idea-parallel-pgbench-i
-- Thanks, Mircea Cadariu
Вложения
Thanks again for the updated patch.
I did some additional testing on 19devel with a larger scale factor.
For scale 100,parallel initialization with -j 10 shows a clear overall speedup and correct results ,as mentioned earlier.
For scale 500,i observed that client-side data generation becomes significantly faster with parallel loading,but the total run time was slightly higher than the serial case on my system.This appears to be mainly due to much longer vacuum phase after the parallel load.
so the parallel approach clearly improves data generation time,but the overall benefit may depend on scale and workload characteristics.
Regression tests still pass locally,and correctness checks look good.
just sharing these observations in case they are useful for further evaluation.
Best regards,
lakshmi
Hi Lakshmi,
On 19/01/2026 09:25, lakshmi wrote:Hi Mircea,
I tested the patch on 19devel and it worked well for me.
Before applying it,-jis rejected in pgbench initialization mode as expected. After applying the patch,pgbench -i -s 100 -j 10runs successfully and shows a clear speedup.
On my system the total runtime dropped to about 9.6s, with client-side data generation around 3.3s.
I also checked correctness after the run — row counts for pgbench_accounts, pgbench_branches, and pgbench_tellers all match the expected values.Thanks for working on this, the improvement is very noticeable.
Best regards,
lakshmiThanks for having a look and trying it out!
FYI this is one of Tomas Vondra's patch ideas from his blog [1].
I have attached a new version which now includes docs, tests, a proposed commit message, and an attempt to fix the current CI failures (Windows).
[1] - https://vondra.me/posts/patch-idea-parallel-pgbench-i
-- Thanks, Mircea Cadariu