Re: Parallel Apply
От | Konstantin Knizhnik |
---|---|
Тема | Re: Parallel Apply |
Дата | |
Msg-id | ae5c5a41-2f68-4088-8fcc-58ed71a7f82f@garret.ru обсуждение исходный текст |
Ответ на | Re: Parallel Apply (Nisha Moond <nisha.moond412@gmail.com>) |
Ответы |
Re: Parallel Apply
|
Список | pgsql-hackers |
On 18/08/2025 9:56 AM, Nisha Moond wrote: > On Wed, Aug 13, 2025 at 4:17 PM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: >> Here is the initial POC patch for this idea. >> > Thank you Hou-san for the patch. > > I did some performance benchmarking for the patch and overall, the > results show substantial performance improvements. > Please find the details as follows: > > Source code: > ---------------- > pgHead (572c0f1b0e) and v1-0001 patch > > Setup: > --------- > Pub --> Sub > - Two nodes created in pub-sub logical replication setup. > - Both nodes have the same set of pgbench tables created with scale=300. > - The sub node is subscribed to all the changes from the pub node's > pgbench tables. > > Workload Run: > -------------------- > - Disable the subscription on Sub node > - Run default pgbench(read-write) only on Pub node with #clients=40 > and run duration=10 minutes > - Enable the subscription on Sub once pgbench completes and then > measure time taken in replication. > ~~~ > > Test-01: Measure Replication lag > ---------------------------------------- > Observations: > --------------- > - Replication time improved as the number of parallel workers > increased with the patch. > - On pgHead, replicating a 10-minute publisher workload took ~46 minutes. > - With just 2 parallel workers (default), replication time was cut in > half, and with 8 workers it completed in ~13 minutes(3.5x faster). > - With 16 parallel workers, achieved ~3.7x speedup over pgHead. > - With 32 workers, performance gains plateaued slightly, likely due > to more workers running on the machine and work done parallelly is not > that high to see further improvements. > > Detailed Result: > ----------------- > Case Time_taken_in_replication(sec) rep_time_in_minutes > faster_than_head > 1. pgHead 2760.791 46.01318333 - > 2. patched_#worker=2 1463.853 24.3975 1.88 times > 3. patched_#worker=4 1031.376 17.1896 2.68 times > 4. patched_#worker=8 781.007 13.0168 3.54 times > 5. patched_#worker=16 741.108 12.3518 3.73 times > 6. patched_#worker=32 787.203 13.1201 3.51 times > ~~~~ > > Test-02: Measure number of transactions parallelized > ----------------------------------------------------- > - Used a top up patch to LOG the number of transactions applied by > parallel worker, applied by leader, and are depended. > - The LOG output e.g. - > ``` > LOG: parallelized_nxact: 11497254 dependent_nxact: 0 leader_applied_nxact: 600 > ``` > - parallelized_nxact: gives the number of parallelized transactions > - dependent_nxact: gives the dependent transactions > - leader_applied_nxact: gives the transactions applied by leader worker > (the required top-up v1-002 patch is attached.) > > Observations: > ---------------- > - With 4 to 8 parallel workers, ~80%-98% transactions are parallelized > - As the number of workers increased, the parallelized percentage > increased and reached 99.99% with 32 workers. > > Detailed Result: > ----------------- > case1: #parallel_workers = 2(default) > #total_pgbench_txns = 24745648 > parallelized_nxact = 14439480 (58.35%) > dependent_nxact = 16 (0.00006%) > leader_applied_nxact = 10306153 (41.64%) > > case2: #parallel_workers = 4 > #total_pgbench_txns = 24776108 > parallelized_nxact = 19666593 (79.37%) > dependent_nxact = 212 (0.0008%) > leader_applied_nxact = 5109304 (20.62%) > > case3: #parallel_workers = 8 > #total_pgbench_txns = 24821333 > parallelized_nxact = 24397431 (98.29%) > dependent_nxact = 282 (0.001%) > leader_applied_nxact = 423621 (1.71%) > > case4: #parallel_workers = 16 > #total_pgbench_txns = 24938255 > parallelized_nxact = 24937754 (99.99%) > dependent_nxact = 142 (0.0005%) > leader_applied_nxact = 360 (0.0014%) > > case5: #parallel_workers = 32 > #total_pgbench_txns = 24769474 > parallelized_nxact = 24769135 (99.99%) > dependent_nxact = 312 (0.0013%) > leader_applied_nxact = 28 (0.0001%) > > ~~~~~ > The scripts used for above tests are attached. > > Next, I plan to extend the testing to larger workloads by running > pgbench for 20–30 minutes. > We will also benchmark performance across different workload types to > evaluate the improvements once the patch has matured further. > > -- > Thanks, > Nisha I also did some benchmarking of the proposed parallel apply patch and compare it with my prewarming approach. And parallel apply is significantly more efficient than prefetch (it is expected). So I had two tests (more details here): https://www.postgresql.org/message-id/flat/84ed36b8-7d06-4945-9a6b-3826b3f999a6%40garret.ru#70b45c44814c248d3d519a762f528753 One is performing random updates and another - inserts with random key. I stop subscriber, apply workload at publisher during 100 seconds and then measure how long time it will take subscriber to caught up. update test (with 8 parallel apply workers): master: 8:30 min prefetch: 2:05 min parallel apply: 1:30 min insert test (with 8 parallel apply workers): master: 9:20 min prefetch: 3:08 min parallel apply: 1:54 min
В списке pgsql-hackers по дате отправления: