RE: speeding up planning with partitions
| От | Kato, Sho |
|---|---|
| Тема | RE: speeding up planning with partitions |
| Дата | |
| Msg-id | 25C1C6B2E7BE044889E4FE8643A58BA963AF581E@G01JPEXMBKW03 обсуждение исходный текст |
| Ответ на | speeding up planning with partitions (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
| Ответы |
Re: speeding up planning with partitions
|
| Список | pgsql-hackers |
Hi, Amit
Great!
With the total number of records being 6400, I benchmarked while increasing the number of partitions from 100 to 6400.
Applying three all patches, 20% performance improved for 100 partitions.
I have the same number of records for each partition, do you do the same?
Also, in my case, performance was better when not prepare.
I think these patches do not improve execute case, so we need faster runtime pruning patch[1], right?
Details of measurement conditions and results are as follows.
- base source
master(@777e6ddf17) + Speeding up Insert v8 patch[1]
- table definition(e.g. 100 partition)
create table test.accounts(aid serial, abalance int) partition by range(aid);
create table test.accounts_history(id serial, aid int, delta int, mtime timestamp without time zone) partition by
range(aid);
create table test.account_part_1 partition of test.accounts for values from (1) to (65);
create table test.account_part_2 partition of test.accounts for values from (65) to (129);
...
create table test.account_part_100 partition of test.accounts for values from (6337) to (6400);
create table test.ah_part_1 partition of test.accounts_history for values from (1) to (65);
create table test.ah_part_2 partition of test.accounts_history for values from (65) to (129);
...
create table test.ah_part_100 partition of test.accounts_history for values from (6337) to (6400);
- benchmark script
\set aid random(1, 6400)
\set delta random(-5000, 5000)
BEGIN;
UPDATE test.accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM test.accounts WHERE aid = :aid;
INSERT INTO test.accounts_history (aid, delta, mtime) VALUES (:aid, :delta, CURRENT_TIMESTAMP);
END;
- command option
pgbench -d testdb -f benchmark.pgbench -T 180 -r -n -M prepare
pgbench -d testdb -f benchmark.pgbench -T 180 -r -n
-results
base source no prepared
part_num | tps_ex | update_latency | select_latency | insert_latency
----------+------------+----------------+----------------+----------------
100 | 662.414805 | 0.357 | 0.265 | 0.421
200 | 494.478431 | 0.439 | 0.349 | 0.579
400 | 307.982089 | 0.651 | 0.558 | 0.927
800 | 191.360676 | 0.979 | 0.876 | 1.548
1600 | 75.344947 | 2.253 | 2.003 | 4.301
3200 | 30.643902 | 5.716 | 4.955 | 10.118
6400 | 16.726056 | 12.512 | 8.582 | 18.054
0001 no prepared
part_num | tps_ex | update_latency | select_latency | insert_latency
----------+------------+----------------+----------------+----------------
100 | 429.816329 | 0.811 | 0.75 | 0.365
200 | 275.211531 | 1.333 | 1.248 | 0.501
400 | 160.499833 | 2.384 | 2.252 | 0.754
800 | 79.387776 | 4.935 | 4.698 | 1.468
1600 | 24.787377 | 16.593 | 15.954 | 4.302
3200 | 9.846421 | 42.96 | 42.139 | 8.848
6400 | 4.919772 | 87.43 | 83.109 | 16.56
0001 prepared
part_num | tps_ex | update_latency | select_latency | insert_latency
----------+------------+----------------+----------------+----------------
100 | 245.100776 | 2.728 | 0.374 | 0.476
200 | 140.249283 | 5.116 | 0.603 | 0.686
400 | 67.608559 | 11.383 | 1.055 | 1.179
800 | 23.085806 | 35.781 | 2.585 | 2.677
1600 | 6.211247 | 141.093 | 7.096 | 6.785
3200 | 1.808214 | 508.045 | 15.741 | 13.243
6400 | 0.495635 | 1919.362 | 37.691 | 28.177
0001 + 0002 no prepared
part_num | tps_ex | update_latency | select_latency | insert_latency
----------+------------+----------------+----------------+----------------
100 | 682.53091 | 0.388 | 0.35 | 0.35
200 | 469.906601 | 0.543 | 0.496 | 0.51
400 | 321.915349 | 0.78 | 0.721 | 0.752
800 | 201.620975 | 1.246 | 1.156 | 1.236
1600 | 94.438204 | 2.612 | 2.335 | 2.745
3200 | 38.292922 | 6.657 | 5.579 | 6.808
6400 | 21.48462 | 11.989 | 10.104 | 12.601
0001 + 0002 prepared
part_num | tps_ex | update_latency | select_latency | insert_latency
----------+------------+----------------+----------------+----------------
100 | 591.10863 | 0.433 | 0.342 | 0.422
200 | 393.223638 | 0.625 | 0.522 | 0.614
400 | 253.672736 | 0.946 | 0.828 | 0.928
800 | 146.840262 | 1.615 | 1.448 | 1.604
1600 | 52.805593 | 4.656 | 3.811 | 4.473
3200 | 21.461606 | 11.48 | 9.56 | 10.661
6400 | 11.888232 | 22.869 | 16.841 | 18.871
0001 + 0002 + 0003 no prepared
part_num | tps_ex | update_latency | select_latency | insert_latency
----------+------------+----------------+----------------+----------------
100 | 798.962029 | 0.304 | 0.267 | 0.339
200 | 577.893396 | 0.384 | 0.346 | 0.487
400 | 426.542177 | 0.472 | 0.435 | 0.717
800 | 288.616213 | 0.63 | 0.591 | 1.162
1600 | 154.247034 | 1.056 | 0.987 | 2.384
3200 | 59.711446 | 2.416 | 2.233 | 6.514
6400 | 37.109761 | 3.387 | 3.099 | 11.762
0001 + 0002 + 0003 prepared
part_num | tps_ex | update_latency | select_latency | insert_latency
----------+------------+----------------+----------------+----------------
100 | 662.414805 | 0.357 | 0.265 | 0.421
200 | 494.478431 | 0.439 | 0.349 | 0.579
400 | 307.982089 | 0.651 | 0.558 | 0.927
800 | 191.360676 | 0.979 | 0.876 | 1.548
1600 | 75.344947 | 2.253 | 2.003 | 4.301
3200 | 30.643902 | 5.716 | 4.955 | 10.118
6400 | 16.726056 | 12.512 | 8.582 | 18.054
Although it may not be related to this, when measured with pg11 beta2, somehow the performance was better.
11beta2 + v1-0001-Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch[3] prepared
part_num | tps_ex | update_latency | select_latency | insert_latency
----------+-------------+----------------+----------------+----------------
100 | 756.07228 | 0.942 | 0.091 | 0.123
[1] https://www.postgresql.org/message-id/CAKJS1f_QN-nmf6jCQ4gRU_8ab0zrd0ms-U%3D_Dj0KUARJiuGpOA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAKJS1f9T_32Xpb-p8cWwo5ezSfVhXviUW8QTWncP8ksPHDRK8g%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz%3DGVBwvGh4a6xA%40mail.gmail.com
regards,
Sho Kato
-----Original Message-----
From: Amit Langote [mailto:Langote_Amit_f8@lab.ntt.co.jp]
Sent: Wednesday, August 29, 2018 9:06 PM
To: Pg Hackers <pgsql-hackers@postgresql.org>
Subject: speeding up planning with partitions
It is more or less well known that the planner doesn't perform well with more than a few hundred partitions even when
onlya handful of partitions are ultimately included in the plan. Situation has improved a bit in PG
11 where we replaced the older method of pruning partitions one-by-one using constraint exclusion with a much faster
methodthat finds relevant partitions by using partitioning metadata. However, we could only use it for SELECT queries,
becauseUPDATE/DELETE are handled by a completely different code path, whose structure doesn't allow it to call the new
pruningmodule's functionality. Actually, not being able to use the new pruning is not the only problem for
UPDATE/DELETE,more on which further below.
While situation improved with new pruning where it could, there are still overheads in the way planner handles
partitions. As things stand today, it will spend cycles and allocate memory for partitions even before pruning is
performed,meaning most of that effort could be for partitions that were better left untouched. Currently, planner will
lock,heap_open
*all* partitions, create range table entries and AppendRelInfos for them, and finally initialize RelOptInfos for them,
eventouching the disk file of each partition in the process, in an earlier phase of planning. All of that processing
isvain for partitions that are pruned, because they won't be included in the final plan. This problem grows worse as
thenumber of partitions grows beyond thousands, because range table grows too big.
That could be fixed by delaying all that per-partition activity to a point where pruning has already been performed, so
thatwe know the partitions to open and create planning data structures for, such as somewhere downstream to
query_planner. But before we can do that we must do something about the fact that UPDATE/DELETE won't be able to cope
withthat because the code path that currently handles the planning of UPDATE/DELETE on partitioned tables
(inheritance_plannercalled from
subquery_planner) relies on AppendRelInfos for all partitions having been initialized by an earlier planning phase.
Delayingit to query_planner would be too late, because inheritance_planner calls query_planner for each partition, not
forthe parent. That is, if query_planner, which is downstream to inheritance_planner, was in the charge of determining
whichpartitions to open, the latter wouldn't know which partitions to call the former for. :)
That would be fixed if there is no longer this ordering dependency, which is what I propose to do with the attached
patch0001. I've tried to describe how the patch manages to do that in its commit message, but I'll summarize here. As
thingsstand today, inheritance_planner modifies the query for each leaf partition to make the partition act as the
query'sresult relation instead of the original partitioned table and calls grouping_planner on the query. That means
anythingthat's joined to partitioned table looks to instead be joined to the partition and join paths are generated
likewise. Also, the resulting path's targetlist is adjusted to be suitable for the result partition. Upon studying how
thisworks, I concluded that the same result can be achieved if we call grouping_planner only once and repeat the
portionsof query_planner's and grouping_planner's processing that generate the join paths and appropriate target list,
respectively,for each partition. That way, we can rely on query_planner determining result partitions for us, which in
turnrelies on the faster partprune.c based method of pruning. That speeds things up in two ways. Faster pruning and
weno longer repeat common processing for each partition.
With 0001 in place, there is nothing that requires that partitions be opened by an earlier planning phase, so, I
proposepatch 0002, which refactors the opening and creation of planner data structures for partitions such that it is
nowperformed after pruning. However, it doesn't do anything about the fact that partitions are all still locked in the
earlierphase.
With various overheads gone thanks to 0001 and 0002, locking of all partitions via find_all_inheritos can be seen as
thesingle largest bottleneck, which 0003 tries to address. I've kept it a separate patch, because I'll need to think a
bitmore to say that it's actually to safe to defer locking to late planning, due mainly to the concern about the change
inthe order of locking from the current method. I'm attaching it here, because I also want to show the performance
improvementwe can expect with it.
I measured the gain in performance due to each patch on a modest virtual machine. Details of the measurement and
resultsfollow.
* Benchmark scripts
update.sql
update ht set a = 0 where b = 1;
select.sql
select * from ht where b = 1;
* Table:
create table ht (a int, b int) partition by hash (b) create table ht_1 partition of ht for values with (modulus N,
remainder0) ..
create table ht_N partition of ht for values with (modulus N, remainder N-1)
* Rounded tps with update.sql and select.sql against regular table (nparts = 0) and partitioned table with various
partitioncounts:
pgbench -n -T 60 -f update.sql
nparts master 0001 0002 0003
====== ====== ==== ==== ====
0 2856 2893 2862 2816
8 507 1115 1447 1872
16 260 765 1173 1892
32 119 483 922 1884
64 59 282 615 1881
128 29 153 378 1835
256 14 79 210 1803
512 5 40 113 1728
1024 2 17 57 1616
2048 0* 9 30 1471
4096 0+ 4 15 1236
8192 0= 2 7 975
* 0.46
+ 0.0064
= 0 (OOM on a virtual machine with 4GB RAM)
As can be seen here, 0001 is a big help for update queries.
pgbench -n -T 60 -f select.sql
For a select query that doesn't contain join and needs to scan only one
partition:
nparts master 0001 0002 0003
====== ====== ==== ==== ====
0 2290 2329 2319 2268
8 1058 1077 1414 1788
16 711 729 1124 1789
32 450 475 879 1773
64 265 272 603 1765
128 146 149 371 1685
256 76 77 214 1678
512 39 39 112 1636
1024 16 17 59 1525
2048 8 9 29 1416
4096 4 4 15 1195
8192 2 2 7 932
Actually, here we get almost same numbers with 0001 as with master, because 0001 changes nothing for SELECT queries.
Westart seeing improvement with 0002, the patch to delay opening partitions.
Thanks,
Amit
В списке pgsql-hackers по дате отправления: