Performance of the partitioning in the large scale

Поиск
Список
Период
Сортировка
От Kato, Sho
Тема Performance of the partitioning in the large scale
Дата
Msg-id 25C1C6B2E7BE044889E4FE8643A58BA963B1F96F@G01JPEXMBKW03
обсуждение исходный текст
Ответы Re: Performance of the partitioning in the large scale  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers

Hi,

 

Thanks to your efforts the performance of the partitioned table has improved greatly.

Since I evaluated the performance by combining the performance improvement patches proposed in PG12, I share it.

 

The purpose of this evaluation is to organize performance issues of the table which is partitioned large number like over thousand and I want to bring the partitioned table performance close to the no partitioned table performance.

 

I used pgbench.

On the premise that tables don't have data and plan_cache_mode is auto.

 

- source:

 

master(38763d6778 Date:   Thu Sep 20 15:52:39 2018 +1200) + v9 patch[1] + v5 patch[2] + v3 patch[3]

 

[1] Reduce partition tuple routing overheads

    https://commitfest.postgresql.org/19/1690/

 

[2] Revise executor's handling of range table relations

    https://commitfest.postgresql.org/19/1758/

 

[3] Speed up planning with partitions

    https://commitfest.postgresql.org/19/1778/

 

 

- table definition:

 

When 6400 items of data is inserted to parent table, the each leaf partitions have the same number of items.

For example, the following DDL is the number of leaf partitions is a hundred.

 

create table test.accounts_history(id serial, aid int, delta int, mtime timestamp without time zone) partition by range(aid);

 

create table test.account_part_1 partition of test.accounts for values from (1) to (65);

create table test.account_part_2 partition of test.accounts for values from (65) to (129);

...

create table test.account_part_100 partition of test.accounts for values from (6337) to (6400);

 

create table test.ah_part_1 partition of test.accounts_history for values from (1) to (65);

create table test.ah_part_2 partition of test.accounts_history for values from (65) to (129);

...

create table test.ah_part_100 partition of test.accounts_history for values from (6337) to (6400);

 

 

- benchmark script:

 

I make SQL which only one leaf partition is targeted in each case of SELECT/UPDATE/DELETE/INSERT.

 

\set aid random(1, 6400)

\set delta random(-5000, 5000)

 

SELECT abalance FROM test.accounts WHERE aid = :aid;

  or

INSERT INTO test.accounts_history (aid, delta, mtime) VALUES (:aid, :delta, CURRENT_TIMESTAMP);

  or

UPDATE test.accounts SET abalance = abalance + :delta WHERE aid = :aid;

  or

DELETE FROM test.accounts where aid = :aid;

 

 

- results:

 

1. simple mode results:

 

part_num is the number of partition and 0 means no partitioned case.

tps_ex is tps of excluding connections establishing.

 

Also, after pgbench, I evaluate Planning time and Execution Time using explain analyze.

plan_time_avg and execute_time_avg are average of explain analyze when executed ten times.

 

pgbench -n -T 60 -r -f select.sql

 

part_num |   tps_ex    | plan_time_avg | execute_time_avg

----------+-------------+---------------+------------------

        0 |  8285.83582 |        0.0528 |           0.0222

      100 |   5948.1711 |        0.1342 |           0.0306

      200 | 5436.438478 |          0.15 |           0.0298

      400 | 4523.867744 |         0.148 |             0.03

      800 | 3460.625739 |        0.1447 |           0.0305

     1600 | 2428.795542 |        0.1528 |           0.0303

     3200 | 1539.672214 |        0.1552 |           0.0316

     6400 |  880.965232 |        0.1704 |           0.0288

(8 rows)

 

pgbench -n -T 60 -r -f update.sql

 

part_num |   tps_ex    | plan_time_avg | execute_time_avg

----------+-------------+---------------+------------------

        0 |  7360.58261 |        0.0596 |           0.0417

      100 | 4633.880563 |        0.1564 |            0.105

      200 | 3972.737702 |         0.152 |           0.1007

      400 |  3000.23471 |        0.1594 |           0.1039

      800 | 2139.676379 |        0.1664 |           0.1055

     1600 | 1348.553673 |         0.165 |           0.1056

     3200 |   787.48559 |        0.1774 |           0.1124

     6400 |  411.575671 |        0.1823 |           0.1089

(8 rows)

 

pgbench -n -T 60 -r -f delete.sql

 

part_num |   tps_ex    | plan_time_avg | execute_time_avg

----------+-------------+---------------+------------------

        0 |  8133.84019 |         0.057 |           0.0403

      100 | 5150.452458 |        0.1398 |           0.0936

      200 |  4352.69018 |        0.1414 |           0.0964

      400 |  3298.86364 |        0.1459 |            0.099

      800 | 2245.946178 |        0.1559 |           0.1029

     1600 |  1386.92366 |        0.1591 |           0.1048

     3200 |  802.024765 |        0.1617 |           0.1042

     6400 |  407.214158 |         0.168 |           0.1087

(8 rows)

 

pgbench -n -T 60 -r -f insert.sql

 

part_num |   tps_ex    | plan_time_avg | execute_time_avg

----------+-------------+---------------+------------------

        0 | 5246.142416 |        0.0307 |           0.0601

      100 | 2190.331571 |        0.0311 |           0.3587

      200 | 1452.601752 |        0.0301 |           0.5065

      400 |  863.771879 |         0.031 |           0.7864

      800 |  482.528223 |        0.0308 |            1.361

     1600 |  254.026173 |        0.0325 |           2.4999

     3200 |   133.04952 |        0.0327 |             4.94

     6400 |   68.541207 |        0.0335 |           9.8271

(8 rows)

 

SELECT/UPDATE/DELETE case

 

Planning Time and Execution Time are almost same. but, tps decreases as the number of partitions increases. why?

 

INSERT case

 

Execution Time is longer as the number of partitions increase.

This is why find_all_inheritors locks all leaf partitions. Improvement idea was already proposed in [1] but, not discussion started yet.

 

 

2. prepared mode results:

 

part_num and tps_ex are same with simple mode results.

plan_time_avg and execute_time are the average of the 7th to 10th out of 10 times.

 

pgbench -n -M prepared -T 60 -r -f select.sql

 

part_num |   tps_ex    | plan_time_avg | execute_time_avg

----------+-------------+---------------+------------------

        0 | 10939.35898 |         0.007 |            0.021

      100 | 4705.814419 |         0.063 |            0.025

      200 | 3884.922764 |         0.067 |           0.0255

      400 | 2762.569518 |       0.06575 |           0.0255

      800 | 1826.824659 |          0.07 |          0.02625

     1600 | 1119.961247 |         0.065 |            0.021

     3200 |  588.001498 |       0.08725 |          0.02775

     6400 |  218.817738 |        0.1325 |          0.03325

(8 rows)

 

pgbench -n -M prepared -T 60 -r -f update.sql

 

part_num |    tps_ex    | plan_time_avg | execute_time_avg

----------+--------------+---------------+------------------

        0 | 10996.995001 |       0.01475 |            0.029

      100 |  3711.790226 |       0.07975 |            0.074

      200 |  2724.792289 |        0.0865 |           0.1055

      400 |  1837.348771 |       0.09175 |            0.104

      800 |  1105.378276 |        0.0925 |          0.10425

     1600 |   619.924085 |       0.09425 |           0.1115

     3200 |   289.202665 |         0.109 |          0.12475

     6400 |   110.149098 |        0.1695 |           0.1585

(8 rows)

 

pgbench -n -M prepared -T 60 -r -f delete.sql

 

part_num |    tps_ex    | plan_time_avg | execute_time_avg

----------+--------------+---------------+------------------

        0 | 11398.458657 |         0.009 |          0.02625

      100 |  3997.105459 |         0.072 |          0.08675

      200 |  3124.704202 |        0.0745 |          0.09725

      400 |  2113.749027 |         0.083 |          0.10125

      800 |  1365.142607 |         0.081 |            0.107

     1600 |   740.248148 |        0.0895 |          0.10925

     3200 |   363.054579 |       0.09525 |          0.11375

     6400 |   124.795721 |       0.14725 |            0.158

(8 rows)

 

pgbench -n -M prepared -T 60 -r -f insert.sql

 

part_num |   tps_ex    | plan_time_avg | execute_time_avg

----------+-------------+---------------+------------------

        0 |  6308.66377 |       0.01175 |          0.04225

      100 | 2476.919703 |         0.014 |          0.26375

      200 |  1569.59956 |       0.01325 |          0.38625

      400 |    912.4026 |         0.014 |           0.6055

      800 |  502.504371 |        0.0145 |          1.05725

     1600 |   259.97686 |       0.01425 |           1.9835

     3200 |  136.809811 |       0.01225 |            3.877

     6400 |   70.489791 |        0.0105 |          7.76575

(8 rows)

 

SELECT/UPDATE/DELETE case

 

As with simple mode results, tps decreases as the number of partitions increases. why?

Also, I wonder why no prepared case is faster than prepared case.

 

When I see all 10 times results of the explain analyze, I realize Planning Time of 6th explain analyze is much longer.

why does it take so long to build a new generic plan?

 

part_num | num | plan_time | execute_time

----------+-----+-----------+--------------

     6400 |   1 |     0.459 |        0.043

     6400 |   2 |     0.168 |        0.028

     6400 |   3 |     0.169 |        0.028

     6400 |   4 |     0.151 |        0.027

     6400 |   5 |     0.148 |        0.027

     6400 |   6 |   449.213 |        0.052

     6400 |   7 |     0.142 |        0.033

     6400 |   8 |      0.12 |        0.034

     6400 |   9 |     0.136 |        0.033

     6400 |  10 |     0.132 |        0.033

 

INSERT case

 

I think the cause of performance degradation is the same as simple mode.

 

 

In the case of prepared statement, even if the target of SELECT statement is one leaf partition, since LockRelationOid of AcquireExecutorLocks is executed for all leaves, the performance decreases as the number of partitions increases.

Can AcquireExecutorLocks lock only the target leaf partition?

 

I am planning to investigate using a system TAP etc. for other bottlenecks.

If you have any other convenient method, please let me know.

Also, if there is something already known as a bottleneck, please let me know.

 

regards,

 

--

Sho Kato

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Performance improvements for src/port/snprintf.c
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Performance improvements for src/port/snprintf.c