Performance regression related to parallel execution

Поиск

Список

Период

Сортировка

От	Jinho Jung
Тема	Performance regression related to parallel execution
Дата	11 марта 2019 г. 22:19:37
Msg-id	CAF6pyKfRchfdLSvFOBJ4U96AhEFsHbm-gGcrQrjXdqPC4q=Hpg@mail.gmail.com обсуждение исходный текст
Список	pgsql-performance

Дерево обсуждения

Hello,

We noticed that the following SQL queries are running 3 times slower on the latest version of PostgreSQL. Here’s the time taken to execute them on older (v9.5.16) and newer versions (v11.2) of PostgreSQL (in milliseconds):

+-----------------------+--------+---------+---------+-----------+

+-----------------------+--------+---------+---------+-----------+

| Query 1 (v9.5.16) | 88 | 937 | 4721 | 27241 |

| Query 1 (v11.2) | 288 | 2822 | 13838 | 85081 |

+-----------------------+--------+---------+---------+-----------+

| Query 2 (v9.5.16) | 39 | X | X | X |

| Query 2 (v11.2) | 80 | X | X | X |

+-----------------------+--------+---------+---------+-----------+

For each regression, we share:

1) the associated query,

2) the commit that activated it,

3) our high-level analysis, and

4) query execution plans in old and new versions of PostgreSQL.

All these regressions are observed on the latest version. (v11.2 and v9.5.16)

We found several other regression related to this commit:

77cd477 (Enable parallel query by default.)

* You can download the queries at:

https://gts3.org/~/jjung/tpcc/case3.tar.gz

* You can reproduce the result by using the same setup that we described

before (As Andrew mentioned before, we increased default work_mem to 128MB):

https://www.postgresql.org/message-id/BN6PR07MB3409922471073F2B619A8CA4EE640%40BN6PR07MB3409.namprd07.prod.outlook.com

###### Query 1:

select

ref_0.ol_delivery_d as c1

from

public.order_line as ref_0

where EXISTS (

select

ref_1.i_im_id as c0

from

public.item as ref_1

where ref_0.ol_d_id <= ref_1.i_im_id

)

- Commit: 77cd477 (Enable parallel query by default.)

- Our analysis: We believe that this regression is due to parallel queries being enabled by default. Surprisingly, we found that even on a larger TPC-C database (scale factor of 50, roughly 4GB), parallel scan is still slower than the non-parallel one in the old version, when the query is not returning any tuples.

- Query Execution Plans

[OLD version]

Nested Loop Semi Join (cost=0.00..90020417940.08 rows=30005835 width=8) (actual time=0.034..24981.895 rows=90017507 loops=1)

Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)

-> Seq Scan on order_line ref_0 (cost=0.00..2011503.04 rows=90017504 width=12) (actual time=0.022..7145.811 rows=90017507 loops=1)

-> Materialize (cost=0.00..2771.00 rows=100000 width=4) (actual time=0.000..0.000 rows=1 loops=90017507)

-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.006..0.006 rows=1 loops=1)

Planning time: 0.290 ms

Execution time: 27241.239 ms

[NEW version]

Gather (cost=1000.00..88047487498.82 rows=30005835 width=8) (actual time=0.265..82355.289 rows=90017507 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Nested Loop Semi Join (cost=0.00..88044485915.32 rows=12502431 width=8) (actual time=0.033..68529.259 rows=30005836 loops=3)

Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)

-> Parallel Seq Scan on order_line ref_0 (cost=0.00..1486400.93 rows=37507293 width=12) (actual time=0.023..2789.901 rows=30005836 loops=3)

-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.001..0.001 rows=1 loops=90017507)

Planning Time: 0.319 ms

Execution Time: 85081.158 ms

##### QUERY 2

select

ref_0.c_id as c0

from

public.customer as ref_0

where EXISTS (

select

ref_0.c_city as c0

from

public.order_line as ref_1

left join public.new_order as ref_2

on (ref_1.ol_supply_w_id = ref_2.no_w_id)

where (ref_1.ol_delivery_d > ref_0.c_since)

)

- Our analysis : There seems to a problem with parallel execution logic. It is unclear why only one worker is launched, since the default behavior of PostgreSQL is to launch at least two workers. We found that this query does not run to completion within a day when we increase the size of the DB. (e.g., from scale factor 1 to scale factor 10/50/300).

- Commit: 16be2fd (Make dsa_allocate interface more like MemoryContextAlloc)

- Query execution plans:

[Old version]

Nested Loop Semi Join (cost=224.43..910152608046.08 rows=10000 width=4) (actual time=3.681..37.842 rows=30000 loops=1)

Join Filter: (ref_1.ol_delivery_d > ref_0.c_since)

-> Seq Scan on customer ref_0 (cost=0.00..2569.00 rows=30000 width=12) (actual time=0.020..22.582 rows=30000 loops=1)

-> Materialize (cost=224.43..48521498.97 rows=2406886812 width=8) (actual time=0.000..0.000 rows=1 loops=30000)

-> Hash Left Join (cost=224.43..27085162.91 rows=2406886812 width=8) (actual time=3.652..3.652 rows=1 loops=1)

Hash Cond: (ref_1.ol_supply_w_id = ref_2.no_w_id)

-> Seq Scan on order_line ref_1 (cost=0.00..6711.48 rows=300148 width=12) (actual time=0.016..0.016 rows=1 loops=1)

-> Hash (cost=124.19..124.19 rows=8019 width=4) (actual time=3.569..3.569 rows=8019 loops=1)

Buckets: 8192 Batches: 1 Memory Usage: 346kB

-> Seq Scan on new_order ref_2 (cost=0.00..124.19 rows=8019 width=4) (actual time=0.020..1.770 rows=8019 loops=1)

Planning time: 0.927 ms

Execution time: 39.166 ms

[New version]

Gather (cost=1224.43..672617098015.10 rows=10000 width=4) (actual time=3.792..78.702 rows=30000 loops=1)

Workers Planned: 1

Workers Launched: 1

-> Nested Loop Semi Join (cost=224.43..672617096015.10 rows=5882 width=4) (actual time=4.383..65.544 rows=15000 loops=2)

Join Filter: (ref_1.ol_delivery_d > ref_0.c_since)

-> Parallel Seq Scan on customer ref_0 (cost=0.00..2445.47 rows=17647 width=12) (actual time=0.029..13.054 rows=15000 loops=2)

-> Hash Left Join (cost=224.43..27085162.91 rows=2406886812 width=8) (actual time=0.002..0.002 rows=1 loops=30000)

Hash Cond: (ref_1.ol_supply_w_id = ref_2.no_w_id)

-> Seq Scan on order_line ref_1 (cost=0.00..6711.48 rows=300148 width=12) (actual time=0.002..0.002 rows=1 loops=30000)

-> Hash (cost=124.19..124.19 rows=8019 width=4) (actual time=4.094..4.094 rows=8019 loops=2)

Buckets: 8192 Batches: 1 Memory Usage: 346kB

-> Seq Scan on new_order ref_2 (cost=0.00..124.19 rows=8019 width=4) (actual time=0.025..1.995 rows=8019 loops=2)

Planning Time: 1.154 ms

Execution Time: 80.217 ms

В списке pgsql-performance по дате отправления:

Предыдущее

От: tayeb merabti
Дата: 11 марта 2019 г., 17:14:50
Сообщение: Wired execution in my benchmark using postgresql 9.6

Следующее

От: Mariel Cherkassky
Дата: 12 марта 2019 г., 10:58:32
Сообщение: Re: ERROR: found xmin from before relfrozenxid

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Performance regression related to parallel execution

Предыдущее

Следующее