Need advice: Parallel query execution introduces performance regression

Поиск

Список

Период

Сортировка

От	Jinho Jung
Тема	Need advice: Parallel query execution introduces performance regression
Дата	29 марта 2019 г. 19:06:42
Msg-id	CAF6pyKfsYkkwk3-uCzcxXQTvY0SRp3SF=dFmwJXnP8vzVpXNjw@mail.gmail.com обсуждение исходный текст
Список	pgsql-performance

Дерево обсуждения

We noticed that the following SQL query runs 3 times slower on the latest version of PostgreSQL with parallel query execution. It would be helpful if someone could shed light on why this is happening.

Here’s the time taken to execute them on older (v9.5.16) and newer versions (v11.2) of PostgreSQL (in milliseconds):

+-------------+--------+---------+---------+-----------+

+-------------+--------+---------+---------+-----------+

| v9.5.16 | 88 | 937 | 4721 | 27241 |

| v11.2 | 288 | 2822 | 13838 | 85081 |

+-------------+--------+---------+---------+-----------+

We have shared the following details below:

1) the associated query,

2) the commit that activated it,

3) our high-level analysis,

4) query execution plans in old and new versions of PostgreSQL, and

5) information on reproducing these regressions.

### QUERY

select

ref_0.ol_delivery_d as c1

from

public.order_line as ref_0

where EXISTS (

select

ref_1.i_im_id as c0

from

public.item as ref_1

where ref_0.ol_d_id <= ref_1.i_im_id

)

### COMMIT

77cd477 (Enable parallel query by default.)

We found several other queries exhibiting performance regression related to this commit.

### HIGH-LEVEL ANALYSIS

We believe that this regression is due to parallel queries being enabled by default. Surprisingly, we found that even on a larger TPC-C database (scale factor of 50, roughly 4GB of size), parallel scan is still slower than the non-parallel execution plan in the old version, when the query is not returning any tuples.

### QUERY EXECUTION PLANS

[OLD version]

Nested Loop Semi Join (cost=0.00..90020417940.08 rows=30005835 width=8) (actual time=0.034..24981.895 rows=90017507 loops=1)

Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)

-> Seq Scan on order_line ref_0 (cost=0.00..2011503.04 rows=90017504 width=12) (actual time=0.022..7145.811 rows=90017507 loops=1)

-> Materialize (cost=0.00..2771.00 rows=100000 width=4) (actual time=0.000..0.000 rows=1 loops=90017507)

-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.006..0.006 rows=1 loops=1)

Planning time: 0.290 ms

Execution time: 27241.239 ms

[NEW version]

Gather (cost=1000.00..88047487498.82 rows=30005835 width=8) (actual time=0.265..82355.289 rows=90017507 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Nested Loop Semi Join (cost=0.00..88044485915.32 rows=12502431 width=8) (actual time=0.033..68529.259 rows=30005836 loops=3)

Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)

-> Parallel Seq Scan on order_line ref_0 (cost=0.00..1486400.93 rows=37507293 width=12) (actual time=0.023..2789.901 rows=30005836 loops=3)

-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.001..0.001 rows=1 loops=90017507)

Planning Time: 0.319 ms

Execution Time: 85081.158 ms

### REPRODUCING REGRESSION

* The queries can be downloaded here:

https://gts3.org/~/jjung/tpcc/case4.tar.gz

* You can reproduce these results by using the setup described in:

https://www.postgresql.org/message-id/BN6PR07MB3409922471073F2B619A8CA4EE640%40BN6PR07MB3409.namprd07.prod.outlook.com

Thanks for the pointers!

Best regards,

Jinho Jung

В списке pgsql-performance по дате отправления:

Предыдущее

От: Andreas Kretschmer
Дата: 29 марта 2019 г., 19:00:19
Сообщение: Re: endless quere when upsert with ON CONFLICT clause

Следующее

От: "tank.zhang"
Дата: 01 апреля 2019 г., 12:45:14
Сообщение: Postgresql Sort cost Poor performance?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Need advice: Parallel query execution introduces performance regression

Предыдущее

Следующее