Re: Performance regressions found using sqlfuzz

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Performance regressions found using sqlfuzz
Дата
Msg-id 20190215164434.twk54vfogbpxqujm@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Performance regressions found using sqlfuzz  ("Jung, Jinho" <jinho.jung@gatech.edu>)
Ответы Re: Performance regressions found using sqlfuzz  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Hi,

On 2019-02-14 17:27:40 +0000, Jung, Jinho wrote:
> ####### QUERY 2:
> 
>   select distinct
>     ref_0.i_im_id as c0,
>     ref_1.ol_dist_info as c1
>   from
>       public.item as ref_0 right join
>       public.order_line as ref_1
>       on (ref_0.i_id = 5)
> 
> - Commit: 84f9a35 (Improve estimate of distinct values in estimate_num_groups())
> 
> - Our analysis: We believe that this regression is related to the new logic for estimating the number of distinct
valuesin the optimizer. This is affecting even queries with point lookups (ref_0.i_id = 5) in the TPC-C benchmark.
 
> 
> - Query Execution Plans
> 
>   [OLD version]
>   HashAggregate  (cost=11972.38..12266.20 rows=29382 width=29) (actual time=233.543..324.973 rows=300144 loops=1)
>     Group Key: ref_0.i_im_id, ref_1.ol_dist_info
>     ->  Nested Loop Left Join  (cost=0.29..10471.64 rows=300148 width=29) (actual time=0.012..114.955 rows=300148
loops=1)
>           ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=25) (actual time=0.004..25.061
rows=300148loops=1)
 
>           ->  Materialize  (cost=0.29..8.32 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=300148)
>                 ->  Index Scan using item_pkey on item ref_0  (cost=0.29..8.31 rows=1 width=4) (actual
time=0.005..0.005rows=1 loops=1)
 
>                       Index Cond: (i_id = 10)
>   Planning time: 0.267 ms
>   Execution time: 338.027 ms
> 
> 
>   [NEW version]
>   Unique  (cost=44960.08..47211.19 rows=300148 width=29) (actual time=646.545..885.502 rows=300144 loops=1)
>     ->  Sort  (cost=44960.08..45710.45 rows=300148 width=29) (actual time=646.544..838.933 rows=300148 loops=1)
>           Sort Key: ref_0.i_im_id, ref_1.ol_dist_info
>           Sort Method: external merge  Disk: 11480kB
>           ->  Nested Loop Left Join  (cost=0.29..10471.64 rows=300148 width=29) (actual time=0.016..111.889
rows=300148loops=1)
 
>                 ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=25) (actual
time=0.004..24.612rows=300148 loops=1)
 
>                 ->  Materialize  (cost=0.29..8.32 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=300148)
>                       ->  Index Scan using item_pkey on item ref_0  (cost=0.29..8.31 rows=1 width=4) (actual
time=0.008..0.008rows=1 loops=1)
 
>                             Index Cond: (i_id = 10)
>   Planning Time: 0.341 ms
>   Execution Time: 896.662 ms

This seems perfectly alright - the old version used more memory than
work_mem actually should have allowed. I'd bet you get the performance
back if you set work mem to a bigger value.


> ####### QUERY 3:
> 
>   select
>     cast(ref_1.ol_i_id as int4) as c0
>   from
>     public.stock as ref_0
>       left join public.order_line as ref_1
>       on (ref_1.ol_number is not null)
>   where ref_1.ol_number is null
> 
> - Commit: 77cd477 (Enable parallel query by default.)
> 
> - Our analysis: We believe that this regression is due to parallel queries being enabled by default. Surprisingly, we
foundthat even on a larger TPC-C database (scale factor of 50, roughly 4GB), parallel scan is still slower than the
non-parallelone in the old version, when the query is not returning any tuples.
 
> 
> - Query Execution Plans
> 
>   [OLD version]
>   Nested Loop Anti Join  (cost=0.00..18006.81 rows=1 width=4) (actual time=28.689..28.689 rows=0 loops=1)
>     ->  Seq Scan on stock ref_0  (cost=0.00..5340.00 rows=100000 width=0) (actual time=0.028..15.722 rows=100000
loops=1)
>     ->  Materialize  (cost=0.00..9385.22 rows=300148 width=4) (actual time=0.000..0.000 rows=1 loops=100000)
>           ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=4) (actual time=0.004..0.004 rows=1
loops=1)
>                 Filter: (ol_number IS NOT NULL)
>   Planning time: 0.198 ms
>   Execution time: 28.745 ms
> 
>   [NEW version]
>   Gather  (cost=1000.00..15164.93 rows=1 width=4) (actual time=91.022..92.634 rows=0 loops=1)
>     Workers Planned: 2
>     Workers Launched: 2
>     ->  Nested Loop Anti Join  (cost=0.00..14164.83 rows=1 width=4) (actual time=88.889..88.889 rows=0 loops=3)
>           ->  Parallel Seq Scan on stock ref_0  (cost=0.00..4756.67 rows=41667 width=0) (actual time=0.025..7.331
rows=33333loops=3)
 
>           ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=4) (actual time=0.002..0.002 rows=1
loops=100000)
>                 Filter: (ol_number IS NOT NULL)
>   Planning Time: 0.258 ms
>   Execution Time: 92.699 ms

I'm not particularly bothered - this is a pretty small difference. Most
of the time here is likely spent starting the workers, the cost of which
is hard to predict/model accurately.


> ####### QUERY 4:
> 
>   select
>     ref_0.s_dist_06 as c0
>   from
>     public.stock as ref_0
>   where (ref_0.s_w_id < cast(least(0, 1) as int8))
> 
> - Commit: 5edc63b (Account for the effect of lossy pages when costing bitmap scans)
> 
> - Our analysis: We believe that this regression has to do with two factors: 1) conditional expression (e.g., LEAST or
NULLIF)are not reduced to constants unlike string functions (e.g., CHAR_LENGTH)  2) change in the cost estimation
functionfor bitmap scan. Execution time grows by 3 orders of magnitude.  We note that this regression is only observed
onlarge databases (e.g., scale factor of 50).
 
> 
> - Query Execution Plans
> 
>   [OLD version]
>   Bitmap Heap Scan on stock ref_0  (cost=31201.11..273173.13 rows=1666668 width=25) (actual time=0.005..0.005 rows=0
loops=1)
>     Recheck Cond: (s_w_id < (LEAST(0, 1))::bigint)
>     ->  Bitmap Index Scan on stock_pkey  (cost=0.00..30784.44 rows=1666668 width=0) (actual time=0.005..0.005 rows=0
loops=1)
>           Index Cond: (s_w_id < (LEAST(0, 1))::bigint)
>   Planning time: 0.228 ms
>   Execution time: 0.107 ms
> 
>   [NEW version]
>   Seq Scan on stock ref_0  (cost=0.00..304469.17 rows=1666613 width=25) (actual time=716.397..716.397 rows=0
loops=1)
>     Filter: (s_w_id < (LEAST(0, 1))::bigint)
>     Rows Removed by Filter: 5000000
>   Planning Time: 0.221 ms
>   Execution Time: 716.513 ms


Hm. The primary problem here is that the estimation both before and
after are really bad. So I don't think the commit you point out here is
really to blame.  I'm not that bothered by the query not being great,
given the weird construction with LEAST(), but we probably could fix
that pretty easily.


> ####### QUERY 1:
> 
>   select
>     ref_0.o_d_id as c0
>   from
>     public.oorder as ref_0
>   where EXISTS (
>     select
>       1
>     from
>       (select distinct
>           ref_0.o_entry_d as c0,
>           ref_1.c_credit as c1
>        from
>           public.customer as ref_1
>        where (false)
>       ) as subq_1
>   );
> 
> - Commit: bf6c614 (Do execGrouping.c via expression eval machinery, take two)
> 
> - Our analysis: We are not sure about the root cause of this regression. This might have to do with grouping logic.
> 
> - Query Execution Plans
> 
>   [OLD version]
>   Seq Scan on oorder ref_0  (cost=0.00..77184338.54 rows=15022 width=4) (actual time=34.173..34.173 rows=0 loops=1)
>     Filter: (SubPlan 1)
>     Rows Removed by Filter: 30044
>     SubPlan 1
>       ->  Subquery Scan on subq_1  (cost=2569.01..2569.03 rows=1 width=0) (actual time=0.001..0.001 rows=0
loops=30044)
>             ->  HashAggregate  (cost=2569.01..2569.02 rows=1 width=3) (actual time=0.000..0.000 rows=0 loops=30044)
>                   Group Key: ref_0.o_entry_d, ref_1.c_credit
>                   ->  Result  (cost=0.00..2569.00 rows=1 width=3) (actual time=0.000..0.000 rows=0 loops=30044)
>                         One-Time Filter: false
>                         ->  Seq Scan on customer ref_1  (cost=0.00..2569.00 rows=1 width=3) (never executed)
>   Planning time: 0.325 ms
>   Execution time: 34.234 ms
> 
>   [NEW version]
>   Seq Scan on oorder ref_0  (cost=0.00..1152.32 rows=15022 width=4) (actual time=74.799..74.799 rows=0 loops=1)
>     Filter: (SubPlan 1)
>     Rows Removed by Filter: 30044
>     SubPlan 1
>       ->  Subquery Scan on subq_1  (cost=0.00..0.02 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=30044)
>             ->  HashAggregate  (cost=0.00..0.01 rows=1 width=20) (actual time=0.000..0.000 rows=0 loops=30044)
>                   Group Key: ref_0.o_entry_d, c_credit
>                   ->  Result  (cost=0.00..0.00 rows=0 width=20) (actual time=0.000..0.000 rows=0 loops=30044)
>                         One-Time Filter: false
>   Planning Time: 0.350 ms
>   Execution Time: 79.237 ms

I think that might be fixed in the latest point release. I screwed up
and made resets of tuple hash tables (and there's 30044 of those here)
more expensive. It's fixed in the latest minor release however.


Greetings,

Andres Freund


В списке pgsql-performance по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: Re: Performance regressions found using sqlfuzz
Следующее
От: Benedict Holland
Дата:
Сообщение: Re: constraint exclusion with ineq condition (Re: server hardware tuning.)