Обсуждение: [PERFORM] Execution plan analysis
Dear all
Someone help me analyze the execution plans below, is the query 12 of
TPC-H benchmark [1].
I need to find out why the query without index runs faster (7 times)
than with index, although the costs are smaller (see table).
I have other cases that happened in the same situation. The server
parameters have been set with PGTUNE. I use postgresql version 9.6.4
on Debian 8 OS with 4 GB memory.
Query|Index(yes/no) |Time Spend |Cost Total
===================================
12 Yes 00:08:58 2710805.51
12 No 00:01:42 3365996.34
----------------- Explain Analyze Query 12 WITH INDEX
----------------------------
Sort (cost=2710805.51..2710805.51 rows=1 width=27) (actual
time=537713.672..537713.672 rows=2 loops=1)
Sort Key: lineitem.l_shipmode
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=2710805.47..2710805.50 rows=1 width=27)
(actual time=537713.597..537713.598 rows=2 loops=1)
-> Merge Join (cost=1994471.69..2708777.28 rows=270426
width=27) (actual time=510717.977..536818.802 rows=311208 loops=1)
Merge Cond: (orders.o_orderkey = lineitem.l_orderkey)
-> Index Scan using orders_pkey on orders
(cost=0.00..672772.57 rows=15000045 width=20) (actual
time=0.019..20898.325 rows=14999972 loops=1)
-> Sort (cost=1994455.40..1995131.47
rows=270426 width=19) (actual time=510690.114..510915.678 rows=311208
loops=1)
Sort Key: lineitem.l_orderkey
Sort Method: external sort Disk: 11568kB
-> Bitmap Heap Scan on
lineitem (cost=336295.10..1970056.39 rows=270426 width=19) (actual
time=419620.817..509685.421 rows=311208 loops=1)
Recheck Cond:
(l_shipmode = ANY (_{TRUCK,AIR}_::bpchar[]))
Filter:
((l_commitdate < l_receiptdate) AND (l_shipdate < l_commitdate) AND
(l_receiptdate >= _1997-01-01_::date) AND (l_receiptdate < _1998-01-01
00:00:00_::timestamp without time zone))
-> Bitmap
Index Scan on idx_l_shipmodelineitem000 (cost=0.00..336227.49
rows=15942635 width=0) (actual time=419437.172..419437.172
rows=17133713 loops=1)
Index
Cond: (l_shipmode = ANY (_{TRUCK,AIR}_::bpchar[]))
Total runtime: 537728.848 ms
----------------- Explain Analyze Query 12 WITHOUT INDEX
----------------------------
Sort (cost=3365996.33..3365996.34 rows=1 width=27) (actual
time=101850.883..101850.884 rows=2 loops=1)
Sort Key: lineitem.l_shipmode Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=3365996.30..3365996.32 rows=1 width=27)
(actual time=101850.798..101850.800 rows=2 loops=1)
-> Merge Join (cost=2649608.28..3363936.68 rows=274616
width=27) (actual time=75497.181..100938.830 rows=311208 loops=1)
Merge Cond: (orders.o_orderkey = lineitem.l_orderkey)
-> Index Scan using orders_pkey on orders
(cost=0.00..672771.90 rows=15000000 width=20) (actual
time=0.020..20272.828 rows=14999972 loops=1)
-> Sort (cost=2649545.68..2650232.22
rows=274616 width=19) (actual time=75364.450..75618.772 rows=311208
loops=1)
Sort Key: lineitem.l_orderkey
Sort Method: external sort
Disk: 11568kB
-> Seq Scan on lineitem
(cost=0.00..2624738.17 rows=274616 width=19) (actual
time=0.839..74391.087 rows=311208 loops=1)
Filter: ((l_shipmode
= ANY (_{TRUCK,AIR}_::bpchar[])) AND (l_commitdate < l_receiptdate)
AND (l_shipdate < l_commitdate) AND (l_receiptdate >=
_1997-01-01_::date) AND (l_receiptdate < _1998-01-01
00:00:00_::timestamp without time zone))
Total runtime:
101865.253 ms
-=========------ SQL query 12 ----------------------
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('TRUCK', 'AIR')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1997-01-01'
and l_receiptdate < date '1997-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode
2017-08-25 5:31 GMT-03:00 Neto pr <netopr9@gmail.com>:
> Dear all
>
> Someone help me analyze the execution plans below, is the query 12 of
> TPC-H benchmark [1].
> I need to find out why the query without index runs faster (7 times)
> than with index, although the costs are smaller (see table).
> I have other cases that happened in the same situation. The server
> parameters have been set with PGTUNE. I use postgresql version 9.6.4
> on Debian 8 OS with 4 GB memory.
>
> Query|Index(yes/no) |Time Spend |Cost Total
> ===================================
> 12 Yes 00:08:58 2710805.51
> 12 No 00:01:42 3365996.34
>
>
> ----------------- Explain Analyze Query 12 WITH INDEX
> ----------------------------
> Sort (cost=2710805.51..2710805.51 rows=1 width=27) (actual
> time=537713.672..537713.672 rows=2 loops=1)
> Sort Key: lineitem.l_shipmode
> Sort Method: quicksort Memory: 25kB
> -> HashAggregate (cost=2710805.47..2710805.50 rows=1 width=27)
> (actual time=537713.597..537713.598 rows=2 loops=1)
> -> Merge Join (cost=1994471.69..2708777.28 rows=270426
> width=27) (actual time=510717.977..536818.802 rows=311208 loops=1)
> Merge Cond: (orders.o_orderkey = lineitem.l_orderkey)
> -> Index Scan using orders_pkey on orders
> (cost=0.00..672772.57 rows=15000045 width=20) (actual
> time=0.019..20898.325 rows=14999972 loops=1)
> -> Sort (cost=1994455.40..1995131.47
> rows=270426 width=19) (actual time=510690.114..510915.678 rows=311208
> loops=1)
> Sort Key: lineitem.l_orderkey
> Sort Method: external sort Disk: 11568kB
> -> Bitmap Heap Scan on
> lineitem (cost=336295.10..1970056.39 rows=270426 width=19) (actual
> time=419620.817..509685.421 rows=311208 loops=1)
> Recheck Cond:
> (l_shipmode = ANY (_{TRUCK,AIR}_::bpchar[]))
> Filter:
> ((l_commitdate < l_receiptdate) AND (l_shipdate < l_commitdate) AND
> (l_receiptdate >= _1997-01-01_::date) AND (l_receiptdate < _1998-01-01
> 00:00:00_::timestamp without time zone))
> -> Bitmap
> Index Scan on idx_l_shipmodelineitem000 (cost=0.00..336227.49
> rows=15942635 width=0) (actual time=419437.172..419437.172
> rows=17133713 loops=1)
> Index
> Cond: (l_shipmode = ANY (_{TRUCK,AIR}_::bpchar[]))
>
> Total runtime: 537728.848 ms
>
>
> ----------------- Explain Analyze Query 12 WITHOUT INDEX
> ----------------------------
> Sort (cost=3365996.33..3365996.34 rows=1 width=27) (actual
> time=101850.883..101850.884 rows=2 loops=1)
> Sort Key: lineitem.l_shipmode Sort Method: quicksort Memory: 25kB
> -> HashAggregate (cost=3365996.30..3365996.32 rows=1 width=27)
> (actual time=101850.798..101850.800 rows=2 loops=1)
> -> Merge Join (cost=2649608.28..3363936.68 rows=274616
> width=27) (actual time=75497.181..100938.830 rows=311208 loops=1)
> Merge Cond: (orders.o_orderkey = lineitem.l_orderkey)
> -> Index Scan using orders_pkey on orders
> (cost=0.00..672771.90 rows=15000000 width=20) (actual
> time=0.020..20272.828 rows=14999972 loops=1)
> -> Sort (cost=2649545.68..2650232.22
> rows=274616 width=19) (actual time=75364.450..75618.772 rows=311208
> loops=1)
> Sort Key: lineitem.l_orderkey
> Sort Method: external sort
> Disk: 11568kB
> -> Seq Scan on lineitem
> (cost=0.00..2624738.17 rows=274616 width=19) (actual
> time=0.839..74391.087 rows=311208 loops=1)
> Filter: ((l_shipmode
> = ANY (_{TRUCK,AIR}_::bpchar[])) AND (l_commitdate < l_receiptdate)
> AND (l_shipdate < l_commitdate) AND (l_receiptdate >=
> _1997-01-01_::date) AND (l_receiptdate < _1998-01-01
> 00:00:00_::timestamp without time zone))
> Total runtime:
> 101865.253 ms
>
> -=========------ SQL query 12 ----------------------
> select
> l_shipmode,
> sum(case
> when o_orderpriority = '1-URGENT'
> or o_orderpriority = '2-HIGH'
> then 1
> else 0
> end) as high_line_count,
> sum(case
> when o_orderpriority <> '1-URGENT'
> and o_orderpriority <> '2-HIGH'
> then 1
> else 0
> end) as low_line_count
> from
> orders,
> lineitem
> where
> o_orderkey = l_orderkey
> and l_shipmode in ('TRUCK', 'AIR')
> and l_commitdate < l_receiptdate
> and l_shipdate < l_commitdate
> and l_receiptdate >= date '1997-01-01'
> and l_receiptdate < date '1997-01-01' + interval '1' year
> group by
> l_shipmode
> order by
> l_shipmode
Complementing the question I'm using a server HP proliant Ml110-G9:
Processador: (1) Intel Xeon E5-1603v3 (2.8GHz/4-core/10MB/140W)
Memória RAM: 4GB DDR4
Disco Rígido: SATA 1TB 7.2K rpm LFF
More specifications
here:https://www.hpe.com/us/en/product-catalog/servers/proliant-servers/pip.specifications.hpe-proliant-ml110-gen9-server.7796454.html
154/5000
See Below parameters presents in postgresql.conf. You would indicate
which value for example: cpu_index_tuple_cost and other CPU_*, based
on this
Server.
#seq_page_cost = 1.0
#random_page_cost = 4.0
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.005
#cpu_operator_cost = 0.0025
shared_buffers = 1GB
effective_cache_size = 3GB
work_mem = 26214kB
maintenance_work_mem = 512MB
checkpoint_segments = 128
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
Best Regards
Neto Br
Hi,
So looking at the plans, essentially the only part that is different is
the scan node at the very bottom - in one case it's a sequential scan,
in the other case (the slow one) it's the bitmap index scan.
Essentially it's this:
-> Seq Scan on lineitem
(cost=0.00..2624738.17 ...)
(actual time=0.839..74391.087 ...)
vs. this:
-> Bitmap Heap Scan on lineitem
(cost=336295.10..1970056.39 ...)
(actual time=419620.817..509685.421 ...)
-> Bitmap Index Scan on idx_l_shipmodelineitem000
(cost=0.00..336227.49 ...)
(actual time=419437.172..419437.172 ...)
All the nodes are the same and perform about the same in both cases, so
you can ignore them. This difference it the the root cause you need to
investigate.
The question is why is the sequential scan so much faster than bitmap
index scan? Ideally, the bitmap heap scan should scan the index (in a
mostly sequential way), build a bitmap, and then read just the matching
part of the table (sequentially, by skipping some of the pages).
Now, there are a few reasons why this might not work that well.
Perhaps the table fits into RAM, but table + index does not. That would
make the sequential scan much faster than the index path. Not sure if
this is the case, as you haven't mentioned which TPC-H scale are you
testing, but you only have 4GB of RAM which if fairly low.
Another bit is prefetching - with sequential scans, the OS is able to
prefetch the next bit of data automatically (read-ahead). With bitmap
index scans that's not the case, producing a lot of individual
synchronous I/O requests. See if increasing effective_cache_size (from
default 1 to 16 or 32) helps.
Try generating the plans with EXPLAIN (ANALYZE, BUFFERS), that should
tell us more about how many blocks are found in shared buffers, etc.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services