Обсуждение: excessive performance difference
We're running a query on psql7.2b3, in which planner is unable to find
correct way to solve.
This is only informative message
If we run query19.old it takes more than 7 hours with this plan
Aggregate (cost=17310310.04..17310310.04 rows=1 width=116)
-> Nested Loop (cost=0.00..17310310.00 rows=15 width=116)
-> Seq Scan on part (cost=0.00..140.00 rows=2000 width=36)
-> Seq Scan on lineitem (cost=0.00..2325.79 rows=60279
width=80)
If we use query19.sql it takes 6 secs and plan is
Nested Loop (cost=9770.00..9770.02 rows=1 width=96) (actual
time=6538.94..6539.09 rows=1 loops=1)
-> Nested Loop (cost=6513.40..6513.41 rows=1 width=64) (actual
time=4472.32..4472.41 rows=1 loops=1)
-> Subquery Scan resultado (cost=3256.82..3256.82 rows=1
width=30) (actual time=2283.08..2283.11 rows=1 loops=1)
-> Aggregate (cost=3256.82..3256.82 rows=1 width=30)
(actual time=2283.03..2283.04 rows=1 loops=1)
-> Hash Join (cost=175.00..3256.81 rows=1
width=30) (actual time=688.18..2282.70 rows=1 loops=1)
-> Seq Scan on lineitem (cost=0.00..3079.28
rows=506 width=26) (actual time=6.29..2217.74 rows=492 loops=1)
-> Hash (cost=175.00..175.00 rows=1 width=4)
(actual time=55.20..55.20 rows=0 loops=1)
-> Seq Scan on part (cost=0.00..175.00
rows=1 width=4) (actual time=42.12..55.11 rows=2 loops=1)
-> Subquery Scan resultado2 (cost=3256.59..3256.59 rows=1
width=30) (actual time=2189.15..2189.18 rows=1 loops=1)
-> Aggregate (cost=3256.59..3256.59 rows=1 width=30)
(actual time=2189.10..2189.11 rows=1 loops=1)
-> Hash Join (cost=175.00..3256.58 rows=1
width=30) (actual time=2188.99..2188.99 rows=0 loops=1)
-> Seq Scan on lineitem (cost=0.00..3079.28
rows=460 width=26) (actual time=2.53..2124.05 rows=519 loops=1)
-> Hash (cost=175.00..175.00 rows=1 width=4)
(actual time=54.97..54.97 rows=0 loops=1)
-> Seq Scan on part (cost=0.00..175.00
rows=1 width=4) (actual time=21.05..54.85 rows=3 loops=1)
-> Subquery Scan resultado3 (cost=3256.59..3256.59 rows=1 width=30)
(actual time=2066.41..2066.44 rows=1 loops=1)
-> Aggregate (cost=3256.59..3256.59 rows=1 width=30) (actual
time=2066.36..2066.37 rows=1 loops=1)
-> Hash Join (cost=175.01..3256.59 rows=1 width=30)
(actual time=2066.24..2066.24 rows=0 loops=1)
-> Seq Scan on lineitem (cost=0.00..3079.28
rows=460 width=26) (actual time=2.25..2001.36 rows=483 loops=1)
-> Hash (cost=175.00..175.00 rows=2 width=4)
(actual time=55.48..55.48 rows=0 loops=1)
-> Seq Scan on part (cost=0.00..175.00
rows=2 width=4) (actual time=17.21..55.34 rows=3 loops=1)
explain
SELECT
sum(lineitem.extendedprice*(1-lineitem.discount))AS revenue
FROM
part,
lineitem
WHERE
(
part.partkey=lineitem.partkey
AND part.brand='Brand#12'
AND part.container IN('SM CASE','SM BOX','SM PACK','SM PKG')
AND lineitem.quantity>=1 AND lineitem.quantity<=1+10
AND part.size BETWEEN 1 AND 5
AND lineitem.shipmode IN ('AIR','AIR REG')
AND lineitem.shipinstruct='DELIVER IN PERSON'
)
OR
(
part.partkey=lineitem.partkey
AND part.brand='Brand#23'
AND part.container IN('MED BAG','MED BOX','MED PACK','MED PKG')
AND lineitem.quantity>=10 AND lineitem.quantity<=10+10
AND part.size BETWEEN 1 AND 10
AND lineitem.shipmode IN ('AIR','AIR REG')
AND lineitem.shipinstruct='DELIVER IN PERSON'
)
OR
(
part.partkey=lineitem.partkey
AND part.brand='Brand#34'
AND part.container IN('LG CASE','LG BOX','LG PACK','LG PKG')
AND lineitem.quantity>=20 AND lineitem.quantity<=20+10
AND part.size BETWEEN 1 AND 15
AND lineitem.shipmode IN ('AIR','AIR REG')
AND lineitem.shipinstruct='DELIVER IN PERSON'
);
SELECT
(suma+suma1+suma2)AS revenue
FROM
(SELECT
CASE
WHEN sum(extendedprice*(1-discount))>0 THEN sum(extendedprice*(1-discount))
ELSE 0
END AS suma
FROM
part,
lineitem
WHERE
(
part.partkey=lineitem.partkey
AND part.brand='Brand#12'
AND part.container IN('SM CASE','SM BOX','SM PACK','SM PKG')
AND lineitem.quantity>=1 AND lineitem.quantity<=1+10
AND part.size BETWEEN 1 AND 5
AND lineitem.shipmode IN ('AIR','AIR REG')
AND lineitem.shipinstruct='DELIVER IN PERSON'
))AS resultado,
(SELECT
CASE
WHEN sum(extendedprice*(1-discount))>0 THEN sum(extendedprice*(1-discount))
ELSE 0
END AS suma1
FROM
part,
lineitem
WHERE
(
part.partkey=lineitem.partkey
AND part.brand='Brand#23'
AND part.container IN('MED BAG','MED BOX','MED PACK','MED PKG')
AND lineitem.quantity>=10 AND lineitem.quantity<=20
AND part.size BETWEEN 1 AND 10
AND lineitem.shipmode IN ('AIR','AIR REG')
AND lineitem.shipinstruct='DELIVER IN PERSON'
))AS resultado2,
(SELECT
CASE
WHEN sum(extendedprice*(1-discount))>0 THEN sum(extendedprice*(1-discount))
ELSE 0
END AS suma2
FROM
part,
lineitem
WHERE
(
part.partkey=lineitem.partkey
AND part.brand='Brand#34'
AND part.container IN('LG CASE','LG BOX','LG PACK','LG PKG')
AND lineitem.quantity>=20 AND lineitem.quantity<=30
AND part.size BETWEEN 1 AND 15
AND lineitem.shipmode IN ('AIR','AIR REG')
AND lineitem.shipinstruct='DELIVER IN PERSON'
))AS resultado3;
On Thu, 13 Dec 2001 13:35:16 +0100
Luis Amigo <lamigo@atc.unican.es> wrote:
> We're running a query on psql7.2b3, in which planner is unable to find
> correct way to solve.
> This is only informative message
> If we run query19.old it takes more than 7 hours with this plan
>
> Aggregate (cost=17310310.04..17310310.04 rows=1 width=116)
> -> Nested Loop (cost=0.00..17310310.00 rows=15 width=116)
> -> Seq Scan on part (cost=0.00..140.00 rows=2000 width=36)
> -> Seq Scan on lineitem (cost=0.00..2325.79 rows=60279
> width=80)
>
> If we use query19.sql it takes 6 secs and plan is
>
> Nested Loop (cost=9770.00..9770.02 rows=1 width=96) (actual
> time=6538.94..6539.09 rows=1 loops=1)
I found that the query, query19.old, has no hints to run the planner
effectively. I would think you need to change a bit like this:
explain
SELECT
sum(lineitem.extendedprice*(1-lineitem.discount))AS revenue
FROM
part,
lineitem
WHERE
part.partkey=lineitem.partkey -- change
AND
((
part.brand='Brand#12'
AND part.container IN('SM CASE','SM BOX','SM PACK','SM PKG')
AND lineitem.quantity>=1 AND lineitem.quantity<=1+10
AND part.size BETWEEN 1 AND 5
AND lineitem.shipmode IN ('AIR','AIR REG')
AND lineitem.shipinstruct='DELIVER IN PERSON'
)
OR
(
part.brand='Brand#23'
AND part.container IN('MED BAG','MED BOX','MED PACK','MED PKG')
AND lineitem.quantity>=10 AND lineitem.quantity<=10+10
AND part.size BETWEEN 1 AND 10
AND lineitem.shipmode IN ('AIR','AIR REG')
AND lineitem.shipinstruct='DELIVER IN PERSON'
)
OR
(
part.brand='Brand#34'
AND part.container IN('LG CASE','LG BOX','LG PACK','LG PKG')
AND lineitem.quantity>=20 AND lineitem.quantity<=20+10
AND part.size BETWEEN 1 AND 15
AND lineitem.shipmode IN ('AIR','AIR REG')
AND lineitem.shipinstruct='DELIVER IN PERSON'
));
NOTICE: QUERY PLAN:
Aggregate (cost=344.66..344.66 rows=1 width=72)
-> Merge Join (cost=139.66..344.66 rows=1 width=72)
-> Sort (cost=69.83..69.83 rows=1000 width=32)
-> Seq Scan on part (cost=0.00..20.00 rows=1000 width=32)
-> Sort (cost=69.83..69.83 rows=1000 width=40)
-> Seq Scan on lineitem (cost=0.00..20.00 rows=1000 width=40)
AFAICT, you probably might use indices to remove the two sorts
from the previous result of the query plan.
create index idx_part_partkey on part (partkey);
create index idx_lineitem_partkey on lineitem (partkey);
NOTICE: QUERY PLAN:
Aggregate (cost=323.01..323.01 rows=1 width=72)
-> Merge Join (cost=0.00..323.01 rows=1 width=72)
-> Index Scan using idx_part_partkey on part
(cost=0.00..59.00 rows=1000 width=32)
-> Index Scan using idx_lineitem_partkey on lineitem
(cost=0.00..59.00 rows=1000 width=40)
Regards,
Masaru Sugawara
Masaru Sugawara wrote:
>
> I found that the query, query19.old, has no hints to run the planner
> effectively. I would think you need to change a bit like this:
Thank you for your answer Masaru, the only thing I wanted to communicate is that
query19.old works perfectly with pay bases, and someone could get stucked with the
same problem. What you are getting outside is same condition that is inside of
all, partkey is primary key, so it is not neccesary to create index.
query19.sql gives better performance than yours, it is pseudo intra-paralellism,
if anyone is interested here are the results
tpch=# \e /disco3/lamigo/tpch/consultas/query19.old
NOTICE: QUERY PLAN:
Aggregate (cost=6073.03..6073.03 rows=1 width=116) (actual time=6865.81..6865.81
rows=1 loops=1)
-> Hash Join (cost=145.00..6073.03 rows=1 width=116) (actual
time=2057.06..6865.58 rows=1 loops=1)
-> Seq Scan on lineitem (cost=0.00..2326.05 rows=60305 width=80) (actual
time=0.38..3244.77 rows=60279 loops=1)
-> Hash (cost=140.00..140.00 rows=2000 width=36) (actual
time=132.55..132.55 rows=0 loops=1)
-> Seq Scan on part (cost=0.00..140.00 rows=2000 width=36) (actual
time=13.66..98.65 rows=2000 loops=1)
Total runtime: 6867.08 msec
EXPLAIN
tpch=# \e /disco3/lamigo/tpch/consultas/query19.sql
NOTICE: QUERY PLAN:
Nested Loop (cost=9771.17..9771.19 rows=1 width=96) (actual time=6539.58..6539.74
rows=1 loops=1)
-> Nested Loop (cost=6513.97..6513.98 rows=1 width=64) (actual
time=4254.88..4254.98 rows=1 loops=1)
-> Subquery Scan resultado2 (cost=3257.16..3257.16 rows=1 width=30)
(actual time=2185.23..2185.27 rows=1 loops=1)
-> Aggregate (cost=3257.16..3257.16 rows=1 width=30) (actual
time=2185.19..2185.20 rows=1 loops=1)
-> Hash Join (cost=175.00..3257.16 rows=1 width=30) (actual
time=2185.07..2185.07 rows=0 loops=1)
-> Seq Scan on lineitem (cost=0.00..3079.86 rows=458
width=26) (actual time=2.50..2119.31 rows=519 loops=1)
-> Hash (cost=175.00..175.00 rows=1 width=4) (actual
time=55.40..55.40 rows=0 loops=1)
-> Seq Scan on part (cost=0.00..175.00 rows=1
width=4) (actual time=21.11..55.27 rows=3 loops=1)
-> Subquery Scan resultado3 (cost=3256.81..3256.81 rows=1 width=30)
(actual time=2069.55..2069.59 rows=1 loops=1)
-> Aggregate (cost=3256.81..3256.81 rows=1 width=30) (actual
time=2069.51..2069.51 rows=1 loops=1)
-> Hash Join (cost=175.01..3256.81 rows=1 width=30) (actual
time=2069.39..2069.39 rows=0 loops=1)
-> Seq Scan on lineitem (cost=0.00..3079.86 rows=387
width=26) (actual time=2.24..2004.59 rows=483 loops=1)
-> Hash (cost=175.00..175.00 rows=2 width=4) (actual
time=55.00..55.00 rows=0 loops=1)
-> Seq Scan on part (cost=0.00..175.00 rows=2
width=4) (actual time=17.20..54.87 rows=3 loops=1)
-> Subquery Scan resultado (cost=3257.19..3257.19 rows=1 width=30) (actual
time=2284.52..2284.55 rows=1 loops=1)
-> Aggregate (cost=3257.19..3257.19 rows=1 width=30) (actual
time=2284.47..2284.48 rows=1 loops=1)
-> Hash Join (cost=175.00..3257.19 rows=1 width=30) (actual
time=688.98..2284.13 rows=1 loops=1)
-> Seq Scan on lineitem (cost=0.00..3079.86 rows=464
width=26) (actual time=6.31..2219.45 rows=492 loops=1)
-> Hash (cost=175.00..175.00 rows=1 width=4) (actual
time=54.99..54.99 rows=0 loops=1)
-> Seq Scan on part (cost=0.00..175.00 rows=1 width=4)
(actual time=42.02..54.89 rows=2 loops=1)
Total runtime: 6541.95 msec
EXPLAIN
tpch=# create index index_partkey on lineitem (partkey int4_ops);
CREATE
tpch=# analyze;
ANALYZE
tpch=# \e /disco3/lamigo/tpch/consultas/query19.old
ANALYZE
tpch=# \i /disco3/lamigo/tpch/consultas/query19.old
psql:/disco3/lamigo/tpch/consultas/query19.old:35: NOTICE: QUERY PLAN:
Aggregate (cost=6183.50..6183.50 rows=1 width=116) (actual time=6816.22..6816.22
rows=1 loops=1)
-> Hash Join (cost=145.00..6183.50 rows=1 width=116) (actual
time=2013.52..6815.98 rows=1 loops=1)
-> Seq Scan on lineitem (cost=0.00..2326.05 rows=60305 width=80) (actual
time=0.38..3220.05 rows=60279 loops=1)
-> Hash (cost=140.00..140.00 rows=2000 width=36) (actual
time=108.44..108.44 rows=0 loops=1)
-> Seq Scan on part (cost=0.00..140.00 rows=2000 width=36) (actual
time=1.02..76.31 rows=2000 loops=1)
Total runtime: 6817.15 msec
Thank you and regards.
Luis Amigo
Universidad de Cantabria