Re: [HACKERS] please help on query
| От | Luis Alberto Amigo Navarro |
|---|---|
| Тема | Re: [HACKERS] please help on query |
| Дата | |
| Msg-id | 012101c22998$8bbe5670$cab990c1@atc.unican.es обсуждение исходный текст |
| Ответ на | Re: [HACKERS] please help on query (Manfred Koizar <mkoi-pg@aon.at>) |
| Ответы |
Re: [HACKERS] please help on query
|
| Список | pgsql-sql |
> The cost is now only 1141741215.35 compared to 2777810917708.17
> before; this is an improvement factor of more than 2000. So what's
> your problem? ;-)
>
> Servus
> Manfred
>
In fact planner is estimating incredibly badly, it took only 833msecs now
runs perfectly
I'm going to keep on asking about another query:
SELECTcustomer.name,customer.custkey,orders.orderkey,orders.orderdate,orders.totalprice,sum(lineitem.quantity)
FROMcustomer,orders,lineitem
WHEREexists( SELECT lineitem.orderkey FROM lineitem WHERE lineitem.orderkey=orders.orderkey GROUP BY
lineitem.orderkeyHAVING sum(lineitem.quantity)>300 )AND customer.custkey=orders.custkeyAND
orders.orderkey=lineitem.orderkey
GROUP BYcustomer.name,customer.custkey,orders.orderkey,orders.orderdate,orders.totalprice
ORDER BYorders.totalprice DESC,orders.orderdate;
NOTICE: QUERY PLAN:
Sort (cost=26923941.97..26923941.97 rows=300061 width=66) -> Aggregate (cost=26851634.86..26896644.05 rows=300061
width=66) -> Group (cost=26851634.86..26889142.52 rows=3000612 width=66) -> Sort
(cost=26851634.86..26851634.86rows=3000612
width=66) -> Hash Join (cost=26107574.81..26457309.10
rows=3000612 width=66) -> Seq Scan on lineitem (cost=0.00..222208.25
rows=6001225 width=8) -> Hash (cost=26105699.81..26105699.81
rows=750000 width=58) -> Hash Join (cost=7431.00..26105699.81
rows=750000 width=58) -> Seq Scan on orders
(cost=0.00..26083268.81 rows=750000 width=25) SubPlan
-> Aggregate
(cost=0.00..17.35 rows=1 width=8) -> Group
(cost=0.00..17.34 rows=5 width=8) -> Index Scan
using lineitem_pkey on lineitem (cost=0.00..17.33 rows=5 width=8) -> Hash
(cost=7056.00..7056.00
rows=150000 width=33) -> Seq Scan on customer
(cost=0.00..7056.00 rows=150000 width=33)
again:
orders 1500000 tuples
lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey
Customer 150000 tuples
select attname,n_distinct,correlation from pg_stats where
tablename='lineitem'; attname | n_distinct | correlation
---------------+------------+-------------orderkey | -0.199847 | 1partkey | 196448 |
0.0223377suppkey | 9658 | -0.00822751linenumber | 7 | 0.17274quantity | 50 |
0.0150153extendedprice| 25651 | -0.00790245discount | 11 | 0.103761tax | 9 |
0.0993771returnflag | 3 | 0.391434linestatus | 2 | 0.509791shipdate | 2440 |
0.0072777commitdate | 2497 | 0.00698162receiptdate | 2416 | 0.00726686shipinstruct | 4 |
0.241511shipmode | 7 | 0.138432comment | 275488 | 0.0188006
(16 rows)
select attname,n_distinct,correlation from pg_stats where
tablename='orders'; attname | n_distinct | correlation
---------------+------------+-------------orderkey | -1 | -0.999925custkey | 76309 |
0.00590596orderstatus | 3 | 0.451991totalprice | -1 | -0.00768806orderdate | 2431 |
-0.0211354orderpriority| 5 | 0.182489clerk | 1009 | 0.00546939shippriority | 1 |
1comment | -0.750125 | -0.0123887
Customer attname | n_distinct | correlation
------------+------------+-------------custkey | -1 | 1name | -1 |
1address | -1 | -0.00510274nationkey | 25 | 0.0170533phone | -1 | -0.0227816acctbal
| -0.83444 | -0.00220958mktsegment | 5 | 0.205013comment | -1 | 0.0327827
This query takes 12 minutes to run and returns about 50 customers.
lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very
restrictive
May someone help on improving performance?
Again thanks in advance
Regards
В списке pgsql-sql по дате отправления: