query optimization help

Поиск
Список
Период
Сортировка
От sarlav kumar
Тема query optimization help
Дата
Msg-id 20050114143930.13466.qmail@web51307.mail.yahoo.com
обсуждение исходный текст
Список pgsql-novice
Hi All,
 
I have the following query to generate a report grouped by "states".
 
SELECT distinct upper(cd.state) as mystate, SUM(d.amount) as total_amount, SUM(COALESCE(d.fee,0) + COALESCE(mp.seller_fee, 0) + COALESCE(mp.buyer_fee,0)) as total_fee FROM data d left JOIN customerdata cd ON d.uid = cd.uid LEFT JOIN merchant_purchase mp ON d.id = mp.data_id WHERE d.what IN (26,0, 15) AND d.flags IN (1,9,10,12 ) AND lower(cd.country) = 'us' AND date_part('year',d.time)= 2004 GROUP BY myst
ate ORDER BY mystate;

 mystate | total_amount | total_fee
---------+--------------+-----------
         |         3695 |         0
 AR      |         3000 |         0
 AZ      |         1399 |         0
 CA      |       113100 |      6242
 FL      |       121191 |      9796
 GA      |     34826876 |    478888
 GEORGIA |        57990 |   &nbs p;  3500
 IEIE    |       114000 |      4849
 MD      |        20000 |      1158
 MI      |       906447 |         0
 NY      |         8000 |       600
 PA      |         6200 |       375
 SC      |        25000 |       600
 TN      |      1443681 |      1124
         |        13300 |         0
(15 rows)
If you notice, my problem in this query is that the records for GA, GEORGIA appear separately. But what I want to do is  to have them combined to a single entry with their values summed up . Initially we had accepted both formats as input for the state field. Also, there are some invalid entries for the state field (like the "IEIE" and null values), which appear because the input for state was not validated initially. These entries have to be eliminated from the report.This query did not take a long time to complete, but did not meet the needs for the report.
 
So, the query was rewritten to the following query which takes nearly 7-8 mins to complete on our test database:
 
SELECT (SELECT DISTINCT pc.state FROM postalcode pc WHERE UPPER(cd.state) IN (pc.state, pc.state_code)) as mystate, SUM(d.amount) as total_amount, SUM(COALESCE(d.fee,0) + COALESCE(mp.seller_fee, 0) + COALESCE(mp.buyer_fee,0)) as total_fee FROM data d JOIN customerdata cd ON d.uid = cd.uid LEFT JOIN merchant_purchase mp ON d.id = mp.data_id WHERE d.what IN (26,0, 15) AND d.flags IN (1,9,10,12 ) AND lower(cd.country) = 'us' AND date_part('year', d.time) = 2004 GROUP BY mystate ORDER BY mystate;
    mystate     | total_amount | total_fee
----------------+--------------+-----------
 ARIZONA        |         1399 |         0
 ARKANSAS       |         3000 |         0
 CALIFORNIA     |       113100 |      6242
 FLORIDA        |       121191 |      9796
 GEORGIA        |     34884866 |    482388
 MARYLAND       |        20000 |      1158
 MICHIGAN       |       906447 |         0
 NEW YORK       |         8000 |       600
 PENNSYLVANIA   |         6200 |       375
 SOUTH CAROLINA |        25000 |       600
 TENNESSEE      |      1443681 |      1124
                |       130995 |      4849
 
Here is the explain analyze of this query:
  QUERY PLAN                                                                                                     
----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1226.57..1226.58 rows=1 width=38) (actual time=362355.58..362372.09 rows=12 loops=1)
   ->  Group  (cost=1226.57..1226.57 rows=1 width=38) (actual time=362355.54..362367.73 rows=2197 loops=1)
         ->  Sort  (cost=1226.57..1226.57 rows=1 width=38) (actual time=362355.53..362356.96 rows=2197 loops=1)
               Sort Key: (subplan)
               ->  Nested Loop  (cost=0.00..1226.56 rows=1 width=38) (actual time=166.11..362321.46 rows=2197 loops=1)
                     ->  Nested Loop  (cost=0.00..1220.53 rows=1 width=26) (actual time=1.68..361.32 rows=2115 loops=1)
                           ->  Seq Scan on customerdata cd  (cost=0.00..274.32 ro ws=31 width=10) (actual time=0.04..29.87 rows=3303 loops=1)
                                 Filter: (lower((country)::text) = 'us'::text)
                           ->  Index Scan using data_uid_idx on data d  (cost=0.00..30.08 rows=1 width=16) (actual time=0.04..0.09 rows=1 loops=3303)
                                 Index Cond: (d.uid = "outer".uid)
                                 Filter: (((what = 26) OR (what = 0) OR (what = 15)) AND ((flags = 1) OR (flags = 9) OR (flags = 10) OR (flags = 12)) AND (date_part('year'::text, "time") = 2004::double precision))
                     ->  Index Scan using merchant_purchase_data_idx on merchant_purchase mp  (cost=0.00..6.01 rows=1 width=12) (actual time=0.05..0.05 rows=1 loops=2115)
                           Index Cond: ("outer".id = mp.data_id)
                     SubPlan
                       ->  Unique  (cost=2237.12..2243.22 rows=122 width=13) (actual time=161.25..164.68 rows=1 loops=2197)
                             ->  Sort  (cost=2237.12..2240.17 rows=1220 width=13) (actual time=161.21..161.88 rows=1033 loops=2197)
                                   Sort Key: state
                                   ->  Seq Scan on postalcode pc  (cost=0.00..2174.56 rows=1220 width=13) (actual time=35.79..148.33 rows=1033 loops=2197)
                                         Filter: ((upper(($0)::text) = (state)::text) OR (upper(($0)::text) = (state_code)::text))
 Total runtime: 362372.57 msec
 
 
The postalcode table is used in the query to validate the states and to combine the entries like GA and GEORGIA.
\d postalcode
                                    Table "public.postalcode"
   Column   |         Type          |                         Modifiers                         
------------+-----------------------+------------------------------------------------------------
 id         | integer               | not null default nextval('public.postalcode_id_seq'::text)
 country    | character(2)          |
 state      | character varying(30) |
 zipcode    | character varying(20) |
 city       | character varying(50) |
 city_alias | character varying(20) |
 state_code | character varying(2)  |
Indexes: postalcode_country_key unique btree (country, state_code, zipcode),
         postalcode_state_code_idx btree (state_code),
         postalcode_state_idx btree (state)
 
The postalcode table has 70328 rows!
 
Can some one please help me optimize this query?
 
Thanks,
Saranya


Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.

В списке pgsql-novice по дате отправления:

Предыдущее
От: John DeSoi
Дата:
Сообщение: Re: postgres and pgamind3 for windows
Следующее
От: sarlav kumar
Дата:
Сообщение: Re: [PERFORM] query optimization help