Re: slow group by query

Поиск
Список
Период
Сортировка
От Ellen Cyran
Тема Re: slow group by query
Дата
Msg-id 5.1.1.6.0.20021119124548.01afcb88@wolf.csuohio.edu
обсуждение исходный текст
Ответ на Re: slow group by query  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql
That's a whole lot faster. The query on 40 msa_codes that took
7 minutes, now only takes 10 seconds.
Thanks a lot.


At 11:54 AM 11/19/2002 -0500, Tom Lane wrote:
>Ellen Cyran <ellen@urban.csuohio.edu> writes:
> > Here is the explain analyze:
>
> > Group  (cost=637.18..696.52 rows=593 width=22) (actual 
> time=982.67..67581.85 rows=435 loops=1)
> >    ->  Sort  (cost=637.18..637.18 rows=5934 width=22) (actual 
> time=833.27..844.78 rows=6571 loops=1)
>
>Well, we don't have to read any further than that to see that all the
>time is being spent in the final Group step --- and since grouping 6500
>rows is surely not taking very long, the true cost must be in the
>evaluation of the SELECT's output targetlist (which the estimated costs
>ignore, since the planner can't do much to change it).  In other words,
>what's hurting you are those subselects in the SELECT list.  You're
>doing two thousand separate subselects (435*5) --- evidently they take
>about 30 msec apiece, which isn't that bad by itself, but it adds up.
>
>What you've basically got here is what Joe Celko calls a "crosstab
>by subqueries" (_SQL For Smarties_, section 23.6.3).  You might want
>to buy his book, which shows several other ways to do crosstab queries,
>but the one that seems most directly useful is to GROUP BY and use
>aggregate functions to extract the values you want in each crosstab
>column:
>
>SELECT msa_code, sic, own, ind_div,
>max(case when year = '1975' then emp else null end) as emp1975,
>max(case when year = '1976' then emp else null end) as emp1976,
>max(case when year = '1977' then emp else null end) as emp1977,
>max(case when year = '1978' then emp else null end) as emp1978,
>max(case when year = '1990' then emp else null end) as emp1990
>FROM tbl_bls_msa
>WHERE msa_code in ('1680','1640','0040','0120','0080')
>GROUP BY msa_code, sic, ind_div, own ;
>
>If I understand your data schema correctly, only one row in a
>msa_code, sic, ind_div, own group will have a particular year
>value, so the case/max structure extracts that value, or gives
>NULL if there's no such row.  (MIN would have worked as well;
>with a numeric field you could use SUM too.)
>
>                         regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster



В списке pgsql-sql по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: Re: Proposal of hierarchical queries, a la Oracle
Следующее
От: Chris Gamache
Дата:
Сообщение: Using VIEW to simplify code...