Обсуждение: problem with subqueries
Hi
Any help appreciated - I have spent 2 days trying to get this query to
work!
I have an orders database and a customer database.
The orders database has a date field for each order.
Because I want to obtain a monthly breakdown, I created a view called
monthcustomer as this select:
select orders.ord_date, customer.cname,date_part('month',orders.ord_date) AS "month", date_part('year',orders.ord_date)
AS"year", orders.number_of_items;
Each month will have multiple numbers of items, so to get a monthly
breakdown I tried this:
select distinct year,month, (select sum(monthcustomer.number_of_items) from monthcustomer where
monthcustomer.year=m.yearand monthcustomer.month=m.month) as NumPotsfrom monthcustomer m;
This goes off and never comes back - CPU is hitting the top of the
chart! I have to ^C to interrupt it, as it runs for ages (I've left
this run for 10-20 minutes and it is still running).
I have indexes on the columns involved from the original tables.
Any help appreciated.
Pete
PS: Using pgsql 7.2-70 from Suse distribution.
pete@phillipsfamily.freeserve.co.uk writes:
> select distinct year,month,
> (select sum(monthcustomer.number_of_items) from monthcustomer where
> monthcustomer.year=m.year and monthcustomer.month=m.month) as NumPots
> from monthcustomer m;
> This goes off and never comes back -
No surprise, considering the sub-select is going to be evaluated
separately for every row of monthcustomer --- and then most of those
evaluations will be thrown away by the DISTINCT :-(
A straightforward way of reducing the redundant computations would be
to do the DISTINCT first:
select year,month, (select sum(monthcustomer.number_of_items) from monthcustomer where monthcustomer.year=m.year and
monthcustomer.month=m.month)as NumPotsfrom (select distinct year, month from monthcustomer) as m;
But it appears to me that you are reinventing the wheel. Isn't this
query the equivalent of a grouped aggregation --- viz,
select year, month, sum(number_of_items) as NumPotsfrom monthcustomergroup by year, month
regards, tom lane
On Sat, 5 Oct 2002 pete@phillipsfamily.freeserve.co.uk wrote:
> Because I want to obtain a monthly breakdown, I created a view called
> monthcustomer as this select:
>
> select orders.ord_date, customer.cname,
> date_part('month',orders.ord_date) AS "month",
> date_part('year',orders.ord_date) AS "year",
> orders.number_of_items;
>
> Each month will have multiple numbers of items, so to get a monthly
> breakdown I tried this:
>
> select distinct year,month,
> (select sum(monthcustomer.number_of_items) from monthcustomer where
> monthcustomer.year=m.year and monthcustomer.month=m.month) as NumPots
> from monthcustomer m;
>
> This goes off and never comes back - CPU is hitting the top of the
> chart! I have to ^C to interrupt it, as it runs for ages (I've left
That's going to run that inner select once for every row in monthcustomer
probably.
Would
select year, month, sum(monthcustomer.number_of_items) as NumPots frmo
monthcustomer m group by year, month;
have the same effect, get the sum of the items for each year/month
combination along with which year and month?
Hi guys. Thanks for the rapid replies so far. To answer some of the questions: >you did not indicate an explicit join - or even a "from" clause for that >matter- in the example of your create view statement. My original post was a simplified version. Here is the actual view creating statement: create view monthord as select ord_date, extract (month from ord_date)as month, extract (year from ord_date) as year,r_region,number_of_itemsfrom orders,customer where ccode = codenum; >But it appears to me that you are reinventing the wheel. Isn't this >query the equivalent of a grouped aggregation Yes - but again I was simplifying - I want to run a sub query for each region, so I get output like this: year month Reg1 Reg2 Reg3 Reg4 ----- ----- ---- ---- ----- ---- 1999 Jan 20 45 10 27 1999 Feb 30 43 18 37 ... 2002 Oct 7 89 60 17 The subquery I have tried to run is actually this (there is probably a way to do this all in SQL, but at present I would like to just understand why my subqueries take so long). -----start of script --------------- #!/bin/sh # prepare a query as shell variables # set up timeframe SDATE=`date --date '2 years ago' +'%Y-%m-01'` EDATE=`date --date 'next month' +'%Y-%m-01'` # use the QUERY env variable to build the full query QUERY="select distinct year,month, " for reg in `psql -U postgres -d product_db -c "select distinct r_code from regionlist order by r_code;"` do QUERY="$QUERY (select sum(monthcustomer.number_of_items) from monthcustomer where monthcustomer.year=m.year and monthcustomer.month=m.monthand r_region like '$reg') as $reg," done # now add a total column QUERY="$QUERY (select sum(monthcustomer.number_of_items) from monthcustomer where ord_date >= '$SDATE' and ord_date < '$EDATE'and monthcustomer.year=m.year and monthcustomer.month=m.month) as TOTAL from monthcustomer m;" # execute the query psql -U postgres -d newmaggot -c "$QUERY" -----end of script --------------- As you can see, I get all the regions, and loop through them building up a bunch of subqueries as I go along. Hopefully this will answer some of your questions. I will try some of the suggestions later on today (such as placing the distinct clause at the end) but first I have to go and get some decorating done :-( (yes, that sound you can hear is the crack of my wife's whip!). Regards, Pete -- I do whatever the Voices tell me to do
On Sun, 6 Oct 2002 pete@phillipsfamily.freeserve.co.uk wrote: > Hi guys. Thanks for the rapid replies so far. > > To answer some of the questions: > > >you did not indicate an explicit join - or even a "from" clause for that > >matter- in the example of your create view statement. > > My original post was a simplified version. Here is the actual view > creating statement: > > create view monthord as select ord_date, extract (month from ord_date) > as month, extract (year from ord_date) as year,r_region, > number_of_items from orders,customer where ccode = codenum; > > > >But it appears to me that you are reinventing the wheel. Isn't this > >query the equivalent of a grouped aggregation > > Yes - but again I was simplifying - I want to run a sub query for each > region, so I get output like this: > > year month Reg1 Reg2 Reg3 Reg4 > ----- ----- ---- ---- ----- ---- > 1999 Jan 20 45 10 27 > 1999 Feb 30 43 18 37 > ... > 2002 Oct 7 89 60 17 > > The subquery I have tried to run is actually this (there is probably a > way to do this all in SQL, but at present I would like to just > understand why my subqueries take so long). Well, you're running <n> subqueries for each row in monthcustomer because the distinct happens afterwards in your query. So if you've got 4 regions and 1 total and 100,000 rows in monthcustomer, you're looking at something on the order of 500,000 subqueries. Doing the distinct before that step should lower the number to ((#year/month combinations) * (#regions+1)). In any case, you may be better off with one of: a) Doing something programatic to turn a result set like:year|month|region|value1999|Jan |1 |201999|Jan |2 |45... into the form you want. The above can be gotten by group by probably and would require no subqueries. b) Keeping a summary table that you update via triggers. Thisrequires a bit of work to get the triggers, but it probablymakesthe query faster.