Re: query optimization question

Поиск
Список
Период
Сортировка
От terry@greatgulfhomes.com
Тема Re: query optimization question
Дата
Msg-id 002401c2859c$589a6120$2766f30a@development.greatgulfhomes.com
обсуждение исходный текст
Ответ на query optimization question  (<terry@ashtonwoodshomes.com>)
Список pgsql-sql
No offence taken, however it is incorrect, my SQL is pretty good.  I
received no other responses...  And I later realized the solution to my
question:

(EXPERTS READ ON: If anyone can show me how to use a group by or otherwise
optimize I would be grateful)

This subquery:    SELECT  project_id, marketing_name,            (SELECT count(lots.lot_id) AS lot_count
FROMdeficiency_table AS dt, lots, deficiency_status AS ds             WHERE dt.lot_id = lots.lot_id                AND
lots.division_id= proj.division_id                AND lots.project_id = proj.project_id                AND
dt.deficiency_status_id= ds.deficiency_status_id                AND ds.is_outstanding
#PreserveSingleQuotes(variables.base_query)#           ) AS def_count,
 
Actually does return a deficiency count, where there could be more then 1
deficiency per lot.  In order to get my lot_count, (number of lots with 1 or
more deficiencies) I just needed to add a DISTINCT clause in my count()
aggregate, ie  SELECT count(DISTINCT lots.lot_id)...  I forgot one could do
that:                (SELECT count(DISTINCT lots.lot_id) AS lot_count                 FROM deficiency_table AS dt,
lots,deficiency_status AS ds                 WHERE dt.lot_id = lots.lot_id                    AND lots.division_id =
proj.division_id                   AND lots.project_id = proj.project_id                    AND dt.days_old_start_date
>=#CreateODBCDate(DateAdd("d", -
 
int(ListLast(variables.aging_breakdown_list, ",")), now() ))#                    AND dt.deficiency_status_id =
ds.deficiency_status_id                   AND ds.is_outstanding
#PreserveSingleQuotes(variables.base_query)#               ) AS
lot_count_greater_#ListLast(variables.aging_breakdown_list,",")#,
 
Note the #PreserveSingleQuotes(variables.base_query)# is dynamic code that
further selects deficiencies by various criteria, eg just for a particular
supplier.

This query is actually dynamic, if all I had to do was the above 2 clauses
then I most certainly COULD do a group by.

However, for the total deficiencies I am then splitting up the total into
aging groups, eg <30, 30-60, 60-90, and >90 days old.  The query for that
looks like the below.  But before I paste it in, I would like to optimize
it, if I could do so with a group by clause I most certainly would, but I
don't see how I can BECAUSE OF THE AGING BREAKDOWN:
SELECT  project_id, marketing_name,    (SELECT count(lots.lot_id) AS lot_count     FROM deficiency_table AS dt, lots,
deficiency_statusAS ds     WHERE dt.lot_id = lots.lot_id        AND lots.division_id = proj.division_id        AND
lots.project_id= proj.project_id        AND dt.deficiency_status_id = ds.deficiency_status_id        AND
ds.is_outstanding       AND dt.assigned_supplier_id = '101690'    ) AS def_count,
 
        (SELECT count(lots.lot_id) AS lot_count         FROM deficiency_table AS dt, lots, deficiency_status AS ds
  WHERE dt.lot_id = lots.lot_id            AND lots.division_id = proj.division_id            AND lots.project_id =
proj.project_id
            AND dt.days_old_start_date < {d '2002-10-07'}            AND dt.deficiency_status_id =
ds.deficiency_status_id           AND ds.is_outstanding        AND dt.assigned_supplier_id = '101690'        ) AS
def_count_less_30,
            (SELECT count(lots.lot_id) AS lot_count             FROM deficiency_table AS dt, lots, deficiency_status AS
ds            WHERE dt.lot_id = lots.lot_id                AND lots.division_id = proj.division_id                AND
lots.project_id= proj.project_id
 
                AND dt.days_old_start_date >= {d '2002-10-07'}                AND dt.days_old_start_date < {d
'2002-09-07'}               AND dt.deficiency_status_id = ds.deficiency_status_id                AND ds.is_outstanding
     AND dt.assigned_supplier_id = '101690'            ) AS def_count_30_60,
 
            (SELECT count(lots.lot_id) AS lot_count             FROM deficiency_table AS dt, lots, deficiency_status AS
ds            WHERE dt.lot_id = lots.lot_id                AND lots.division_id = proj.division_id                AND
lots.project_id= proj.project_id
 
                AND dt.days_old_start_date >= {d '2002-09-07'}                AND dt.days_old_start_date < {d
'2002-08-08'}               AND dt.deficiency_status_id = ds.deficiency_status_id                AND ds.is_outstanding
     AND dt.assigned_supplier_id = '101690'            ) AS def_count_60_90,
 

        (SELECT count(lots.lot_id) AS lot_count         FROM deficiency_table AS dt, lots, deficiency_status AS ds
  WHERE dt.lot_id = lots.lot_id            AND lots.division_id = proj.division_id            AND lots.project_id =
proj.project_id
            AND dt.days_old_start_date >= {d '2002-08-08'}            AND dt.deficiency_status_id =
ds.deficiency_status_id           AND ds.is_outstanding        AND dt.assigned_supplier_id = '101690'        ) AS
def_count_greater_90,
        (SELECT count(DISTINCT lots.lot_id) AS lot_count         FROM deficiency_table AS dt, lots, deficiency_status
ASds         WHERE dt.lot_id = lots.lot_id            AND lots.division_id = proj.division_id            AND
lots.project_id= proj.project_id
 
            AND dt.days_old_start_date < {d '2002-10-07'}            AND dt.deficiency_status_id =
ds.deficiency_status_id           AND ds.is_outstanding        AND dt.assigned_supplier_id = '101690'        ) AS
lot_count_less_30,
            (SELECT count(DISTINCT lots.lot_id) AS lot_count             FROM deficiency_table AS dt, lots,
deficiency_statusAS ds             WHERE dt.lot_id = lots.lot_id                AND lots.division_id = proj.division_id
              AND lots.project_id = proj.project_id
 
                AND dt.days_old_start_date >= {d '2002-10-07'}                AND dt.days_old_start_date < {d
'2002-09-07'}               AND dt.deficiency_status_id = ds.deficiency_status_id                AND ds.is_outstanding
     AND dt.assigned_supplier_id = '101690'            ) AS lot_count_30_60,
 
            (SELECT count(DISTINCT lots.lot_id) AS lot_count             FROM deficiency_table AS dt, lots,
deficiency_statusAS ds             WHERE dt.lot_id = lots.lot_id                AND lots.division_id = proj.division_id
              AND lots.project_id = proj.project_id
 
                AND dt.days_old_start_date >= {d '2002-09-07'}                AND dt.days_old_start_date < {d
'2002-08-08'}               AND dt.deficiency_status_id = ds.deficiency_status_id                AND ds.is_outstanding
     AND dt.assigned_supplier_id = '101690'            ) AS lot_count_60_90,
 

        (SELECT count(DISTINCT lots.lot_id) AS lot_count         FROM deficiency_table AS dt, lots, deficiency_status
ASds         WHERE dt.lot_id = lots.lot_id            AND lots.division_id = proj.division_id            AND
lots.project_id= proj.project_id
 
            AND dt.days_old_start_date >= {d '2002-08-08'}            AND dt.deficiency_status_id =
ds.deficiency_status_id           AND ds.is_outstanding        AND dt.assigned_supplier_id = '101690'        ) AS
lot_count_greater_90,
    (SELECT count(DISTINCT lots.lot_id) AS lot_count     FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHEREdt.lot_id = lots.lot_id        AND lots.division_id = proj.division_id        AND lots.project_id =
proj.project_id       AND dt.deficiency_status_id = ds.deficiency_status_id        AND ds.is_outstanding        AND
dt.assigned_supplier_id= '101690'    ) AS lot_count
 
FROM    projects AS proj
WHERE   proj.division_id = 'GGH'AND NOT EXISTS (SELECT 1 FROM menu_group_projects WHERE menu_code = 'WA'
AND division_id = proj.division_id AND project_id = proj.project_id AND
status = 'I')
ORDER BY proj.project_id

If anyone can see a way to do a group by to do this, then I will be happy to
hear about it, because currently the resultset has to do a separate
(sequential or index) scan of the deficiencies table.  The only way I can
see to do a group by would be to break out the aging categories into
separate queries, but that wins me nothing because each query then does its
own scan...

The expected simplified output of this query looks like this:
Project    <30     30-60     >=60    lot total    <30    30-60    >=60    def total
X    1    2    1    4    5    10    5    20    (if X had 4 lots, each of 5 deficiencies)
Y    1    1    0    2    3    3    0    6    (each has eg 3 deficiencies in project Y)

Terry Fielder
Network Engineer
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com



> -----Original Message-----
> From: ch@rodos.fzk.de [mailto:ch@rodos.fzk.de]
> Sent: Wednesday, November 06, 2002 4:54 AM
> To: terry@ashtonwoodshomes.com
> Subject: Re: [SQL] query optimization question
>
>
>
> Dear Terry,
> When I was reading the objective of your query, I expected at
> least one
> GROUP BY clause within. I do not intend to be offensive - not at all,
> but your query very much looks like you're lacking in basic SQL
> knowledge
> (did you receive any other reply?).
> The clause
>  WHERE ...
>  AND division_id = proj.division_id AND project_id =
> proj.project_id ...
>
> is leading to a JOIN of your projects table to itself.
> I'm pretty sure that's the main reason why the query is slow.
> As I understand your database table design, there are relations about
> divisions, projects, lots, and deficiencies of lots. And you are
> running a master database for all of them.
> I've tried to write two queries (see below) to retrieve the
> information
> you want (BTW I think your first subquery counts the total number of
> lots within the project but not the total number of deficiencies).
> Both queries may still run slow because three tables have to be joined
> (Please try them within the 'psql' interactive terminal first).
> Also, they may not work at all (I could not verify them as I did not
> know about your CREATE TABLE statements and did not have data to put
> in).
> I'm willing to help, so if it's not working this information would be
> very useful to me. I am no SQL guru, so I cannot see any way to put
> these
> two into one. But this looks like an interesting task, maybe we should
> put this topic to the list again as soon as we make the
> single ones run.
>
> Probably, you'll need to create several indexes to speed up.
>
> -- for each project, the total number of deficiencies
> SELECT   p.project_id, p.marketing_name, COUNT(d.lot_id) AS def_count
> FROM     projects AS p,
>          lots AS l LEFT JOIN deficiency_table AS d
> ON       ( d.lot_id = l.lot_id )
> WHERE    l.division_id = p.division_id
> AND       p.division_id = '#variables.local_division_id#'
> GROUP BY p.project_id, p.marketing_name ;
>
> -- for each project, the total number of lots with 1 or more
> deficiencies
> SELECT   p.project_id, p.marketing_name, COUNT(l.lot_id) AS
> def_lot_count
> FROM     projects AS p,
>          lots AS l LEFT JOIN deficiency_table AS d
> ON       ( d.lot_id = l.lot_id )
> WHERE    l.division_id = p.division_id
> AND       p.division_id = '#variables.local_division_id#'
> GROUP BY p.project_id, p.marketing_name HAVING COUNT(d.lot_id) > 0 ;
>
> Once again, no offence intended, but I recommend to read a book on SQL
> soon.
>
> Regards, Christoph
>
> >
> > The query below is slow because both the lots table and the
> deficiency_table
> > table have thousands of records.  Can anyone tell me how to do the
> second
> > subselect (lot_count) by some method of a join instead of a sub -
> subselect
> > OR any other method I can use to optimize this query to make it
> faster?
> >
> > The objective of the query is:  Tell me for each project, the total
> number
> > of deficiencies in the project, and the total number of
> lots with 1 or
> more
> > deficiencies in the project.
> >
> > SELECT  project_id, marketing_name,
> >   (SELECT COUNT(lots.lot_id) AS lot_count
> >    FROM deficiency_table AS dt, lots
> >    WHERE dt.lot_id = lots.lot_id
> >       AND lots.division_id = proj.division_id
> >       AND lots.project_id = proj.project_id
> >   ) AS def_count,
> >   (SELECT COUNT(lots.lot_id) AS lot_counter
> >    FROM lots
> >    WHERE  lots.division_id = proj.division_id
> >     AND lots.project_id = proj.project_id
> >    AND EXISTS (SELECT 1 FROM deficiency_table AS dt WHERE
> dt.lot_id =
> > lots.lot_id)
> >   ) AS lot_count
> > FROM    projects AS proj
> > WHERE   proj.division_id = '#variables.local_division_id#'
> >  AND NOT EXISTS (SELECT 1 FROM menu_group_projects WHERE menu_code =
> 'WA'
> > AND division_id = proj.division_id AND project_id = proj.project_id
> AND
> > status = 'I')
> > ORDER BY proj.project_id
> >
> > Thanks in advance
> >
> > Terry Fielder
> > Network Engineer
> > Great Gulf Homes / Ashton Woods Homes
> > terry@greatgulfhomes.com
> >
>



В списке pgsql-sql по дате отправления:

Предыдущее
От: Richard Huxton
Дата:
Сообщение: Re: Generating a cross tab (pivot table)
Следующее
От: "Prime Ho"
Дата:
Сообщение: how to get the source table & field name of a view field