Обсуждение: SELECT Aggregate

Поиск
Список
Период
Сортировка

SELECT Aggregate

От
"Phillip Smith"
Дата:

Hi all,

I have two tables which are storing all our sales orders / invoices as below. sales_order.trans_no and soh_product.soh_num are the common columns. This is PostgreSQL 8.1.4 (ie, the latest release)

 

We have some issues that I’ve been able to identify using this SELECT:

SELECT      trans_no,

            customer,

            date_placed,

            date_complete,

            date_printed,

            ord_type,

            ord_status,

            customer_reference,

            salesman,

            parent_order,

            child_order,

            order_number

FROM        sales_orders

WHERE       (trans_no Like '8%' AND order_number Like '8%')

 OR         (trans_no Like '9%' AND order_number Like '8%')

 OR         (trans_no Like '8%' AND order_number Like '9%')

 OR         (trans_no Like '9%' AND order_number Like '9%')

 AND        (warehouse='M')

 AND        (date_placed > (current_date + ('12 months ago'::interval)))

ORDER BY trans_no DESC

 

But I want to add in a wholesale value of each order – SUM(soh_product.sell_price) – How would be best to do this? Would it be easiest to create a function to accept the trans_no then do a SELECT on soh_product and return that value?

 

Thanks,

-p

 

I’ve tried to do this but Postgres complains about having to include all the other columns in either an aggregate or the GROUP BY.

SELECT      trans_no,

            customer,

            date_placed,

            date_complete,

            date_printed,

            ord_type,

            ord_status,

            SUM(soh_product.sell_price),

            customer_reference,

            salesman,

            parent_order,

            child_order,

            order_number

FROM        sales_orders, soh_product

WHERE       (trans_no Like '8%' AND order_number Like '8%')

 OR         (trans_no Like '9%' AND order_number Like '8%')

 OR         (trans_no Like '8%' AND order_number Like '9%')

 OR         (trans_no Like '9%' AND order_number Like '9%')

 AND        (warehouse='M')

 AND        (sales_orders.trans_no = soh_product.soh_num)

 AND        (date_placed > (current_date + ('12 months ago'::interval)))

GROUP BY soh_product.soh_num

ORDER BY trans_no DESC

 

CREATE TABLE sales_orders

(

  trans_no varchar(6) NOT NULL,

  customer varchar(6),

  date_placed date,

  date_complete date,

  date_printed date,

  ord_type varchar(1),

  ord_status varchar(1),

  discount float8,

  customer_reference text,

  warehouse varchar(3),

  salesman varchar(3),

  username text,

  ordered_value float8 DEFAULT 0,

  supplied_value float8 DEFAULT 0,

  ordered_qty int8,

  supplied_qty int8 DEFAULT 0,

  frieght float8 DEFAULT 0,

  delivery_instructions text,

  parent_order varchar(6),

  child_order varchar(6),

  apply_to_order varchar(6),

  fo_release date,

  order_number varchar(6),

  orig_fo_number varchar(6),

  CONSTRAINT soh_pkey PRIMARY KEY (trans_no)

)

CREATE TABLE soh_product

(

  soh_num varchar(6) NOT NULL,

  prod_code varchar(6) NOT NULL,

  qty_ordered numeric(8),

  qty_supplied numeric(8),

  cost_price numeric(10,2),

  sell_price numeric(10,2),

  sales_tax numeric(10,2),

  discount numeric(10,2),

  cost_gl varchar(5),

  if_committed varchar(1)

)


*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to the addressee. If you are not the addressee indicated in this message or responsible for delivery of the message to such person, you may not copy or deliver this message to anyone, and you should destroy it and kindly notify the sender by reply email.

Information in this message that does not relate to the official business of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta. Weatherbeeta, its employees, contractors or associates shall not be liable for direct, indirect or consequential loss arising from transmission of this message or any attachments

Re: SELECT Aggregate

От
"Aaron Bono"
Дата:
I would recommend against using a function.  If you are selecting a large number of rows, the function will run for each row returned and will have to do a select for each row.  So if you get 1000 rows returned from your query, you will end up with 1001 select statements for your one query.

Assuming trans_no is your primary key (or at least unique) then a group by on all columns in the select EXCEPT sale_price should do the trick:

SELECT      trans_no,

            customer,

            date_placed,

            date_complete,

            date_printed,

            ord_type,

            ord_status,

            SUM(soh_product.sell_price),

            customer_reference,

            salesman,

            parent_order,

            child_order,

            order_number

FROM        sales_orders, soh_product

WHERE       (trans_no Like '8%' AND order_number Like '8%')

 OR         (trans_no Like '9%' AND order_number Like '8%')

 OR         (trans_no Like '8%' AND order_number Like '9%')

 OR         (trans_no Like '9%' AND order_number Like '9%')

 AND        (warehouse='M')

  AND        (sales_orders.trans_no = soh_product.soh_num)

 AND        (date_placed > (current_date + ('12 months ago'::interval)))

GROUP BY    trans_no,

            customer,

            date_placed,

            date_complete,

            date_printed,

            ord_type,

            ord_status,

            customer_reference,

            salesman,

            parent_order,

            child_order,

            order_number

ORDER BY trans_no DESC

On 6/28/06, Phillip Smith < phillips@weatherbeeta.com.au> wrote:

Hi all,

I have two tables which are storing all our sales orders / invoices as below. sales_order.trans_no and soh_product.soh_num are the common columns. This is PostgreSQL 8.1.4 (ie, the latest release)

 

We have some issues that I've been able to identify using this SELECT:

SELECT      trans_no,

            customer,

            date_placed,

            date_complete,

            date_printed,

            ord_type,

            ord_status,

            customer_reference,

            salesman,

            parent_order,

            child_order,

            order_number

FROM        sales_orders

WHERE       (trans_no Like '8%' AND order_number Like '8%')

 OR         (trans_no Like '9%' AND order_number Like '8%')

 OR         (trans_no Like '8%' AND order_number Like '9%')

 OR         (trans_no Like '9%' AND order_number Like '9%')

 AND        (warehouse='M')

 AND        (date_placed > (current_date + ('12 months ago'::interval)))

ORDER BY trans_no DESC

 

But I want to add in a wholesale value of each order – SUM(soh_product.sell_price) – How would be best to do this? Would it be easiest to create a function to accept the trans_no then do a SELECT on soh_product and return that value?

 

Thanks,

-p

 

I've tried to do this but Postgres complains about having to include all the other columns in either an aggregate or the GROUP BY.

SELECT      trans_no,

            customer,

            date_placed,

            date_complete,

            date_printed,

            ord_type,

            ord_status,

            SUM(soh_product.sell_price),

            customer_reference,

            salesman,

            parent_order,

            child_order,

            order_number

FROM        sales_orders, soh_product

WHERE       (trans_no Like '8%' AND order_number Like '8%')

 OR         (trans_no Like '9%' AND order_number Like '8%')

 OR         (trans_no Like '8%' AND order_number Like '9%')

 OR         (trans_no Like '9%' AND order_number Like '9%')

 AND        (warehouse='M')

  AND        (sales_orders.trans_no = soh_product.soh_num)

 AND        (date_placed > (current_date + ('12 months ago'::interval)))

GROUP BY soh_product.soh_num

ORDER BY trans_no DESC

 

CREATE TABLE sales_orders

(

  trans_no varchar(6) NOT NULL,

  customer varchar(6),

  date_placed date,

  date_complete date,

  date_printed date,

  ord_type varchar(1),

  ord_status varchar(1),

  discount float8,

  customer_reference text,

  warehouse varchar(3),

  salesman varchar(3),

  username text,

  ordered_value float8 DEFAULT 0,

  supplied_value float8 DEFAULT 0,

  ordered_qty int8,

  supplied_qty int8 DEFAULT 0,

  frieght float8 DEFAULT 0,

  delivery_instructions text,

  parent_order varchar(6),

  child_order varchar(6),

  apply_to_order varchar(6),

  fo_release date,

  order_number varchar(6),

  orig_fo_number varchar(6),

  CONSTRAINT soh_pkey PRIMARY KEY (trans_no)

)

CREATE TABLE soh_product

(

  soh_num varchar(6) NOT NULL,

  prod_code varchar(6) NOT NULL,

  qty_ordered numeric(8),

  qty_supplied numeric(8),

  cost_price numeric(10,2),

  sell_price numeric(10,2),

  sales_tax numeric(10,2),

  discount numeric(10,2),

  cost_gl varchar(5),

  if_committed varchar(1)

)

Re: SELECT Aggregate

От
Richard Broersma Jr
Дата:
> SELECT      trans_no,
>             customer,
>             date_placed,
>             date_complete,
>             date_printed,
>             ord_type,
>             ord_status,             select (                        SUM(sell_price)                        from
soh_product                       where sales_orders.trans_no = soh_product.soh_num                     ) as
transact_sum,
>             customer_reference,
>             salesman,
>             parent_order,
>             child_order,
>             order_number
> FROM        sales_orders
> WHERE       (trans_no Like '8%' AND order_number Like '8%')
>  OR         (trans_no Like '9%' AND order_number Like '8%')
>  OR         (trans_no Like '8%' AND order_number Like '9%')
>  OR         (trans_no Like '9%' AND order_number Like '9%')
>  AND        (warehouse='M')
>  AND        (date_placed > (current_date + ('12 months ago'::interval)))
> ORDER BY trans_no DESC


I am pretty new to SQL.  But while reading a book written by an author recommended on this list,I
can suggest a possible solution that I've seen.  It might work for your problem. Of course, I
haven't tested anything like this and don't know if PostgreSQL supports it.

Just be sure that trans_no is unique in the returned query.

Regards,

Richard Broersma Jr.


Re: SELECT Aggregate

От
Richard Broersma Jr
Дата:
> > SELECT      trans_no,
> >             customer,
> >             date_placed,
> >             date_complete,
> >             date_printed,
> >             ord_type,
> >             ord_status,               (select  SUM(sell_price)   -- this syntax working for me. see below
    from    soh_product                where   sales_orders.trans_no = soh_product.soh_num                ) as
transact_sum,
> >             customer_reference,
> >             salesman,
> >             parent_order,
> >             child_order,
> >             order_number
> > FROM        sales_orders
> > WHERE       (trans_no Like '8%' AND order_number Like '8%')
> >  OR         (trans_no Like '9%' AND order_number Like '8%')
> >  OR         (trans_no Like '8%' AND order_number Like '9%')
> >  OR         (trans_no Like '9%' AND order_number Like '9%')
> >  AND        (warehouse='M')
> >  AND        (date_placed > (current_date + ('12 months ago'::interval)))
> > ORDER BY trans_no DESC
> 
> 
> I am pretty new to SQL.  But while reading a book written by an author recommended on this
> list,I
> can suggest a possible solution that I've seen.  It might work for your problem. Of course, I
> haven't tested anything like this and don't know if PostgreSQL supports it.
> 
> Just be sure that trans_no is unique in the returned query.

select       f1.fiscalyear,
       (select f2.startdate        from   fiscalyeartable2 as f2        where  f1.fiscalyear = f2.fiscalyear       ) as
start2date,
       f1.enddate

from       fiscalyeartable1 as f1;
fiscalyear | start2date |  enddate
------------+------------+------------      1995 | 1994-10-01 | 1995-09-30      1996 | 1995-10-01 | 1996-08-30
1997| 1996-10-01 | 1997-09-30      1998 | 1997-10-01 | 1998-09-30
 
                    QUERY PLAN                                    
--------------------------------------------------------------------------Seq Scan on fiscalyeartable1 f1
(cost=0.00..6.83rows=1 width=6)                                  (actual time=0.044..0.067 rows=4 loops=1)  SubPlan
-> Index Scan using fiscalyeartable2_pkey on fiscalyeartable2 f2                                   (cost=0.00..5.82
rows=1width=4)                                  (actual time=0.008..0.009 rows=1 loops=4)          Index Cond: ($0 =
fiscalyear)Totalruntime: 0.138 ms
 
(5 rows)

it works,  and check out the nifty query plan.

Regards,

Richard Broersma Jr.


Re: SELECT Aggregate

От
"Aaron Bono"
Дата:
I am not familiar enough with how postgres optimizes the queries but won't this end up with

total number of queries run on DB = 1 query + 1 query/row in first query

What would be more efficient on a large database - a query like Richard submitted (subquery in the select) or one like I submitted (join the two tables and then do a group by)?  My guess is it depends on the % of records returned out of the sales_orders table, the smaller the % the better Richard's query would perform, the higher the % the better the join would run.

The database I am working with aren't big enough yet to warrant spending a lot of time researching this but if someone with more experience knows what is best I would love to hear about it.

Thanks,
Aaron Bono

On 6/29/06, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
> SELECT      trans_no,
>             customer,
>             date_placed,
>             date_complete,
>             date_printed,
>             ord_type,
>             ord_status,
              select (
                         SUM(sell_price)
                         from soh_product
                         where sales_orders.trans_no = soh_product.soh_num
                      ) as transact_sum,
>             customer_reference,
>             salesman,
>             parent_order,
>             child_order,
>             order_number
> FROM        sales_orders
> WHERE       (trans_no Like '8%' AND order_number Like '8%')
>  OR         (trans_no Like '9%' AND order_number Like '8%')
>  OR         (trans_no Like '8%' AND order_number Like '9%')
>  OR         (trans_no Like '9%' AND order_number Like '9%')
>  AND        (warehouse='M')
>  AND        (date_placed > (current_date + ('12 months ago'::interval)))
> ORDER BY trans_no DESC

Re: SELECT Aggregate

От
"Phillip Smith"
Дата:

I’ve tried Aaron’s suggestion of the GROUP BY and I don’t know much about it, but it ran for around 17 hours and still going (it had a dedicated Dual Xeon 3.0GHz box under RHEL4 running it!)

 

I’ll give Richard’s suggestion a try and see if that comes up any better. Like I said yesterday, this might just be too much for Postgres and I’ll need to summarize it in the export from our live system and add a new column before I import it to the sales_orders table

 

Cheers,

-p

 

-----Original Message-----
From: aaron.bono@gmail.com [mailto:aaron.bono@gmail.com] On Behalf Of Aaron Bono
Sent:
Friday, 30 June 2006 01:25
To: Richard Broersma Jr
Cc: Phillip Smith; pgsql-sql@postgresql.org
Subject: Re: [SQL] SELECT Aggregate

 

I am not familiar enough with how postgres optimizes the queries but won't this end up with

total number of queries run on DB = 1 query + 1 query/row in first query

What would be more efficient on a large database - a query like Richard submitted (subquery in the select) or one like I submitted (join the two tables and then do a group by)?  My guess is it depends on the % of records returned out of the sales_orders table, the smaller the % the better Richard's query would perform, the higher the % the better the join would run.

The database I am working with aren't big enough yet to warrant spending a lot of time researching this but if someone with more experience knows what is best I would love to hear about it.

Thanks,
Aaron Bono

On 6/29/06, Richard Broersma Jr <rabroersma@yahoo.com> wrote:

> SELECT      trans_no,
>             customer,
>             date_placed,
>             date_complete,
>             date_printed,
>             ord_type,
>             ord_status,
              select (
                         SUM(sell_price)
                         from soh_product
                         where sales_orders.trans_no = soh_product.soh_num
                      ) as transact_sum,
>             customer_reference,
>             salesman,
>             parent_order,
>             child_order,
>             order_number
> FROM        sales_orders
> WHERE       (trans_no Like '8%' AND order_number Like '8%')
>  OR         (trans_no Like '9%' AND order_number Like '8%')
>  OR         (trans_no Like '8%' AND order_number Like '9%')
>  OR         (trans_no Like '9%' AND order_number Like '9%')
>  AND        (warehouse='M')
>  AND        (date_placed > (current_date + ('12 months ago'::interval)))
> ORDER BY trans_no DESC


*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to the addressee. If you are not the addressee indicated in this message or responsible for delivery of the message to such person, you may not copy or deliver this message to anyone, and you should destroy it and kindly notify the sender by reply email.

Information in this message that does not relate to the official business of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta. Weatherbeeta, its employees, contractors or associates shall not be liable for direct, indirect or consequential loss arising from transmission of this message or any attachments

Re: SELECT Aggregate

От
Richard Broersma Jr
Дата:
> I've tried Aaron's suggestion of the GROUP BY and I don't know much about
> it, but it ran for around 17 hours and still going (it had a dedicated Dual
> Xeon 3.0GHz box under RHEL4 running it!)

Maybe, this query that you are trying to run is a good candidate for a "Materialize View".
http://archives.postgresql.org/pgsql-performance/2004-02/msg00279.php

Also before you run your query you might want to see the explain plan is.  Perhap it is using a
sequencial scan in a place where an index can improve query preformance.




Re: SELECT Aggregate

От
"Phillip Smith"
Дата:

Well whatdyaknow?? Being a Postgres newbie I hadn't even played with indexes yet. They're awesome!!

 

Using Richard's suggestion of the Sub-Select in the COLUMN list, combined with adding some indexes, I can now return this in under 5 seconds!

 

I’ve included the new SELECT query, as well as the definitions of the indexes below for anyone who’s interested.

 

Thanks guys!

 

QUERY:

SELECT      trans_no,

            customer,

            date_placed,

            date_complete,

            date_printed,

            (SELECT  SUM(sell_price)

                 FROM   soh_product

                 WHERE  sales_orders.trans_no = soh_product.soh_num

                 ) AS wholesale,

            ord_type,

            ord_status,

            customer_reference,

            salesman,

            parent_order,

            child_order,

            order_number

FROM        sales_orders

WHERE       (trans_no Like '8%' AND order_number Like '8%')

 OR         (trans_no Like '9%' AND order_number Like '8%')

 OR         (trans_no Like '8%' AND order_number Like '9%')

 OR         (trans_no Like '9%' AND order_number Like '9%')

 AND        warehouse='M'

 AND        date_placed > (current_date + ('12 months ago'::interval))

ORDER BY    trans_no DESC

 

INDEXES:

CREATE INDEX sales_orders_customer

  ON sales_orders

  USING btree

  (customer);

 

CREATE INDEX sales_orders_orderno

  ON sales_orders

  USING btree

  (order_number);

 

CREATE INDEX sales_orders_customer

  ON sales_orders

  USING btree

  (customer);

 

CREATE INDEX soh_product_prodcode

  ON soh_product

  USING btree

  (prod_code);

 

CREATE INDEX soh_product_transno

  ON soh_product

  USING btree

  (soh_num);

 

 

-----Original Message-----
From: Richard Broersma Jr [mailto:rabroersma@yahoo.com]
Sent: Friday, 30 June 2006 10:51
To: Phillip Smith; pgsql-sql@postgresql.org
Subject: Re: [SQL] SELECT Aggregate

 

> I've tried Aaron's suggestion of the GROUP BY and I don't know much about

> it, but it ran for around 17 hours and still going (it had a dedicated Dual

> Xeon 3.0GHz box under RHEL4 running it!)

 

Maybe, this query that you are trying to run is a good candidate for a "Materialize View".

http://archives.postgresql.org/pgsql-performance/2004-02/msg00279.php

 

Also before you run your query you might want to see the explain plan is.  Perhap it is using a

sequencial scan in a place where an index can improve query preformance.

 

 


*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to the addressee. If you are not the addressee indicated in this message or responsible for delivery of the message to such person, you may not copy or deliver this message to anyone, and you should destroy it and kindly notify the sender by reply email.

Information in this message that does not relate to the official business of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta. Weatherbeeta, its employees, contractors or associates shall not be liable for direct, indirect or consequential loss arising from transmission of this message or any attachments

Re: SELECT Aggregate

От
Richard Broersma Jr
Дата:
> Well whatdyaknow?? Being a Postgres newbie I hadn't even played with indexes
> yet. They're awesome!!
> Using Richard's suggestion of the Sub-Select in the COLUMN list, combined
> with adding some indexes, I can now return this in under 5 seconds!

Also, another way to improve preformance will be to analyze the affected tables.  Analyze will
ensure that the query planner has accurate statics by which it will use in picking fastest
possible query.

If you ever plan on updating or deleting records.  You will also need to vacuum the table.  And an
additional measure of maintance would be to re-index the database.

All of this is listing in the postgresql manual.  If you really want to ensure the best possible
speeds, it will be an important step to take.

Regards,

Richard Broersma Jr.