Обсуждение: Help refining/eliminating recursive selects

Поиск
Список
Период
Сортировка

Help refining/eliminating recursive selects

От
Edmund Bacon
Дата:
I have the following table:

create table test (   id        serial primary key,   product   integer,   tx_date   date,   quantity  integer)

with the following data:id | product |  tx_date   | quantity 
----+---------+------------+---------- 1 |       1 | 2004-01-01 |       10 2 |       2 | 2004-01-01 |        8 3 |
3 | 2004-01-01 |        7 4 |       4 | 2004-01-01 |       12 5 |       1 | 2004-01-15 |        9 6 |       2 |
2004-01-15|       12 7 |       3 | 2004-01-15 |        8 8 |       5 | 2004-01-07 |       15
 


what I want to do is to find the most recent record for each product in
the table.

The  only ways I seem to be able to achieve this is by  one of the 
following

1) A self join: 
   SELECT * FROM test   JOIN (SELECT product, max(tx_date) AS tx_date   FROM test    GROUP BY product) x
USING(product,tx_date);
 

2) A correlated subquery:    SELECT * FROM test t   WHERE tx_date =( SELECT max(tx_date) FROM test         WHERE
product= t.product);
 
or

3) a two-part select:    SELECT product, max(tx_date) AS tx_date  INTO TEMP TABLE t_prod_date  FROM test  GROUP BY
product;
  SELECT  * FROM test     JOIN t_prod_date     USING(product, tx_date);


I can't help but feel like I'm missing something simple that would do
what I want and not mean I need to scan the table multiple times. 
Is there a better way?

In trying to answer some questions in advance:

The two-part select _IS_ comparitively slow on the above dataset. In my
actual situation, I have about 300 possible products and over 20,00
records to sort through.  In that case the overhead of creating the temp
table is easily overcome by reducing the search space.  In my case the
two-part select runs in about 2/3 the time the self-join.  Note that we
are currently talking about .5 and .3 seconds, but the dataset is
growing.

The correlated-subquery on the large dataset is horribly slow,
comparitively speaking, at about 8 seconds.  I'm trying various index
approaches, (ANALYZING after adding/dropping an index) with no luck,
yet.  Maybe I just haven't hit on the right combination of fields to
index on?

If anybody can either (a) point me in a better direction, or (b) confirm my approach,I would greatly appreciate it

Thanks very much.  

-- 
Edmund Bacon <ebacon@onesystem.com>



Re: Help refining/eliminating recursive selects

От
Stephan Szabo
Дата:
On Thu, 19 Feb 2004, Edmund Bacon wrote:

>
> I have the following table:
>
> create table test (
>     id        serial primary key,
>     product   integer,
>     tx_date   date,
>     quantity  integer)
>
> with the following data:
>  id | product |  tx_date   | quantity
> ----+---------+------------+----------
>   1 |       1 | 2004-01-01 |       10
>   2 |       2 | 2004-01-01 |        8
>   3 |       3 | 2004-01-01 |        7
>   4 |       4 | 2004-01-01 |       12
>   5 |       1 | 2004-01-15 |        9
>   6 |       2 | 2004-01-15 |       12
>   7 |       3 | 2004-01-15 |        8
>   8 |       5 | 2004-01-07 |       15
>
>
> what I want to do is to find the most recent record for each product in
> the table.

If you don't mind using a PostgreSQL extension, I think distinct on
might help you.

Maybe something like the following:select distinct on (product) * from test order by product desc, tx_date
desc;

This might be helped by an index on (product, tx_date).



Re: Help refining/eliminating recursive selects

От
"Yudie"
Дата:
What about something like this:

select * from test order by tx_date desc limit 1


Yudie



----- Original Message ----- 
From: "Edmund Bacon" <ebacon@onesystem.com>
To: <pgsql-sql@postgresql.org>
Sent: Thursday, February 19, 2004 5:09 PM
Subject: [SQL] Help refining/eliminating recursive selects



I have the following table:

create table test (   id        serial primary key,   product   integer,   tx_date   date,   quantity  integer)

with the following data:id | product |  tx_date   | quantity 
----+---------+------------+---------- 1 |       1 | 2004-01-01 |       10 2 |       2 | 2004-01-01 |        8 3 |
3 | 2004-01-01 |        7 4 |       4 | 2004-01-01 |       12 5 |       1 | 2004-01-15 |        9 6 |       2 |
2004-01-15|       12 7 |       3 | 2004-01-15 |        8 8 |       5 | 2004-01-07 |       15
 


what I want to do is to find the most recent record for each product in
the table.

The  only ways I seem to be able to achieve this is by  one of the 
following

1) A self join: 
   SELECT * FROM test   JOIN (SELECT product, max(tx_date) AS tx_date  FROM test   GROUP BY product) x   USING(product,
tx_date);

2) A correlated subquery:    SELECT * FROM test t   WHERE tx_date =
( SELECT max(tx_date) FROM test         WHERE product = t.product);
or

3) a two-part select:    SELECT product, max(tx_date) AS tx_date  INTO TEMP TABLE t_prod_date  FROM test  GROUP BY
product;
  SELECT  * FROM test     JOIN t_prod_date     USING(product, tx_date);


I can't help but feel like I'm missing something simple that would do
what I want and not mean I need to scan the table multiple times. 
Is there a better way?

In trying to answer some questions in advance:

The two-part select _IS_ comparitively slow on the above dataset. In my
actual situation, I have about 300 possible products and over 20,00
records to sort through.  In that case the overhead of creating the temp
table is easily overcome by reducing the search space.  In my case the
two-part select runs in about 2/3 the time the self-join.  Note that we
are currently talking about .5 and .3 seconds, but the dataset is
growing.

The correlated-subquery on the large dataset is horribly slow,
comparitively speaking, at about 8 seconds.  I'm trying various index
approaches, (ANALYZING after adding/dropping an index) with no luck,
yet.  Maybe I just haven't hit on the right combination of fields to
index on?

If anybody can either 
(a) point me in a better direction, or 
(b) confirm my approach,I would greatly appreciate it

Thanks very much.  

-- 
Edmund Bacon <ebacon@onesystem.com>


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate     subscribe-nomail command to
majordomo@postgresql.orgso that your     message can get through to the mailing list cleanly