Обсуждение: How can I speed up with query?
I would like to know if there is a more efficient way to perform a query.
I have a table (see below) with an ID, a date, and a value.  The date
specifies when the entry was added.  I want to query the table to determine
what the value was on a specific day.
I have the following query, which works quite well but it is very slow (the
real table I'm using has 200000 records)... I'm sure the slowness is due to
the subselect... does anybody know any way to make this query faster?
Thanks in advance,
Scott
My current query:
select a.* from sample a where (id,date) in (select a.id,max(date) from
sample where date<='<<the date>>' and id=a.id) order by id;
So with the data below, filling in '2003-02-01' for <<the date>> gives:
id |    date    | value
----+------------+-------
1 | 2003-02-01 |    12
2 | 2003-02-01 |     9
3 | 2003-02-01 |     2
4 | 2003-02-01 |    11
And filling in '2003-02-04' for <<the date>> gives:
id |    date    | value
----+------------+-------
1 | 2003-02-04 |    21
2 | 2003-02-01 |     9
3 | 2003-02-01 |     2
4 | 2003-02-03 |    12
Here is the table layout and the sample data I'm using:
       Table "sample"
Column |  Type   | Modifiers
--------+---------+-----------
id     | integer | not null
date   | date    | not null
value  | integer | not null
Primary key: sample_pkey
id |    date    | value
----+------------+-------
1 | 2003-02-01 |    12
1 | 2003-02-02 |    16
1 | 2003-02-04 |    21
2 | 2003-02-01 |     9
3 | 2003-02-01 |     2
4 | 2003-02-01 |    11
4 | 2003-02-03 |    12
(7 rows)
			
		How bout select a.* from sample a where exists (select id from sample where date <= '<<thedate>>' and id = a.id order by date desc limit 1) order by id Is that what your after? Seems like you could do this w/o the subquery.. but Im not sure exactly what you want. Thanks Chad ----- Original Message ----- From: "Scott Morrison" <smorrison@navtechinc.com> To: <pgsql-novice@postgresql.org> Sent: Wednesday, February 05, 2003 11:59 AM Subject: [NOVICE] How can I speed up with query? > I would like to know if there is a more efficient way to perform a query. > > I have a table (see below) with an ID, a date, and a value. The date > specifies when the entry was added. I want to query the table to determine > what the value was on a specific day. > > I have the following query, which works quite well but it is very slow (the > real table I'm using has 200000 records)... I'm sure the slowness is due to > the subselect... does anybody know any way to make this query faster? > > Thanks in advance, > Scott > > My current query: > select a.* from sample a where (id,date) in (select a.id,max(date) from > sample where date<='<<the date>>' and id=a.id) order by id; > > So with the data below, filling in '2003-02-01' for <<the date>> gives: > id | date | value > ----+------------+------- > 1 | 2003-02-01 | 12 > 2 | 2003-02-01 | 9 > 3 | 2003-02-01 | 2 > 4 | 2003-02-01 | 11 > > And filling in '2003-02-04' for <<the date>> gives: > id | date | value > ----+------------+------- > 1 | 2003-02-04 | 21 > 2 | 2003-02-01 | 9 > 3 | 2003-02-01 | 2 > 4 | 2003-02-03 | 12 > > Here is the table layout and the sample data I'm using: > > Table "sample" > Column | Type | Modifiers > --------+---------+----------- > id | integer | not null > date | date | not null > value | integer | not null > Primary key: sample_pkey > > id | date | value > ----+------------+------- > 1 | 2003-02-01 | 12 > 1 | 2003-02-02 | 16 > 1 | 2003-02-04 | 21 > 2 | 2003-02-01 | 9 > 3 | 2003-02-01 | 2 > 4 | 2003-02-01 | 11 > 4 | 2003-02-03 | 12 > (7 rows) > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
> I have a table (see below) with an ID, a date, and a value.  The date
> specifies when the entry was added.  I want to query the table to determine
> what the value was on a specific day.
> ...
> select a.* from sample a where (id,date) in (select a.id,max(date) from
> sample where date<='<<the date>>' and id=a.id) order by id;
Actually, your query as written wants to find the latest value added before or
on a certain date. A subtle difference, but it makes a difference. Some
quick notes before I jump into my shot at this:
* Avoid the max() function - it is not fully optimized yet. Instead use
SELECT foo FROM table ORDER BY foo DESC LIMIT 1
(and use ASC to replace the min() function)
* Try not to use keywords such as "date" for your column names.
* Always run VACUUM ANALYZE and create an index: in this case, on the "date" column
* If you can't match on a single column (as in the id,date from your original
query, use the oid)
(The explain analyze below is for a 200,000 row table with 20 distinct ids and a
time period of abot a month.)
VACUUM ANALYZE sample;
CREATE INDEX sample_date on sample(date);
EXPLAIN ANALYZE
SELECT id, date, value FROM sample a WHERE date <='2003-01-01' AND oid =
(SELECT oid FROM sample WHERE id = a.id AND DATE <='2003-01-01' ORDER BY date DESC LIMIT 1)
ORDER BY id;
                                                                    QUERY PLAN
- --------------------------------------------------------------------------------------------------
 Sort  (cost=1712.13..1712.14) (actual time=5292.35..5292.38 rows=20 loops=1)
   Sort Key: id
   ->  Index Scan using sample_date on sample a  (cost=0.00..1712.12) (actual time=5258.10..5292.22 rows=20 loops=1)
         Index Cond: (date <= '2003-01-01'::date)
         Filter: (oid = (subplan))
         SubPlan
           ->  Limit  (cost=0.00..81.53) (actual time=0.71..0.78 rows=1 loops=6532)
                 ->  Index Scan Backward using sample_date on sample  (cost=0.00..81.53) (actual time=0.70..0.78 rows=2
loops=6532)
                       Index Cond: (date <= '2003-01-01'::date)
                       Filter: (id = $0)
 Total runtime: 5292.50 msec
- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302051538
-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html
iD8DBQE+QXnMvJuQZxSWSsgRAhmDAKDwMmf0GvhnVFKeiDPnVolx3wOO1gCgsasJ
3t3LCOa6Q5uOCJpawodJO54=
=dGj1
-----END PGP SIGNATURE-----
			
		On Thu, 2003-02-06 at 07:59, Scott Morrison wrote:
>
> My current query:
> select a.* from sample a where (id,date) in (select a.id,max(date) from
> sample where date<='<<the date>>' and id=a.id) order by id;
max(date) will require a scan on the recordset in the subselect.  You
might be better to use 'ORDER BY date DESC LIMIT 1' which will use an
index on date if one exists.
SELECT * FROM sample a WHERE (id, date)
  IN (SELECT a.id, date FROM sample WHERE id=a.id ORDER BY date DESC
LIMIT 1)
When you had 20,000 records you would definitely be wanting an index on
date and an index on id.
I think that the LIMIT clause on subselects is only implemented in
recent PostgreSQL however - 7.2 on, I believe.
To examine the query plans that PostgreSQL comes up with, and help you
choose a better structure for your query, you should use "EXPLAIN <<the
query>>" .
Regards,
                    Andrew.
--
---------------------------------------------------------------------
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/         PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201     MOB: +64(21)635-694    OFFICE: +64(4)499-2267
           Survey for nothing with http://survey.net.nz/
---------------------------------------------------------------------