Обсуждение: Query performance question
Hi,
I'm having trouble with an obviously simple query that just doesn't
perform quite good IMO.
I have two tables:
httplog=# \d hits
Table "hits"
Attribute | Type | Modifier
-------------+-----------+-----------------------------------------------
id | integer | not null default nextval('hits_id_seq'::text)
page_id | integer | not null
referrer_id | integer | not null
ip_addr | inet | not null
time | timestamp | not null
domain_id | integer | not null
Index: hits_pkey
httplog=# \d referrer
Table "referrer"
Attribute | Type | Modifier
-----------+--------------+----------
id | integer |
url | varchar(300) |
Index: referrer_pkey
These are part of an HTTP-log database. Table 'hits' has about 7000
rows, table 'referrer' has about 350 rows. Now I want to know what the top
ten referrers are, and I issue this query:
SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id
GROUP BY url ORDER BY count DESC LIMIT 10;
And this query takes almost five seconds to complete, and that seems to me
quite slow. The database is running on a Duron 700 with 256 MB RAM and
ATA-66 disks. Operating system is FreeBSD-4.2.
Are there some tuning knobs I could turn?
regards,
le
--
Lukas Ertl eMail: l.ertl@univie.ac.at
WWW-Redaktion Tel.: (+43 1) 4277-14073
Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140
der Universität Wien
On Wed, 28 Mar 2001, Lukas Ertl wrote: There are two things you should do first: 1) VACUUM ANALYZE hits; 2) VACUUM ANALYZE referrer; If it's still running slowly, try EXPLAINing your SELECT to us. Also, it's been my experience that LIMIT's slow down a query, though I could be dead wrong. > SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id > GROUP BY url ORDER BY count DESC LIMIT 10; > > And this query takes almost five seconds to complete, and that seems to me > quite slow. The database is running on a Duron 700 with 256 MB RAM and > ATA-66 disks. Operating system is FreeBSD-4.2. > > Are there some tuning knobs I could turn? -- Dave
On Wed, 28 Mar 2001, David Olbersen wrote:
> There are two things you should do first:
>
> 1) VACUUM ANALYZE hits;
> 2) VACUUM ANALYZE referrer;
I did that before, too.
> If it's still running slowly, try EXPLAINing your SELECT to us.
Ok, VACUUMing the tables didn't increase performance. This is the EXPLAIN
output:
httplog=# EXPLAIN SELECT count(*), url FROM hits, referrer WHERE
referrer.id = referrer_id GROUP BY url ORDER BY count DESC LIMIT 10;
NOTICE: QUERY PLAN:
Sort (cost=912.81..912.81 rows=678 width=20)
-> Aggregate (cost=846.98..880.90 rows=678 width=20)
-> Group (cost=846.98..863.94 rows=6785 width=20)
-> Sort (cost=846.98..846.98 rows=6785 width=20)
-> Hash Join (cost=10.46..415.17 rows=6785 width=20)
-> Seq Scan on hits (cost=0.00..128.85
rows=6785 width=4)
-> Hash (cost=9.57..9.57 rows=357 width=16)
-> Seq Scan on referrer (cost=0.00..9.57
rows=357 width=16)
EXPLAIN
I hope someone can explain this to me :)
Thanks for your quick answer,
regards,
le
--
Lukas Ertl eMail: l.ertl@univie.ac.at
WWW-Redaktion Tel.: (+43 1) 4277-14073
Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140
der Universität Wien
On Thu, 29 Mar 2001, Vijay Deval wrote: > URL is a very large field. If an extra field is created which gives a > neumeric id to the url, run the query on the number, and then get the desired > output I have to admit, that I quite don't understand what you want me to do here... I already have that numeric id: > > httplog=# \d referrer > > Table "referrer" > > Attribute | Type | Modifier > > -----------+--------------+---------- > > id | integer | > > url | varchar(300) | > > Index: referrer_pkey If I say: SELECT count(*), referrer.id FROM hits, referrer WHERE referrer.id = referrer_id GROUP BY referrer.id ORDER BY count DESC LIMIT 10; it is really fast and I get the IDs of the referrer-urls but how would I get the URLs then? regards, le -- Lukas Ertl eMail: l.ertl@univie.ac.at WWW-Redaktion Tel.: (+43 1) 4277-14073 Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140 der Universität Wien
URL is a very large field. If an extra field is created which gives a
neumeric id to the url, run the query on the number, and then get the desired
output
Vijay
Lukas Ertl wrote:
> Hi,
>
> I'm having trouble with an obviously simple query that just doesn't
> perform quite good IMO.
>
> I have two tables:
>
> httplog=# \d hits
> Table "hits"
> Attribute | Type | Modifier
> -------------+-----------+-----------------------------------------------
> id | integer | not null default nextval('hits_id_seq'::text)
> page_id | integer | not null
> referrer_id | integer | not null
> ip_addr | inet | not null
> time | timestamp | not null
> domain_id | integer | not null
> Index: hits_pkey
>
> httplog=# \d referrer
> Table "referrer"
> Attribute | Type | Modifier
> -----------+--------------+----------
> id | integer |
> url | varchar(300) |
> Index: referrer_pkey
>
> These are part of an HTTP-log database. Table 'hits' has about 7000
> rows, table 'referrer' has about 350 rows. Now I want to know what the top
> ten referrers are, and I issue this query:
>
> SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id
> GROUP BY url ORDER BY count DESC LIMIT 10;
>
>
i'd suggest either using an MD5 hash or some sort of CRC
definitely a hash though
On Thu, 29 Mar 2001, Vijay Deval wrote:
> URL is a very large field. If an extra field is created which gives a
> neumeric id to the url, run the query on the number, and then get the desired
> output
>
> Vijay
>
> Lukas Ertl wrote:
>
> > Hi,
> >
> > I'm having trouble with an obviously simple query that just doesn't
> > perform quite good IMO.
> >
> > I have two tables:
> >
> > httplog=# \d hits
> > Table "hits"
> > Attribute | Type | Modifier
> > -------------+-----------+-----------------------------------------------
> > id | integer | not null default nextval('hits_id_seq'::text)
> > page_id | integer | not null
> > referrer_id | integer | not null
> > ip_addr | inet | not null
> > time | timestamp | not null
> > domain_id | integer | not null
> > Index: hits_pkey
> >
> > httplog=# \d referrer
> > Table "referrer"
> > Attribute | Type | Modifier
> > -----------+--------------+----------
> > id | integer |
> > url | varchar(300) |
> > Index: referrer_pkey
> >
> > These are part of an HTTP-log database. Table 'hits' has about 7000
> > rows, table 'referrer' has about 350 rows. Now I want to know what the top
> > ten referrers are, and I issue this query:
> >
> > SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id
> > GROUP BY url ORDER BY count DESC LIMIT 10;
> >
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
-----------------------------------------------------------------------------
david@backpack.com BackPack Software, Inc. www.backpack.com
+1 651.645.7550 voice "Life is an Adventure.
+1 651.645.9798 fax Don't forget your BackPack!"
-----------------------------------------------------------------------------
Thanks for all of your suggestions, I found a way to speed up the queries
using VIEWs:
--
CREATE VIEW hits_per_referrer AS SELECT count(hits.id) AS anzahl,
hits.referrer_id FROM hits GROUP BY hits.referrer_id;
SELECT anzahl, url FROM hits_per_referrer, referrer WHERE referrer_id =
referrer.id ORDER BY anzahl DESC LIMIT 10;
--
This is really _a lot_ faster, as this EXPLAIN statement proves:
--
httplog=# EXPLAIN SELECT anzahl, url FROM hits_per_referrer, referrer
WHERE referrer_id = referrer.id ORDER BY anzahl DESC LIMIT 10;
NOTICE: QUERY PLAN:
Sort (cost=254.66..254.66 rows=1000 width=24)
-> Aggregate (cost=69.83..204.83 rows=1000 width=24)
-> Group (cost=69.83..179.83 rows=10000 width=24)
-> Merge Join (cost=69.83..154.83 rows=10000 width=24)
-> Index Scan using referrer_pkey on referrer
(cost=0.00..60.00 rows=1000 width=16)
-> Sort (cost=69.83..69.83 rows=1000 width=8)
-> Seq Scan on hits (cost=0.00..20.00
rows=1000 width=8)
EXPLAIN
--
Thanks again,
regards,
le
On Thu, 29 Mar 2001, D. Duccini wrote:
>
> i'd suggest either using an MD5 hash or some sort of CRC
>
> definitely a hash though
>
>
> On Thu, 29 Mar 2001, Vijay Deval wrote:
>
> > URL is a very large field. If an extra field is created which gives a
> > neumeric id to the url, run the query on the number, and then get the desired
> > output
> >
> > Vijay
> >
> > Lukas Ertl wrote:
> >
> > > Hi,
> > >
> > > I'm having trouble with an obviously simple query that just doesn't
> > > perform quite good IMO.
> > >
> > > I have two tables:
> > >
> > > httplog=# \d hits
> > > Table "hits"
> > > Attribute | Type | Modifier
> > > -------------+-----------+-----------------------------------------------
> > > id | integer | not null default nextval('hits_id_seq'::text)
> > > page_id | integer | not null
> > > referrer_id | integer | not null
> > > ip_addr | inet | not null
> > > time | timestamp | not null
> > > domain_id | integer | not null
> > > Index: hits_pkey
> > >
> > > httplog=# \d referrer
> > > Table "referrer"
> > > Attribute | Type | Modifier
> > > -----------+--------------+----------
> > > id | integer |
> > > url | varchar(300) |
> > > Index: referrer_pkey
> > >
> > > These are part of an HTTP-log database. Table 'hits' has about 7000
> > > rows, table 'referrer' has about 350 rows. Now I want to know what the top
> > > ten referrers are, and I issue this query:
> > >
> > > SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id
> > > GROUP BY url ORDER BY count DESC LIMIT 10;
--
Lukas Ertl eMail: l.ertl@univie.ac.at
WWW-Redaktion Tel.: (+43 1) 4277-14073
Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140
der Universität Wien
As the tables are linked by WHERE hits.referrer_id =referrer.id just adding to select referrer.url should display url I did not see right away that referrer.id was unique. Hence the suggestion to add a unique ID for the tuples. Vijay Lukas Ertl wrote: > On Thu, 29 Mar 2001, Vijay Deval wrote: > > > URL is a very large field. If an extra field is created which gives a > > neumeric id to the url, run the query on the number, and then get the desired > > output > > I have to admit, that I quite don't understand what you want me to do > here... I already have that numeric id: > > > > httplog=# \d referrer > > > Table "referrer" > > > Attribute | Type | Modifier > > > -----------+--------------+---------- > > > id | integer | > > > url | varchar(300) | > > > Index: referrer_pkey > > If I say: > > SELECT count(*), referrer.id FROM hits, referrer WHERE referrer.id = > referrer_id GROUP BY referrer.id ORDER BY count DESC LIMIT 10; > > it is really fast and I get the IDs of the referrer-urls but how would I > get the URLs then? > > regards, > le > > -- > Lukas Ertl eMail: l.ertl@univie.ac.at > WWW-Redaktion Tel.: (+43 1) 4277-14073 > Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140 > der Universität Wien