Re: [Tuning questions..]
От | Michael T. Halligan |
---|---|
Тема | Re: [Tuning questions..] |
Дата | |
Msg-id | 3C21284F.5070407@echo.com обсуждение исходный текст |
Ответ на | Re: [Tuning questions..] (Bojan Belovic <bbelovic@usa.net>) |
Список | pgsql-admin |
Thanks, that sped things up a bit, from 7.6 sec. to about 5.5 sec. However the plan still includes a sequential scan on ssa_candidate: Aggregate (cost=12161.11..12161.11 rows=1 width=35) -> Merge Join (cost=11611.57..12111.12 rows=19996 width=35) -> Sort (cost=11488.27..11488.27 rows=99805 width=24) -> Seq Scan on ssa_candidate sc (cost=0.00..3201.05 rows=99805 width=24) -> Sort (cost=123.30..123.30 rows=31 width=11) -> Index Scan using station_subgenre_pk on station_subgenre ss (cost=0.00..122.53 rows=31 width=11) If we run the same query on Oracle (modified slightly for syntax): SELECT count(DISTINCT song_id) AS X FROM ssa_candidate SC, station_subgenre SS WHERE SC.style_id = SS.style_id AND SS.station_id = with the same data, it runs almost instaneously. Oracle's plan uses a hash join: SELECT STATEMENT Optimizer=CHOOSE (Cost=63 Card=1 Bytes=15) SORT (GROUP BY) HASH JOIN (Cost=63 Card=8943 Bytes=134145) INDEX (RANGE SCAN) OF STATION_SUBGENRE_PK (UNIQUE) (Cost=3 Card=12 Bytes=84) TABLE ACCESS (FULL) OF SSA_CANDIDATE (Cost=59 Card=60364 Bytes=482912) Is there some way to convince PostgreSQL to use a hash join? Bojan Belovic wrote: >Not sure about tuning, but it seems to me that this query would be much more >effective if it's rewritten like this (especially if style_id columns on both >tables are indexed): > >SELECT count(DISTINCT song_id) AS X >FROM ssa_candidate SC JOIN station_subgenre SS ON SC.style_id = SS.style_id >WHERE SS.station_id = 48 > >Please correct me if I'm wrong. Also, Michael, if you give this one a try, >could you send me the query plan, I'm just curious. > >SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre >WHERE > >SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre >WHERE "Michael T. Halligan" <michael@echo.com> wrote: > >>Hi.. I seem to be running into a bottle neck on a query, and I'm not >>sure what the bottleneck is . >>The machine is a dual-processor p3 750 with 2 gigs of (pc100) memory, >>and 3 72 gig disks setup >>in raid 5. Right now i'm just testing our db for speed (we're porting >>from oracle) .. later on >>We're looking @ a quad xeon 700 with 16 gigs of ram & 10 drives in >>hardware raid 5. >> >>We've tuned the queries a bit, added some indices, and we got this query >>down from about 15 minutes >>to 7.6 seconds.. but it just seems like we should be able to get this >>query down to under a second on >>this box.. It's running the latest suse, with 2.4.16 kernel, reiserfs, >>postgres 7.2b3. I've tried many different combinations >>of buffers, stat collection space, sort space, etc. none of them really >>effect performance.. >> >>When I run this particular query, the only resource that seems to change >>is one of the processors gets up to >>about 99% usage.. I've tried setting postgres to use up to 1.6 gigs of >>memory, but the postmaster never seems >>to get above about 700megs.. it's not swapping at all, though the >>contact switching seems to get a bit high (peaking >>at 150) .. >> >>The query sorts through about 80k rows.. here's the query >>-------------------------------------------------- >>SELECT count(*) FROM ( >> SELECT DISTINCT song_id FROM ssa_candidate WHERE >>style_id IN ( >> SELECT style_id FROM station_subgenre WHERE >>station_id = 48 >> ) >> ) AS X; >>-------------------------------------------------- >>and the query plan : >>-------------------------------------------------- >>NOTICE: QUERY PLAN: >> >>Aggregate (cost=12236300.87..12236300.87 rows=1 width=13) >> -> Subquery Scan x (cost=12236163.64..12236288.40 rows=4990 width=13) >> -> Unique (cost=12236163.64..12236288.40 rows=4990 width=13)SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre >>WHERE >> -> Sort (cost=12236163.64..12236163.64 rows=49902 width=13) >> -> Seq Scan on ssa_candidate >>(cost=0.00..12232269.54 rows=49902 width=13) >> SubPlan >> -> Materialize (cost=122.53..122.53 >>rows=31 width=11) >> -> Index Scan using >>station_subgenre_pk on station_subgenre (cost=0.00..122.53 rows=31 >>width=11) >> >>EXPLAIN >>-------------------------------------------------- >> >> >>If anybody has any ideas, I'd be really appreciative.. >> >> >> >> >> >> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 2: you can get off all lists at once with the unregister command >> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >> > > >____________________________________________________________________ >Get free e-mail and a permanent address at http://www.amexmail.com/?A=1 > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to majordomo@postgresql.org so that your >message can get through to the mailing list cleanly >
В списке pgsql-admin по дате отправления: