Re: [Tuning questions..]

Поиск
Список
Период
Сортировка
От Michael T. Halligan
Тема Re: [Tuning questions..]
Дата
Msg-id 3C21284F.5070407@echo.com
обсуждение исходный текст
Ответ на Re: [Tuning questions..]  (Bojan Belovic <bbelovic@usa.net>)
Список pgsql-admin
Thanks, that sped things up a bit, from 7.6 sec. to about 5.5 sec.  However
the plan still includes a sequential scan on ssa_candidate:

Aggregate  (cost=12161.11..12161.11 rows=1 width=35)
   -> Merge Join  (cost=11611.57..12111.12 rows=19996 width=35)
         -> Sort  (cost=11488.27..11488.27 rows=99805 width=24)
               -> Seq Scan on ssa_candidate sc  (cost=0.00..3201.05
rows=99805 width=24)
         -> Sort  (cost=123.30..123.30 rows=31 width=11)
               -> Index Scan using station_subgenre_pk on station_subgenre
ss  (cost=0.00..122.53 rows=31 width=11)


If we run the same query on Oracle (modified slightly for syntax):

SELECT count(DISTINCT song_id) AS X
FROM ssa_candidate SC,
  station_subgenre SS
WHERE SC.style_id = SS.style_id
   AND SS.station_id =

with the same data, it runs almost instaneously.  Oracle's plan uses a hash
join:

SELECT STATEMENT Optimizer=CHOOSE (Cost=63 Card=1 Bytes=15)
   SORT (GROUP BY)
     HASH JOIN (Cost=63 Card=8943 Bytes=134145)
       INDEX (RANGE SCAN) OF STATION_SUBGENRE_PK (UNIQUE) (Cost=3 Card=12
Bytes=84)
       TABLE ACCESS (FULL) OF SSA_CANDIDATE (Cost=59 Card=60364
Bytes=482912)

Is there some way to convince PostgreSQL to use a hash join?

Bojan Belovic wrote:

 >Not sure about tuning, but it seems to me that this query would be
much more
 >effective if it's rewritten like this (especially if style_id columns
on both
 >tables are indexed):
 >
 >SELECT count(DISTINCT song_id) AS X
 >FROM ssa_candidate SC JOIN station_subgenre SS ON SC.style_id =
SS.style_id
 >WHERE SS.station_id = 48
 >
 >Please correct me if I'm wrong. Also, Michael, if you give this one a try,
 >could you send me the query plan, I'm just curious.
 >
 >SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
 >WHERE
 >
 >SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
 >WHERE "Michael T. Halligan" <michael@echo.com> wrote:
 >
 >>Hi.. I seem to be running into a bottle neck on a query, and I'm not
 >>sure what the bottleneck is .
 >>The machine is a dual-processor p3 750 with 2 gigs of (pc100) memory,
 >>and 3 72 gig disks setup
 >>in raid 5. Right now i'm just testing our db for speed (we're porting
 >>from oracle) .. later on
 >>We're looking @ a quad xeon 700 with 16 gigs of ram & 10 drives in
 >>hardware raid 5.
 >>
 >>We've tuned the queries a bit, added some indices, and we got this query
 >>down from about 15 minutes
 >>to 7.6 seconds.. but it just seems like we should be able to get this
 >>query down to under a second on
 >>this box..  It's running the latest suse, with 2.4.16 kernel, reiserfs,
 >>postgres 7.2b3. I've tried many different combinations
 >>of buffers, stat collection space, sort space, etc. none of them really
 >>effect performance..
 >>
 >>When I run this particular query, the only resource that seems to change
 >>is one of the processors gets up to
 >>about 99% usage.. I've tried setting postgres to use up to 1.6 gigs of
 >>memory, but the postmaster never seems
 >>to get above about 700megs.. it's not swapping at all, though the
 >>contact switching seems to get a bit high (peaking
 >>at 150) ..
 >>
 >>The query sorts through about 80k rows.. here's the query
 >>--------------------------------------------------
 >>SELECT count(*) FROM (
 >>                  SELECT DISTINCT song_id FROM ssa_candidate WHERE
 >>style_id IN (
 >>                             SELECT style_id FROM station_subgenre WHERE
 >>station_id = 48
 >>                                            )
 >>                            ) AS X;
 >>--------------------------------------------------
 >>and the query plan :
 >>--------------------------------------------------
 >>NOTICE:  QUERY PLAN:
 >>
 >>Aggregate  (cost=12236300.87..12236300.87 rows=1 width=13)
 >>  ->  Subquery Scan x  (cost=12236163.64..12236288.40 rows=4990 width=13)
 >>        ->  Unique  (cost=12236163.64..12236288.40 rows=4990
width=13)SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
 >>WHERE
 >>              ->  Sort  (cost=12236163.64..12236163.64 rows=49902
width=13)
 >>                    ->  Seq Scan on ssa_candidate
 >>(cost=0.00..12232269.54 rows=49902 width=13)
 >>                          SubPlan
 >>                            ->  Materialize  (cost=122.53..122.53
 >>rows=31 width=11)
 >>                                  ->  Index Scan using
 >>station_subgenre_pk on station_subgenre  (cost=0.00..122.53 rows=31
 >>width=11)
 >>
 >>EXPLAIN
 >>--------------------------------------------------
 >>
 >>
 >>If anybody has any ideas, I'd be really appreciative..
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>---------------------------(end of broadcast)---------------------------
 >>TIP 2: you can get off all lists at once with the unregister command
 >>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 >>
 >
 >
 >____________________________________________________________________
 >Get free e-mail and a permanent address at http://www.amexmail.com/?A=1
 >
 >---------------------------(end of broadcast)---------------------------
 >TIP 3: if posting/reading through Usenet, please send an appropriate
 >subscribe-nomail command to majordomo@postgresql.org so that your
 >message can get through to the mailing list cleanly
 >





В списке pgsql-admin по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Tuning questions..
Следующее
От: Bojan Belovic
Дата:
Сообщение: Re: Tuning questions..