Обсуждение: Re: [Tuning questions..]

Поиск
Список
Период
Сортировка

Re: [Tuning questions..]

От
Bojan Belovic
Дата:
Not sure about tuning, but it seems to me that this query would be much more
effective if it's rewritten like this (especially if style_id columns on both
tables are indexed):

SELECT count(DISTINCT song_id) AS X
FROM ssa_candidate SC JOIN station_subgenre SS ON SC.style_id = SS.style_id
WHERE SS.station_id = 48

Please correct me if I'm wrong. Also, Michael, if you give this one a try,
could you send me the query plan, I'm just curious.



"Michael T. Halligan" <michael@echo.com> wrote:
> Hi.. I seem to be running into a bottle neck on a query, and I'm not
> sure what the bottleneck is .
> The machine is a dual-processor p3 750 with 2 gigs of (pc100) memory,
> and 3 72 gig disks setup
> in raid 5. Right now i'm just testing our db for speed (we're porting
> from oracle) .. later on
> We're looking @ a quad xeon 700 with 16 gigs of ram & 10 drives in
> hardware raid 5.
>
> We've tuned the queries a bit, added some indices, and we got this query
> down from about 15 minutes
> to 7.6 seconds.. but it just seems like we should be able to get this
> query down to under a second on
> this box..  It's running the latest suse, with 2.4.16 kernel, reiserfs,
> postgres 7.2b3. I've tried many different combinations
> of buffers, stat collection space, sort space, etc. none of them really
> effect performance..
>
> When I run this particular query, the only resource that seems to change
> is one of the processors gets up to
> about 99% usage.. I've tried setting postgres to use up to 1.6 gigs of
> memory, but the postmaster never seems
> to get above about 700megs.. it's not swapping at all, though the
> contact switching seems to get a bit high (peaking
> at 150) ..
>
> The query sorts through about 80k rows.. here's the query
> --------------------------------------------------
> SELECT count(*) FROM (
>                   SELECT DISTINCT song_id FROM ssa_candidate WHERE
> style_id IN (
>                              SELECT style_id FROM station_subgenre WHERE
> station_id = 48
>                                             )
>                             ) AS X;
> --------------------------------------------------
> and the query plan :
> --------------------------------------------------
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=12236300.87..12236300.87 rows=1 width=13)
>   ->  Subquery Scan x  (cost=12236163.64..12236288.40 rows=4990 width=13)
>         ->  Unique  (cost=12236163.64..12236288.40 rows=4990 width=13)
>               ->  Sort  (cost=12236163.64..12236163.64 rows=49902 width=13)
>                     ->  Seq Scan on ssa_candidate
> (cost=0.00..12232269.54 rows=49902 width=13)
>                           SubPlan
>                             ->  Materialize  (cost=122.53..122.53
> rows=31 width=11)
>                                   ->  Index Scan using
> station_subgenre_pk on station_subgenre  (cost=0.00..122.53 rows=31
> width=11)
>
> EXPLAIN
> --------------------------------------------------
>
>
> If anybody has any ideas, I'd be really appreciative..
>
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


____________________________________________________________________
Get free e-mail and a permanent address at http://www.amexmail.com/?A=1

Re: [Tuning questions..]

От
"Michael T. Halligan"
Дата:
Thanks, that sped things up a bit, from 7.6 sec. to about 5.5 sec.  However
the plan still includes a sequential scan on ssa_candidate:

Aggregate  (cost=12161.11..12161.11 rows=1 width=35)
   -> Merge Join  (cost=11611.57..12111.12 rows=19996 width=35)
         -> Sort  (cost=11488.27..11488.27 rows=99805 width=24)
               -> Seq Scan on ssa_candidate sc  (cost=0.00..3201.05
rows=99805 width=24)
         -> Sort  (cost=123.30..123.30 rows=31 width=11)
               -> Index Scan using station_subgenre_pk on station_subgenre
ss  (cost=0.00..122.53 rows=31 width=11)


If we run the same query on Oracle (modified slightly for syntax):

SELECT count(DISTINCT song_id) AS X
FROM ssa_candidate SC,
  station_subgenre SS
WHERE SC.style_id = SS.style_id
   AND SS.station_id =

with the same data, it runs almost instaneously.  Oracle's plan uses a hash
join:

SELECT STATEMENT Optimizer=CHOOSE (Cost=63 Card=1 Bytes=15)
   SORT (GROUP BY)
     HASH JOIN (Cost=63 Card=8943 Bytes=134145)
       INDEX (RANGE SCAN) OF STATION_SUBGENRE_PK (UNIQUE) (Cost=3 Card=12
Bytes=84)
       TABLE ACCESS (FULL) OF SSA_CANDIDATE (Cost=59 Card=60364
Bytes=482912)

Is there some way to convince PostgreSQL to use a hash join?

Bojan Belovic wrote:

 >Not sure about tuning, but it seems to me that this query would be
much more
 >effective if it's rewritten like this (especially if style_id columns
on both
 >tables are indexed):
 >
 >SELECT count(DISTINCT song_id) AS X
 >FROM ssa_candidate SC JOIN station_subgenre SS ON SC.style_id =
SS.style_id
 >WHERE SS.station_id = 48
 >
 >Please correct me if I'm wrong. Also, Michael, if you give this one a try,
 >could you send me the query plan, I'm just curious.
 >
 >SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
 >WHERE
 >
 >SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
 >WHERE "Michael T. Halligan" <michael@echo.com> wrote:
 >
 >>Hi.. I seem to be running into a bottle neck on a query, and I'm not
 >>sure what the bottleneck is .
 >>The machine is a dual-processor p3 750 with 2 gigs of (pc100) memory,
 >>and 3 72 gig disks setup
 >>in raid 5. Right now i'm just testing our db for speed (we're porting
 >>from oracle) .. later on
 >>We're looking @ a quad xeon 700 with 16 gigs of ram & 10 drives in
 >>hardware raid 5.
 >>
 >>We've tuned the queries a bit, added some indices, and we got this query
 >>down from about 15 minutes
 >>to 7.6 seconds.. but it just seems like we should be able to get this
 >>query down to under a second on
 >>this box..  It's running the latest suse, with 2.4.16 kernel, reiserfs,
 >>postgres 7.2b3. I've tried many different combinations
 >>of buffers, stat collection space, sort space, etc. none of them really
 >>effect performance..
 >>
 >>When I run this particular query, the only resource that seems to change
 >>is one of the processors gets up to
 >>about 99% usage.. I've tried setting postgres to use up to 1.6 gigs of
 >>memory, but the postmaster never seems
 >>to get above about 700megs.. it's not swapping at all, though the
 >>contact switching seems to get a bit high (peaking
 >>at 150) ..
 >>
 >>The query sorts through about 80k rows.. here's the query
 >>--------------------------------------------------
 >>SELECT count(*) FROM (
 >>                  SELECT DISTINCT song_id FROM ssa_candidate WHERE
 >>style_id IN (
 >>                             SELECT style_id FROM station_subgenre WHERE
 >>station_id = 48
 >>                                            )
 >>                            ) AS X;
 >>--------------------------------------------------
 >>and the query plan :
 >>--------------------------------------------------
 >>NOTICE:  QUERY PLAN:
 >>
 >>Aggregate  (cost=12236300.87..12236300.87 rows=1 width=13)
 >>  ->  Subquery Scan x  (cost=12236163.64..12236288.40 rows=4990 width=13)
 >>        ->  Unique  (cost=12236163.64..12236288.40 rows=4990
width=13)SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
 >>WHERE
 >>              ->  Sort  (cost=12236163.64..12236163.64 rows=49902
width=13)
 >>                    ->  Seq Scan on ssa_candidate
 >>(cost=0.00..12232269.54 rows=49902 width=13)
 >>                          SubPlan
 >>                            ->  Materialize  (cost=122.53..122.53
 >>rows=31 width=11)
 >>                                  ->  Index Scan using
 >>station_subgenre_pk on station_subgenre  (cost=0.00..122.53 rows=31
 >>width=11)
 >>
 >>EXPLAIN
 >>--------------------------------------------------
 >>
 >>
 >>If anybody has any ideas, I'd be really appreciative..
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>---------------------------(end of broadcast)---------------------------
 >>TIP 2: you can get off all lists at once with the unregister command
 >>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 >>
 >
 >
 >____________________________________________________________________
 >Get free e-mail and a permanent address at http://www.amexmail.com/?A=1
 >
 >---------------------------(end of broadcast)---------------------------
 >TIP 3: if posting/reading through Usenet, please send an appropriate
 >subscribe-nomail command to majordomo@postgresql.org so that your
 >message can get through to the mailing list cleanly
 >