Обсуждение: Query running slower than same on Oracle

Поиск
Список
Период
Сортировка

Query running slower than same on Oracle

От
"Sailer, Denis (YBUSA-CDR)"
Дата:
We are evaluating PostgreSQL for a typical data warehouse application.  I
have 3 tables below that are part of a Star schema design.  The query listed
below runs in 16 seconds on Oracle 9.2 and 3+ minutes on PostgreSQL 7.3.3
Here are the details.

I'm wondering what else can be done to tune this type of query.  Is 3
minutes reasonable given the amount of data that is loaded into the 3
tables?  Is there anyone else who has made comparisons between Oracle and
PostgreSQL?

----------------------------------------------------------------------------
------------------------------------------------------------------------
Oracle 9.2 is running on a windows/2000 server, 600MHz PIII, 512MB ram
    Shared Pool    48MB
    Buffer Cache    98MB
    Large Pool      8MB
    Java Pool    32MB
            =========
    Total SGA    186MB

----------------------------------------------------------------------------
------------------------------------------------------------------------
PostgreSQL is running on Redhat Linux 7.2, 733MHz PIII processor, 383MB ram.
    shared_buffers = 12384        (96 MB)
    sort_mem = 16384

----------------------------------------------------------------------------
------------------------------------------------------------------------

explain analyze
select fiscalyearquarter, description, sum(amount_quantity)
from time t, revenue r, statistic s
Where t.fiscalyear = 2002
and r.timekey = t.timekey
and r.statisticskey = s.statisticskey
group by  fiscalyearquarter, description;

                                                                     QUERY
PLAN
----------------------------------------------------------------------------
------------------------------------------------------------------------
 Aggregate  (cost=124685.74..127078.87 rows=23931 width=48) (actual
time=170682.53..189640.85 rows=8 loops=1)
   ->  Group  (cost=124685.74..126480.59 rows=239313 width=48) (actual
time=169508.49..185478.90 rows=1082454 loops=1)
         ->  Sort  (cost=124685.74..125284.02 rows=239313 width=48) (actual
time=169508.47..171853.03 rows=1082454 loops=1)
               Sort Key: t.fiscalyearquarter, s.description
               ->  Hash Join  (cost=6.46..94784.90 rows=239313 width=48)
(actual time=140.20..47685.46 rows=1082454 loops=1)
                     Hash Cond: ("outer".statisticskey =
"inner".statisticskey)
                     ->  Hash Join  (cost=5.43..90595.90 rows=239313
width=32) (actual time=139.96..39672.76 rows=1082454 loops=1)
                           Hash Cond: ("outer".timekey = "inner".timekey)
                           ->  Seq Scan on revenue r  (cost=0.00..68454.04
rows=3829004 width=17) (actual time=0.01..26336.95 rows=3829004 loops=1)
                           ->  Hash  (cost=5.40..5.40 rows=12 width=15)
(actual time=0.79..0.79 rows=0 loops=1)
                                 ->  Seq Scan on "time" t  (cost=0.00..5.40
rows=12 width=15) (actual time=0.36..0.75 rows=12 loops=1)
                                       Filter: (fiscalyear = 2002::numeric)
                     ->  Hash  (cost=1.02..1.02 rows=2 width=16) (actual
time=0.04..0.04 rows=0 loops=1)
                           ->  Seq Scan on statistic s  (cost=0.00..1.02
rows=2 width=16) (actual time=0.02..0.03 rows=2 loops=1)
 Total runtime: 195409.79 msec


This gives you an idea of the size of each table in the query
----------------------------------------------------------------------------
------------------------------------------------------------------------

pubnet=# vacuum analyze verbose revenue;
INFO:  --Relation dw.revenue--
INFO:  Pages 30164: Changed 0, Empty 0; Tup 3829004: Vac 0, Keep 0, UnUsed
17.
        Total CPU 1.87s/0.73u sec elapsed 9.97 sec.
INFO:  Analyzing dw.revenue
VACUUM
pubnet=# vacuum analyze verbose statistic;
INFO:  --Relation dw.statistic--
INFO:  Pages 1: Changed 0, Empty 0; Tup 2: Vac 0, Keep 0, UnUsed 1.
        Total CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  Analyzing dw.statistic
VACUUM
pubnet=# vacuum analyze verbose time;
INFO:  --Relation dw.time--
INFO:  Pages 3: Changed 0, Empty 0; Tup 192: Vac 0, Keep 0, UnUsed 33.
        Total CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO:  Analyzing dw.time
VACUUM
pubnet=#


I tried to disable the use of hash join to see what might happen.  This
causes the optimizer to use a merge join.  The timings are worse.

Here is the plan for that


QUERY PLAN

----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
 Aggregate  (cost=665570.44..667963.57 rows=23931 width=48) (actual
time=362121.97..381081.18 rows=8 loops=1)
   ->  Group  (cost=665570.44..667365.29 rows=239313 width=48) (actual
time=360948.51..376904.14 rows=1082454 loops=1)
         ->  Sort  (cost=665570.44..666168.72 rows=239313 width=48) (actual
time=360948.48..363285.85 rows=1082454 loops=1)
               Sort Key: t.fiscalyearquarter, s.description
               ->  Merge Join  (cost=631481.61..635669.60 rows=239313
width=48) (actual time=263257.77..276625.27 rows=1082454 loops=1)
                     Merge Cond: ("outer".statisticskey =
"inner".statisticskey)
                     ->  Sort  (cost=631480.58..632078.86 rows=239313
width=32) (actual time=260561.38..264151.04 rows=1082454 loops=1)
                           Sort Key: r.statisticskey
                           ->  Merge Join  (cost=587963.25..610099.74
rows=239313 width=32) (actual time=217380.88..231958.36 rows=1082454
loops=1)
                                 Merge Cond: ("outer".timekey =
"inner".timekey)
                                 ->  Sort  (cost=5.62..5.65 rows=12
width=15) (actual time=14.90..14.92 rows=12 loops=1)
                                       Sort Key: t.timekey
                                       ->  Seq Scan on "time" t
(cost=0.00..5.40 rows=12 width=15) (actual time=13.47..14.83 rows=12
loops=1)
                                             Filter: (fiscalyear =
2002::numeric)
                                 ->  Sort  (cost=587957.63..597530.14
rows=3829004 width=17) (actual time=214776.92..224634.94 rows=1455997
loops=1)
                                       Sort Key: r.timekey
                                       ->  Seq Scan on revenue r
(cost=0.00..68454.04 rows=3829004 width=17) (actual time=1.33..31014.95
rows=3829004 loops=1)
                     ->  Sort  (cost=1.03..1.03 rows=2 width=16) (actual
time=2696.35..3765.93 rows=541228 loops=1)
                           Sort Key: s.statisticskey
                           ->  Seq Scan on statistic s  (cost=0.00..1.02
rows=2 width=16) (actual time=19.50..19.52 rows=2 loops=1)
 Total runtime: 385939.85 msec


The Query plan in Oracle looks like this...
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
SORT GROUP BY
    HASH JOIN
        MERGE JOIN CARTESIAN
            TABLE ACCESS FULL            DBA_ADMIN
STATISTIC
            BUFFER SORT
                TABLE ACCESS FULL        DBA_ADMIN
TIME
        TABLE ACCESS FULL                DBA_ADMIN
REVENUE


Re: Query running slower than same on Oracle

От
Tom Lane
Дата:
"Sailer, Denis (YBUSA-CDR)" <Denis.Sailer@Yellowbook.com> writes:
> We are evaluating PostgreSQL for a typical data warehouse application.  I
> have 3 tables below that are part of a Star schema design.  The query listed
> below runs in 16 seconds on Oracle 9.2 and 3+ minutes on PostgreSQL 7.3.3
> Here are the details.

The majority of the runtime seems to be going into the sort step.  There
is not much to be done about this in 7.3, but 7.4 should use a hashed
aggregation approach for this query, which'd eliminate the sort step and
hopefully reduce the time a great deal.  Since you're only doing
evaluation at this point, it might be worth your while to try out CVS
tip ...

            regards, tom lane

Re: Query running slower than same on Oracle

От
Josh Berkus
Дата:
Denis,

> I'm wondering what else can be done to tune this type of query.  Is 3
> minutes reasonable given the amount of data that is loaded into the 3
> tables?  Is there anyone else who has made comparisons between Oracle and
> PostgreSQL?

We will probably be a bit slower on aggregates than Oracle is, for reasons
discussed on this list ad nauseum.   However, it also looks from the queries
like you forgot to index your foriegn keys.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco