Обсуждение: Major differences between oracle and postgres performance - what can I do ?

От:
Gary Cowell
Дата:

I'm trying to migrate an application from an Oracle
backend to PostgreSQL and have a performance question.

The hardware for the database is the same, a SunFire
v120, 2x73GB U2W SCSI  disks, 1GB RAM, 650MHz US-IIe
CPU. Running Solaris 8.

The table in question has 541741 rows. Under Oracle,
the query ' select distinct version from vers where
version is not null '  returns 534 rows in 6.14
seconds, with an execution plan showing a table scan
of vers followed by a sort.

The explain output on postgres shows the same
execution with a scan on vers and a sort but the query
time is 78.6 seconds.

The explain output from PostgreSQL is:
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Unique  (cost=117865.77..120574.48 rows=142
width=132)
   ->  Sort  (cost=117865.77..119220.13 rows=541741
width=132)
         Sort Key: "version"
         ->  Seq Scan on vers  (cost=0.00..21367.41
rows=541741 width=132)
               Filter: ("version" IS NOT NULL)

I do have an index on the column in question but
neither oracle nor postgresql choose to use it (which
given that we're visiting all rows is perhaps not
surprising).

I'm not as familiar with postgresql as I am with
Oracle but I think I've configured comparible
buffering and sort area sizes, certainly there isn't
much physical IO going on in either case.

What can I do to speed up this query? Other queries
are slightly slower than under Oracle on the same
hardware but nothing like this.

Thanks!

G





___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to
expressyourself http://uk.messenger.yahoo.com 

От:
Richard Huxton
Дата:

Gary Cowell wrote:
>
> I'm not as familiar with postgresql as I am with
> Oracle but I think I've configured comparible
> buffering and sort area sizes, certainly there isn't
> much physical IO going on in either case.

People are going to want to know:
1. version of PG
2. explain analyse output, rather than just explain
3. What values you've used for the postgresql.conf file

The actual plan from explain analyse isn't going to be much use - as you
say, a scan of the whole table followed by sorting is the best you'll
get. However, the actual costs of these steps might say something useful.

--
   Richard Huxton
   Archonet Ltd

От:
Paul Thomas
Дата:

On 18/06/2004 12:31 Gary Cowell wrote:
> [snip]
> I'm not as familiar with postgresql as I am with
> Oracle but I think I've configured comparible
> buffering and sort area sizes, certainly there isn't
> much physical IO going on in either case.
>
> What can I do to speed up this query? Other queries
> are slightly slower than under Oracle on the same
> hardware but nothing like this.

Usual questions:

have you vacuumed the table recently?
what are your postgresql.conf settings?
can you show us explain ANALYZE output rather than just explain output?

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for
Business             |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+

От:
Jeff
Дата:

On Jun 18, 2004, at 7:31 AM, Gary Cowell wrote:

> The explain output on postgres shows the same
> execution with a scan on vers and a sort but the query
> time is 78.6 seconds.
>

Does it run just as slow if you run it again?
It could be a case of the caches being empty

> Oracle but I think I've configured comparible
> buffering and sort area sizes, certainly there isn't
> much physical IO going on in either case.
>

Configuring PG like Oracle isn't the best thing in the world.  The
general PG philosophy is to let the OS do all the caching & buffering
- this is reversed in the Oracle world.  In 7.4 the rule of thumb is no
more than 10k shared_buffers.. beyond that the overhead of maintaining
it becomes excessive.  (This isn't really the case in 7.5)

Curiously, what are your sort_mem and shared_buffers settings?


--
Jeff Trout <>
http://www.jefftrout.com/
http://www.stuarthamm.net/


От:
Shridhar Daithankar
Дата:

Gary Cowell wrote:
> The explain output on postgres shows the same
> execution with a scan on vers and a sort but the query
> time is 78.6 seconds.
>
> The explain output from PostgreSQL is:
>                                    QUERY PLAN
> ---------------------------------------------------------------------------------
>  Unique  (cost=117865.77..120574.48 rows=142
> width=132)
>    ->  Sort  (cost=117865.77..119220.13 rows=541741
> width=132)
>          Sort Key: "version"
>          ->  Seq Scan on vers  (cost=0.00..21367.41
> rows=541741 width=132)
>                Filter: ("version" IS NOT NULL)
>
> I do have an index on the column in question but
> neither oracle nor postgresql choose to use it (which
> given that we're visiting all rows is perhaps not
> surprising).

Can you post explain analyze for the same query? It contains actual numbers
alond side the chosen plan.

>
> I'm not as familiar with postgresql as I am with
> Oracle but I think I've configured comparible
> buffering and sort area sizes, certainly there isn't
> much physical IO going on in either case.

Well, for postgresql you should check out

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

HTH

  Shridhar

От:
Gary Cowell
Дата:

---  wrote: > You can roughly estimate time
spent for just scaning
> the table using
> something like this:
>
>     select sum(version) from ... where version is not
> null
>
>     and just
>
>     select sum(version) from ...
>
> The results would be interesting to compare.

To answer (I hope) everyones questions at once:

1) Oracle and postmaster were not running at the same
time
2) The queries were run once, to cache as much as
possible then run again to get the timing

3) Distinct vs. no distinct (i.e. sort performance).

select length(version) from vers where version is not
null;

Time: 9748.174 ms

select distinct(version) from vers where version is
not null;

Time: 67988.972 ms

So about an extra 60 seconds with the distinct on.

Here is the explain analyze output from psql:

# explain analyze select distinct version from vers
where version is not null;

     QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=117865.77..120574.48 rows=142
width=132) (actual time=63623.428..68269.111 rows=536
loops=1)
   ->  Sort  (cost=117865.77..119220.13 rows=541741
width=132) (actual time=63623.417..66127.641
rows=541741 loops=1)
         Sort Key: "version"
         ->  Seq Scan on vers  (cost=0.00..21367.41
rows=541741 width=132) (actual time=0.218..7214.903
rows=541741 loops=1)
               Filter: ("version" IS NOT NULL)
 Total runtime: 68324.215 ms
(6 rows)

Time: 68326.062 ms


And the non-default .conf parameters:

tcpip_socket = true
max_connections = 100
password_encryption = true
shared_buffers = 2000
sort_mem = 16384
vacuum_mem = 8192
effective_cache_size = 4000
syslog = 2

postgresql version is 7.4.3
compiled with GCC 3.3.2 on sun4u architecture.








___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to
expressyourself http://uk.messenger.yahoo.com 

От:
pginfo
Дата:

Hi ,
I have similare problem and found that the problem is by pg sort.
It is extremly slow by me.

Also in my case I tryed to migrate one db from oracle to pg .

To solve this problem I dinamicaly set sort_mem to some big value.
In this case the sort is working into RAM and is relative fast.
You can try this and remember sort mem is per sort, not per connection.

In my migration I found the only advantage for oracle is the very good sort.

regards,
ivan.

Gary Cowell wrote:
---  wrote: > You can roughly estimate time
spent for just scaning 
the table using
something like this: 
select sum(version) from ... where version is not
null
and just 
select sum(version) from ...

The results would be interesting to compare.    
To answer (I hope) everyones questions at once:

1) Oracle and postmaster were not running at the same
time
2) The queries were run once, to cache as much as
possible then run again to get the timing

3) Distinct vs. no distinct (i.e. sort performance).

select length(version) from vers where version is not
null;

Time: 9748.174 ms

select distinct(version) from vers where version is
not null;

Time: 67988.972 ms

So about an extra 60 seconds with the distinct on.

Here is the explain analyze output from psql:

# explain analyze select distinct version from vers
where version is not null;                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------Unique  (cost=117865.77..120574.48 rows=142
width=132) (actual time=63623.428..68269.111 rows=536
loops=1)  ->  Sort  (cost=117865.77..119220.13 rows=541741
width=132) (actual time=63623.417..66127.641
rows=541741 loops=1)        Sort Key: "version"        ->  Seq Scan on vers  (cost=0.00..21367.41
rows=541741 width=132) (actual time=0.218..7214.903
rows=541741 loops=1)              Filter: ("version" IS NOT NULL)Total runtime: 68324.215 ms
(6 rows)

Time: 68326.062 ms


And the non-default .conf parameters:

tcpip_socket = true
max_connections = 100
password_encryption = true
shared_buffers = 2000
sort_mem = 16384     
vacuum_mem = 8192    
effective_cache_size = 4000
syslog = 2                 

postgresql version is 7.4.3
compiled with GCC 3.3.2 on sun4u architecture.




	
___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to express yourself http://uk.messenger.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command   (send "unregister YourEmailAddressHere" to )

 

От:
Richard Huxton
Дата:

Gary Cowell wrote:
> ---  wrote: > You can roughly estimate time
> spent for just scaning
>
>>the table using
>>something like this:
>>
>>    select sum(version) from ... where version is not
>>null
>>
>>    and just
>>
>>    select sum(version) from ...
>>
>>The results would be interesting to compare.
>
>
> To answer (I hope) everyones questions at once:
>
> 1) Oracle and postmaster were not running at the same
> time
> 2) The queries were run once, to cache as much as
> possible then run again to get the timing
>
> 3) Distinct vs. no distinct (i.e. sort performance).
>
> select length(version) from vers where version is not
> null;
>
> Time: 9748.174 ms
>
> select distinct(version) from vers where version is
> not null;
>
> Time: 67988.972 ms
>
> So about an extra 60 seconds with the distinct on.

Which is basically the sorting time...

> Here is the explain analyze output from psql:
>
> # explain analyze select distinct version from vers
> where version is not null;
>
>      QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>  Unique  (cost=117865.77..120574.48 rows=142
> width=132) (actual time=63623.428..68269.111 rows=536
> loops=1)
>    ->  Sort  (cost=117865.77..119220.13 rows=541741
> width=132) (actual time=63623.417..66127.641
> rows=541741 loops=1)
>          Sort Key: "version"
>          ->  Seq Scan on vers  (cost=0.00..21367.41
> rows=541741 width=132) (actual time=0.218..7214.903
> rows=541741 loops=1)
>                Filter: ("version" IS NOT NULL)
>  Total runtime: 68324.215 ms
> (6 rows)
>
> Time: 68326.062 ms

Yep - the seq-scan takes 7214.903 ms, there's a huge setup time for the
sort (63623.417) and it's not finished until 66127.641ms have elapsed.

>
> And the non-default .conf parameters:
>
> tcpip_socket = true
> max_connections = 100
> password_encryption = true
> shared_buffers = 2000
> sort_mem = 16384
> vacuum_mem = 8192
> effective_cache_size = 4000
> syslog = 2

Well, I'd probably up vacuum_mem, and check how much RAM is being used
for disk cache - I'd guess it's more than 32MB (4000 * 8kb).

You might want to up the shared_buffers, but that's going to depend on
the load.

Try increasing sort_mem temporarily, and see if that makes a difference:
   SET sort_mem = 64000;
   EXPLAIN ANALYSE ...
The only thing I can think is that you're getting disk activity to get a
sort that slow. I'd be expecting a hash-sort if PG thought it could fit
the distinct values in memory.

--
   Richard Huxton
   Archonet Ltd

От:
Tom Lane
Дата:

=?iso-8859-1?q?Gary=20Cowell?= <> writes:
>    ->  Sort  (cost=117865.77..119220.13 rows=541741
> width=132) (actual time=63623.417..66127.641
> rows=541741 loops=1)

This is clearly where the time is going.

> sort_mem = 16384

Probably not enough for this problem.  The estimated data size is
upwards of 60 meg (132 bytes * half a mil rows); allowing for per-row
overhead I suspect that you'd need sort_mem approaching 100 meg for
a fully-in-memory sort.  (Also I'd take the width=132 with a *big*
grain of salt, unless you have reason to know that it's accurate.)

The on-disk sorting algorithm that we use is designed to favor minimum
disk space consumption over speed.  It has a fairly nonrandom access
pattern that can be pretty slow if your disks don't have good seek-time
specs.

I don't know whether Oracle's performance advantage is because they're
not swapping the sort to disk at all, or because they use a different
on-disk sort method with a more sequential access pattern.

[... thinks for awhile ...]  It seems possible that they may use sort
code that knows it is performing a DISTINCT operation and discards
duplicates on sight.  Given that there are only 534 distinct values,
the sort would easily stay in memory if that were happening.

It would be interesting to compare Oracle and PG times for a straight
sort of half a million rows, without the DISTINCT part; that would
give us a clue whether they simply have much better sort technology,
or whether they have a special optimization for sort+unique.

            regards, tom lane

От:
pginfo
Дата:

Hi,

Tom Lane wrote:
Gary Cowell <> writes: 
   ->  Sort  (cost=117865.77..119220.13 rows=541741
width=132) (actual time=63623.417..66127.641
rows=541741 loops=1)   
This is clearly where the time is going.
 
sort_mem = 16384        
Probably not enough for this problem.  The estimated data size is
upwards of 60 meg (132 bytes * half a mil rows); allowing for per-row
overhead I suspect that you'd need sort_mem approaching 100 meg for
a fully-in-memory sort.  (Also I'd take the width=132 with a *big*
grain of salt, unless you have reason to know that it's accurate.)

The on-disk sorting algorithm that we use is designed to favor minimum
disk space consumption over speed.  It has a fairly nonrandom access
pattern that can be pretty slow if your disks don't have good seek-time
specs.

I don't know whether Oracle's performance advantage is because they're
not swapping the sort to disk at all, or because they use a different
on-disk sort method with a more sequential access pattern.

[... thinks for awhile ...]  It seems possible that they may use sort
code that knows it is performing a DISTINCT operation and discards
duplicates on sight.  Given that there are only 534 distinct values,
the sort would easily stay in memory if that were happening.

It would be interesting to compare Oracle and PG times for a straight
sort of half a million rows, without the DISTINCT part; that would
give us a clue whether they simply have much better sort technology,
or whether they have a special optimization for sort+unique. 
I was tested this situation and found that oracle is working also in this case much faster (in some cases x10 ) compared to pg.
Also by in memory sort oracle is faster but the diferenc is not so big.
So I have oracle 8 and oracle 10 (also pg - it is my primary platform) installed and can run some tests.
I am ready to help in this direction or if you can send any example I will run it and post the result .

regards,
ivan.
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

 

От:
Gary Cowell
Дата:

> Try increasing sort_mem temporarily, and see if that
> makes a difference:
>    SET sort_mem = 64000;
>    EXPLAIN ANALYSE ...

I did this (actualy 65536) and got the following:
pvcsdb=# explain analyze select distinct version from
vers where version is not null;

      QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=117865.77..120574.48 rows=142
width=132) (actual time=81595.178..86573.228 rows=536
loops=1)
   ->  Sort  (cost=117865.77..119220.13 rows=541741
width=132) (actual time=81595.169..84412.069
rows=541741 loops=1)
         Sort Key: "version"
         ->  Seq Scan on vers  (cost=0.00..21367.41
rows=541741 width=132) (actual time=10.068..7397.374
rows=541741 loops=1)
               Filter: ("version" IS NOT NULL)
 Total runtime: 86647.495 ms
(6 rows)


In response to Tom Lane, I have compared a
select/order by on the same data in Oracle and PG to
see if this changes things:


PG:     Time: 67438.536 ms   541741 rows
Oracle: After an hour and a half I canned it

So it seems the idea that oracle is dropping duplicate
rows prior to the sort when using distinct may indeed
be the case.

From what I've seen here, it seems that PGs on-disk
sort performance is exceeding that of Oracle - it's
just that oracle sorts fewer rows for distinct.






___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to
expressyourself http://uk.messenger.yahoo.com 

От:
Stephen Frost
Дата:

* Tom Lane () wrote:
> [... thinks for awhile ...]  It seems possible that they may use sort
> code that knows it is performing a DISTINCT operation and discards
> duplicates on sight.  Given that there are only 534 distinct values,
> the sort would easily stay in memory if that were happening.

Could this optimization be added to PostgreSQL?  It sounds like a very
reasonable thing to do.  Hopefully there wouldn't be too much complexity
needed to add it.

    Stephen

От:
Tom Lane
Дата:

Stephen Frost <> writes:
> * Tom Lane () wrote:
>> [... thinks for awhile ...]  It seems possible that they may use sort
>> code that knows it is performing a DISTINCT operation and discards
>> duplicates on sight.  Given that there are only 534 distinct values,
>> the sort would easily stay in memory if that were happening.

> Could this optimization be added to PostgreSQL?  It sounds like a very
> reasonable thing to do.

That's what I was wondering about too.  But first I'd like to get
some kind of reading on how effective it would be.  If someone can
demonstrate that Oracle can do sort-and-drop-dups a lot faster than
it can do a straight sort of the same amount of input data, that
would be a strong indication that it's worth doing.  At this point
we don't know if that's the source of their win or not.

            regards, tom lane

От:
Stephen Frost
Дата:

* Tom Lane () wrote:
> Stephen Frost <> writes:
> > * Tom Lane () wrote:
> >> [... thinks for awhile ...]  It seems possible that they may use sort
> >> code that knows it is performing a DISTINCT operation and discards
> >> duplicates on sight.  Given that there are only 534 distinct values,
> >> the sort would easily stay in memory if that were happening.
>
> > Could this optimization be added to PostgreSQL?  It sounds like a very
> > reasonable thing to do.
>
> That's what I was wondering about too.  But first I'd like to get
> some kind of reading on how effective it would be.  If someone can
> demonstrate that Oracle can do sort-and-drop-dups a lot faster than
> it can do a straight sort of the same amount of input data, that
> would be a strong indication that it's worth doing.  At this point
> we don't know if that's the source of their win or not.

Alright, I did a couple tests, these are different systems with
different hardware, but in the end I think the difference is clear:

tsf=# explain analyze select distinct access_type_id from p_gen_dom_dedicated_swc_access ;
                                                                       QUERY PLAN
                                                               

--------------------------------------------------------------------------------------------------------------------------------------------------------
                              
 Unique  (cost=321591.00..333205.56 rows=16 width=10) (actual time=32891.141..37420.429 rows=16 loops=1)
   ->  Sort  (cost=321591.00..327398.28 rows=2322912 width=10) (actual time=32891.137..35234.810 rows=2322912 loops=1)
         Sort Key: access_type_id
         ->  Seq Scan on p_gen_dom_dedicated_swc_access  (cost=0.00..55492.12 rows=2322912 width=10) (actual
time=0.013..3743.470rows=2322912 loops=1) 
 Total runtime: 37587.519 ms
(5 rows)

tsf=# explain analyze select access_type_id from p_gen_dom_dedicated_swc_access order by access_type_id;
                                                                    QUERY PLAN
                          

--------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=321591.00..327398.28 rows=2322912 width=10) (actual time=32926.696..35278.847 rows=2322912 loops=1)
   Sort Key: access_type_id
   ->  Seq Scan on p_gen_dom_dedicated_swc_access  (cost=0.00..55492.12 rows=2322912 width=10) (actual
time=0.014..3753.443rows=2322912 loops=1) 
 Total runtime: 36737.628 ms
(4 rows)

So, about the same from postgres in each case.  From Oracle:

(select access_type_id from p_gen_dom_dedicated_swc_access order by access_type_id)
sauron:/home/sfrost> time sqlplus mci_vendor/mci @test.sql > /dev/null

real    3m55.12s
user    2m25.87s
sys     0m10.59s

(select distinct access_type_id from p_gen_dom_dedicated_swc_access)
sauron:/home/sfrost> time sqlplus mci_vendor/mci @test.sql > /dev/null

real    0m5.08s
user    0m3.86s
sys     0m0.95s

All the queries were run multiple times, though there wasn't all that
much difference in the times.  Both systems are pretty speedy, but I
tend to feel the Postgres box is faster in CPU/disk access time, which
is probably why the Oracle system took 4 minutes to do what the Postgres
systems does in 40 seconds.  My only other concern is the Oracle system
having to do the write I/O while the postgres one doesn't...  I don't
see an obvious way to get around that though, and I'm not sure if it'd
really make *that* big of a difference.

    Stephen

От:
Stephen Frost
Дата:

* Stephen Frost () wrote:
> systems does in 40 seconds.  My only other concern is the Oracle system
> having to do the write I/O while the postgres one doesn't...  I don't
> see an obvious way to get around that though, and I'm not sure if it'd
> really make *that* big of a difference.

Alright, after talking with some people on #postgresql I found that in
Oracle you can do 'set autotrace traceonly', which removes the I/O
factor from the Oracle query.  Doing this I also discovered that it
appears Oracle actually uses an index on that field that it knows about
to derive what the distinct results would be.  That probably invalidates
this test for what we were specifically looking for, but, hey, using the
index to figure out what the distinct values for the key are isn't
exactly a bad idea. :)

Here's the new results:

(select access_type_id from p_gen_dom_dedicated_swc_access order by access_type_id;)
-----------------------------------------------------------------------------------
sauron:/home/sfrost> time sqlplus mci_vendor/mci @test.sql

SQL*Plus: Release 9.2.0.1.0 - Production on Fri Jun 18 14:10:12 2004

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production


2322912 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=11459 Card=1303962 B
          ytes=16951506)

   1    0   SORT* (ORDER BY) (Cost=11459 Card=1303962 Bytes=16951506)  :Q457001
   2    1     TABLE ACCESS* (FULL) OF 'P_GEN_DOM_DEDICATED_SWC_ACCESS' :Q457000
           (Cost=1550 Card=1303962 Bytes=16951506)



   1 PARALLEL_TO_SERIAL            SELECT A1.C0 C0 FROM :Q457000 A1 ORDER BY A1
                                   .C0

   2 PARALLEL_TO_PARALLEL          SELECT /*+ NO_EXPAND ROWID(A1) */ A1."ACCESS
                                   _TYPE_ID" C0 FROM "P_GEN_DOM_DEDICAT



Statistics
----------------------------------------------------------
         32  recursive calls
       1594  db block gets
      64495  consistent gets
     105975  physical reads
          0  redo size
   40109427  bytes sent via SQL*Net to client
    1704111  bytes received via SQL*Net from client
     154862  SQL*Net roundtrips to/from client
          2  sorts (memory)
          4  sorts (disk)
    2322912  rows processed

Disconnected from Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production

real    1m38.55s
user    0m23.36s
sys     0m9.61s

-----------------------------------------------------------------------------------
(select distinct access_type_id from p_gen_dom_dedicated_swc_access)
-----------------------------------------------------------------------------------
sauron:/home/sfrost> time sqlplus mci_vendor/mci @test.sql

SQL*Plus: Release 9.2.0.1.0 - Production on Fri Jun 18 14:13:54 2004

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production


16 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=44874 Card=1303962 B
          ytes=16951506)

   1    0   SORT (UNIQUE) (Cost=44874 Card=1303962 Bytes=16951506)
   2    1     INDEX (FAST FULL SCAN) OF 'TABLE_8111_DUPLICATE_CHECK' (
          UNIQUE) (Cost=4 Card=1303962 Bytes=16951506)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      47069  consistent gets
      47067  physical reads
          0  redo size
        841  bytes sent via SQL*Net to client
        662  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         16  rows processed

Disconnected from Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production

real    0m5.36s
user    0m0.04s
sys     0m0.07s
-----------------------------------------------------------------------------------

    Stephen

От:
Tom Lane
Дата:

=?iso-8859-1?q?Gary=20Cowell?= <> writes:
> So it seems the idea that oracle is dropping duplicate
> rows prior to the sort when using distinct may indeed
> be the case.

Okay.  We won't have any short-term solution for making DISTINCT do that,
but if you are on PG 7.4 you could get the same effect from using
GROUP BY: instead of
    select distinct version from vers where version is not null
try
    select version from vers where version is not null group by version
You should get a HashAggregate plan out of that, and I'd think it'd be
pretty quick when there are not many distinct values of version.

            regards, tom lane

От:
Gary Cowell
Дата:

--- Tom Lane <> wrote: >
=?iso-8859-1?q?Gary=20Cowell?=
> <> writes:
> > So it seems the idea that oracle is dropping
> duplicate
> > rows prior to the sort when using distinct may
> indeed
> > be the case.
>
> Okay.  We won't have any short-term solution for
> making DISTINCT do that,
> but if you are on PG 7.4 you could get the same
> effect from using
> GROUP BY: instead of
>     select distinct version from vers where version is
> not null
> try
>     select version from vers where version is not null
> group by version
> You should get a HashAggregate plan out of that, and
> I'd think it'd be
> pretty quick when there are not many distinct values
> of version.
>

Yeah out of the half million rows there are only ever
going to be 500 or so distinct values.

I do indeed get such a plan. It's much faster that
way. Down to 16 seconds. I'll get the chap to rewrite
his app to use group by instead of distinct.

Thanks (everyone) for the top class help!





___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to
expressyourself http://uk.messenger.yahoo.com 

От:
"Mischa Sandberg"
Дата:

Don't know about Oracle, but select-distinct in MSSQL2K will indeed throw
away duplicates, which chops the CPU time. Very easy to see in the graphic
query plan, both in terms of CPU and the number of rows retrieved from a
single-node or nested-loop subtree. Definitely a worthwhile optimization.

"Tom Lane" <> wrote in message
news:...
> Stephen Frost <> writes:
> > * Tom Lane () wrote:
> >> [... thinks for awhile ...]  It seems possible that they may use sort
> >> code that knows it is performing a DISTINCT operation and discards
> >> duplicates on sight.  Given that there are only 534 distinct values,
> >> the sort would easily stay in memory if that were happening.
>
> > Could this optimization be added to PostgreSQL?  It sounds like a very
> > reasonable thing to do.
>
> That's what I was wondering about too.  But first I'd like to get
> some kind of reading on how effective it would be.  If someone can
> demonstrate that Oracle can do sort-and-drop-dups a lot faster than
> it can do a straight sort of the same amount of input data, that
> would be a strong indication that it's worth doing.  At this point
> we don't know if that's the source of their win or not.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to 
>