Обсуждение: intersect performance (PG 7.1.3 vs 7.2)

Поиск
Список
Период
Сортировка

intersect performance (PG 7.1.3 vs 7.2)

От
d_nardini@btconnect.com
Дата:
Hi,

I'm running the following statement on a (development) Mac OS X (PG 7.2) 
box and on a (production) RedHat 7.0 (PG 7.1.3) box. On the OS X it's 
performing very well ... on RH it's VERY SLOW, with CPU utilization 
hitting 90%+ (on OS X it hardly exceeds 2%). BTW - replacing 'intersect' 
with 'union' in the SQL behaves the same. (RH 7.0 box has 4x the amount 
of memory and CPU power - plenty of spare capacity).

Do I need to upgrade the RH box with 7.2 (have there been significant 
enhancements in this area ?), or have I missed something obvious ?

Any comments / advice welcome !

SQL statement :

select i.master_reference from image i, image_keyword ik, keyword k
where k.keyword = 'pen'
and ik.keyword_id = k.keyword_id
and i.image_id = ik.image_id
intersect
select i.master_reference from image i, image_keyword ik, keyword k
where k.keyword = 'cheque'
and ik.keyword_id = k.keyword_id
and i.image_id = ik.image_id
intersect
select i.master_reference from image i, image_keyword ik, keyword k
where k.keyword = 'purchase'
and ik.keyword_id = k.keyword_id
and i.image_id = ik.image_id
order by master_reference
limit 12, 0;

DEFINITIONS statements :

drop table image;
drop sequence image_seq;
create sequence image_seq;
create table image (   image_id bigint not null default nextval('image_seq') primary key,   artist_id integer not null
default'0'
 
);

drop table image_keyword;
create table image_keyword (   image_id bigint not null default '0',   keyword_id bigint not null default '0'
);
create index image_keyword_image_id_ix on image_keyword (image_id);
create index image_keyword_keyword_id_ix on image_keyword (keyword_id);

drop table keyword;
drop sequence keyword_seq;
create sequence keyword_seq;
create table keyword (   keyword_id bigint not null default nextval('keyword_seq') primary key,   keyword varchar(50)
notnull unique
 
);
create index keyword_keyword_ix on keyword (keyword);

approx number of records/rows in the above tables :
image = 15000
image_keyword = 600000
keyword = 40000



Re: intersect performance (PG 7.1.3 vs 7.2)

От
Tom Lane
Дата:
d_nardini@btconnect.com writes:
> I'm running the following statement on a (development) Mac OS X (PG 7.2) 
> box and on a (production) RedHat 7.0 (PG 7.1.3) box. On the OS X it's 
> performing very well ... on RH it's VERY SLOW, with CPU utilization 
> hitting 90%+ (on OS X it hardly exceeds 2%).

Hmm.  Nested intersects are actually broken in 7.1.*,
cf.http://archives.postgresql.org/pgsql-bugs/2001-08/msg00064.php
Do you get the same results from both boxes?

> BTW - replacing 'intersect' 
> with 'union' in the SQL behaves the same.

In that case it might possibly be a different problem.  Do you get the
same EXPLAIN plan on both boxes?
        regards, tom lane


Re: intersect performance (PG 7.1.3 vs 7.2)

От
d_nardini@btconnect.com
Дата:
Tom,

Thanks for the info on the 'broken' intersect (the RH 7.1 box does give 
the same results (eventually)) - apologies for the size of email in 
advance ...

Following are EXPLAIN dumps from each platform for both intersect and 
union (both return the correct data on each platform).

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

OS X (PG 7.2) - intersect start-snip :

spdb=# explain select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'pen'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# intersect
spdb-# select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'cheque'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# intersect
spdb-# select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'purchase'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# order by master_reference
spdb-# limit 12, 0;
NOTICE:  QUERY PLAN:
Limit  (cost=80.41..80.41 rows=1 width=46)  ->  Sort  (cost=80.41..80.41 rows=1 width=46)        ->  SetOp Intersect
(cost=80.39..80.40rows=1 width=46)              ->  Sort  (cost=80.39..80.39 rows=2 width=46)                    ->
Append (cost=53.59..80.38 rows=2 width=46)                          ->  Result  (cost=53.59..53.59 rows=1 width=46)
                          ->  SetOp Intersect  (cost=53.59..53.59 
 
rows=1 width=46)                                      ->  Sort  (cost=53.59..53.59 
rows=2 width=46)                                            ->  Append  
(cost=0.00..53.58 rows=2 width=46)                                                  ->  Subquery Scan 
*SELECT* 1  (cost=0.00..26.79 rows=1 width=46)                                                        ->  Nested Loop

(cost=0.00..26.79 rows=1 width=46)                                                              ->  Nested 
Loop  (cost=0.00..21.95 rows=1 width=24)                                                                    ->  
Index Scan using keyword_keyword_ix on keyword k  (cost=0.00..4.82 
rows=1 width=8)                                                                    ->  
Index Scan using image_keyword_keyword_id_ix on image_keyword ik  
(cost=0.00..17.07 rows=5 width=16)                                                              ->  Index 
Scan using image_pkey on image i  (cost=0.00..4.82 rows=1 width=22)                                                  ->
Subquery Scan 
 
*SELECT* 2  (cost=0.00..26.79 rows=1 width=46)                                                        ->  Nested Loop

(cost=0.00..26.79 rows=1 width=46)                                                              ->  Nested 
Loop  (cost=0.00..21.95 rows=1 width=24)                                                                    ->  
Index Scan using keyword_keyword_ix on keyword k  (cost=0.00..4.82 
rows=1 width=8)                                                                    ->  
Index Scan using image_keyword_keyword_id_ix on image_keyword ik  
(cost=0.00..17.07 rows=5 width=16)                                                              ->  Index 
Scan using image_pkey on image i  (cost=0.00..4.82 rows=1 width=22)                          ->  Subquery Scan *SELECT*
3 
 
(cost=0.00..26.79 rows=1 width=46)                                ->  Nested Loop  (cost=0.00..26.79 
rows=1 width=46)                                      ->  Nested Loop  (cost=0.00..21.95 
rows=1 width=24)                                            ->  Index Scan using 
keyword_keyword_ix on keyword k  (cost=0.00..4.82 rows=1 width=8)                                            ->  Index
Scanusing 
 
image_keyword_keyword_id_ix on image_keyword ik  (cost=0.00..17.07 
rows=5 width=16)                                      ->  Index Scan using image_pkey on 
image i  (cost=0.00..4.82 rows=1 width=22)
EXPLAIN
 master_reference
------------------ 0635-00003 1060-00018 1060-00019
(3 rows)

OS X (PG 7.2) - intersect end-snip :

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

RH 7.1 (PG 7.1.3) - intersect start-snip :

spdb=# explain select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'pen'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# intersect
spdb-# select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'cheque'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# intersect
spdb-# select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'purchase'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# order by master_reference
spdb-# limit 12, 0;
NOTICE:  QUERY PLAN:
Limit  (cost=651.20..651.20 rows=12 width=44)  ->  Sort  (cost=651.20..651.20 rows=120 width=44)        ->  SetOp
Intersect (cost=644.06..647.06 rows=120 width=44)              ->  Sort  (cost=644.06..644.06 rows=1200 width=44)
            ->  Append  (cost=421.67..582.68 rows=1200 width=44)                          ->  Result
(cost=421.67..426.67rows=200 
 
width=44)                                ->  SetOp Intersect  
(cost=421.67..426.67 rows=200 width=44)                                      ->  Sort  (cost=421.67..421.67 
rows=2000 width=44)                                            ->  Append  
(cost=83.25..312.02 rows=2000 width=44)                                                  ->  Subquery Scan 
*SELECT* 1  (cost=83.25..156.01 rows=1000 width=44)                                                        ->  Merge
Join 
 
(cost=83.25..156.01 rows=1000 width=44)                                                              ->  Index 
Scan using image_pkey on image i  (cost=0.00..59.00 rows=1000 width=20)
            ->  Sort  
 
(cost=83.25..83.25 rows=100 width=24)                                                                    ->  
Merge Join  (cost=8.30..79.93 rows=100 width=24)
 ->  Index Scan using image_keyword_keyword_id_ix on image_keyword ik  
 
(cost=0.00..59.00 rows=1000 width=16)                                                                          ->  Sort
(cost=8.30..8.30 rows=10 width=8)                                                                                ->
IndexScan using keyword_keyword_ix on keyword k  
 
(cost=0.00..8.14 rows=10 width=8)                                                  ->  Subquery Scan 
*SELECT* 2  (cost=83.25..156.01 rows=1000 width=44)                                                        ->  Merge
Join 
 
(cost=83.25..156.01 rows=1000 width=44)                                                              ->  Index 
Scan using image_pkey on image i  (cost=0.00..59.00 rows=1000 width=20)
            ->  Sort  
 
(cost=83.25..83.25 rows=100 width=24)                                                                    ->  
Merge Join  (cost=8.30..79.93 rows=100 width=24)
 ->  Index Scan using image_keyword_keyword_id_ix on image_keyword ik  
 
(cost=0.00..59.00 rows=1000 width=16)                                                                          ->  Sort
(cost=8.30..8.30 rows=10 width=8)                                                                                ->
IndexScan using keyword_keyword_ix on keyword k  
 
(cost=0.00..8.14 rows=10 width=8)                          ->  Subquery Scan *SELECT* 3  
(cost=83.25..156.01 rows=1000 width=44)                                ->  Merge Join  (cost=83.25..156.01 
rows=1000 width=44)                                      ->  Index Scan using image_pkey on
image i  (cost=0.00..59.00 rows=1000 width=20)                                      ->  Sort  (cost=83.25..83.25 
rows=100 width=24)                                            ->  Merge Join  
(cost=8.30..79.93 rows=100 width=24)                                                  ->  Index Scan using 
image_keyword_keyword_id_ix on image_keyword ik  (cost=0.00..59.00 
rows=1000 width=16)                                                  ->  Sort  
(cost=8.30..8.30 rows=10 width=8)                                                        ->  Index Scan 
using keyword_keyword_ix on keyword k  (cost=0.00..8.14 rows=10 width=8)
EXPLAIN
 master_reference
------------------ 0635-00003 1060-00018 1060-00019
(3 rows)

RH 7.1 (PG 7.1.3) - intersect end-snip :

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

OS X (PG 7.2) - union start-snip :
spdb=# explain select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'pen'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# union
spdb-# select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'cheque'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# union
spdb-# select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'purchase'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# order by master_reference
spdb-# limit 12, 0;
NOTICE:  QUERY PLAN:
Limit  (cost=80.41..80.41 rows=1 width=46)  ->  Sort  (cost=80.41..80.41 rows=1 width=46)        ->  Unique
(cost=80.39..80.40rows=1 width=46)              ->  Sort  (cost=80.39..80.39 rows=3 width=46)                    ->
Append (cost=0.00..80.37 rows=3 width=46)                          ->  Subquery Scan *SELECT* 1  
 
(cost=0.00..26.79 rows=1 width=46)                                ->  Nested Loop  (cost=0.00..26.79 
rows=1 width=46)                                      ->  Nested Loop  (cost=0.00..21.95 
rows=1 width=24)                                            ->  Index Scan using 
keyword_keyword_ix on keyword k  (cost=0.00..4.82 rows=1 width=8)                                            ->  Index
Scanusing 
 
image_keyword_keyword_id_ix on image_keyword ik  (cost=0.00..17.07 
rows=5 width=16)                                      ->  Index Scan using image_pkey on 
image i  (cost=0.00..4.82 rows=1 width=22)                          ->  Subquery Scan *SELECT* 2  
(cost=0.00..26.79 rows=1 width=46)                                ->  Nested Loop  (cost=0.00..26.79 
rows=1 width=46)                                      ->  Nested Loop  (cost=0.00..21.95 
rows=1 width=24)                                            ->  Index Scan using 
keyword_keyword_ix on keyword k  (cost=0.00..4.82 rows=1 width=8)                                            ->  Index
Scanusing 
 
image_keyword_keyword_id_ix on image_keyword ik  (cost=0.00..17.07 
rows=5 width=16)                                      ->  Index Scan using image_pkey on 
image i  (cost=0.00..4.82 rows=1 width=22)                          ->  Subquery Scan *SELECT* 3  
(cost=0.00..26.79 rows=1 width=46)                                ->  Nested Loop  (cost=0.00..26.79 
rows=1 width=46)                                      ->  Nested Loop  (cost=0.00..21.95 
rows=1 width=24)                                            ->  Index Scan using 
keyword_keyword_ix on keyword k  (cost=0.00..4.82 rows=1 width=8)                                            ->  Index
Scanusing 
 
image_keyword_keyword_id_ix on image_keyword ik  (cost=0.00..17.07 
rows=5 width=16)                                      ->  Index Scan using image_pkey on 
image i  (cost=0.00..4.82 rows=1 width=22)
EXPLAIN
 master_reference
------------------ 0079-00047 0219-00166 0237-00131 0237-00140 0237-00146 0244-00037 0244-00038 0244-00046 0244-00055
0244-000670253-00004 0368-00094
 
(12 rows)

OS X (PG 7.2) - union end-snip

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

RH 7.1 (PG 7.1.3) - union start-snip :

spdb=# explain select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'pen'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# union
spdb-# select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'cheque'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# union
spdb-# select i.master_reference from image i, image_keyword ik, 
keyword k
spdb-# where k.keyword = 'purchase'
spdb-# and ik.keyword_id = k.keyword_id
spdb-# and i.image_id = ik.image_id
spdb-# order by master_reference
spdb-# limit 12, 0;
NOTICE:  QUERY PLAN:
Limit  (cost=661.13..661.13 rows=12 width=44)  ->  Sort  (cost=661.13..661.13 rows=300 width=44)        ->  Unique
(cost=641.29..648.79rows=300 width=44)              ->  Sort  (cost=641.29..641.29 rows=3000 width=44)
 ->  Append  (cost=83.25..468.02 rows=3000 width=44)                          ->  Subquery Scan *SELECT* 1  
 
(cost=83.25..156.01 rows=1000 width=44)                                ->  Merge Join  (cost=83.25..156.01 
rows=1000 width=44)                                      ->  Index Scan using image_pkey on 
image i  (cost=0.00..59.00 rows=1000 width=20)                                      ->  Sort  (cost=83.25..83.25 
rows=100 width=24)                                            ->  Merge Join  
(cost=8.30..79.93 rows=100 width=24)                                                  ->  Index Scan using 
image_keyword_keyword_id_ix on image_keyword ik  (cost=0.00..59.00 
rows=1000 width=16)                                                  ->  Sort  
(cost=8.30..8.30 rows=10 width=8)                                                        ->  Index Scan 
using keyword_keyword_ix on keyword k  (cost=0.00..8.14 rows=10 width=8)                          ->  Subquery Scan
*SELECT*2  
 
(cost=83.25..156.01 rows=1000 width=44)                                ->  Merge Join  (cost=83.25..156.01 
rows=1000 width=44)                                      ->  Index Scan using image_pkey on 
image i  (cost=0.00..59.00 rows=1000 width=20)                                      ->  Sort  (cost=83.25..83.25 
rows=100 width=24)                                            ->  Merge Join  
(cost=8.30..79.93 rows=100 width=24)                                                  ->  Index Scan using 
image_keyword_keyword_id_ix on image_keyword ik  (cost=0.00..59.00 
rows=1000 width=16)                                                  ->  Sort  
(cost=8.30..8.30 rows=10 width=8)                                                        ->  Index Scan 
using keyword_keyword_ix on keyword k  (cost=0.00..8.14 rows=10 width=8)                          ->  Subquery Scan
*SELECT*3  
 
(cost=83.25..156.01 rows=1000 width=44)                                ->  Merge Join  (cost=83.25..156.01 
rows=1000 width=44)                                      ->  Index Scan using image_pkey on 
image i  (cost=0.00..59.00 rows=1000 width=20)                                      ->  Sort  (cost=83.25..83.25 
rows=100 width=24)                                            ->  Merge Join  
(cost=8.30..79.93 rows=100 width=24)                                                  ->  Index Scan using 
image_keyword_keyword_id_ix on image_keyword ik  (cost=0.00..59.00 
rows=1000 width=16)                                                  ->  Sort  
(cost=8.30..8.30 rows=10 width=8)                                                        ->  Index Scan 
using keyword_keyword_ix on keyword k  (cost=0.00..8.14 rows=10 width=8)

EXPLAIN
 master_reference
------------------ 0079-00047 0219-00166 0237-00131 0237-00140 0237-00146 0244-00037 0244-00038 0244-00046 0244-00055
0244-000670253-00004 0368-00094
 
(12 rows)

RH 7.1 (PG 7.1.3) - union end-snip

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

As a side issue, I have tried to upgrade the RH 7.1 box with PG 7.2, but 
I get the following which I cannot get passed (apologies as this is 
clearly a Linux issue ... but it is very frustrating and have hit a 
brick wall; I have installed 'openssl-0.9.6-9.i386.rpm' and 
'krb5-libs-1.2.2-12.i386.rpm' but no change in behaviour) :

rpm -i --test postgresql-7.2-1PGDG.i686.rpm
error: failed dependencies:        libcrypto.so.0   is needed by postgresql-7.2-1PGDG        libk5crypto.so.2   is
neededby postgresql-7.2-1PGDG        libkrb5.so.2   is needed by postgresql-7.2-1PGDG        libreadline.so.4   is
neededby postgresql-7.2-1PGDG        libssl.so.0   is needed by postgresql-7.2-1PGDG
 

Any advice appreciated ;-)

Regards.

=====

On Friday, April 5, 2002, at 11:50  pm, Tom Lane wrote:

> d_nardini@btconnect.com writes:
>> I'm running the following statement on a (development) Mac OS X (PG 
>> 7.2)
>> box and on a (production) RedHat 7.0 (PG 7.1.3) box. On the OS X it's
>> performing very well ... on RH it's VERY SLOW, with CPU utilization
>> hitting 90%+ (on OS X it hardly exceeds 2%).
>
> Hmm.  Nested intersects are actually broken in 7.1.*, cf.
>     http://archives.postgresql.org/pgsql-bugs/2001-08/msg00064.php
> Do you get the same results from both boxes?
>
>> BTW - replacing 'intersect'
>> with 'union' in the SQL behaves the same.
>
> In that case it might possibly be a different problem.  Do you get the
> same EXPLAIN plan on both boxes?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>



Re: intersect performance (PG 7.1.3 vs 7.2)

От
Tom Lane
Дата:
d_nardini@btconnect.com writes:
> Following are EXPLAIN dumps from each platform for both intersect and 
> union (both return the correct data on each platform).

Did you ever VACUUM ANALYZE these tables on the 7.1 box?  The statistics
estimates from that machine look suspiciously like default values.

If you did, then maybe I can credit 7.2's improved statistics machinery
with a win ;-).  But I bet the better choice of plan on the 7.2 box is
due simply to having any stats at all.

Can't help you with the RPM dependency issues, sorry.
        regards, tom lane