Обсуждение: abnormally long time in performing a two-table join

Поиск
Список
Период
Сортировка

abnormally long time in performing a two-table join

От
Chris Mungall
Дата:
I've recently made the switch from mysql to postgres - i am now a
zealous postgres convert. however, i'm experiencing some performance
issues that i'm sure are just a reflection on my inexperience admining pg.

A simple two-table natural join (on integer primary/foreign keys) seems
abnormally slow.

(I have performed VACUUM ANALYZE on both tables)

both forced index scans and the default seq+index nested loop
combination are of comparably slow speed.

I include explain analyze output for both modes, plus details on the
tables and their indexes:


omicia29=# set enable_seqscan=off;
SET VARIABLE

omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and
seqfeature.seqfeature_key_id= 15; 
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..197011.39 rows=223 width=8) (actual time=16744.92..44572.00 rows=15 loops=1)
  ->  Index Scan using seqfeature_pkey on seqfeature  (cost=0.00..61715.62 rows=44674 width=4) (actual
time=0.29..14669.06rows=100030 loops=1) 
  ->  Index Scan using sfqv_idx1 on sfqv  (cost=0.00..3.02 rows=1 width=4) (actual time=0.29..0.29 rows=0 loops=100030)

Total runtime: 44572.13 msec

omicia29=# set enable_seqscan=on;
SET VARIABLE

omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and
seqfeature.seqfeature_key_id= 15; 
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..186352.20 rows=223 width=8) (actual time=7803.68..14224.36 rows=15 loops=1)
  ->  Seq Scan on seqfeature  (cost=0.00..51056.43 rows=44674 width=4) (actual time=0.13..4050.09 rows=100030 loops=1)
  ->  Index Scan using sfqv_idx1 on sfqv  (cost=0.00..3.02 rows=1 width=4) (actual time=0.10..0.10 rows=0 loops=100030)
Total runtime: 14224.56 msec

omicia29=# select relkind,sum(relpages) from pg_class group by relkind;
 relkind |  sum
---------+--------
 S       |     26
 i       | 678149
 r       | 487314
 s       |      0
 t       | 162194
 v       |      0
(6 rows)

omicia29=# select count(*) from seqfeature;
  count
---------
 2632194
(1 row)

omicia29=# select count(*) from sfqv;
  count
---------
 8284617
(1 row)

omicia29=# select count(*) from seqfeature where seqfeature_key_id = 15;
 count
--------
 100030
(1 row)

omicia29=# select count(*) from sfqv where qualifier_value = 'BRCA1';
 count
-------
   110
(1 row)

I am obviously doing something wrong, as other joins between the
"seqfeature" table and other tables of comparable size to "sfqv" are
nice and fast.

I can make this go much faster by doing two seperate queries, and
plugging the foreign keys from "sfqv" into a query on "seqfeature";
but obviously I'd rather do this in one SQL query than two.

the pg planner seems to be attempting to do the query the other way
round - doing the query that returns the most rows first??

I have tried forcing the order with a CROSS JOIN and explicitly
constraining by the seqfeature_id field, it works out the same

these are the indexes:

CREATE INDEX sfqv_idx1 ON sfqv USING btree (seqfeature_id);
CREATE INDEX sfqv_idx2 ON sfqv USING btree (ontology_term_id);
CREATE INDEX sfqv_idx3 ON sfqv USING btree (qualifier_value);
CREATE INDEX sfqv_idx4 ON sfqv USING btree (seqfeature_id, qualifier_value);

CREATE INDEX sf1 ON seqfeature USING btree (seqfeature_key_id);
CREATE INDEX sf2 ON seqfeature USING btree (seqfeature_source_id);
CREATE INDEX sf3 ON seqfeature USING btree (bioentry_id);
CREATE INDEX sf4 ON seqfeature USING btree (seqfeature_id, seqfeature_key_id);

as you can see from the EXPLAIN above, only sfqv_idx is being used.

I would have thought sfqv_idx4 would be useful in this particular query?

and also sf4?

and finally, here are the actual tables:

CREATE TABLE "seqfeature" (
        "seqfeature_id" integer DEFAULT nextval('seqfeature_pk_seq'::text) NOT N
ULL,
        "bioentry_id" integer NOT NULL,
        "seqfeature_key_id" integer,
        "seqfeature_source_id" integer,
        "seqfeature_rank" integer,
        Constraint "seqfeature_pkey" Primary Key ("seqfeature_id")
);

CREATE TABLE "sfqv" (
        "seqfeature_id" integer,
        "ontology_term_id" integer,
        "qualifier_rank" integer,
        "qualifier_value" character(255)
);


Re: abnormally long time in performing a two-table join

От
Tom Lane
Дата:
Chris Mungall <cjm@fruitfly.org> writes:
> Nested Loop  (cost=0.00..197011.39 rows=223 width=8) (actual time=16744.92..44572.00 rows=15 loops=1)
>   ->  Index Scan using seqfeature_pkey on seqfeature  (cost=0.00..61715.62 rows=44674 width=4) (actual
time=0.29..14669.06rows=100030 loops=1) 
>   ->  Index Scan using sfqv_idx1 on sfqv  (cost=0.00..3.02 rows=1 width=4) (actual time=0.29..0.29 rows=0
loops=100030)

Odd ... I'm surprised it doesn't choose a hash join.  What do you get if
you try it with "set enable_nestloop = off" ?

> CREATE INDEX sfqv_idx4 ON sfqv USING btree (seqfeature_id, qualifier_value);

> I would have thought sfqv_idx4 would be useful in this particular query?

You'd have to write

select ... where qualifier_value = 'BRCA1' and
seqfeature.seqfeature_key_id = 15 and
sfqv.seqfeature_key_id = 15

to get it to consider that index.  You know and I know that the join
should imply sfqv.seqfeature_key_id = 15, but the planner doesn't.

            regards, tom lane

Re: abnormally long time in performing a two-table join

От
Chris Mungall
Дата:

On Sun, 11 Aug 2002, Tom Lane wrote:

> Chris Mungall <cjm@fruitfly.org> writes:
> > Nested Loop  (cost=0.00..197011.39 rows=223 width=8) (actual time=16744.92..44572.00 rows=15 loops=1)
> >   ->  Index Scan using seqfeature_pkey on seqfeature  (cost=0.00..61715.62 rows=44674 width=4) (actual
time=0.29..14669.06rows=100030 loops=1) 
> >   ->  Index Scan using sfqv_idx1 on sfqv  (cost=0.00..3.02 rows=1 width=4) (actual time=0.29..0.29 rows=0
loops=100030)
>
> Odd ... I'm surprised it doesn't choose a hash join.  What do you get if
> you try it with "set enable_nestloop = off" ?

Much better!

omicia29=# set enable_nestloop = off;
SET VARIABLE
omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL
JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id
= 15;
NOTICE:  QUERY PLAN:

Hash Join  (cost=55921.31..223860.67 rows=778 width=8) (actual
time=4249.63..4259.47 rows=15 loops=1)
  ->  Index Scan using sfqv_idx3 on sfqv  (cost=0.00..167411.84 rows=41423
width=4) (actual time=38.05..44.50 rows=110 loops=1)
  ->  Hash  (cost=51056.43..51056.43 rows=49453 width=4) (actual
time=4211.15..4211.15 rows=0 loops=1)
        ->  Seq Scan on seqfeature  (cost=0.00..51056.43 rows=49453
width=4) (actual time=0.14..3974.12 rows=100030 loops=1)
Total runtime: 4259.67 msec

EXPLAIN
omicia29=# set enable_nestloop = on;
SET VARIABLE
omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL
JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id
= 15;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..203740.72 rows=778 width=8) (actual
time=8382.36..15421.84 rows=15 loops=1)
  ->  Seq Scan on seqfeature  (cost=0.00..51056.43 rows=49453 width=4)
(actual time=0.14..4198.16 rows=100030 loops=1)
  ->  Index Scan using sfqv_idx1 on sfqv  (cost=0.00..3.07 rows=1 width=4)
(actual time=0.11..0.11 rows=0 loops=100030)
Total runtime: 15422.03 msec

however I'm not sure what the implications of turning nestloop off
altogether are - maybe i can hardcode it just for this query

> > CREATE INDEX sfqv_idx4 ON sfqv USING btree (seqfeature_id, qualifier_value);
>
> > I would have thought sfqv_idx4 would be useful in this particular query?
>
> You'd have to write
>
> select ... where qualifier_value = 'BRCA1' and
> seqfeature.seqfeature_key_id = 15 and
> sfqv.seqfeature_key_id = 15
>
> to get it to consider that index.  You know and I know that the join
> should imply sfqv.seqfeature_key_id = 15, but the planner doesn't.

sorry, my column names are confusing

the primary key of the seqfeature table is "seqfeature_id",
seqfeature_key_id is actually a FK into another table entirely

(ideally this would be a 3 table join, but the 3rd table is small so I
prefetch all the pkeys in my code which speeds things up)



>             regards, tom lane
>


Re: abnormally long time in performing a two-table join

От
Tom Lane
Дата:
Chris Mungall <cjm@fruitfly.org> writes:
> On Sun, 11 Aug 2002, Tom Lane wrote:
>> Odd ... I'm surprised it doesn't choose a hash join.  What do you get if
>> you try it with "set enable_nestloop = off" ?

> Much better!

> omicia29=# set enable_nestloop = off;
> SET VARIABLE
> omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL
> JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id
> = 15;
> NOTICE:  QUERY PLAN:

> Hash Join  (cost=55921.31..223860.67 rows=778 width=8) (actual
> time=4249.63..4259.47 rows=15 loops=1)
>   ->  Index Scan using sfqv_idx3 on sfqv  (cost=0.00..167411.84 rows=41423
> width=4) (actual time=38.05..44.50 rows=110 loops=1)
>   ->  Hash  (cost=51056.43..51056.43 rows=49453 width=4) (actual
> time=4211.15..4211.15 rows=0 loops=1)
>         ->  Seq Scan on seqfeature  (cost=0.00..51056.43 rows=49453
> width=4) (actual time=0.14..3974.12 rows=100030 loops=1)
> Total runtime: 4259.67 msec

Hmm.  It looks like the reason the planner doesn't like this plan is
that it's vastly overestimating the number of rows it will get from
indexscanning sfqv_idx3 for qualifier_value = 'BRCA1' (viz, 41423
rather than the true 110).  This is a statistical failure.

What do you get from
    select * from pg_stats where tablename = 'sfqv'
?

You might try increasing the statistics target for the qualifier_value
column --- I'm guessing that you need more resolution in the stats to
deal correctly with low-probability data.  Try
    ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS 100;
    ANALYZE sfqv;        -- to recompute stats
then see how the EXPLAIN results change.  (The default stats target is
10; 100 might be more than you need, or perhaps not.)

> however I'm not sure what the implications of turning nestloop off
> altogether are - maybe i can hardcode it just for this query

It'd be best not to.  I'd counsel seeing if more stats help, first.

            regards, tom lane

Re: abnormally long time in performing a two-table join

От
Chris Mungall
Дата:

On Sun, 11 Aug 2002, Tom Lane wrote:

> Chris Mungall <cjm@fruitfly.org> writes:
> > On Sun, 11 Aug 2002, Tom Lane wrote:
> >> Odd ... I'm surprised it doesn't choose a hash join.  What do you get if
> >> you try it with "set enable_nestloop = off" ?
>
> > Much better!
>
> > omicia29=# set enable_nestloop = off;
> > SET VARIABLE
> > omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL
> > JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id
> > = 15;
> > NOTICE:  QUERY PLAN:
>
> > Hash Join  (cost=55921.31..223860.67 rows=778 width=8) (actual
> > time=4249.63..4259.47 rows=15 loops=1)
> >   ->  Index Scan using sfqv_idx3 on sfqv  (cost=0.00..167411.84 rows=41423
> > width=4) (actual time=38.05..44.50 rows=110 loops=1)
> >   ->  Hash  (cost=51056.43..51056.43 rows=49453 width=4) (actual
> > time=4211.15..4211.15 rows=0 loops=1)
> >         ->  Seq Scan on seqfeature  (cost=0.00..51056.43 rows=49453
> > width=4) (actual time=0.14..3974.12 rows=100030 loops=1)
> > Total runtime: 4259.67 msec
>
> Hmm.  It looks like the reason the planner doesn't like this plan is
> that it's vastly overestimating the number of rows it will get from
> indexscanning sfqv_idx3 for qualifier_value = 'BRCA1' (viz, 41423
> rather than the true 110).  This is a statistical failure.
>
> What do you get from
>     select * from pg_stats where tablename = 'sfqv'
> ?

omicia29=# select * from pg_stats where tablename = 'sfqv';
 tablename |     attname      | null_frac | avg_width | n_distinct |
most_common_vals |          most_common_freqs          |
histogram_bounds                                   | correlation

-----------+------------------+-----------+-----------+------------+------------------+-------------------------------------+-------------------------------------------------------------------------------------+-------------
 sfqv      | seqfeature_id    |         0 |         4 |    -0.2135 |
{394598,942245}  | {0.000666667,0.000666667}           |
{1106,276334,566476,832842,1101265,1360912,1624199,1889730,2125178,2408495,2632006}
|           1
 sfqv      | ontology_term_id |         0 |         4 |         22 |
{13,7}           | {0.516,0.315667}                    |
{5,6,9,9,9,15,15,16,19,20,81}
|    0.342753
 sfqv      | qualifier_rank   |         0 |         4 |          4 |
{1,2,3,4}        | {0.738,0.26,0.00133333,0.000666667} |
|    0.552338
(3 rows)

i'm not sure how to interpret this, but it seems odd that it doesn't
contain the attname 'qualifier_value'

this attribute is definitely there!

omicia29=# explain select distinct qualifier_value from sfqv;
NOTICE:  QUERY PLAN:

Unique  (cost=5655636.66..5676348.20 rows=828462 width=259)
  ->  Sort  (cost=5655636.66..5655636.66 rows=8284617 width=259)
        ->  Seq Scan on sfqv  (cost=0.00..401486.17 rows=8284617
width=259)

most of values of this column will be fairly rare; however, there are
possibly a few distinct values that crop up again and again - this
possibly skews the statistics


> You might try increasing the statistics target for the qualifier_value
> column --- I'm guessing that you need more resolution in the stats to
> deal correctly with low-probability data.  Try
>     ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS 100;
>     ANALYZE sfqv;        -- to recompute stats
> then see how the EXPLAIN results change.  (The default stats target is
> 10; 100 might be more than you need, or perhaps not.)

Hmm, it still doesn't force a hash join

i've included the output from the same query again, twice - first with
nestloop disabled, then with it enabled.

omicia29=# ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS
100;
ALTER
omicia29=# vacuum analyze sfqv;
VACUUM
omicia29=# set enable_nestloop = off;
SET VARIABLE
omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL
JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id
= 15;
NOTICE:  QUERY PLAN:

Hash Join  (cost=55921.31..223860.67 rows=778 width=8) (actual
time=16554.37..16566.89 rows=15 loops=1)
  ->  Index Scan using sfqv_idx3 on sfqv  (cost=0.00..167411.84 rows=41423
width=4) (actual time=82.68..91.59 rows=110 loops=1)
  ->  Hash  (cost=51056.43..51056.43 rows=49453 width=4) (actual
time=16471.29..16471.29 rows=0 loops=1)
        ->  Seq Scan on seqfeature  (cost=0.00..51056.43 rows=49453
width=4) (actual time=0.19..16258.95 rows=100030 loops=1)
Total runtime: 16567.15 msec

EXPLAIN
omicia29=# set enable_nestloop = on;
SET VARIABLE
omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL
JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id
= 15;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..202812.42 rows=778 width=8) (actual
time=56557.06..84107.30 rows=15 loops=1)
  ->  Seq Scan on seqfeature  (cost=0.00..51056.43 rows=49453 width=4)
(actual time=0.13..4572.13 rows=100030 loops=1)
  ->  Index Scan using sfqv_idx1 on sfqv  (cost=0.00..3.06 rows=1 width=4)
(actual time=0.79..0.79 rows=0 loops=100030)
Total runtime: 84107.46 msec



> > however I'm not sure what the implications of turning nestloop off
> > altogether are - maybe i can hardcode it just for this query
>
> It'd be best not to.  I'd counsel seeing if more stats help, first.

Ok, I'm going to try upping it from 100....

>             regards, tom lane
>



Re: abnormally long time in performing a two-table join

От
Chris Mungall
Дата:
On Sun, 11 Aug 2002, Chris Mungall wrote:

> > You might try increasing the statistics target for the qualifier_value
> > column --- I'm guessing that you need more resolution in the stats to
> > deal correctly with low-probability data.  Try
> >     ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS 100;
> >     ANALYZE sfqv;        -- to recompute stats
> > then see how the EXPLAIN results change.  (The default stats target is
> > 10; 100 might be more than you need, or perhaps not.)
>
> Hmm, it still doesn't force a hash join
>
> i've included the output from the same query again, twice - first with
> nestloop disabled, then with it enabled.
>

[SNIP]

>
>
> > > however I'm not sure what the implications of turning nestloop off
> > > altogether are - maybe i can hardcode it just for this query
> >
> > It'd be best not to.  I'd counsel seeing if more stats help, first.
>
> Ok, I'm going to try upping it from 100....

I tried 1000, it still insists on doing a nested loop. any other
suggestions for forcing a hash join over a nested loop?

> >             regards, tom lane
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


Re: abnormally long time in performing a two-table join

От
Tom Lane
Дата:
Chris Mungall <cjm@fruitfly.org> writes:
> i'm not sure how to interpret this, but it seems odd that it doesn't
> contain the attname 'qualifier_value'

That's not just odd, but strikingly wrong.  You did say that
qualifier_value is a plain old varchar column, no?  I'm really
at a loss how we could fail to garner any stats for it.

Would you be willing to let me have access to your system to
look into this?  If so, contact me off-list to arrange details.

            regards, tom lane