Re: abnormally long time in performing a two-table join

Поиск
Список
Период
Сортировка
От Chris Mungall
Тема Re: abnormally long time in performing a two-table join
Дата
Msg-id Pine.LNX.4.33.0208111657090.16003-100000@sos.lbl.gov
обсуждение исходный текст
Ответ на Re: abnormally long time in performing a two-table join  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: abnormally long time in performing a two-table join  (Chris Mungall <cjm@fruitfly.org>)
Re: abnormally long time in performing a two-table join  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-admin

On Sun, 11 Aug 2002, Tom Lane wrote:

> Chris Mungall <cjm@fruitfly.org> writes:
> > On Sun, 11 Aug 2002, Tom Lane wrote:
> >> Odd ... I'm surprised it doesn't choose a hash join.  What do you get if
> >> you try it with "set enable_nestloop = off" ?
>
> > Much better!
>
> > omicia29=# set enable_nestloop = off;
> > SET VARIABLE
> > omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL
> > JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id
> > = 15;
> > NOTICE:  QUERY PLAN:
>
> > Hash Join  (cost=55921.31..223860.67 rows=778 width=8) (actual
> > time=4249.63..4259.47 rows=15 loops=1)
> >   ->  Index Scan using sfqv_idx3 on sfqv  (cost=0.00..167411.84 rows=41423
> > width=4) (actual time=38.05..44.50 rows=110 loops=1)
> >   ->  Hash  (cost=51056.43..51056.43 rows=49453 width=4) (actual
> > time=4211.15..4211.15 rows=0 loops=1)
> >         ->  Seq Scan on seqfeature  (cost=0.00..51056.43 rows=49453
> > width=4) (actual time=0.14..3974.12 rows=100030 loops=1)
> > Total runtime: 4259.67 msec
>
> Hmm.  It looks like the reason the planner doesn't like this plan is
> that it's vastly overestimating the number of rows it will get from
> indexscanning sfqv_idx3 for qualifier_value = 'BRCA1' (viz, 41423
> rather than the true 110).  This is a statistical failure.
>
> What do you get from
>     select * from pg_stats where tablename = 'sfqv'
> ?

omicia29=# select * from pg_stats where tablename = 'sfqv';
 tablename |     attname      | null_frac | avg_width | n_distinct |
most_common_vals |          most_common_freqs          |
histogram_bounds                                   | correlation

-----------+------------------+-----------+-----------+------------+------------------+-------------------------------------+-------------------------------------------------------------------------------------+-------------
 sfqv      | seqfeature_id    |         0 |         4 |    -0.2135 |
{394598,942245}  | {0.000666667,0.000666667}           |
{1106,276334,566476,832842,1101265,1360912,1624199,1889730,2125178,2408495,2632006}
|           1
 sfqv      | ontology_term_id |         0 |         4 |         22 |
{13,7}           | {0.516,0.315667}                    |
{5,6,9,9,9,15,15,16,19,20,81}
|    0.342753
 sfqv      | qualifier_rank   |         0 |         4 |          4 |
{1,2,3,4}        | {0.738,0.26,0.00133333,0.000666667} |
|    0.552338
(3 rows)

i'm not sure how to interpret this, but it seems odd that it doesn't
contain the attname 'qualifier_value'

this attribute is definitely there!

omicia29=# explain select distinct qualifier_value from sfqv;
NOTICE:  QUERY PLAN:

Unique  (cost=5655636.66..5676348.20 rows=828462 width=259)
  ->  Sort  (cost=5655636.66..5655636.66 rows=8284617 width=259)
        ->  Seq Scan on sfqv  (cost=0.00..401486.17 rows=8284617
width=259)

most of values of this column will be fairly rare; however, there are
possibly a few distinct values that crop up again and again - this
possibly skews the statistics


> You might try increasing the statistics target for the qualifier_value
> column --- I'm guessing that you need more resolution in the stats to
> deal correctly with low-probability data.  Try
>     ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS 100;
>     ANALYZE sfqv;        -- to recompute stats
> then see how the EXPLAIN results change.  (The default stats target is
> 10; 100 might be more than you need, or perhaps not.)

Hmm, it still doesn't force a hash join

i've included the output from the same query again, twice - first with
nestloop disabled, then with it enabled.

omicia29=# ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS
100;
ALTER
omicia29=# vacuum analyze sfqv;
VACUUM
omicia29=# set enable_nestloop = off;
SET VARIABLE
omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL
JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id
= 15;
NOTICE:  QUERY PLAN:

Hash Join  (cost=55921.31..223860.67 rows=778 width=8) (actual
time=16554.37..16566.89 rows=15 loops=1)
  ->  Index Scan using sfqv_idx3 on sfqv  (cost=0.00..167411.84 rows=41423
width=4) (actual time=82.68..91.59 rows=110 loops=1)
  ->  Hash  (cost=51056.43..51056.43 rows=49453 width=4) (actual
time=16471.29..16471.29 rows=0 loops=1)
        ->  Seq Scan on seqfeature  (cost=0.00..51056.43 rows=49453
width=4) (actual time=0.19..16258.95 rows=100030 loops=1)
Total runtime: 16567.15 msec

EXPLAIN
omicia29=# set enable_nestloop = on;
SET VARIABLE
omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL
JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id
= 15;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..202812.42 rows=778 width=8) (actual
time=56557.06..84107.30 rows=15 loops=1)
  ->  Seq Scan on seqfeature  (cost=0.00..51056.43 rows=49453 width=4)
(actual time=0.13..4572.13 rows=100030 loops=1)
  ->  Index Scan using sfqv_idx1 on sfqv  (cost=0.00..3.06 rows=1 width=4)
(actual time=0.79..0.79 rows=0 loops=100030)
Total runtime: 84107.46 msec



> > however I'm not sure what the implications of turning nestloop off
> > altogether are - maybe i can hardcode it just for this query
>
> It'd be best not to.  I'd counsel seeing if more stats help, first.

Ok, I'm going to try upping it from 100....

>             regards, tom lane
>



В списке pgsql-admin по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: abnormally long time in performing a two-table join
Следующее
От: Chris Ruprecht
Дата:
Сообщение: ProstGreSQL on RAID Question