Re: abnormally long time in performing a two-table join
От | Chris Mungall |
---|---|
Тема | Re: abnormally long time in performing a two-table join |
Дата | |
Msg-id | Pine.LNX.4.33.0208111657090.16003-100000@sos.lbl.gov обсуждение исходный текст |
Ответ на | Re: abnormally long time in performing a two-table join (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: abnormally long time in performing a two-table join
(Chris Mungall <cjm@fruitfly.org>)
Re: abnormally long time in performing a two-table join (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-admin |
On Sun, 11 Aug 2002, Tom Lane wrote: > Chris Mungall <cjm@fruitfly.org> writes: > > On Sun, 11 Aug 2002, Tom Lane wrote: > >> Odd ... I'm surprised it doesn't choose a hash join. What do you get if > >> you try it with "set enable_nestloop = off" ? > > > Much better! > > > omicia29=# set enable_nestloop = off; > > SET VARIABLE > > omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL > > JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id > > = 15; > > NOTICE: QUERY PLAN: > > > Hash Join (cost=55921.31..223860.67 rows=778 width=8) (actual > > time=4249.63..4259.47 rows=15 loops=1) > > -> Index Scan using sfqv_idx3 on sfqv (cost=0.00..167411.84 rows=41423 > > width=4) (actual time=38.05..44.50 rows=110 loops=1) > > -> Hash (cost=51056.43..51056.43 rows=49453 width=4) (actual > > time=4211.15..4211.15 rows=0 loops=1) > > -> Seq Scan on seqfeature (cost=0.00..51056.43 rows=49453 > > width=4) (actual time=0.14..3974.12 rows=100030 loops=1) > > Total runtime: 4259.67 msec > > Hmm. It looks like the reason the planner doesn't like this plan is > that it's vastly overestimating the number of rows it will get from > indexscanning sfqv_idx3 for qualifier_value = 'BRCA1' (viz, 41423 > rather than the true 110). This is a statistical failure. > > What do you get from > select * from pg_stats where tablename = 'sfqv' > ? omicia29=# select * from pg_stats where tablename = 'sfqv'; tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation -----------+------------------+-----------+-----------+------------+------------------+-------------------------------------+-------------------------------------------------------------------------------------+------------- sfqv | seqfeature_id | 0 | 4 | -0.2135 | {394598,942245} | {0.000666667,0.000666667} | {1106,276334,566476,832842,1101265,1360912,1624199,1889730,2125178,2408495,2632006} | 1 sfqv | ontology_term_id | 0 | 4 | 22 | {13,7} | {0.516,0.315667} | {5,6,9,9,9,15,15,16,19,20,81} | 0.342753 sfqv | qualifier_rank | 0 | 4 | 4 | {1,2,3,4} | {0.738,0.26,0.00133333,0.000666667} | | 0.552338 (3 rows) i'm not sure how to interpret this, but it seems odd that it doesn't contain the attname 'qualifier_value' this attribute is definitely there! omicia29=# explain select distinct qualifier_value from sfqv; NOTICE: QUERY PLAN: Unique (cost=5655636.66..5676348.20 rows=828462 width=259) -> Sort (cost=5655636.66..5655636.66 rows=8284617 width=259) -> Seq Scan on sfqv (cost=0.00..401486.17 rows=8284617 width=259) most of values of this column will be fairly rare; however, there are possibly a few distinct values that crop up again and again - this possibly skews the statistics > You might try increasing the statistics target for the qualifier_value > column --- I'm guessing that you need more resolution in the stats to > deal correctly with low-probability data. Try > ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS 100; > ANALYZE sfqv; -- to recompute stats > then see how the EXPLAIN results change. (The default stats target is > 10; 100 might be more than you need, or perhaps not.) Hmm, it still doesn't force a hash join i've included the output from the same query again, twice - first with nestloop disabled, then with it enabled. omicia29=# ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS 100; ALTER omicia29=# vacuum analyze sfqv; VACUUM omicia29=# set enable_nestloop = off; SET VARIABLE omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id = 15; NOTICE: QUERY PLAN: Hash Join (cost=55921.31..223860.67 rows=778 width=8) (actual time=16554.37..16566.89 rows=15 loops=1) -> Index Scan using sfqv_idx3 on sfqv (cost=0.00..167411.84 rows=41423 width=4) (actual time=82.68..91.59 rows=110 loops=1) -> Hash (cost=51056.43..51056.43 rows=49453 width=4) (actual time=16471.29..16471.29 rows=0 loops=1) -> Seq Scan on seqfeature (cost=0.00..51056.43 rows=49453 width=4) (actual time=0.19..16258.95 rows=100030 loops=1) Total runtime: 16567.15 msec EXPLAIN omicia29=# set enable_nestloop = on; SET VARIABLE omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id = 15; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..202812.42 rows=778 width=8) (actual time=56557.06..84107.30 rows=15 loops=1) -> Seq Scan on seqfeature (cost=0.00..51056.43 rows=49453 width=4) (actual time=0.13..4572.13 rows=100030 loops=1) -> Index Scan using sfqv_idx1 on sfqv (cost=0.00..3.06 rows=1 width=4) (actual time=0.79..0.79 rows=0 loops=100030) Total runtime: 84107.46 msec > > however I'm not sure what the implications of turning nestloop off > > altogether are - maybe i can hardcode it just for this query > > It'd be best not to. I'd counsel seeing if more stats help, first. Ok, I'm going to try upping it from 100.... > regards, tom lane >
В списке pgsql-admin по дате отправления: