Обсуждение: abnormally long time in performing a two-table join
I've recently made the switch from mysql to postgres - i am now a zealous postgres convert. however, i'm experiencing some performance issues that i'm sure are just a reflection on my inexperience admining pg. A simple two-table natural join (on integer primary/foreign keys) seems abnormally slow. (I have performed VACUUM ANALYZE on both tables) both forced index scans and the default seq+index nested loop combination are of comparably slow speed. I include explain analyze output for both modes, plus details on the tables and their indexes: omicia29=# set enable_seqscan=off; SET VARIABLE omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id= 15; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..197011.39 rows=223 width=8) (actual time=16744.92..44572.00 rows=15 loops=1) -> Index Scan using seqfeature_pkey on seqfeature (cost=0.00..61715.62 rows=44674 width=4) (actual time=0.29..14669.06rows=100030 loops=1) -> Index Scan using sfqv_idx1 on sfqv (cost=0.00..3.02 rows=1 width=4) (actual time=0.29..0.29 rows=0 loops=100030) Total runtime: 44572.13 msec omicia29=# set enable_seqscan=on; SET VARIABLE omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id= 15; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..186352.20 rows=223 width=8) (actual time=7803.68..14224.36 rows=15 loops=1) -> Seq Scan on seqfeature (cost=0.00..51056.43 rows=44674 width=4) (actual time=0.13..4050.09 rows=100030 loops=1) -> Index Scan using sfqv_idx1 on sfqv (cost=0.00..3.02 rows=1 width=4) (actual time=0.10..0.10 rows=0 loops=100030) Total runtime: 14224.56 msec omicia29=# select relkind,sum(relpages) from pg_class group by relkind; relkind | sum ---------+-------- S | 26 i | 678149 r | 487314 s | 0 t | 162194 v | 0 (6 rows) omicia29=# select count(*) from seqfeature; count --------- 2632194 (1 row) omicia29=# select count(*) from sfqv; count --------- 8284617 (1 row) omicia29=# select count(*) from seqfeature where seqfeature_key_id = 15; count -------- 100030 (1 row) omicia29=# select count(*) from sfqv where qualifier_value = 'BRCA1'; count ------- 110 (1 row) I am obviously doing something wrong, as other joins between the "seqfeature" table and other tables of comparable size to "sfqv" are nice and fast. I can make this go much faster by doing two seperate queries, and plugging the foreign keys from "sfqv" into a query on "seqfeature"; but obviously I'd rather do this in one SQL query than two. the pg planner seems to be attempting to do the query the other way round - doing the query that returns the most rows first?? I have tried forcing the order with a CROSS JOIN and explicitly constraining by the seqfeature_id field, it works out the same these are the indexes: CREATE INDEX sfqv_idx1 ON sfqv USING btree (seqfeature_id); CREATE INDEX sfqv_idx2 ON sfqv USING btree (ontology_term_id); CREATE INDEX sfqv_idx3 ON sfqv USING btree (qualifier_value); CREATE INDEX sfqv_idx4 ON sfqv USING btree (seqfeature_id, qualifier_value); CREATE INDEX sf1 ON seqfeature USING btree (seqfeature_key_id); CREATE INDEX sf2 ON seqfeature USING btree (seqfeature_source_id); CREATE INDEX sf3 ON seqfeature USING btree (bioentry_id); CREATE INDEX sf4 ON seqfeature USING btree (seqfeature_id, seqfeature_key_id); as you can see from the EXPLAIN above, only sfqv_idx is being used. I would have thought sfqv_idx4 would be useful in this particular query? and also sf4? and finally, here are the actual tables: CREATE TABLE "seqfeature" ( "seqfeature_id" integer DEFAULT nextval('seqfeature_pk_seq'::text) NOT N ULL, "bioentry_id" integer NOT NULL, "seqfeature_key_id" integer, "seqfeature_source_id" integer, "seqfeature_rank" integer, Constraint "seqfeature_pkey" Primary Key ("seqfeature_id") ); CREATE TABLE "sfqv" ( "seqfeature_id" integer, "ontology_term_id" integer, "qualifier_rank" integer, "qualifier_value" character(255) );
Chris Mungall <cjm@fruitfly.org> writes: > Nested Loop (cost=0.00..197011.39 rows=223 width=8) (actual time=16744.92..44572.00 rows=15 loops=1) > -> Index Scan using seqfeature_pkey on seqfeature (cost=0.00..61715.62 rows=44674 width=4) (actual time=0.29..14669.06rows=100030 loops=1) > -> Index Scan using sfqv_idx1 on sfqv (cost=0.00..3.02 rows=1 width=4) (actual time=0.29..0.29 rows=0 loops=100030) Odd ... I'm surprised it doesn't choose a hash join. What do you get if you try it with "set enable_nestloop = off" ? > CREATE INDEX sfqv_idx4 ON sfqv USING btree (seqfeature_id, qualifier_value); > I would have thought sfqv_idx4 would be useful in this particular query? You'd have to write select ... where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id = 15 and sfqv.seqfeature_key_id = 15 to get it to consider that index. You know and I know that the join should imply sfqv.seqfeature_key_id = 15, but the planner doesn't. regards, tom lane
On Sun, 11 Aug 2002, Tom Lane wrote: > Chris Mungall <cjm@fruitfly.org> writes: > > Nested Loop (cost=0.00..197011.39 rows=223 width=8) (actual time=16744.92..44572.00 rows=15 loops=1) > > -> Index Scan using seqfeature_pkey on seqfeature (cost=0.00..61715.62 rows=44674 width=4) (actual time=0.29..14669.06rows=100030 loops=1) > > -> Index Scan using sfqv_idx1 on sfqv (cost=0.00..3.02 rows=1 width=4) (actual time=0.29..0.29 rows=0 loops=100030) > > Odd ... I'm surprised it doesn't choose a hash join. What do you get if > you try it with "set enable_nestloop = off" ? Much better! omicia29=# set enable_nestloop = off; SET VARIABLE omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id = 15; NOTICE: QUERY PLAN: Hash Join (cost=55921.31..223860.67 rows=778 width=8) (actual time=4249.63..4259.47 rows=15 loops=1) -> Index Scan using sfqv_idx3 on sfqv (cost=0.00..167411.84 rows=41423 width=4) (actual time=38.05..44.50 rows=110 loops=1) -> Hash (cost=51056.43..51056.43 rows=49453 width=4) (actual time=4211.15..4211.15 rows=0 loops=1) -> Seq Scan on seqfeature (cost=0.00..51056.43 rows=49453 width=4) (actual time=0.14..3974.12 rows=100030 loops=1) Total runtime: 4259.67 msec EXPLAIN omicia29=# set enable_nestloop = on; SET VARIABLE omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id = 15; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..203740.72 rows=778 width=8) (actual time=8382.36..15421.84 rows=15 loops=1) -> Seq Scan on seqfeature (cost=0.00..51056.43 rows=49453 width=4) (actual time=0.14..4198.16 rows=100030 loops=1) -> Index Scan using sfqv_idx1 on sfqv (cost=0.00..3.07 rows=1 width=4) (actual time=0.11..0.11 rows=0 loops=100030) Total runtime: 15422.03 msec however I'm not sure what the implications of turning nestloop off altogether are - maybe i can hardcode it just for this query > > CREATE INDEX sfqv_idx4 ON sfqv USING btree (seqfeature_id, qualifier_value); > > > I would have thought sfqv_idx4 would be useful in this particular query? > > You'd have to write > > select ... where qualifier_value = 'BRCA1' and > seqfeature.seqfeature_key_id = 15 and > sfqv.seqfeature_key_id = 15 > > to get it to consider that index. You know and I know that the join > should imply sfqv.seqfeature_key_id = 15, but the planner doesn't. sorry, my column names are confusing the primary key of the seqfeature table is "seqfeature_id", seqfeature_key_id is actually a FK into another table entirely (ideally this would be a 3 table join, but the 3rd table is small so I prefetch all the pkeys in my code which speeds things up) > regards, tom lane >
Chris Mungall <cjm@fruitfly.org> writes: > On Sun, 11 Aug 2002, Tom Lane wrote: >> Odd ... I'm surprised it doesn't choose a hash join. What do you get if >> you try it with "set enable_nestloop = off" ? > Much better! > omicia29=# set enable_nestloop = off; > SET VARIABLE > omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL > JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id > = 15; > NOTICE: QUERY PLAN: > Hash Join (cost=55921.31..223860.67 rows=778 width=8) (actual > time=4249.63..4259.47 rows=15 loops=1) > -> Index Scan using sfqv_idx3 on sfqv (cost=0.00..167411.84 rows=41423 > width=4) (actual time=38.05..44.50 rows=110 loops=1) > -> Hash (cost=51056.43..51056.43 rows=49453 width=4) (actual > time=4211.15..4211.15 rows=0 loops=1) > -> Seq Scan on seqfeature (cost=0.00..51056.43 rows=49453 > width=4) (actual time=0.14..3974.12 rows=100030 loops=1) > Total runtime: 4259.67 msec Hmm. It looks like the reason the planner doesn't like this plan is that it's vastly overestimating the number of rows it will get from indexscanning sfqv_idx3 for qualifier_value = 'BRCA1' (viz, 41423 rather than the true 110). This is a statistical failure. What do you get from select * from pg_stats where tablename = 'sfqv' ? You might try increasing the statistics target for the qualifier_value column --- I'm guessing that you need more resolution in the stats to deal correctly with low-probability data. Try ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS 100; ANALYZE sfqv; -- to recompute stats then see how the EXPLAIN results change. (The default stats target is 10; 100 might be more than you need, or perhaps not.) > however I'm not sure what the implications of turning nestloop off > altogether are - maybe i can hardcode it just for this query It'd be best not to. I'd counsel seeing if more stats help, first. regards, tom lane
On Sun, 11 Aug 2002, Tom Lane wrote: > Chris Mungall <cjm@fruitfly.org> writes: > > On Sun, 11 Aug 2002, Tom Lane wrote: > >> Odd ... I'm surprised it doesn't choose a hash join. What do you get if > >> you try it with "set enable_nestloop = off" ? > > > Much better! > > > omicia29=# set enable_nestloop = off; > > SET VARIABLE > > omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL > > JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id > > = 15; > > NOTICE: QUERY PLAN: > > > Hash Join (cost=55921.31..223860.67 rows=778 width=8) (actual > > time=4249.63..4259.47 rows=15 loops=1) > > -> Index Scan using sfqv_idx3 on sfqv (cost=0.00..167411.84 rows=41423 > > width=4) (actual time=38.05..44.50 rows=110 loops=1) > > -> Hash (cost=51056.43..51056.43 rows=49453 width=4) (actual > > time=4211.15..4211.15 rows=0 loops=1) > > -> Seq Scan on seqfeature (cost=0.00..51056.43 rows=49453 > > width=4) (actual time=0.14..3974.12 rows=100030 loops=1) > > Total runtime: 4259.67 msec > > Hmm. It looks like the reason the planner doesn't like this plan is > that it's vastly overestimating the number of rows it will get from > indexscanning sfqv_idx3 for qualifier_value = 'BRCA1' (viz, 41423 > rather than the true 110). This is a statistical failure. > > What do you get from > select * from pg_stats where tablename = 'sfqv' > ? omicia29=# select * from pg_stats where tablename = 'sfqv'; tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation -----------+------------------+-----------+-----------+------------+------------------+-------------------------------------+-------------------------------------------------------------------------------------+------------- sfqv | seqfeature_id | 0 | 4 | -0.2135 | {394598,942245} | {0.000666667,0.000666667} | {1106,276334,566476,832842,1101265,1360912,1624199,1889730,2125178,2408495,2632006} | 1 sfqv | ontology_term_id | 0 | 4 | 22 | {13,7} | {0.516,0.315667} | {5,6,9,9,9,15,15,16,19,20,81} | 0.342753 sfqv | qualifier_rank | 0 | 4 | 4 | {1,2,3,4} | {0.738,0.26,0.00133333,0.000666667} | | 0.552338 (3 rows) i'm not sure how to interpret this, but it seems odd that it doesn't contain the attname 'qualifier_value' this attribute is definitely there! omicia29=# explain select distinct qualifier_value from sfqv; NOTICE: QUERY PLAN: Unique (cost=5655636.66..5676348.20 rows=828462 width=259) -> Sort (cost=5655636.66..5655636.66 rows=8284617 width=259) -> Seq Scan on sfqv (cost=0.00..401486.17 rows=8284617 width=259) most of values of this column will be fairly rare; however, there are possibly a few distinct values that crop up again and again - this possibly skews the statistics > You might try increasing the statistics target for the qualifier_value > column --- I'm guessing that you need more resolution in the stats to > deal correctly with low-probability data. Try > ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS 100; > ANALYZE sfqv; -- to recompute stats > then see how the EXPLAIN results change. (The default stats target is > 10; 100 might be more than you need, or perhaps not.) Hmm, it still doesn't force a hash join i've included the output from the same query again, twice - first with nestloop disabled, then with it enabled. omicia29=# ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS 100; ALTER omicia29=# vacuum analyze sfqv; VACUUM omicia29=# set enable_nestloop = off; SET VARIABLE omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id = 15; NOTICE: QUERY PLAN: Hash Join (cost=55921.31..223860.67 rows=778 width=8) (actual time=16554.37..16566.89 rows=15 loops=1) -> Index Scan using sfqv_idx3 on sfqv (cost=0.00..167411.84 rows=41423 width=4) (actual time=82.68..91.59 rows=110 loops=1) -> Hash (cost=51056.43..51056.43 rows=49453 width=4) (actual time=16471.29..16471.29 rows=0 loops=1) -> Seq Scan on seqfeature (cost=0.00..51056.43 rows=49453 width=4) (actual time=0.19..16258.95 rows=100030 loops=1) Total runtime: 16567.15 msec EXPLAIN omicia29=# set enable_nestloop = on; SET VARIABLE omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id = 15; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..202812.42 rows=778 width=8) (actual time=56557.06..84107.30 rows=15 loops=1) -> Seq Scan on seqfeature (cost=0.00..51056.43 rows=49453 width=4) (actual time=0.13..4572.13 rows=100030 loops=1) -> Index Scan using sfqv_idx1 on sfqv (cost=0.00..3.06 rows=1 width=4) (actual time=0.79..0.79 rows=0 loops=100030) Total runtime: 84107.46 msec > > however I'm not sure what the implications of turning nestloop off > > altogether are - maybe i can hardcode it just for this query > > It'd be best not to. I'd counsel seeing if more stats help, first. Ok, I'm going to try upping it from 100.... > regards, tom lane >
On Sun, 11 Aug 2002, Chris Mungall wrote: > > You might try increasing the statistics target for the qualifier_value > > column --- I'm guessing that you need more resolution in the stats to > > deal correctly with low-probability data. Try > > ALTER TABLE sfqv ALTER COLUMN qualifier_value SET STATISTICS 100; > > ANALYZE sfqv; -- to recompute stats > > then see how the EXPLAIN results change. (The default stats target is > > 10; 100 might be more than you need, or perhaps not.) > > Hmm, it still doesn't force a hash join > > i've included the output from the same query again, twice - first with > nestloop disabled, then with it enabled. > [SNIP] > > > > > however I'm not sure what the implications of turning nestloop off > > > altogether are - maybe i can hardcode it just for this query > > > > It'd be best not to. I'd counsel seeing if more stats help, first. > > Ok, I'm going to try upping it from 100.... I tried 1000, it still insists on doing a nested loop. any other suggestions for forcing a hash join over a nested loop? > > regards, tom lane > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
Chris Mungall <cjm@fruitfly.org> writes: > i'm not sure how to interpret this, but it seems odd that it doesn't > contain the attname 'qualifier_value' That's not just odd, but strikingly wrong. You did say that qualifier_value is a plain old varchar column, no? I'm really at a loss how we could fail to garner any stats for it. Would you be willing to let me have access to your system to look into this? If so, contact me off-list to arrange details. regards, tom lane