Joining two large tables on a tiny subset of rows
От | Dmitri Bichko |
---|---|
Тема | Joining two large tables on a tiny subset of rows |
Дата | |
Msg-id | F18A6F7CF1661F46920F2CF713122FED46CC35@mail.aveo.aveopharma.com обсуждение исходный текст |
Ответы |
Re: Joining two large tables on a tiny subset of rows
Re: Joining two large tables on a tiny subset of rows |
Список | pgsql-sql |
Hello, I have two tables, one has a foreing key from the other (only showing the relevant columns and indices here): Table "expresso.probes" Column | Type | Modifiers -------------+------------------------+-----------platform_id | integer | not nullprobe_num | integer | not nullmrna_acc | character varying(50) | Indexes: "idx_probes_mrna_acc" btree (mrna_acc, platform_id) WHERE mrna_acc IS NOT NULL Table "expresso.mrna_info" Column | Type | Modifiers -------------+-----------------------+-----------mrna_acc | character varying(25) | not nullsymbol | character varying(50)| not null Indexes: "idx_mrna_info_symbol" btree (upper(symbol::text)) WHERE symbol IS NOT NULL I want to select all the records in "probes" which have an "mrna_acc" that is associated with a specific symbol in "mrna_info". This works fine if I do it in two steps: select from "mrna_info" (12 records out of ~250 thousand) and paste those into an IN clause to select from "probes" (83 records out of ~4.5 million). But if I try to do this as one query I always end up with a seq scan on "probes". I've tried doing it with joins, subselects, joining with a dynamic table (select from () as foo), nothing seems to get the desired result. Here are the explains for the two step process: expression=> EXPLAIN ANALYZE SELECT mrna_acc FROM mrna_info WHERE upper(symbol) = upper('pparg') AND symbol IS NOT NULL; QUERYPLAN ------------------------------------------------------------------------ ---------------------------------------------------------------Index Scan using idx_mrna_info_symbol on mrna_info (cost=0.00..2934.78 rows=930 width=12) (actual time=0.038..0.089 rows=12 loops=1) Index Cond: (upper((symbol)::text) = 'PPARG'::text) Filter:(symbol IS NOT NULL)Total runtime: 0.123 ms (4 rows) EXPLAIN ANALYZE SELECT platform_id, probe_num FROM probes WHERE mrna_acc IN ('U10374','U09138','U01841','U01664','NM_015869','NM_013124','NM_011146' ,'NM_005037','D83233','BC021798','BC006811','AB011365') AND mrna_acc IS NOT NULL; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------ --------------------------------------------------------------------Index Scan using idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc on probes (cost=0.00..14710.63 rows=4151 width=8) (actual time=0.040..0.719 rows=142 loops=1) Index Cond: (((mrna_acc)::text = 'U10374'::text) OR ((mrna_acc)::text = 'U09138'::text) OR ((mrna_acc)::text = 'U01841'::text) OR ((mrna_acc)::text = 'U01664'::text) OR ((mrna_acc)::text = 'NM_015869'::text) OR ((mrna_acc)::text = 'NM_013124'::text) OR ((mrna_acc)::text = 'NM_011146'::text) OR ((mrna_acc)::text = 'NM_005037'::text) OR ((mrna_acc)::text = 'D83233'::text) OR ((mrna_acc)::text = 'BC021798'::text) OR ((mrna_acc)::text = 'BC006811'::text) OR ((mrna_acc)::text = 'AB011365'::text)) Filter: (mrna_acc IS NOT NULL)Total runtime: 0.844 ms (4 rows) And here is the explain of the join (it's essentially the same plan as the subselect and all the other ways I've tried): expression=> explain SELECT platform_id, probe_num FROM mrna_info m, probes p WHERE m.mrna_acc = p.mrna_acc and p.mrna_acc is not null and UPPER(symbol) = UPPER('pparg') AND m.symbol IS NOT NULL; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------Hash Join (cost=2937.10..133392.04 rows=9939 width=8) Hash Cond: (("outer".mrna_acc)::text= ("inner".mrna_acc)::text) -> Seq Scan on probes p (cost=0.00..115434.22 rows=2984265 width=19) Filter: (mrna_acc IS NOT NULL) -> Hash (cost=2934.78..2934.78 rows=930 width=12) -> Index Scanusing idx_mrna_info_symbol on mrna_info m (cost=0.00..2934.78 rows=930 width=12) Index Cond: (upper((symbol)::text) = 'PPARG'::text) Filter:(symbol IS NOT NULL) (8 rows) What am I doing wrong? Dmitri The information transmitted is intended only for the person or entity to which it is addressed and may contain confidentialand/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any actionin reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you receivedthis in error, please contact the sender and delete the material from any computer
В списке pgsql-sql по дате отправления: