Re: Joining two large tables on a tiny subset of rows

Поиск
Список
Период
Сортировка
От Gnanavel S
Тема Re: Joining two large tables on a tiny subset of rows
Дата
Msg-id eec3b03c050726204813908f41@mail.gmail.com
обсуждение исходный текст
Ответ на Joining two large tables on a tiny subset of rows  ("Dmitri Bichko" <dbichko@aveopharma.com>)
Список pgsql-sql


On 7/27/05, Dmitri Bichko <dbichko@aveopharma.com> wrote:
Hello,

I have two tables, one has a foreing key from the other (only showing
the relevant columns and indices here):

             Table "expresso.probes"
   Column    |          Type          | Modifiers
-------------+------------------------+-----------
platform_id | integer                | not null
probe_num   | integer                | not null
mrna_acc    | character varying(50)  |
Indexes:
    "idx_probes_mrna_acc" btree (mrna_acc, platform_id) WHERE mrna_acc
IS NOT NULL

           Table "expresso.mrna_info"
   Column    |         Type          | Modifiers
-------------+-----------------------+-----------
mrna_acc    | character varying(25) | not null
symbol      | character varying(50) | not null
Indexes:
    "idx_mrna_info_symbol" btree (upper(symbol::text)) WHERE symbol IS
NOT NULL

I want to select all the records in "probes" which have an "mrna_acc"
that is associated with a specific symbol in "mrna_info".

This works fine if I do it in two steps: select from "mrna_info" (12
records out of ~250 thousand) and paste those into an IN clause to
select from "probes" (83 records out of ~4.5 million).

But if I try to do this as one query I always end up with a seq scan on
"probes".  I've tried doing it with joins, subselects, joining with a
dynamic table (select from () as foo), nothing seems to get the desired
result.

Here are the explains for the two step process:

expression=> EXPLAIN ANALYZE SELECT mrna_acc FROM mrna_info WHERE
upper(symbol) = upper('pparg') AND symbol IS NOT NULL;
                                                              QUERY PLAN
------------------------------------------------------------------------
---------------------------------------------------------------
Index Scan using idx_mrna_info_symbol on mrna_info  (cost=0.00..2934.78
rows=930 width=12) (actual time=0.038..0.089 rows=12 loops=1)
   Index Cond: (upper((symbol)::text) = 'PPARG'::text)
   Filter: (symbol IS NOT NULL)
Total runtime: 0.123 ms
(4 rows)

Whether any records are returned by this query. Moreover "symbol is not null" condition is not needed as that column cannot be null by table definition.

EXPLAIN ANALYZE SELECT platform_id, probe_num FROM probes WHERE mrna_acc
IN
('U10374','U09138','U01841','U01664','NM_015869','NM_013124','NM_011146'
,'NM_005037','D83233','BC021798','BC006811','AB011365') AND mrna_acc IS
NOT NULL;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
--------------------------------------------------------------------
Index Scan using idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc on probes  (cost=0.00..14710.63 rows=4151 width=8)
(actual time=0.040..0.719 rows=142 loops=1)
   Index Cond: (((mrna_acc)::text = 'U10374'::text) OR ((mrna_acc)::text
= 'U09138'::text) OR ((mrna_acc)::text = 'U01841'::text) OR
((mrna_acc)::text = 'U01664'::text) OR ((mrna_acc)::text =
'NM_015869'::text) OR ((mrna_acc)::text = 'NM_013124'::text) OR
((mrna_acc)::text = 'NM_011146'::text) OR ((mrna_acc)::text =
'NM_005037'::text) OR ((mrna_acc)::text = 'D83233'::text) OR
((mrna_acc)::text = 'BC021798'::text) OR ((mrna_acc)::text =
'BC006811'::text) OR ((mrna_acc)::text = 'AB011365'::text))
   Filter: (mrna_acc IS NOT NULL)
Total runtime: 0.844 ms
(4 rows)

And here is the explain of the join (it's essentially the same plan as
the subselect and all the other ways I've tried):

expression=> explain SELECT platform_id, probe_num FROM mrna_info m,
probes p WHERE m.mrna_acc = p.mrna_acc and p.mrna_acc is not null and
UPPER(symbol) = UPPER('pparg') AND m.symbol IS NOT NULL;

Here m.mrna_acc cannot be null, so condition on p.mrna_acc is not null is useless as m.mrna_acc = p.mrna_acc (for null) will not give any records
Try to  use outer join.

                                                QUERY PLAN
------------------------------------------------------------------------
----------------------------------
Hash Join  (cost=2937.10..133392.04 rows=9939 width=8)
   Hash Cond: (("outer".mrna_acc)::text = ("inner".mrna_acc)::text)
   ->  Seq Scan on probes p  (cost=0.00..115434.22 rows=2984265
width=19)
         Filter: (mrna_acc IS NOT NULL)
   ->  Hash  (cost=2934.78..2934.78 rows=930 width=12)
         ->  Index Scan using idx_mrna_info_symbol on mrna_info m
(cost=0.00..2934.78 rows=930 width=12)
               Index Cond: (upper((symbol)::text) = 'PPARG'::text)
               Filter: (symbol IS NOT NULL)
(8 rows)

What am I doing wrong?

Dmitri

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org



--
with regards,
S.Gnanavel
Satyam Computer Services Ltd.

В списке pgsql-sql по дате отправления:

Предыдущее
От: "Dmitri Bichko"
Дата:
Сообщение: Joining two large tables on a tiny subset of rows
Следующее
От: Achilleus Mantzios
Дата:
Сообщение: Re: REINDEX DATABASE