Обсуждение: Inner join on two OR conditions dont use index

Поиск
Список
Период
Сортировка

Inner join on two OR conditions dont use index

От
Jocelyn Turcotte
Дата:
Hi all
i dont know if this is normal, but if yes i would like to know why and
how I could do it another way other than using unions.

(I tried on postgresql 7.4 and 8.0.3, made my vacuum analyse just before)
Here is my simple query:

select *
from rt_node n, rt_edge e
where node_id = 2
and e.start_node_id = n.node_id;

which give me the following query plan:

 Nested Loop  (cost=0.00..79.46 rows=24 width=60)
   ->  Index Scan using rt_node_pkey on rt_node n  (cost=0.00..5.94
rows=1 width=36)
         Index Cond: (node_id = 2)
   ->  Index Scan using rt_edge_start_node on rt_edge e
(cost=0.00..73.28 rows=24 width=24)
         Index Cond: (start_node_id = 2)


But if I plug another condition with a OR like this:
select *
from rt_node n, rt_edge e
where node_id = 2
and (e.start_node_id = n.node_id or e.end_node_id = n.node_id);

I get this plan, it stop using the index!:

 Nested Loop  (cost=0.00..158.94 rows=4 width=60)
   Join Filter: (("inner".start_node_id = "outer".node_id) OR
("inner".end_node_id = "outer".node_id))
   ->  Index Scan using rt_node_pkey on rt_node n  (cost=0.00..5.94
rows=1 width=36)
         Index Cond: (node_id = 2)
   ->  Seq Scan on rt_edge e  (cost=0.00..81.60 rows=4760 width=24)

I tried SET enable_seqscan = OFF and it give me this (woah) :

 Nested Loop  (cost=100000000.00..100000158.94 rows=4 width=60)
   Join Filter: (("inner".start_node_id = "outer".node_id) OR
("inner".end_node_id = "outer".node_id))
   ->  Index Scan using rt_node_pkey on rt_node n  (cost=0.00..5.94
rows=1 width=36)
         Index Cond: (node_id = 2)
   ->  Seq Scan on rt_edge e  (cost=100000000.00..100000081.60
rows=4760 width=24)

These are my tables definitions:
CREATE TABLE rt_node (
    node_id         INTEGER     PRIMARY KEY
);

CREATE TABLE rt_edge (
    edge_id         INTEGER     PRIMARY KEY,
    start_node_id   INTEGER     NOT NULL,
    end_node_id     INTEGER     NOT NULL,
    CONSTRAINT start_node_ref FOREIGN KEY (start_node_id) REFERENCES
rt_node(node_id),
    CONSTRAINT end_node_ref FOREIGN KEY (end_node_id) REFERENCES
rt_node(node_id)
    );

    CREATE INDEX rt_edge_start_node ON rt_edge(start_node_id);
    CREATE INDEX rt_edge_end_node ON rt_edge(end_node_id);


I cant figure why it cant use my index
I know I can use a UNION instead on two query like the first one only
different on "start_node_id"/"end_node_id", and it works,
but this is a part of a bigger query which is already ugly and I hate
using 5 lines for something I could in 5 words.

thank you!

Re: Inner join on two OR conditions dont use index

От
John A Meinel
Дата:
Jocelyn Turcotte wrote:

>Hi all
>i dont know if this is normal, but if yes i would like to know why and
>how I could do it another way other than using unions.
>
>

The only thing that *might* work is if you used an index on both keys.
So if you did:

CREATE INDEX rt_edge_start_end_node ON rt_edge(start_node_id,end_node_id);

The reason is that in an "OR" construct, you have to check both for being true. So in the general case where you don't
knowthe correlation between the columns, you have to check all of the entries, because even if you know the status of
oneside of the OR, you don't know the other. 

Another possibility would be to try this index:

CREATE INDEX rt_edge_stare_or_end ON rt_edge(start_node_id OR end_node_id);

I'm not sure how smart the planner can be, though.

John
=:->



Вложения

Re: Inner join on two OR conditions dont use index

От
Jocelyn Turcotte
Дата:
Thanks John

it dont seems to work, but in my context I only needed data from the
rt_node table so I tried this:

select *
from rt_node n
where node_id = 2
and exists (select edge_id from rt_edge where start_node_id =
n.node_id or end_node_id = n.node_id)

and it gave me this plan (even if I remove the stupid node_id = 2 condition):

 Index Scan using rt_node_pkey on rt_node n  (cost=0.00..6.15 rows=1 width=25)
   Index Cond: (node_id = 2)
   Filter: (subplan)
   SubPlan
     ->  Index Scan using rt_edge_start_node, rt_edge_end_node on
rt_edge  (cost=0.00..12.56 rows=4 width=4)
           Index Cond: ((start_node_id = $0) OR (end_node_id = $0))


this time it use my two indexes, maybe because he know that the same
value is compared in the two condition... I should ask my mother if
she got an idea, mothers know a lot of stuff!

On 5/25/05, John A Meinel <john@arbash-meinel.com> wrote:
> Jocelyn Turcotte wrote:
>
> >Hi all
> >i dont know if this is normal, but if yes i would like to know why and
> >how I could do it another way other than using unions.
> >
> >
>
> The only thing that *might* work is if you used an index on both keys.
> So if you did:
>
> CREATE INDEX rt_edge_start_end_node ON rt_edge(start_node_id,end_node_id);
>
> The reason is that in an "OR" construct, you have to check both for being true. So in the general case where you
don'tknow the correlation between the columns, you have to check all of the entries, because even if you know the
statusof one side of the OR, you don't know the other. 
>
> Another possibility would be to try this index:
>
> CREATE INDEX rt_edge_stare_or_end ON rt_edge(start_node_id OR end_node_id);
>
> I'm not sure how smart the planner can be, though.
>
> John
> =:->
>
>
>
>
>

Re: Inner join on two OR conditions dont use index

От
Tom Lane
Дата:
Jocelyn Turcotte <turcotte.j@gmail.com> writes:
> But if I plug another condition with a OR like this:
> select *
> from rt_node n, rt_edge e
> where node_id = 2
> and (e.start_node_id = n.node_id or e.end_node_id = n.node_id);

> I get this plan, it stop using the index!:

I'm afraid you're stuck with faking it with a UNION for now; the current
planner is incapable of recognizing that a join OR condition can be
handled with OR indexscans.

FWIW, this is fixed for 8.1 --- in CVS tip, I get this from your
example:

                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2.06..17.57 rows=2 width=16)
   ->  Index Scan using rt_node_pkey on rt_node n  (cost=0.00..4.82 rows=1 width=4)
         Index Cond: (node_id = 2)
   ->  Bitmap Heap Scan on rt_edge e  (cost=2.06..12.47 rows=18 width=12)
         Recheck Cond: ((e.start_node_id = "outer".node_id) OR (e.end_node_id = "outer".node_id))
         ->  BitmapOr  (cost=2.06..2.06 rows=18 width=0)
               ->  Bitmap Index Scan on rt_edge_start_node  (cost=0.00..1.03 rows=9 width=0)
                     Index Cond: (e.start_node_id = "outer".node_id)
               ->  Bitmap Index Scan on rt_edge_end_node  (cost=0.00..1.03 rows=9 width=0)
                     Index Cond: (e.end_node_id = "outer".node_id)
(10 rows)

(This is with no data in the tables, so the cost estimates are small,
but it does show that the planner knows how to generate this kind of
query plan now.)

            regards, tom lane