Обсуждение: Subqueries and indexes

Поиск
Список
Период
Сортировка

Subqueries and indexes

От
Bruce Momjian
Дата:
In this QUERY:
SELECT keynameFROM markmainWHERE mark_id NOT IN(SELECT mark_id              FROM markaty)

I have an index on markaty.mark_id, and have vacuum analyzed.  EXPLAIN
shows:
Seq Scan on markmain  (cost=2051.43 size=45225 width=12)  SubPlan    ->  Seq Scan on markaty  (cost=2017.41 size=52558
width=4)

Vadim, why isn't this using the index?  Each table has 50k rows.  Is it
NOT IN that is causing the problem?  IN produces the same plan, though. 
If I do a traditional join:        
SELECT keyname       FROM markmain , markaty       WHERE markmain.mark_id = markaty.mark_id

I then get a hash join plan:Hash Join  (cost=10768.51 size=90519 width=20)  ->  Seq Scan on markmain  (cost=2051.43
size=45225width=16)  ->  Hash  (cost=0.00 size=0 width=0)        ->  Seq Scan on markaty  (cost=2017.41 size=52558
width=4)

Seems the optimizer could either hash the subquery, or us an index. 
Certainly would be faster than a sequental scan, no?




--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life IS a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Subqueries and indexes

От
Vadim Mikheev
Дата:
Bruce Momjian wrote:
> 
> In this QUERY:
> 
>         SELECT keyname
>         FROM markmain
>         WHERE mark_id NOT IN(SELECT mark_id
>                              FROM markaty)
> 
> I have an index on markaty.mark_id, and have vacuum analyzed.  EXPLAIN
> shows:
> 
>         Seq Scan on markmain  (cost=2051.43 size=45225 width=12)
>           SubPlan
>             ->  Seq Scan on markaty  (cost=2017.41 size=52558 width=4)
> 
> Vadim, why isn't this using the index?  Each table has 50k rows.  Is it
> NOT IN that is causing the problem?  IN produces the same plan, though.
....
> 
> Seems the optimizer could either hash the subquery, or us an index.
> Certainly would be faster than a sequental scan, no?

Optimizer should hash the subquery, but I didn't implement this -:(
Try to rewrite query using NOT EXISTS and index will be used.

Vadim


Re: [HACKERS] Subqueries and indexes

От
Bruce Momjian
Дата:
> Bruce Momjian wrote:
> > 
> > In this QUERY:
> > 
> >         SELECT keyname
> >         FROM markmain
> >         WHERE mark_id NOT IN(SELECT mark_id
> >                              FROM markaty)
> > 
> > I have an index on markaty.mark_id, and have vacuum analyzed.  EXPLAIN
> > shows:
> > 
> >         Seq Scan on markmain  (cost=2051.43 size=45225 width=12)
> >           SubPlan
> >             ->  Seq Scan on markaty  (cost=2017.41 size=52558 width=4)
> > 
> > Vadim, why isn't this using the index?  Each table has 50k rows.  Is it
> > NOT IN that is causing the problem?  IN produces the same plan, though.
> ....
> > 
> > Seems the optimizer could either hash the subquery, or us an index.
> > Certainly would be faster than a sequental scan, no?
> 
> Optimizer should hash the subquery, but I didn't implement this -:(
> Try to rewrite query using NOT EXISTS and index will be used.

How hard would it be to implement it?  I know you are deep into MVCC,
but doing a nested loop to join a subquery is really bad.

Now, in our defense, I tried this with commercial Ingres 6.4, and it
took so long I copied the data into PostgreSQL and tried to run it
there.  Eventually, I copied the data into a second table, and did a
DELETE FROM using two tables in the WHERE clause, and the rows left
where my NOT IN result.  It did use a hash join in that case.

Obviously, Ingres was doing a nested loop do, but I want to do better
than Ingres.

I think we really need to get that hash enabled.  Is there something I
can do to enable it, or can I do something to help you enable it?

All queries can't be rewritten as EXISTS.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Subqueries and indexes

От
Vadim Mikheev
Дата:
Bruce Momjian wrote:
> 
> > >
> > > Seems the optimizer could either hash the subquery, or us an index.
> > > Certainly would be faster than a sequental scan, no?
> >
> > Optimizer should hash the subquery, but I didn't implement this -:(
> > Try to rewrite query using NOT EXISTS and index will be used.
> 
> How hard would it be to implement it?  I know you are deep into MVCC,
> but doing a nested loop to join a subquery is really bad.

Not very hard, for un-correlated subqueries at least.
I have no time to do this for 6.5...

> 
...
> 
> All queries can't be rewritten as EXISTS.

All except of subqueries with aggregates in target list.

Vadim


Re: [HACKERS] Subqueries and indexes

От
Bruce Momjian
Дата:
> Bruce Momjian wrote:
> > 
> > > >
> > > > Seems the optimizer could either hash the subquery, or us an index.
> > > > Certainly would be faster than a sequental scan, no?
> > >
> > > Optimizer should hash the subquery, but I didn't implement this -:(
> > > Try to rewrite query using NOT EXISTS and index will be used.
> > 
> > How hard would it be to implement it?  I know you are deep into MVCC,
> > but doing a nested loop to join a subquery is really bad.
> 
> Not very hard, for un-correlated subqueries at least.
> I have no time to do this for 6.5...


Is it possible before 6.5 final?


> > All queries can't be rewritten as EXISTS.
> 
> All except of subqueries with aggregates in target list.

I am confused.  How do I rewrite this to use exists?
        SELECT keyname        FROM markmain        WHERE mark_id NOT IN(SELECT mark_id                             FROM
markaty)


Even if I use IN instead of NOT IN, I don't see how to do it without
making it a correlated subquery.
        SELECT keyname        FROM markmain        WHERE EXISTS (SELECT mark_id                      FROM markaty
   WHERE markmain.mark_id = markaty.mark_id)
 

This is a correlated subquery.  It did not use hash, but it did use the
index on markaty:
Seq Scan on markmain  (cost=16.02 size=334 width=12)  SubPlan    ->  Index Scan using i_markaty on markaty  (cost=2.10
size=3width=4)
 

While the index usage is good, the fact is the subquery is executed for
every row of markmain, isn't it?  That's one query executed for each row
in markmain, isn't it?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026