Обсуждение: Subqueries and indexes
In this QUERY: SELECT keynameFROM markmainWHERE mark_id NOT IN(SELECT mark_id FROM markaty) I have an index on markaty.mark_id, and have vacuum analyzed. EXPLAIN shows: Seq Scan on markmain (cost=2051.43 size=45225 width=12) SubPlan -> Seq Scan on markaty (cost=2017.41 size=52558 width=4) Vadim, why isn't this using the index? Each table has 50k rows. Is it NOT IN that is causing the problem? IN produces the same plan, though. If I do a traditional join: SELECT keyname FROM markmain , markaty WHERE markmain.mark_id = markaty.mark_id I then get a hash join plan:Hash Join (cost=10768.51 size=90519 width=20) -> Seq Scan on markmain (cost=2051.43 size=45225width=16) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on markaty (cost=2017.41 size=52558 width=4) Seems the optimizer could either hash the subquery, or us an index. Certainly would be faster than a sequental scan, no? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life IS a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > In this QUERY: > > SELECT keyname > FROM markmain > WHERE mark_id NOT IN(SELECT mark_id > FROM markaty) > > I have an index on markaty.mark_id, and have vacuum analyzed. EXPLAIN > shows: > > Seq Scan on markmain (cost=2051.43 size=45225 width=12) > SubPlan > -> Seq Scan on markaty (cost=2017.41 size=52558 width=4) > > Vadim, why isn't this using the index? Each table has 50k rows. Is it > NOT IN that is causing the problem? IN produces the same plan, though. .... > > Seems the optimizer could either hash the subquery, or us an index. > Certainly would be faster than a sequental scan, no? Optimizer should hash the subquery, but I didn't implement this -:( Try to rewrite query using NOT EXISTS and index will be used. Vadim
> Bruce Momjian wrote: > > > > In this QUERY: > > > > SELECT keyname > > FROM markmain > > WHERE mark_id NOT IN(SELECT mark_id > > FROM markaty) > > > > I have an index on markaty.mark_id, and have vacuum analyzed. EXPLAIN > > shows: > > > > Seq Scan on markmain (cost=2051.43 size=45225 width=12) > > SubPlan > > -> Seq Scan on markaty (cost=2017.41 size=52558 width=4) > > > > Vadim, why isn't this using the index? Each table has 50k rows. Is it > > NOT IN that is causing the problem? IN produces the same plan, though. > .... > > > > Seems the optimizer could either hash the subquery, or us an index. > > Certainly would be faster than a sequental scan, no? > > Optimizer should hash the subquery, but I didn't implement this -:( > Try to rewrite query using NOT EXISTS and index will be used. How hard would it be to implement it? I know you are deep into MVCC, but doing a nested loop to join a subquery is really bad. Now, in our defense, I tried this with commercial Ingres 6.4, and it took so long I copied the data into PostgreSQL and tried to run it there. Eventually, I copied the data into a second table, and did a DELETE FROM using two tables in the WHERE clause, and the rows left where my NOT IN result. It did use a hash join in that case. Obviously, Ingres was doing a nested loop do, but I want to do better than Ingres. I think we really need to get that hash enabled. Is there something I can do to enable it, or can I do something to help you enable it? All queries can't be rewritten as EXISTS. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > > > > > Seems the optimizer could either hash the subquery, or us an index. > > > Certainly would be faster than a sequental scan, no? > > > > Optimizer should hash the subquery, but I didn't implement this -:( > > Try to rewrite query using NOT EXISTS and index will be used. > > How hard would it be to implement it? I know you are deep into MVCC, > but doing a nested loop to join a subquery is really bad. Not very hard, for un-correlated subqueries at least. I have no time to do this for 6.5... > ... > > All queries can't be rewritten as EXISTS. All except of subqueries with aggregates in target list. Vadim
> Bruce Momjian wrote: > > > > > > > > > > Seems the optimizer could either hash the subquery, or us an index. > > > > Certainly would be faster than a sequental scan, no? > > > > > > Optimizer should hash the subquery, but I didn't implement this -:( > > > Try to rewrite query using NOT EXISTS and index will be used. > > > > How hard would it be to implement it? I know you are deep into MVCC, > > but doing a nested loop to join a subquery is really bad. > > Not very hard, for un-correlated subqueries at least. > I have no time to do this for 6.5... Is it possible before 6.5 final? > > All queries can't be rewritten as EXISTS. > > All except of subqueries with aggregates in target list. I am confused. How do I rewrite this to use exists? SELECT keyname FROM markmain WHERE mark_id NOT IN(SELECT mark_id FROM markaty) Even if I use IN instead of NOT IN, I don't see how to do it without making it a correlated subquery. SELECT keyname FROM markmain WHERE EXISTS (SELECT mark_id FROM markaty WHERE markmain.mark_id = markaty.mark_id) This is a correlated subquery. It did not use hash, but it did use the index on markaty: Seq Scan on markmain (cost=16.02 size=334 width=12) SubPlan -> Index Scan using i_markaty on markaty (cost=2.10 size=3width=4) While the index usage is good, the fact is the subquery is executed for every row of markmain, isn't it? That's one query executed for each row in markmain, isn't it? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026