Re: Can someone explain the problem with this select

Поиск
Список
Период
Сортировка
От Ted Allen
Тема Re: Can someone explain the problem with this select
Дата
Msg-id 4575E11F.8000803@blackducksoftware.com
обсуждение исходный текст
Ответ на Can someone explain the problem with this select  (Richard Ray <rray@mstc.state.ms.us>)
Список pgsql-sql
Hey Ray,

I'm by no means a guru but here is my simple analysis.  In the first 
query, the 10 "documents" specified 'IN' the in are first selected from 
the "documents" table.  Then, those 10 rows are joined with the 
"comments" table. 

In the second query, every row in the "documents" table is joined with 
the "comments" table, which took forever according to the Explain 
Analyse.  Then, the results of the complete join of those two tables 
from are compared against the sub-select. 

I'm guessing (emphasis on guessing) that the query planner chose this 
approach because it does not know how many rows the sub-select will 
return so it does the join and then checks those results against that 
sub-select.

Another approach may be to do this.

Create Temp table limit_documents as select doc_num from documents limit 10;
select * from  limit_documents left outer join comments on 
(limit_documents.doc_num = comments.doc_num);


Hope that helps,
Ted

Richard Ray wrote:
> Allow me to demonstrate my pitiful SQL knowledge
> I have tables documents and comments
> If I run join and list doc_nums the query is quite fast
> If I run join and use subselect the query is extremely slow
> Can someone offer analysis
>
> Thanks
> Richard
>
> dcc=# EXPLAIN ANALYZE select doc_num from documents limit 10;
>                                                       QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------
 
>
>  Limit  (cost=0.00..1.01 rows=10 width=13) (actual time=0.013..0.061 
> rows=10 loops=1)
>    ->  Seq Scan on documents  (cost=0.00..909333.85 rows=9014885 
> width=13) (actual time=0.008..0.027 rows=10 loops=1)
>  Total runtime: 0.125 ms
> (3 rows)
>
> dcc=#
>
>
>
> dcc=#  EXPLAIN ANALYZE select * from documents left outer join 
> comments on (documents.doc_num = comments.doc_num) where 
> documents.doc_num in 
>
('105364107','105513059','105513095','105513112','105585627','102933195','014650340','014650361','014650362','105419865');

>
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>  Hash Left Join  (cost=21.23..61.54 rows=10 width=444) (actual 
> time=0.507..0.574 rows=10 loops=1)
>    Hash Cond: ("outer".doc_num = "inner".doc_num)
>    ->  Bitmap Heap Scan on documents  (cost=20.03..60.28 rows=10 
> width=361) (actual time=0.397..0.432 rows=10 loops=1)
>          Recheck Cond: ((doc_num = '105364107'::bpchar) OR (doc_num = 
> '105513059'::bpchar) OR (doc_num = '105513095'::bpchar) OR (doc_num = 
> '105513112'::bpchar) OR (doc_num = '105585627'::bpchar) OR (doc_num = 
> '102933195'::bpchar) OR (doc_num = '014650340'::bpchar) OR (doc_num = 
> '014650361'::bpchar) OR (doc_num = '014650362'::bpchar) OR (doc_num = 
> '105419865'::bpchar))
>          ->  BitmapOr  (cost=20.03..20.03 rows=10 width=0) (actual 
> time=0.383..0.383 rows=0 loops=1)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.059..0.059 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105364107'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.039..0.039 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105513059'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.034..0.034 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105513095'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105513112'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.034..0.034 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105585627'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.036..0.036 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '102933195'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.036..0.036 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '014650340'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '014650361'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '014650362'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.035..0.035 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105419865'::bpchar)
>    ->  Hash  (cost=1.16..1.16 rows=16 width=83) (actual 
> time=0.080..0.080 rows=16 loops=1)
>          ->  Seq Scan on comments  (cost=0.00..1.16 rows=16 width=83) 
> (actual time=0.005..0.037 rows=16 loops=1)
>  Total runtime: 0.775 ms
> (28 rows)
>
> dcc=#
>
>
>
> dcc=# EXPLAIN ANALYZE select * from documents left outer join comments 
> on (documents.doc_num = comments.doc_num) where documents.doc_num in 
> (select doc_num from documents limit 10);
>
> QUERY PLAN
>
> -------------------------------------------------------------------------------- 
>
> -------------------------------------------------------------------------------- 
>
> -- 
>  Merge IN Join  (cost=100000002.19..136154797.93 rows=10 width=654) 
> (actual time
> =23.534..2216180.550 rows=10 loops=1)
>    Merge Cond: ("outer".doc_num = "inner".doc_num)
>    ->  Merge Left Join  (cost=0.00..36129585.92 rows=10083868 
> width=654) (actual
>  time=23.239..2188733.430 rows=6696218 loops=1)
>          Merge Cond: ("outer".doc_num = "inner".doc_num)
>          ->  Index Scan using documents_pkey on documents 
> (cost=0.00..35723277.
> 60 rows=10083868 width=569) (actual time=6.845..2107300.767 
> rows=6695853 loops=1
> )
>          ->  Index Scan using doc_num_idx on comments 
> (cost=0.00..377203.50 row
> s=311612 width=85) (actual time=16.368..6984.365 rows=243797 loops=1)
>    ->  Sort  (cost=100000002.19..100000002.22 rows=10 width=13) 
> (actual time=0.2
> 52..0.293 rows=10 loops=1)
>          Sort Key: "IN_subquery".doc_num
>          ->  Limit  (cost=100000000.00..100000001.92 rows=10 width=13) 
> (actual t
> ime=0.019..0.128 rows=10 loops=1)
>                ->  Seq Scan on documents  
> (cost=100000000.00..101940460.68 rows=
> 10083868 width=13) (actual time=0.010..0.055 rows=10 loops=1)
>  Total runtime: 2216180.973 ms
> (11 rows)
>
> dcc=#
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>


-- 

*Edward Allen*
Software Engineer
Black Duck Software, Inc.

tallen@blackducksoftware.com <mailto:tallen@blackducksoftware.com>
T +1.781.891.5100 x133
F +1.781.891.5145
http://www.blackducksoftware.com



В списке pgsql-sql по дате отправления:

Предыдущее
От: Richard Ray
Дата:
Сообщение: Re: Can someone explain the problem with this select
Следующее
От: Ragnar
Дата:
Сообщение: Re: Using Control Flow Functions in a SELECT Statement