query optimization differs between view and explicit query

От: Reece Hart
Тема: query optimization differs between view and explicit query
Дата: ,
Msg-id: 1075422699.4062.164.camel@tallac
(см: обсуждение, исходный текст)
Ответы: Re: query optimization differs between view and explicit  (Christopher Kings-Lynne)
Re: query optimization differs between view and explicit  (Stephan Szabo)
Список: pgsql-performance

Скрыть дерево обсуждения

query optimization differs between view and explicit query  (Reece Hart, )
 Re: query optimization differs between view and explicit  (Christopher Kings-Lynne, )
 Re: query optimization differs between view and explicit  (Stephan Szabo, )
  Re: query optimization differs between view and explicit  (Tom Lane, )
   Re: query optimization differs between view and explicit  (Reece Hart, )

I have a large query which I would like to place in a view. The explicit query is sufficiently fast, but the same query as a view is much slower and uses a different plan. I would appreciate an explanation of why this is, and, more importantly whether/how I might coax the view to use a different plan.


The original query:
rkh@csb-dev=> select distinct on (AH.p2gblataln_id) AH.p2gblataln_id,H.pseq_id,min(H.pstart) as "pstart",
max(H.pstop) as "pstop",A.ident,(A.ident/Q.len::float*100)::int as "pct_ident",
sum(H.pstop-H.pstart+1) as "aln_length",H.genasm_id,H.chr,H.plus_strand,min(H.gstart) as "gstart",
max(H.gstop) as "gstop"
from p2gblathsp H
join p2gblatalnhsp AH on H.p2gblathsp_id=AH.p2gblathsp_id
join p2gblataln A on AH.p2gblataln_id=A.p2gblataln_id
join pseq Q on H.pseq_id=Q.pseq_id
where H.pseq_id=76
group by AH.p2gblataln_id,H.pseq_id,H.genasm_id,H.chr,H.plus_strand,A.ident,Q.len
\g /dev/null
Time: 277.804 ms
Now as a view:
rkh@csb-dev=> create view v1 as
select distinct on (AH.p2gblataln_id) AH.p2gblataln_id,H.pseq_id,min(H.pstart) as "pstart",
max(H.pstop) as "pstop",A.ident,(A.ident/Q.len::float*100)::int as "pct_ident",
sum(H.pstop-H.pstart+1) as "aln_length",H.genasm_id,H.chr,H.plus_strand,min(H.gstart) as "gstart",
max(H.gstop) as "gstop"
from p2gblathsp H
join p2gblatalnhsp AH on H.p2gblathsp_id=AH.p2gblathsp_id
join p2gblataln A on AH.p2gblataln_id=A.p2gblataln_id
join pseq Q on H.pseq_id=Q.pseq_id
group by AH.p2gblataln_id,H.pseq_id,H.genasm_id,H.chr,H.plus_strand,A.ident,Q.len;
CREATE VIEW
Time: 103.041 ms

rkh@csb-dev=> select * from v1 where pseq_id=76 \g /dev/null
Time: 31973.979 ms

Okay, that's ~100x slower. The plans:

rkh@csb-dev=> explain select distinct on <snip... same as the first query above>
                                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------- Unique  (cost=11157.75..11187.26 rows=454 width=40)
   ->  GroupAggregate  (cost=11157.75..11186.13 rows=454 width=40)
         ->  Sort  (cost=11157.75..11158.89 rows=454 width=40)
               Sort Key: ah.p2gblataln_id, h.pseq_id, h.genasm_id, h.chr, h.plus_strand, a.ident, q.len
               ->  Nested Loop  (cost=11125.62..11137.71 rows=454 width=40)
                     ->  Index Scan using pseq_pkey on pseq q  (cost=0.00..3.01 rows=2 width=6)
                           Index Cond: (76 = pseq_id)
                     ->  Materialize  (cost=11125.62..11127.89 rows=227 width=38)
                           ->  Nested Loop  (cost=546.15..11125.62 rows=227 width=38)
                                 ->  Hash Join  (cost=546.15..10438.72 rows=227 width=34)
                                       Hash Cond: ("outer".p2gblathsp_id = "inner".p2gblathsp_id)
                                       ->  Seq Scan on p2gblatalnhsp ah  (cost=0.00..6504.03 rows=451503 width=8)
                                       ->  Hash  (cost=545.58..545.58 rows=227 width=34)
                                             ->  Index Scan using p2gblathsp_p_lookup on p2gblathsp h  (cost=0.00..545.58 rows=227 wid                                                   Index Cond: (pseq_id = 76)
                                 ->  Index Scan using p2gblataln_pkey on p2gblataln a  (cost=0.00..3.01 rows=1 width=8)
                                       Index Cond: ("outer".p2gblataln_id = a.p2gblataln_id)
(17 rows)


rkh@csb-dev=> explain select * from v1 where pseq_id=76;
                                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan v1  (cost=246907.54..281897.70 rows=2258 width=77)
   Filter: (pseq_id = 76)
   ->  Unique  (cost=246907.54..276254.13 rows=451486 width=40)
         ->  GroupAggregate  (cost=246907.54..275125.41 rows=451486 width=40)
               ->  Sort  (cost=246907.54..248036.25 rows=451486 width=40)
                     Sort Key: ah.p2gblataln_id, h.pseq_id, h.genasm_id, h.chr, h.plus_strand, a.ident, q.len
                     ->  Hash Join  (cost=14019.29..204503.24 rows=451486 width=40)
                           Hash Cond: ("outer".p2gblataln_id = "inner".p2gblataln_id)
                           ->  Hash Join  (cost=7632.79..191344.45 rows=451486 width=36)
                                 Hash Cond: ("outer".p2gblathsp_id = "inner".p2gblathsp_id)
                                 ->  Merge Join  (cost=0.00..176939.38 rows=451486 width=36)
                                       Merge Cond: ("outer".pseq_id = "inner".pseq_id)
                                       ->  Index Scan using p2gblathsp_p_lookup on p2gblathsp h  (cost=0.00..16102.40 rows=451485 widt        

В списке pgsql-performance по дате сообщения:

От: Tom Lane
Дата:
Сообщение: Re: query optimization differs between view and explicit
От: Dennis Bjorklund
Дата:
Сообщение: Re: query optimization question