On Friday 20 February 2009, Tena Sakai wrote:
> Hi Scott,
>
> > What does explain and (it'll take a while to get
> > it) explain analyze select ... have to say?
>
> ---------------------------------------------------------------------------
>---------- Hash Join (cost=165264.65..55486119.31 rows=601095277 width=32)
> Hash Cond: (genotype.allele1id = a1.alleleid)
> -> Hash Join (cost=82632.33..34731274.54 rows=601095277 width=34)
> Hash Cond: (genotype.allele2id = a2.alleleid)
> -> Seq Scan on genotype (cost=0.00..13976429.77 rows=601095277
> width=36) -> Hash (cost=42474.59..42474.59 rows=2447659 width=6) -> Seq
> Scan on allele a2 (cost=0.00..42474.59 rows=2447659 width=6) -> Hash
> (cost=42474.59..42474.59 rows=2447659 width=6)
> -> Seq Scan on allele a1 (cost=0.00..42474.59 rows=2447659
> width=6) (9 rows)
The above tells you that you don't have indices in place. Postgres chooses a
seq scan - which as the name implies scans all the rows in sequencial order.
I'd add an index on genotype.allele1id and genotype.allele2id
aka
create index gtallele1idx on genotype (allele1id);
create index gtallele2idx on genotype (allele2id);
and also on allele.alleleid:
create index alleleididx on allele (alleleid);
After a "analyze genotype" and "analyze allele" the query should perform much
better. The explain analyze should show you an index scan instead of the seq
scan after that.
HTH
Uwe