John Arbash Meinel <john@arbash-meinel.com> writes:
> > -> Hash (cost=1418.68..1418.68 rows=3226 width=4) (actual
> > time=77.062..77.062 rows=0 loops=1)
>
> This seems to be at least one of the problems. The planner thinks there
> are going to be 3000+ rows, but in reality there are 0.
No, that's a red herring. Hash nodes always report 0 rows.
> > Nested Loop (cost=0.00..23849.81 rows=7533 width=8) (actual time=0.341..198.162 rows=5798 loops=1)
> > -> Seq Scan on pdb_entry (cost=0.00..1418.68 rows=3226 width=4) (actual time=0.145..78.177 rows=3329 loops=1)
> > Filter: ((resolution > 0::double precision) AND (resolution < 1.7::double precision))
> > -> Index Scan using chain_pdb_id_ind on "chain" (cost=0.00..6.87 rows=6 width=8) (actual time=0.021..0.027
rows=2loops=3329)
> > Index Cond: ("outer".id = "chain".pdb_id)
The actual number of records is pretty close to the estimated number. And the
difference seems to come primarily from selectivity of the join where it
thinks an average of 6 rows will match every row whereas in fact an average of
about 2 rows matches.
So it thinks it's going to read about 18,000 records out of 67,000 or about
25%. In that case the sequential scan is almost certainly better. In fact it's
going to read about 6,000 or just under 10%, in which case the sequential scan
is probably still better but it's not so clear.
I suspect the only reason you're seeing such a big difference when I would
expect it to be about even is because nearly all the data is cached. In that
case the non-sequential access pattern of the nested loop has little effect.
You might get away with lowering random_page_cost but since it thinks it's
going to read 25% of the table I suspect you'll have to get very close to 1
before it switches over, if it does even then. Be careful about tuning
settings like this based on a single query, especially to unrealistically low
values.
You might also want to try raising the statistics target on pdb_entry. See if
that makes the estimate go down from 6 to closer to 2.
--
greg