Scott Marlowe wrote:
> Douglas Alan wrote:
>> Okay -- no problem:
>>
>> set enable_seqscan = on;
>> explain analyze select * from maindb_astobject
>> where survey_id = 2
>> limit 1000;
>>
>> "Limit (cost=0.00..48.03 rows=1000 width=78) (actual
>> time=84837.835..265938.258 rows=1000 loops=1)"
>> " -> Seq Scan on maindb_astobject (cost=0.00..3538556.10
>> rows=73675167 width=78) (actual time=84837.825..265932.121 rows=1000
>> loops=1)"
>> " Filter: (survey_id = 2)"
>> "Total runtime: 265942.416 ms"
>>
>> set enable_seqscan = off;
>> explain analyze select * from maindb_astobject
>> where survey_id = 2
>> limit 1000;
>>
>> "Limit (cost=0.00..67.37 rows=1000 width=78) (actual
>> time=172.248..225.219 rows=1000 loops=1)"
>> " -> Index Scan using maindb_astobject_survey_id on
>> maindb_astobject (cost=0.00..4963500.87 rows=73675167 width=78)
>> (actual time=172.240..221.078 rows=1000 loops=1)"
>> " Index Cond: (survey_id = 2)"
>> "Total runtime: 227.412 ms"
> What was the random_page_cost during these two queries?
4
> Assuming seq_page_cost is 1,
Yes, it is.
> and random_page_cost was 4 or something, lowering it should
> force the move to an index scan.
I just tried changing random_page_cost to 1, but the query still
does a seq scan.
> If you were already at 1.0 or so, then yeah, the cost
> estimation is off. Since index scans cost CPU (at least I
> think they do), you might try lowering your cpu_* costs to see
> if that helps
How would lowering random_page_cost and all the cpu costs differ
from just increasing seq_page cost?
I have to raise seq_page_cost from 1 to 34 to force an index
scan. I can't imagine that changing this value so radically be a
good idea.
Alternatively, if I set random_page_cost to 1, and
cpu_tuple_cost, cpu_index_tuple_cost, and cpu_operator_cost all
to 0, this still doesn't cause Postgres to do an index scan for
this query.
>> P.S. Here are the stats on the column. It appears that my recollection
>> of 25% of the table matching was a bit off. It's actually 98.5%! That
>> might explain more why Postgres wants to do a sequential scan. The
>> problem is that still means that it has to scan a million rows
>> sequentially before it finds a single matching row, as the matching
>> rows are at the end of the database:
> Yeah, that's a really skewed distribution. Partitioning may work out,
> especially if you often select on that one field.
Is there a way for me to alter the statistics table? I tried
changing the values in pg_stats, but that table is just a view,
so Postgres won't let me do it. pg_statistic, on the other hand,
is rather opaque.
Alternatively, can I delete the statistics for the column. It's
the statistics that are hurting me here.
To delete the statistics, I tried setting statistics for the
column to 0 and analyzing the column, but that just left the
current statistics in place. So I tried setting statistics to 1,
but that's one value to many to eliminate this problem!
|>ouglas