Обсуждение: Strange statistics

Поиск
Список
Период
Сортировка

Strange statistics

От
Henrik
Дата:
Hi list,

I'm having a table with a lots of file names in it. (Aprox 3 million)
in a 8.3.1 db.

Doing this simple query shows that the statistics is way of but I can
get them right even when I raise the statistics to 1000.

db=# alter table tbl_file alter file_name set statistics 1000;
ALTER TABLE
db=# analyze tbl_file;
ANALYZE
db=# explain analyze select * from tbl_file where lower(file_name)
like lower('to%');
                                                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on tbl_file  (cost=23.18..2325.13 rows=625
width=134) (actual time=7.938..82.386 rows=17553 loops=1)
    Filter: (lower((file_name)::text) ~~ 'to%'::text)
    ->  Bitmap Index Scan on tbl_file_idx  (cost=0.00..23.02 rows=625
width=0) (actual time=6.408..6.408 rows=17553 loops=1)
          Index Cond: ((lower((file_name)::text) ~>=~ 'to'::text) AND
(lower((file_name)::text) ~<~ 'tp'::text))
  Total runtime: 86.230 ms
(5 rows)


How can it be off by a magnitude of 28??

Cheers,
Henke

Re: Strange statistics

От
Joris Dobbelsteen
Дата:
Henrik wrote:
> Hi list,
>
> I'm having a table with a lots of file names in it. (Aprox 3 million) in
> a 8.3.1 db.
>
> Doing this simple query shows that the statistics is way of but I can
> get them right even when I raise the statistics to 1000.
>
> db=# alter table tbl_file alter file_name set statistics 1000;
> ALTER TABLE
> db=# analyze tbl_file;
> ANALYZE
> db=# explain analyze select * from tbl_file where lower(file_name) like
> lower('to%');
>                                                          QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------

>
>  Bitmap Heap Scan on tbl_file  (cost=23.18..2325.13 rows=625 width=134)
> (actual time=7.938..82.386 rows=17553 loops=1)
>    Filter: (lower((file_name)::text) ~~ 'to%'::text)
>    ->  Bitmap Index Scan on tbl_file_idx  (cost=0.00..23.02 rows=625
> width=0) (actual time=6.408..6.408 rows=17553 loops=1)
>          Index Cond: ((lower((file_name)::text) ~>=~ 'to'::text) AND
> (lower((file_name)::text) ~<~ 'tp'::text))
>  Total runtime: 86.230 ms
> (5 rows)
>
>
> How can it be off by a magnitude of 28??

These are statistics and represent an only estimate! In this case, the
planner seems to be doing the right thing(tm) anyway.

Statistics is a frequently misunderstood subject and usually provides
excellent material to draw plain wrong conclusions. There is a good
chance that due to the physical layout of your data, the algorithms in
the statistics collector, the existence of uncertainty and some more
unknown factors your statistics will be biased. This is a situations
where you noticed it.

Running "SELECT * FROM pg_stats;" will give you the statistics the
planner uses and can provide some hints to why the planner has chosen
these estimates.
Probably statistics will vary between ANALYZE runs. Its also possible to
try "CLUSTER" and friends. Try different queries and look at the deviations.

All in all, you should really start worrying when the planner starts
planning inefficient queries. Since its a filename, it might be highly
irregular (random) and a low statistics target might be good enough anyways.

Unfortunately I'm not a statistics expert...

- Joris

Re: Strange statistics

От
Henrik
Дата:
3 jun 2008 kl. 23.31 skrev Joris Dobbelsteen:

> Henrik wrote:
>> Hi list,
>> I'm having a table with a lots of file names in it. (Aprox 3
>> million) in a 8.3.1 db.
>> Doing this simple query shows that the statistics is way of but I
>> can get them right even when I raise the statistics to 1000.
>> db=# alter table tbl_file alter file_name set statistics 1000;
>> ALTER TABLE
>> db=# analyze tbl_file;
>> ANALYZE
>> db=# explain analyze select * from tbl_file where lower(file_name)
>> like lower('to%');
>>                                                         QUERY PLAN
>>
----------------------------------------------------------------------------------------------------------------------------
Bitmap  
>>  Heap Scan on tbl_file  (cost=23.18..2325.13 rows=625 width=134)
>> (actual time=7.938..82.386 rows=17553 loops=1)
>>   Filter: (lower((file_name)::text) ~~ 'to%'::text)
>>   ->  Bitmap Index Scan on tbl_file_idx  (cost=0.00..23.02 rows=625
>> width=0) (actual time=6.408..6.408 rows=17553 loops=1)
>>         Index Cond: ((lower((file_name)::text) ~>=~ 'to'::text) AND
>> (lower((file_name)::text) ~<~ 'tp'::text))
>> Total runtime: 86.230 ms
>> (5 rows)
>> How can it be off by a magnitude of 28??
>
> These are statistics and represent an only estimate! In this case,
> the planner seems to be doing the right thing(tm) anyway.
>
> Statistics is a frequently misunderstood subject and usually
> provides excellent material to draw plain wrong conclusions. There
> is a good chance that due to the physical layout of your data, the
> algorithms in the statistics collector, the existence of uncertainty
> and some more unknown factors your statistics will be biased. This
> is a situations where you noticed it.
>
> Running "SELECT * FROM pg_stats;" will give you the statistics the
> planner uses and can provide some hints to why the planner has
> chosen these estimates.
> Probably statistics will vary between ANALYZE runs. Its also
> possible to try "CLUSTER" and friends. Try different queries and
> look at the deviations.
Thanks Joris for your input. You are the second person that suggests
CLUSTER for me. Maybe I should take a look. The problem is that our
select queries are kinda random. Would CLUSTER help then also? Should
I just CLUSTER on the moste used index or?

Thanks
/henke


>
>
> All in all, you should really start worrying when the planner starts
> planning inefficient queries. Since its a filename, it might be
> highly irregular (random) and a low statistics target might be good
> enough anyways.
>
> Unfortunately I'm not a statistics expert...
>
> - Joris