Re: Inconsistant use of index.
| От | Ron Mayer | 
|---|---|
| Тема | Re: Inconsistant use of index. | 
| Дата | |
| Msg-id | Pine.LNX.4.33.0204031108250.24107-100000@ron обсуждение исходный текст | 
| Ответ на | Re: Inconsistant use of index. (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Ответы | Re: Inconsistant use of index. | 
| Список | pgsql-bugs | 
On Tue, 26 Mar 2002, Tom Lane wrote:
> Ron Mayer <ron@intervideo.com> writes:
> >> I'm particularly interested in the correlation estimate for the dat
> >> column.  (Would you happen to have an idea whether the data has been
> >> inserted more-or-less in dat order?)
>
> > I beleve much of February was loaded first, then we back-filled January,
> > and daily I've been adding March's results.  I don't believe the index-usage
> > stopped when we did the january fill... something happend a few days ago after
> > a pretty routine daily load.
>
> The correlation estimate for dat is pretty low (0.086088), which I think
> reflects the fact that on a macro level your data is not very orderly
> (Feb/Jan/Mar).  However, if it's been loaded on a daily basis then the
> records for any single day will be together --- which is why the indexed
> probe for a single day is so fast.  I don't see any way that we can
> expect the system to model this effect with only one ordering-correlation
> number :-( ... so a proper fix will have to wait for some future release
> when we can think about having more extensive stats about ordering.
>
> In the meantime, it would be interesting to see if re-establishing the
> big-picture order correlation would persuade the planner to do the right
> thing.  Could you do something like this:
>
>     CREATE TABLE foo AS SELECT * FROM fact ORDER BY dat;
>     TRUNCATE TABLE fact;
>     INSERT INTO fact SELECT * FROM foo;
>     DROP TABLE foo;
>     VACUUM ANALYZE fact;
>
> (this should leave you in a state where pg_stats shows correlation 1.0
> for fact.dat) and then see what you get from EXPLAIN?
>
>             regards, tom lane
I did quite a bit more playing with this, and no matter what the
correlation was (1, -0.001), it never seemed to have any effect
at all on the execution plan.
Should it?  With a high correlation the index scan is a much better choice.
   Ron
---
--- create the table with a correlation of "1".
---
logs2=# CREATE TABLE fact_by_dat AS SELECT * FROM fact ORDER BY dat;
SELECT
logs2=# CREATE INDEX fact_by_dat__dat ON fact_by_dat(dat);
CREATE
logs2=# vacuum analyze fact_by_dat;
VACUUM
logs2=# select correlation from pg_stats where tablename='fact_by_dat' and attname='dat';
 correlation
-------------
           1
(1 row)
---
--- Still does the "Seq Scan"
---
logs2=#  explain analyze select count(*) from fact_by_dat where dat='2002-03-01';
NOTICE:  QUERY PLAN:
Aggregate  (cost=380347.31..380347.31 rows=1 width=0) (actual time=77785.14..77785.14 rows=1 loops=1)
  ->  Seq Scan on fact  (cost=0.00..379816.25 rows=212423 width=0) (actual time=20486.16..77420.05 rows=180295 loops=1)
Total runtime: 77785.28 msec
EXPLAIN
---
--- Disable Seq Scan...  30 times faster.
---
logs2=# set enable_seqscan to off;
SET VARIABLE
logs2=# explain analyze select count(*) from fact_by_dat where dat='2002-03-01';
NOTICE:  QUERY PLAN:
Aggregate  (cost=5502.57..5502.57 rows=1 width=0) (actual time=2671.20..2671.20 rows=1 loops=1)
  ->  Index Scan using fact_by_dat__dat on fact_by_dat  (cost=0.00..4974.99 rows=211036 width=0)
(actual time=90.24..2339.64 rows=180295 loops=1)
Total runtime: 2693.87 msec
		
	В списке pgsql-bugs по дате отправления: