Обсуждение: VACUUM kills Index Scans ?!
I'm confused over two question involving PostgreSQL index scans. I'm using 
Linux Mandrake 7.2 on Intel, and PostgreSQL 7.0.3. If someone can explain 
what's going on, I'd greatly appreciate it.
---------------------------------------------
1) When I create a empty table, and then immediate create an index on a 
column, I can get /index scans/ when searching on that column. But when I 
then run VACUUM, the same search becomes a /sequential scan/. Shouldn't it 
still be an index scan? What's going on here?
test1=> create table t1 (a varchar(64), b int);
CREATE
test1=> create index t1_a_ndx on t1 (a);
CREATE
test1=> explain select * from t1 where a='asd';
NOTICE:  QUERY PLAN:
Index Scan using t1_a_ndx on t1  (cost=0.00..8.14 rows=10 width=16)
EXPLAIN
test1=> vacuum;
NOTICE:  Skipping "pg_type" --- only table owner can VACUUM it (a bunch of 
these)
VACUUM
test1=> explain select * from t1 where a='asd';
NOTICE:  QUERY PLAN:
Seq Scan on t1  (cost=0.00..0.00 rows=1 width=16)
EXPLAIN
test1=>
---------------------------------------------
2) If I already have some data in a table and I create an index on a 
column, why doesn't subsequent searches then change from sequential scans 
to index scans?
test1=> create table t2 (a varchar(64), b int);
CREATE
test1=> insert into t2 values ('a', 1);
INSERT 41255 1
test1=> insert into t2 values ('b', 2);
INSERT 41256 1
test1=> insert into t2 values ('c', 3);
INSERT 41257 1
test1=> explain select * from t2 where a='a';
NOTICE:  QUERY PLAN:
Seq Scan on t2  (cost=0.00..22.50 rows=10 width=16)
EXPLAIN
test1=> create index t2_a_ndx on t2 (a);
CREATE
test1=> explain select * from t2 where a='a';
NOTICE:  QUERY PLAN:
Seq Scan on t2  (cost=0.00..1.04 rows=1 width=16)
EXPLAIN
test1=>
			
		
As additional information that I neglected to include in the first message, 
after both tests, the indices clearly still exist and can be seed in the 
following commands:
\d t1
\d t2
\di
\d t1_a_ndx
\d t2_a_ndx
The output shows what's expected, e.g:
test1=> \di
List of relations
Name | Type | Owner
----------+-------+-------
t1_a_ndx | index | gutz
t2_a_ndx | index | gutz
(1 row)
test1=> \d t2_a_ndx
Index "t2_a_ndx"
Attribute | Type
-----------+-----------
a | varchar()
btree
At 11:42 AM 3/15/2001 -0800, Gerald Gutierrez wrote:
>I'm confused over two question involving PostgreSQL index scans. I'm using 
>Linux Mandrake 7.2 on Intel, and PostgreSQL 7.0.3. If someone can explain 
>what's going on, I'd greatly appreciate it.
>
>---------------------------------------------
>
>1) When I create a empty table, and then immediate create an index on a 
>column, I can get /index scans/ when searching on that column. But when I 
>then run VACUUM, the same search becomes a /sequential scan/. Shouldn't it 
>still be an index scan? What's going on here?
>
>
>test1=> create table t1 (a varchar(64), b int);
>CREATE
>test1=> create index t1_a_ndx on t1 (a);
>CREATE
>test1=> explain select * from t1 where a='asd';
>NOTICE:  QUERY PLAN:
>
>Index Scan using t1_a_ndx on t1  (cost=0.00..8.14 rows=10 width=16)
>
>EXPLAIN
>test1=> vacuum;
>NOTICE:  Skipping "pg_type" --- only table owner can VACUUM it (a bunch of 
>these)
>VACUUM
>test1=> explain select * from t1 where a='asd';
>NOTICE:  QUERY PLAN:
>
>Seq Scan on t1  (cost=0.00..0.00 rows=1 width=16)
>
>EXPLAIN
>test1=>
>
>---------------------------------------------
>
>2) If I already have some data in a table and I create an index on a 
>column, why doesn't subsequent searches then change from sequential scans 
>to index scans?
>
>
>test1=> create table t2 (a varchar(64), b int);
>CREATE
>test1=> insert into t2 values ('a', 1);
>INSERT 41255 1
>test1=> insert into t2 values ('b', 2);
>INSERT 41256 1
>test1=> insert into t2 values ('c', 3);
>INSERT 41257 1
>test1=> explain select * from t2 where a='a';
>NOTICE:  QUERY PLAN:
>
>Seq Scan on t2  (cost=0.00..22.50 rows=10 width=16)
>
>EXPLAIN
>test1=> create index t2_a_ndx on t2 (a);
>CREATE
>test1=> explain select * from t2 where a='a';
>NOTICE:  QUERY PLAN:
>
>Seq Scan on t2  (cost=0.00..1.04 rows=1 width=16)
>
>EXPLAIN
>test1=>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
			
		Gerald Gutierrez <gutz@kalador.com> writes:
> 1) When I create a empty table, and then immediate create an index on a 
> column, I can get /index scans/ when searching on that column. But when I 
> then run VACUUM, the same search becomes a /sequential scan/.
VACUUM updates the planner's statistics so that it knows the table is
empty (note the change in cost estimates).  The default numbers for a
never-yet-vacuumed table (10 disk blocks and 1000 rows, IIRC) just
happen to be large enough to cause an indexscan.  Put in a reasonable
amount of data and then repeat the VACUUM, and it'll go back to index
scan.
> 2) If I already have some data in a table and I create an index on a 
> column, why doesn't subsequent searches then change from sequential scans 
> to index scans?
Again, you haven't got enough data to justify an indexscan.  You need at
least several disk blocks worth of data before an indexscan can possibly
save more table I/O than it costs to read the index.
There is an undocumented little factoid here: CREATE INDEX will update
(some of) the planner stats, but only if it finds some data in the
table.  CREATE INDEX on an empty table leaves the initial default
numbers alone.  This may be contributing to your confusion, but it was
deemed necessary so that the common sequence
CREATE TABLECREATE INDEXload data
wouldn't leave the planner believing the table to be completely empty
(and hence generating abysmally bad plans if you had actually loaded
quite a bit of data).  On the other hand, the preferred bulk-load
method is
CREATE TABLEload dataCREATE INDEX
and this leaves the planner's stats set correctly.
        regards, tom lane
			
		On Thu, 15 Mar 2001, Gerald Gutierrez wrote: > 1) When I create a empty table, and then immediate create an index on a > column, I can get /index scans/ when searching on that column. But when I > then run VACUUM, the same search becomes a /sequential scan/. Shouldn't it > still be an index scan? What's going on here? > --------------------------------------------- > > 2) If I already have some data in a table and I create an index on a > column, why doesn't subsequent searches then change from sequential scans > to index scans? With a small number of rows, a sequence scan will require less reads/seeks from the filesystem. It's not always correct for the optimizer to choose to use an index even if it's there. If you put in lots of rows with distinct values and vacuum analyze (you want to do that rather than just vacuum) and do a comparison it should use the index, with only a few rows, the seq scan is probably better.
>There is an undocumented little factoid here: CREATE INDEX will update >(some of) the planner stats, but only if it finds some data in the >table. CREATE INDEX on an empty table leaves the initial default >numbers alone. This may be contributing to your confusion, but it was >deemed necessary ... I understand now; it makes sense. I'll be using a number of tables that are initially very small, perhaps 5 or 10 records. But I expect that the tables will grow very quickly to several tens (or hundreds) of thousands of records. It seems reasonable to me that the table should then be set up to use index scan right from the beginning so that as the table grows the index scan will become more useful. Thus, the correct sequence for me is probably: > CREATE TABLE > CREATE INDEX > load data I also understand that VACUUM and VACUUM ANALYZE takes a significant amount of time and locks the tables that it works on. Does it do locking and unlocking per table as it goes through them (i.e. only lock while it's reading the table) or does it do something else? If the locks are for large amounts of time I'll have to shut down my application to avoid connections from timing out and JDBC exceptions from being thrown. Thanks for your help :)
Gerald Gutierrez wrote: > > >There is an undocumented little factoid here: CREATE INDEX will update > >(some of) the planner stats, but only if it finds some data in the > >table. CREATE INDEX on an empty table leaves the initial default > >numbers alone. This may be contributing to your confusion, but it was > >deemed necessary ... > > I understand now; it makes sense. I'll be using a number of tables that are > initially very small, perhaps 5 or 10 records. But I expect that the tables > will grow very quickly to several tens (or hundreds) of thousands of > records. It seems reasonable to me that the table should then be set up to > use index scan right from the beginning so that as the table grows the > index scan will become more useful. Thus, the correct sequence for me is > probably: > > > CREATE TABLE > > CREATE INDEX > > load data > no, the correct sequence is to create the index last, which will create statistics that will tell postgres if it really wants to use an index or not. Don't try and second guess postgres. Even better do a VACUUM ANALYZE. -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com