On Fri, 19 May 2000, Matthias Urlichs wrote:
# If a table has an index (let's say it is a btree on fields a,b), and
# if a SELECT/INSERT/UPDATE/DELETE is issued with field a being either
# inserted or in the where clause, then the database needs to use that
# index. Period.
That's not the case. What if I only have two rows in it? It
would take more resources to use the index than it would to do a
sequential scan.
# Requiring the application to call VACUUM in order to get any kind of
# performance is not a solution. When exactly am I supposed to do that?
# Before inserting one million records into my temporary table it's of
# no use whatsoever, and afterwards it's next week already. Literally.
I had the same conversation with some of my Sybase DBAs, they
explained to me why I was wrong, and why they needed to manually update
statistics for smarter index usage instead of having the hot point during
the inserts. It can probably be designed in such a way that the
statistics can be updated constantly without slowing everything down too
much, but I'm not a postgres developer and don't have the time to find out
if that's true.
# Unfortunately, the observable behavior in this case is something like
# - create table
# - create index
# - call VACUUM or not, doesn't make a difference because the table is
# empty anyway
# - do a whole lot of INSERTs during which PostgreSQL is slow as molasses.
#
# Ouch.
You'll save a tremendous amount of time by loading the data before
you add an index. This is probably a big part of the reason you spend a
week loading one million entries into a table. I don't think it takes me
an hour to load my 15,627,696 row table from scratch, after which I create
the index in about half that time, and a vacuum takes me approximately
five minutes.
Now, it's true, I don't remember having to vacuum before, but the
vacuum isn't very painful.
--
dustin sallings The world is watching America,
http://2852210114/~dustin/ and America is watching TV.