Index question

Поиск
Список
Период
Сортировка
От Troy D
Тема Index question
Дата
Msg-id 3D386642.9000501@yahoo.ca
обсуждение исходный текст
Список pgadmin-support
Hi,

I am running pg7.1 on SuSE 7.2.
I have a table with three varchar fields making up the primary key, plus
four data fields.

create table "foo"(
  "pk1" varchar(7) not null,
  "pk2" varchar(7) not null,
  "pk3" varchar(7) not null,
  "data1" float8 default 0 not null,
  "data2" float8 default 0 not null,
  "data3" float8 default 0 not null,
  "data4" float8 default 0 not null
  constraint "foo_pkey" primary key ("pk1", "pk2", "pk3")
);

This automatically creates an index (so says pgadmin)

create unique index foo_pkey on foo using btree ( pk1 varchar_ops, pk2
varchar_ops, pk3 varchar_ops );

On this table, I run an insert/update routine that reads from table bar
and inserts or updates table foo.  I have used php to read in (about
2000) records from table bar and then insert or update table foo, then
print the record number (of result from bar) and the time.

I was getting about one transaction (each bar record requires [4
selects, an insert and an update]*64) every two to four seconds, getting
obviously progressively slower with each batch.

So, I did something stupid(?) - I created an index:

create index foo_index2 on foo using btree ( pk1 varchar_ops, pk2
varchar_ops, pk3 varchar_ops );

and now I get better than one transaction per second, with minor
degradation from one batch to the next.

Why?
The second index is hardly different than the first.  (This is the
opposite of what Josh Berkus recommended doing in 'The Joy of Indexing'. )
Did the index not get created in the first place?  Or did the stats get
updated when I created the second index?

Other info:
The empty table foo was vacuum'd before the first batch insert. Not
vacuum analyze'd.
The index was created halfway through the first batch - the time stamp
shows the interval change.
I wiped the table (delete from foo), removed the index and did it again,
hence, repeatable at least twice.
Before the index was created, an 'explain' on the insert statement
showed a seq scan on the table...

Insert into foo
select 'pk1value', 'pk2value', 'pk3value', 0, 0, 0, 0
where (select count(*) from foo where pk1 = 'pk1value' and pk2 =
'pk2value' and pk3 = 'pk3value') = 0

i.e don't insert if it's already there.

After the index was created, an 'explain' on the insert statement showed
an index scan on the table using foo_index2.

Version 0.01 of the query took three and a half days, but applying Bruce
Momjian's optimizing techniques (PostgreSQL Hardware Performance Tuning)
and other optimizing techniques got it down to three hours.  Adding the
rogue index dropped it to an hour.

Any other suggestions would be greatly appreciated.

Thanks in advance,
Troy


______________________________________________________________________
Post your ad for free now! http://personals.yahoo.caa

В списке pgadmin-support по дате отправления:

Предыдущее
От: "Roger Fisher"
Дата:
Сообщение: Newbie - PG_DUMP
Следующее
От: cdxalegal_ins@juno.com
Дата:
Сообщение: If you had a legal problem today, what would you do? afnak