Обсуждение: Index question

Поиск
Список
Период
Сортировка

Index question

От
Troy D
Дата:
Hi,

I am running pg7.1 on SuSE 7.2.
I have a table with three varchar fields making up the primary key, plus
four data fields.

create table "foo"(
  "pk1" varchar(7) not null,
  "pk2" varchar(7) not null,
  "pk3" varchar(7) not null,
  "data1" float8 default 0 not null,
  "data2" float8 default 0 not null,
  "data3" float8 default 0 not null,
  "data4" float8 default 0 not null
  constraint "foo_pkey" primary key ("pk1", "pk2", "pk3")
);

This automatically creates an index (so says pgadmin)

create unique index foo_pkey on foo using btree ( pk1 varchar_ops, pk2
varchar_ops, pk3 varchar_ops );

On this table, I run an insert/update routine that reads from table bar
and inserts or updates table foo.  I have used php to read in (about
2000) records from table bar and then insert or update table foo, then
print the record number (of result from bar) and the time.

I was getting about one transaction (each bar record requires [4
selects, an insert and an update]*64) every two to four seconds, getting
obviously progressively slower with each batch.

So, I did something stupid(?) - I created an index:

create index foo_index2 on foo using btree ( pk1 varchar_ops, pk2
varchar_ops, pk3 varchar_ops );

and now I get better than one transaction per second, with minor
degradation from one batch to the next.

Why?
The second index is hardly different than the first.  (This is the
opposite of what Josh Berkus recommended doing in 'The Joy of Indexing'. )
Did the index not get created in the first place?  Or did the stats get
updated when I created the second index?

Other info:
The empty table foo was vacuum'd before the first batch insert. Not
vacuum analyze'd.
The index was created halfway through the first batch - the time stamp
shows the interval change.
I wiped the table (delete from foo), removed the index and did it again,
hence, repeatable at least twice.
Before the index was created, an 'explain' on the insert statement
showed a seq scan on the table...

Insert into foo
select 'pk1value', 'pk2value', 'pk3value', 0, 0, 0, 0
where (select count(*) from foo where pk1 = 'pk1value' and pk2 =
'pk2value' and pk3 = 'pk3value') = 0

i.e don't insert if it's already there.

After the index was created, an 'explain' on the insert statement showed
an index scan on the table using foo_index2.

Version 0.01 of the query took three and a half days, but applying Bruce
Momjian's optimizing techniques (PostgreSQL Hardware Performance Tuning)
and other optimizing techniques got it down to three hours.  Adding the
rogue index dropped it to an hour.

Any other suggestions would be greatly appreciated.

Thanks in advance,
Troy


______________________________________________________________________
Post your ad for free now! http://personals.yahoo.caa

Re: Index question

От
"Dave Page"
Дата:

> -----Original Message-----
> From: Troy D [mailto:ugly_hippo@yahoo.ca]
> Sent: 19 July 2002 20:20
> To: pgadmin-support@postgresql.org
> Subject: [pgadmin-support] Index question
>
>
> Hi,
>
> I am running pg7.1 on SuSE 7.2.
> I have a table with three varchar fields making up the
> primary key, plus
> four data fields.
>
> create table "foo"(
>   "pk1" varchar(7) not null,
>   "pk2" varchar(7) not null,
>   "pk3" varchar(7) not null,
>   "data1" float8 default 0 not null,
>   "data2" float8 default 0 not null,
>   "data3" float8 default 0 not null,
>   "data4" float8 default 0 not null
>   constraint "foo_pkey" primary key ("pk1", "pk2", "pk3")
> );
>
> This automatically creates an index (so says pgadmin)

Does it? Well, I mean it should create an index, but I don't recall
making pgAdmin warn you about it :-).

>
> create unique index foo_pkey on foo using btree ( pk1
> varchar_ops, pk2
> varchar_ops, pk3 varchar_ops );
>
> On this table, I run an insert/update routine that reads from
> table bar
> and inserts or updates table foo.  I have used php to read in (about
> 2000) records from table bar and then insert or update table
> foo, then
> print the record number (of result from bar) and the time.

To be honest, I don't know what's causing your problem. I would suggest
you might find better help on one of the PostgreSQL lists.

Regards, Dave.