Index creation time and distribution

Поиск
Список
Период
Сортировка
От Guillaume Smet
Тема Index creation time and distribution
Дата
Msg-id 1d4e0c10805220532v514c2bd5je4c6c20ceb485495@mail.gmail.com
обсуждение исходный текст
Ответы Re: Index creation time and distribution  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Index creation time and distribution  ("Scott Marlowe" <scott.marlowe@gmail.com>)
Список pgsql-performance
Hi -performance,

I experienced this morning a performance problem when we imported a
dump in a 8.1 database.

The table is 5 millions rows large and when the dump creates an index
on a specific text column called clazz it takes 27 minutes while on
the other columns, it only takes a couple of seconds:
LOG:  duration: 1636301.317 ms  statement: CREATE INDEX
index_journal_clazz ON journal USING btree (clazz);
LOG:  duration: 20613.009 ms  statement: CREATE INDEX
index_journal_date ON journal USING btree (date);
LOG:  duration: 10653.290 ms  statement: CREATE INDEX
index_journal_modifieur ON journal USING btree (modifieur);
LOG:  duration: 15031.579 ms  statement: CREATE INDEX
index_journal_objectid ON journal USING btree (objectid);

The only weird thing about this column is that 4.7 millions of rows
have the exact same value. A partial index excluding this value is
really fast to create but, as the database is used via JDBC and
prepared statements, this index is totally useless (the plan is
created before the BIND so it can't use the partial index). FWIW we
can't use ?protocolVersion=2 with this application so it's not an
option.

As part of the deployment process of this application, we often need
to drop/create/restore the database and 25 minutes is really longer
than we can afford.

So my questions are:
- is the index creation time so correlated with the distribution? I
was quite surprised by this behaviour. The time is essentially CPU
time.
- if not, what can I check to diagnose this problem?
- IIRC, 8.3 could allow me to use the partial index as the query
should be planned after the BIND (plans are unnamed). Am I right?

Thanks for any input.

--
Guillaume

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Robins Tharakan"
Дата:
Сообщение: Re: Varchar pkey instead of integer
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Index creation time and distribution