Обсуждение: About when we should setup index?

Поиск
Список
Период
Сортировка

About when we should setup index?

От
Emi Lu
Дата:
Hello,

I have some questions about index.

For example, we have a table
test(id varchar(7) not null primary key,
       name varchar(15) ,
       sex varchar(1) ,
       birth_date date,
       valid_in_table boolean not null,
       time_stamp timestamp not null default now() );

. id is the primary key, so a default unique index is generated
automatically ?
. if queries based on name are often, index should be setup for name?
  if there are 30,000 records, 29,000 records' names are different, will
the index for name still be useful?

. possible values for sex are  F/M and null, should we setup index for sex?

. How about index for date and timestamp?

Basically, I'd like to know is there a percentage of the differences
among data to decide whether index will help or not? For example,  among
30,000 records, for a column, its value choices are less than A% and
greater than B% so that we know index will help a lot?

thanks a lot!
Emi







Re: About when we should setup index?

От
Michael Fuhr
Дата:
On Mon, Mar 06, 2006 at 03:01:15PM -0500, Emi Lu wrote:
> . id is the primary key, so a default unique index is generated
> automatically ?

Yes.  When you issue the CREATE TABLE statement you should see a
notice like the following:

CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"

You can see the index if you look at the table's description, such
as when doing "\d test" in psql.

> . if queries based on name are often, index should be setup for name?
>  if there are 30,000 records, 29,000 records' names are different, will
> the index for name still be useful?

If you have 29,000 unique names out of 30,000 rows then an index
should definitely speed up queries by name.

> . possible values for sex are  F/M and null, should we setup index for sex?

Probably not, although 8.1 can make better use of indexes on
low-cardinality columns than previous versions could.  If you're
using 8.1 then try running typical queries with and without such
an index to see if it makes much difference.  EXPLAIN ANALYZE will
show whether the index is being used.  Unless you see a significant
improvement in query performance then don't bother indexing this
column.

> . How about index for date and timestamp?

Probably, if you regularly query on those columns.

> Basically, I'd like to know is there a percentage of the differences
> among data to decide whether index will help or not? For example,  among
> 30,000 records, for a column, its value choices are less than A% and
> greater than B% so that we know index will help a lot?

There's no absolute rule; among other things physical order on disk
influences the planner's decision to use an index.  If a table is
clustered on a particular index then the planner might use that
index to fetch 80% of the table's rows, but if the data is randomly
scattered then the planner might prefer a sequential scan to fetch
only 3% of the rows.

If you regularly query a column that has more than a handful of
distinct values then queries will probably benefit from an index
on that column; in 8.1 even queries against low-cardinality columns
might benefit from an index.  If you perform a lot of queries for
values that are close together, or if queries for a certain value
tend to return a lot of rows, then consider clustering the table
on that column's index (and be sure to run ANALYZE afterwards to
update the table's statistics).  Indexes have costs, however: they
take up disk space and they can slow down other operations like
inserts and updates because each index has to be updated as well.
Create whatever indexes you need to realize a significant improvement
in query performance, but don't overdo it.

--
Michael Fuhr

Re: About when we should setup index?

От
Emi Lu
Дата:
Thank you very much Michael. Your inputs are very helpful for me. Just
have one small question, the example you gave is based on  postgresql
8.1, does it apply to PostgreSQL 8.0.1 as well (i686-pc-linux-gnu,
compiled by GCC gcc 3.3.2)?

- Emi



>>. id is the primary key, so a default unique index is generated
>>automatically ?
>>
>>
>
>Yes.  When you issue the CREATE TABLE statement you should see a
>notice like the following:
>
>CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
>
>You can see the index if you look at the table's description, such
>as when doing "\d test" in psql.
>
>
>
>>. if queries based on name are often, index should be setup for name?
>> if there are 30,000 records, 29,000 records' names are different, will
>>the index for name still be useful?
>>
>>
>
>If you have 29,000 unique names out of 30,000 rows then an index
>should definitely speed up queries by name.
>
>
>
>>. possible values for sex are  F/M and null, should we setup index for sex?
>>
>>
>
>Probably not, although 8.1 can make better use of indexes on
>low-cardinality columns than previous versions could.  If you're
>using 8.1 then try running typical queries with and without such
>an index to see if it makes much difference.  EXPLAIN ANALYZE will
>show whether the index is being used.  Unless you see a significant
>improvement in query performance then don't bother indexing this
>column.
>
>
>
>>. How about index for date and timestamp?
>>
>>
>
>Probably, if you regularly query on those columns.
>
>
>
>>Basically, I'd like to know is there a percentage of the differences
>>among data to decide whether index will help or not? For example,  among
>>30,000 records, for a column, its value choices are less than A% and
>>greater than B% so that we know index will help a lot?
>>
>>
>
>There's no absolute rule; among other things physical order on disk
>influences the planner's decision to use an index.  If a table is
>clustered on a particular index then the planner might use that
>index to fetch 80% of the table's rows, but if the data is randomly
>scattered then the planner might prefer a sequential scan to fetch
>only 3% of the rows.
>
>If you regularly query a column that has more than a handful of
>distinct values then queries will probably benefit from an index
>on that column; in 8.1 even queries against low-cardinality columns
>might benefit from an index.  If you perform a lot of queries for
>values that are close together, or if queries for a certain value
>tend to return a lot of rows, then consider clustering the table
>on that column's index (and be sure to run ANALYZE afterwards to
>update the table's statistics).  Indexes have costs, however: they
>take up disk space and they can slow down other operations like
>inserts and updates because each index has to be updated as well.
>Create whatever indexes you need to realize a significant improvement
>in query performance, but don't overdo it.
>
>
>


Re: About when we should setup index?

От
Michael Fuhr
Дата:
On Tue, Mar 07, 2006 at 09:54:11AM -0500, Emi Lu wrote:
> Thank you very much Michael. Your inputs are very helpful for me. Just
> have one small question, the example you gave is based on  postgresql
> 8.1, does it apply to PostgreSQL 8.0.1 as well (i686-pc-linux-gnu,
> compiled by GCC gcc 3.3.2)?

What example?  I only mentioned 8.1 in the context of queries against
low-cardinality columns (columns with only a few distinct values) but
I didn't show an example of that.

> >Yes.  When you issue the CREATE TABLE statement you should see a
> >notice like the following:
> >
> >CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for
> >table "test"

Is this the example you meant?  Earlier versions of PostgreSQL work
the same way.

--
Michael Fuhr

Re: About when we should setup index?

От
Francisco Reyes
Дата:
Emi Lu writes:

One more thing to consider. If you have a column with lots of repeated
values and a handfull of selective values, you could use a partial index.

http://www.postgresql.org/docs/8.0/interactive/indexes-partial.html

For example imagine you have an accounts table like
Accounts
account_id      integer
name            varchar
special_cust    boolean

Where special_cust are customers that subscribe to some premiun and
expensive service.. but there are very few customers that do... you could do
an index like:

CREATE INDEX accuonts_special_cust_idx ON accounts (special_cust)
    WHERE special_cust;

In that case if you wanted to see a list of premiun accounts, that index
should bring those records quickly.

See the link above for examples and links to a couple of papers explaining
why/when you want to use a partial index. In particular the document "The
case for partial indexes" , pages 3 and up.