Обсуждение: About when we should setup index?
Hello, I have some questions about index. For example, we have a table test(id varchar(7) not null primary key, name varchar(15) , sex varchar(1) , birth_date date, valid_in_table boolean not null, time_stamp timestamp not null default now() ); . id is the primary key, so a default unique index is generated automatically ? . if queries based on name are often, index should be setup for name? if there are 30,000 records, 29,000 records' names are different, will the index for name still be useful? . possible values for sex are F/M and null, should we setup index for sex? . How about index for date and timestamp? Basically, I'd like to know is there a percentage of the differences among data to decide whether index will help or not? For example, among 30,000 records, for a column, its value choices are less than A% and greater than B% so that we know index will help a lot? thanks a lot! Emi
On Mon, Mar 06, 2006 at 03:01:15PM -0500, Emi Lu wrote: > . id is the primary key, so a default unique index is generated > automatically ? Yes. When you issue the CREATE TABLE statement you should see a notice like the following: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test" You can see the index if you look at the table's description, such as when doing "\d test" in psql. > . if queries based on name are often, index should be setup for name? > if there are 30,000 records, 29,000 records' names are different, will > the index for name still be useful? If you have 29,000 unique names out of 30,000 rows then an index should definitely speed up queries by name. > . possible values for sex are F/M and null, should we setup index for sex? Probably not, although 8.1 can make better use of indexes on low-cardinality columns than previous versions could. If you're using 8.1 then try running typical queries with and without such an index to see if it makes much difference. EXPLAIN ANALYZE will show whether the index is being used. Unless you see a significant improvement in query performance then don't bother indexing this column. > . How about index for date and timestamp? Probably, if you regularly query on those columns. > Basically, I'd like to know is there a percentage of the differences > among data to decide whether index will help or not? For example, among > 30,000 records, for a column, its value choices are less than A% and > greater than B% so that we know index will help a lot? There's no absolute rule; among other things physical order on disk influences the planner's decision to use an index. If a table is clustered on a particular index then the planner might use that index to fetch 80% of the table's rows, but if the data is randomly scattered then the planner might prefer a sequential scan to fetch only 3% of the rows. If you regularly query a column that has more than a handful of distinct values then queries will probably benefit from an index on that column; in 8.1 even queries against low-cardinality columns might benefit from an index. If you perform a lot of queries for values that are close together, or if queries for a certain value tend to return a lot of rows, then consider clustering the table on that column's index (and be sure to run ANALYZE afterwards to update the table's statistics). Indexes have costs, however: they take up disk space and they can slow down other operations like inserts and updates because each index has to be updated as well. Create whatever indexes you need to realize a significant improvement in query performance, but don't overdo it. -- Michael Fuhr
Thank you very much Michael. Your inputs are very helpful for me. Just have one small question, the example you gave is based on postgresql 8.1, does it apply to PostgreSQL 8.0.1 as well (i686-pc-linux-gnu, compiled by GCC gcc 3.3.2)? - Emi >>. id is the primary key, so a default unique index is generated >>automatically ? >> >> > >Yes. When you issue the CREATE TABLE statement you should see a >notice like the following: > >CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test" > >You can see the index if you look at the table's description, such >as when doing "\d test" in psql. > > > >>. if queries based on name are often, index should be setup for name? >> if there are 30,000 records, 29,000 records' names are different, will >>the index for name still be useful? >> >> > >If you have 29,000 unique names out of 30,000 rows then an index >should definitely speed up queries by name. > > > >>. possible values for sex are F/M and null, should we setup index for sex? >> >> > >Probably not, although 8.1 can make better use of indexes on >low-cardinality columns than previous versions could. If you're >using 8.1 then try running typical queries with and without such >an index to see if it makes much difference. EXPLAIN ANALYZE will >show whether the index is being used. Unless you see a significant >improvement in query performance then don't bother indexing this >column. > > > >>. How about index for date and timestamp? >> >> > >Probably, if you regularly query on those columns. > > > >>Basically, I'd like to know is there a percentage of the differences >>among data to decide whether index will help or not? For example, among >>30,000 records, for a column, its value choices are less than A% and >>greater than B% so that we know index will help a lot? >> >> > >There's no absolute rule; among other things physical order on disk >influences the planner's decision to use an index. If a table is >clustered on a particular index then the planner might use that >index to fetch 80% of the table's rows, but if the data is randomly >scattered then the planner might prefer a sequential scan to fetch >only 3% of the rows. > >If you regularly query a column that has more than a handful of >distinct values then queries will probably benefit from an index >on that column; in 8.1 even queries against low-cardinality columns >might benefit from an index. If you perform a lot of queries for >values that are close together, or if queries for a certain value >tend to return a lot of rows, then consider clustering the table >on that column's index (and be sure to run ANALYZE afterwards to >update the table's statistics). Indexes have costs, however: they >take up disk space and they can slow down other operations like >inserts and updates because each index has to be updated as well. >Create whatever indexes you need to realize a significant improvement >in query performance, but don't overdo it. > > >
On Tue, Mar 07, 2006 at 09:54:11AM -0500, Emi Lu wrote: > Thank you very much Michael. Your inputs are very helpful for me. Just > have one small question, the example you gave is based on postgresql > 8.1, does it apply to PostgreSQL 8.0.1 as well (i686-pc-linux-gnu, > compiled by GCC gcc 3.3.2)? What example? I only mentioned 8.1 in the context of queries against low-cardinality columns (columns with only a few distinct values) but I didn't show an example of that. > >Yes. When you issue the CREATE TABLE statement you should see a > >notice like the following: > > > >CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for > >table "test" Is this the example you meant? Earlier versions of PostgreSQL work the same way. -- Michael Fuhr
Emi Lu writes: One more thing to consider. If you have a column with lots of repeated values and a handfull of selective values, you could use a partial index. http://www.postgresql.org/docs/8.0/interactive/indexes-partial.html For example imagine you have an accounts table like Accounts account_id integer name varchar special_cust boolean Where special_cust are customers that subscribe to some premiun and expensive service.. but there are very few customers that do... you could do an index like: CREATE INDEX accuonts_special_cust_idx ON accounts (special_cust) WHERE special_cust; In that case if you wanted to see a list of premiun accounts, that index should bring those records quickly. See the link above for examples and links to a couple of papers explaining why/when you want to use a partial index. In particular the document "The case for partial indexes" , pages 3 and up.