Обсуждение: What kind of index to use for many rows with few unique values?
Hi, I have a table with a column called "state". Each row can be in one of four states, let's call them 'new', 'pending', 'ok', and 'bad'. On average, about 95% of the rows will be 'bad', with the remaining 5% being in one of the other three states. If the table has 50K rows and I just want to pull out the 'ok' rows, I don't want to do a sequential scan. To pull out the 'bad' rows, obviously, sequential scan is fine. I've heard that a btree index performs badly in this situation. Is a hash index appropriate? I've heard bad things about hash indexes in PostgreSQL. Regards, David. Roaring Penguin Software Inc. | http://www.roaringpenguin.com GPG fingerprint: C523 771C 3710 0F54 B2D2 4B0D C6EF 6991 34AB 95BA GPG public key: http://www.roaringpenguin.com/dskoll-key-2002.txt ID: 34AB95BA
On Mon, Dec 02, 2002 at 05:10:00PM -0500, David F. Skoll wrote:
> Hi,
>
> I have a table with a column called "state".  Each row can be in one
> of four states, let's call them 'new', 'pending', 'ok', and 'bad'.
> On average, about 95% of the rows will be 'bad', with the remaining
> 5% being in one of the other three states.  If the table has 50K rows
> and I just want to pull out the 'ok' rows, I don't want to do a sequential
> scan.  To pull out the 'bad' rows, obviously, sequential scan is fine.
>
> I've heard that a btree index performs badly in this situation.  Is
> a hash index appropriate?  I've heard bad things about hash indexes in
> PostgreSQL.
create table states (id serial primary key, state varchar(10), t text );
create function fill_states(varchar, int) returns bool as '
  begin for i in 1 .. $2
  loop
    insert into states (state, t) values ($1, ''random'');
  end loop;
  return true;
  end;
' language plpgsql;
select fill_states('ok',45000);
select fill_states('bad',5000);
select fill_states('warning',5000);
analyze states;
joel@joel=# explain select * from states where state='warning';
QUERY PLAN
-------------------------------------------------------------------------------
  Index Scan using state_idx on states (cost=0.00..1013.00 rows=5533 width=20)
    Index Cond: (state = 'warning'::character varying)
(2 rows)
joel@joel=# explain select * from states where state='ok';
QUERY PLAN
--------------------------------------------------------------
  Seq Scan on states  (cost=0.00..1171.51 rows=44627 width=20)
  Filter: (state = 'ok'::character varying)
(2 rows)
Looks right to me: index scan for the less-common option, seqscan for
the most common. Why don't you think this, as a btree, will work for
you?
--
Joel BURTON  |  joel@joelburton.com  |  joelburton.com  |  aim: wjoelburton
Independent Knowledge Management Consultant
			
		On Mon, 2 Dec 2002, Joel Burton wrote: > Looks right to me: index scan for the less-common option, seqscan for > the most common. Why don't you think this, as a btree, will work for > you? No, I'm sure a btree will work. However, won't the index be inefficient (i.e., very flat) if there are many entries with the same value? Or is this not a problem? -- David.
On 2 Dec 2002 at 17:10, David F. Skoll wrote: > I've heard that a btree index performs badly in this situation. As another poster has shown, it should be OK. I recently dealt with distributions simlar to your example. > Is a hash index appropriate? I've heard bad things about hash > indexes in PostgreSQL. No, don't use them. I didn't get any performance increase out of them. Does your experience show poor btree results? -- Dan Langille : http://www.langille.org/
David F. Skoll wrote: > No, I'm sure a btree will work. However, won't the index be > inefficient (i.e., very flat) if there are many entries with the same > value? Or is this not a problem? > If you're concerned, why not try a partial index? Joe
On 2 Dec 2002 at 15:15, Joe Conway wrote: > David F. Skoll wrote: > > No, I'm sure a btree will work. However, won't the index be > > inefficient (i.e., very flat) if there are many entries with the same > > value? Or is this not a problem? > > > > If you're concerned, why not try a partial index? FWIW, partial indexes didn't help for my distributions. But btrees were satisfactory. -- Dan Langille : http://www.langille.org/