Обсуждение: partial indexed not being used.
Hello I have a table that has a boolean column telling if that particular record has been billed already. So most of the time that column holds only true values. To select the unbilled record I use the following query: # select * where billed = false order by calldate; and defined the following index: # Create index cdr_billed_index on cdr (billed) where billed = false; This query took surprisingly long, and explain seems to be telling me the query is using a sequential scan: =# explain select cdr.* from cdr where billed = false order by calldate; QUERY PLAN ------------------------------------------------------------------- Sort (cost=37448.75..37526.94 rows=31273 width=465) Sort Key: calldate -> Seq Scan on cdr (cost=0.00..20323.81 rows=31273 width=465) Filter: (billed = false) How can I ensure the patial billing index will be used? thanks for any insights, Ron
Вложения
On Mon, Feb 12, 2007 at 01:23:13AM +0100, Ron Arts wrote: > I have a table that has a boolean column telling if that particular > record has been billed already. So most of the time that column holds > only true values. How often is "most of the time"? What are the results of the following queries? select version(); select count(*) from cdr; select count(*) from cdr where billed = false; > =# explain select cdr.* from cdr where billed = false order by calldate; > > QUERY PLAN > ------------------------------------------------------------------- > Sort (cost=37448.75..37526.94 rows=31273 width=465) > Sort Key: calldate > -> Seq Scan on cdr (cost=0.00..20323.81 rows=31273 width=465) > Filter: (billed = false) > > How can I ensure the patial billing index will be used? If using the index would be slower than a sequential scan then you don't want the query to use the index. Let's see how accurate the row count estimates are and whether using an index really would be faster -- please post the output of the following queries: set enable_seqscan to on; explain analyze select cdr.* from cdr where billed = false order by calldate; set enable_seqscan to off; explain analyze select cdr.* from cdr where billed = false order by calldate; If the number of rows returned isn't close to the planner's estimate (31273 in the output you showed) then trying running ANALYZE or VACUUM ANALYZE on the table, then run the above statements again. -- Michael Fuhr
On Sun, Feb 11, 2007 at 07:27:29PM -0700, Michael Fuhr wrote: > If using the index would be slower than a sequential scan then you > don't want the query to use the index. Let's see how accurate the > row count estimates are and whether using an index really would be > faster -- please post the output of the following queries: Also, what's your hardware configuration and what non-default settings do you have in postgresql.conf? In particular, how much memory do you have and what values do you have for shared_buffers, effective_cache_size, work_mem, and random_page_cost? -- Michael Fuhr
Michael Fuhr schreef: > On Mon, Feb 12, 2007 at 01:23:13AM +0100, Ron Arts wrote: >> I have a table that has a boolean column telling if that particular >> record has been billed already. So most of the time that column holds >> only true values. > > How often is "most of the time"? What are the results of the following > queries? > Michael, most of the time means: 9 out of ten times. > select version(); # select version(); version ------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 7.4.7 on i386-redhat-linux-gnu, compiled by GCC i386-redhat-linux-gcc (GCC) 3.3.3 20040412 (Red Hat Linux 3.3.3-7) (1 row) > select count(*) from cdr; # select count(*) from cdr; count ------- 62547 (1 row) > select count(*) from cdr where billed = false; > # select count(*) from cdr where billed = false; count ------- 3 (1 row) I am worried that is is doing a sequential scan. On production systems the cdr table might contain of millions of records. Thanks, Ron >> =# explain select cdr.* from cdr where billed = false order by calldate; >> >> QUERY PLAN >> ------------------------------------------------------------------- >> Sort (cost=37448.75..37526.94 rows=31273 width=465) >> Sort Key: calldate >> -> Seq Scan on cdr (cost=0.00..20323.81 rows=31273 width=465) >> Filter: (billed = false) >> >> How can I ensure the patial billing index will be used? > > If using the index would be slower than a sequential scan then you > don't want the query to use the index. Let's see how accurate the > row count estimates are and whether using an index really would be > faster -- please post the output of the following queries: > > set enable_seqscan to on; > explain analyze select cdr.* from cdr where billed = false order by calldate; > set enable_seqscan to off; > explain analyze select cdr.* from cdr where billed = false order by calldate; > > If the number of rows returned isn't close to the planner's estimate > (31273 in the output you showed) then trying running ANALYZE or > VACUUM ANALYZE on the table, then run the above statements again. >
Вложения
Ron Arts <ron.arts@neonova.nl> writes: > # select count(*) from cdr where billed = false; > count > ------- > 3 > (1 row) >> =# explain select cdr.* from cdr where billed = false order by calldate; >> >> QUERY PLAN >> ------------------------------------------------------------------- >> Sort (cost=37448.75..37526.94 rows=31273 width=465) >> Sort Key: calldate >> -> Seq Scan on cdr (cost=0.00..20323.81 rows=31273 width=465) ^^^^^^^^^^ >> Filter: (billed = false) You haven't ANALYZEd this table (lately, or perhaps ever). The planner is thus working with a default selectivity estimate (which I think is 50% for a bool column --- that seems to match your numbers anyway), and that leads it to the conclusion that a seqscan is the right thing. Which it would indeed be, if half the table has to be retrieved. regards, tom lane
Tom Lane schreef: > Ron Arts <ron.arts@neonova.nl> writes: >> # select count(*) from cdr where billed = false; >> count >> ------- >> 3 >> (1 row) > >>> =# explain select cdr.* from cdr where billed = false order by calldate; >>> >>> QUERY PLAN >>> ------------------------------------------------------------------- >>> Sort (cost=37448.75..37526.94 rows=31273 width=465) >>> Sort Key: calldate >>> -> Seq Scan on cdr (cost=0.00..20323.81 rows=31273 width=465) > ^^^^^^^^^^ >>> Filter: (billed = false) > > You haven't ANALYZEd this table (lately, or perhaps ever). The planner > is thus working with a default selectivity estimate (which I think is > 50% for a bool column --- that seems to match your numbers anyway), and > that leads it to the conclusion that a seqscan is the right thing. > Which it would indeed be, if half the table has to be retrieved. > > regards, tom lane > Oh man, I totally forgot about analyze because I have it running daily from a script. And of course the script was broken, and I didn't notice. Thanks! Ron -- NeoNova BV, The Netherlands Professional internet and VoIP solutions http://www.neonova.nl Kruislaan 419 1098 VA Amsterdam info: 020-5628292 servicedesk: 020-5628292 fax: 020-5628291 KvK Amsterdam 34151241 The following disclaimer applies to this email: http://www.neonova.nl/maildisclaimer