Обсуждение: Doesn't use index, why?
Hello,
I have some doubt about index.
Look the follow example:
naslog=# \d desconexao
Table "desconexao"
Attribute | Type | Modifier
----------------+-------------+-------------------------------------------------
id | integer | not null default
nextval('id_desconexao'::text)
time | timestamp |
client_user | varchar(20) |
client | varchar(40) |
ip_nas | inet |
ip_client_user | inet |
disconnect | smallint |
user_time | interval |
data_rate | integer |
called_number | varchar(14) |
calling_number | varchar(14) |
filtrado | boolean | default 'f'::bool
Indices: client_desconexao_idx,
desconexao_pkey,
filtro_idx,
time_idx
naslog=# \d time_idx
Index "time_idx"
Attribute | Type
-----------+-----------
time | timestamp
btree
naslog=# explain select * from desconexao where time = '2000-12-01';
NOTICE: QUERY PLAN:
Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5
width=103)
EXPLAIN
Every thing is perfect, but when i execute de follow query:
naslog=# explain select * from desconexao where time > '2000-12-01';
NOTICE: QUERY PLAN:
Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103)
EXPLAIN
Why postgresql didn't use the time_idx index? I only changed the
operator
"=" to ">" in clause "where".
Thank you.
Regards
-- Marco Catunda
Marco Catunda wrote:
> Hello,
>
> I have some doubt about index.
> Look the follow example:
>
>
> naslog=# \d desconexao
> Table "desconexao"
> Attribute | Type | Modifier
> ----------------+-------------+-------------------------------------------------
> id | integer | not null default
> nextval('id_desconexao'::text)
> time | timestamp |
> client_user | varchar(20) |
> client | varchar(40) |
> ip_nas | inet |
> ip_client_user | inet |
> disconnect | smallint |
> user_time | interval |
> data_rate | integer |
> called_number | varchar(14) |
> calling_number | varchar(14) |
> filtrado | boolean | default 'f'::bool
> Indices: client_desconexao_idx,
> desconexao_pkey,
> filtro_idx,
> time_idx
>
>
> naslog=# \d time_idx
> Index "time_idx"
> Attribute | Type
> -----------+-----------
> time | timestamp
> btree
>
>
> naslog=# explain select * from desconexao where time = '2000-12-01';
> NOTICE: QUERY PLAN:
>
> Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5
> width=103)
>
> EXPLAIN
>
>
>
> Every thing is perfect, but when i execute de follow query:
>
>
> naslog=# explain select * from desconexao where time > '2000-12-01';
> NOTICE: QUERY PLAN:
>
> Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103)
>
> EXPLAIN
>
>
> Why postgresql didn't use the time_idx index? I only changed the
> operator
> "=" to ">" in clause "where".
>
>
> Thank you.
> Regards
> -- Marco Catunda
>
>
Firstly you should include your version along with the report.
Two thoughts ... did you run vacuum?
Of the total number of records how many are greater than 2000-12-01? It
could be that postgresql figures that a large portion of the result set
is greater than this date and it is faster to do a sequential scan
Marco Catunda <catunda@pobox.com> writes:
> naslog=# explain select * from desconexao where time = '2000-12-01';
> NOTICE: QUERY PLAN:
> Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5
> width=103)
> naslog=# explain select * from desconexao where time > '2000-12-01';
> NOTICE: QUERY PLAN:
> Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103)
> Why postgresql didn't use the time_idx index? I only changed the
> operator "=" to ">" in clause "where".
Because the '>' condition is much less restrictive (notice the
difference in estimated row counts: 5 versus 657958). Unless the
planner's row count estimates are way off, it very likely *is* faster
to do the second query by sequential scan.
regards, tom lane
On 04 Jan 2001 11:55:14 -0500, Dave Smith wrote:
> Marco Catunda wrote:
>
> > Hello,
> >
> > I have some doubt about index.
> > Look the follow example:
> >
> >
> > naslog=# \d desconexao
> > Table "desconexao"
> > Attribute | Type | Modifier
> > ----------------+-------------+-------------------------------------------------
> > id | integer | not null default
> > nextval('id_desconexao'::text)
> > time | timestamp |
> > client_user | varchar(20) |
> > client | varchar(40) |
> > ip_nas | inet |
> > ip_client_user | inet |
> > disconnect | smallint |
> > user_time | interval |
> > data_rate | integer |
> > called_number | varchar(14) |
> > calling_number | varchar(14) |
> > filtrado | boolean | default 'f'::bool
> > Indices: client_desconexao_idx,
> > desconexao_pkey,
> > filtro_idx,
> > time_idx
> >
> >
> > naslog=# \d time_idx
> > Index "time_idx"
> > Attribute | Type
> > -----------+-----------
> > time | timestamp
> > btree
> >
> >
> > naslog=# explain select * from desconexao where time = '2000-12-01';
> > NOTICE: QUERY PLAN:
> >
> > Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5
> > width=103)
> >
> > EXPLAIN
> >
> >
> >
> > Every thing is perfect, but when i execute de follow query:
> >
> >
> > naslog=# explain select * from desconexao where time > '2000-12-01';
> > NOTICE: QUERY PLAN:
> >
> > Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103)
> >
> > EXPLAIN
> >
> >
> > Why postgresql didn't use the time_idx index? I only changed the
> > operator
> > "=" to ">" in clause "where".
> >
> >
> > Thank you.
> > Regards
> > -- Marco Catunda
> >
> >
> Firstly you should include your version along with the report.
>
> Two thoughts ... did you run vacuum?
> Of the total number of records how many are greater than 2000-12-01? It
> could be that postgresql figures that a large portion of the result set
> is greater than this date and it is faster to do a sequential scan
Sorry, the version of Postgresql is 7.0.3
This is the number of records in database:
naslog=# select count(*) from desconexao;
count
--------
658617
(1 row)
I changed the data of example because all records is greater than
'2000-12-01', sorry.
But when I try '2000-12-10' the samething happens.
naslog=# explain select * from desconexao where time < '2000-12-10';
NOTICE: QUERY PLAN:
Seq Scan on desconexao (cost=0.00..19547.71 rows=231489 width=103)
EXPLAIN
naslog=# explain select * from desconexao where time >=
'2000-12-10';
NOTICE: QUERY PLAN:
Seq Scan on desconexao (cost=0.00..19547.71 rows=427128 width=103)
EXPLAIN
The number of records are:
naslog=# select count(*) from desconexao where time >= '2000-12-10';
count
--------
585789
(1 row)
naslog=# select count(*) from desconexao where time < '2000-12-10';
count
-------
72828
(1 row)
I think the estimate rows (231489) in query < '2000-12-10' is far away
to real value (72828).
So I execute vacuum analyze:
naslog=# vacuum verbose analyze desconexao;
NOTICE: --Relation desconexao--
NOTICE: Pages 11318: Changed 0, reaped 2616, Empty 0, New 0; Tup
658617: Vac 200, Keep/VTL 0/0, Crash 0, UnUsed 8007, MinLen 105,
MaxLen 166; Re-using: Free/Avail. Space 150084/35492;
EndEmpty/Avail. Pages 0/105. CPU 3.08s/27.90u sec.
NOTICE: Index client_desconexao_idx: Pages 9786; Tuples 658617:
Deleted 200. CPU 2.50s/3.33u sec.
NOTICE: Index filtro_idx: Pages 2946; Tuples 658617: Deleted 200.
CPU 0.70s/3.20u sec.
NOTICE: Index time_idx: Pages 5503; Tuples 658617: Deleted 200. CPU
1.52s/3.04u sec.
NOTICE: Index desconexao_pkey: Pages 5893; Tuples 658617: Deleted
200. CPU 1.59s/3.12u sec.
NOTICE: Rel desconexao: Pages: 11318 --> 11315; Tuple(s) moved:
199. CPU 5.41s/1.03u sec.
NOTICE: Index client_desconexao_idx: Pages 9786; Tuples 658617:
Deleted 199. CPU 2.85s/1.79u sec.
NOTICE: Index filtro_idx: Pages 2946; Tuples 658617: Deleted 199.
CPU 0.71s/1.60u sec.
NOTICE: Index time_idx: Pages 5503; Tuples 658617: Deleted 199. CPU
1.51s/1.65u sec.
NOTICE: Index desconexao_pkey: Pages 5893; Tuples 658617: Deleted
199. CPU 1.59s/1.58u sec.
VACUUM
naslog=# explain select * from desconexao where time < '2000-12-10';
NOTICE: QUERY PLAN:
Seq Scan on desconexao (cost=0.00..19547.71 rows=231489 width=103)
EXPLAIN
naslog=# explain select * from desconexao where time >=
'2000-12-10';
NOTICE: QUERY PLAN:
Seq Scan on desconexao (cost=0.00..19547.71 rows=427128 width=103)
EXPLAIN
The same estimates values happen.
Is there a way to force index?
Thank you
-- Marco Catunda
Marco Catunda <catunda@pobox.com> writes:
> naslog=# explain select * from desconexao where time < '2000-12-10';
> NOTICE: QUERY PLAN:
> Seq Scan on desconexao (cost=0.00..19547.71 rows=231489 width=103)
> naslog=# explain select * from desconexao where time >=
> '2000-12-10';
> NOTICE: QUERY PLAN:
> Seq Scan on desconexao (cost=0.00..19547.71 rows=427128 width=103)
> The number of records are:
> naslog=# select count(*) from desconexao where time >= '2000-12-10';
> count
> --------
> 585789
> (1 row)
> naslog=# select count(*) from desconexao where time < '2000-12-10';
> count
> -------
> 72828
> (1 row)
In this case the planner is doing *exactly* the right thing; it is
smarter than you are. If you want to prove it, force the planner to
use an indexscan by doing SET ENABLE_SEQSCAN TO OFF. Then time the
query, and compare the runtime against the seqscan version.
The bottom line here is that a query that needs to touch more than a
few percent of the rows in a table is better off being done as a
seqscan.
regards, tom lane
> In this case the planner is doing *exactly* the right thing; it is > smarter than you are. If you want to prove it, force the planner to > use an indexscan by doing SET ENABLE_SEQSCAN TO OFF. Then time the > query, and compare the runtime against the seqscan version. > > The bottom line here is that a query that needs to touch more than a > few percent of the rows in a table is better off being done as a > seqscan. The only other workaround is to CLUSTER the table on an index, then force an index scan. That _may_ be faster. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026