Обсуждение: Do not understand high estimates of index scan vs seq scan
Hi all,
I do not understand why postgreSQL estimates an index scan only half as fast as a seq scan:
system=# explain select * from queuelog;                            QUERY PLAN
-------------------------------------------------------------------
 Seq Scan on queuelog  (cost=0.00..20530.29 rows=610929 width=148)
(1 row)
system=# explain select * from queuelog where queuelog.start_time >= '2013-05-20 8:30' and queuelog.start_time <=
'2013-06-2117:0'; 
                                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using queuelog_start_time on queuelog  (cost=0.00..13393.18 rows=316090 width=148)
   Index Cond: ((start_time >= '2013-05-20 08:30:00+02'::timestamp with time zone) AND (start_time <= '2013-06-21
17:00:00+02'::timestampwith time zone)) 
Is that solely because it nees to compare each index value to a fixed date/time?
I would assume the index would be much smaller then the actual data, or is it only based on the amount of rows?
Thanks,
Antonio
PS: here's the queuelog definition:
                                       Table "public.queuelog"
      Column      |           Type           |                       Modifiers
------------------+--------------------------+-------------------------------------------------------
 id               | integer                  | not null default nextval('queuelog_id_seq'::regclass)
 created          | timestamp with time zone | not null default now()
 lastupdate       | timestamp with time zone | not null default now()
 start_time       | timestamp with time zone | not null default now()
 sessionid        | character varying(50)    | not null default ''::character varying
 call_seq         | integer                  | not null default 1
 queue            | integer                  | not null default 1
 dial             | character varying(24)    | not null default ''::character varying
 agent            | integer                  | not null default 1
 agents           | integer                  | not null default 0
 agents_logged_in | integer                  | not null default 0
 agents_avail     | integer                  | not null default 0
 queue_pos        | integer                  | not null default 1
 waittime         | numeric                  | not null default (0)::numeric
 ringtime         | numeric                  | not null default (0)::numeric
 talktime         | numeric                  | not null default (0)::numeric
 cause            | integer                  | not null default 16
 from_function    | character varying(24)    |
 from_lookupid    | integer                  | not null default 1
 to_function      | character varying(24)    |
 to_lookupid      | integer                  | not null default 1
 maxcallers       | integer                  | not null default 0
Indexes:
    "queuelog_pkey" PRIMARY KEY, btree (id)
    "queuelog_start_time" btree (start_time)
			
		On Thu, Jun 20, 2013 at 6:47 AM, Antonio Goméz Soto <antonio.gomez.soto@gmail.com> wrote:
			
		Hi all,
I do not understand why postgreSQL estimates an index scan only half as fast as a seq scan:
system=# explain select * from queuelog; QUERY PLAN
-------------------------------------------------------------------
Seq Scan on queuelog (cost=0.00..20530.29 rows=610929 width=148)
(1 row)
system=# explain select * from queuelog where queuelog.start_time >= '2013-05-20 8:30' and queuelog.start_time <= '2013-06-21 17:0';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using queuelog_start_time on queuelog (cost=0.00..13393.18 rows=316090 width=148)
Index Cond: ((start_time >= '2013-05-20 08:30:00+02'::timestamp with time zone) AND (start_time <= '2013-06-21 17:00:00+02'::timestamp with time zone))
Is that solely because it nees to compare each index value to a fixed date/time?
I would assume the index would be much smaller then the actual data, or is it only based on the amount of rows?
Surely the index is smaller, but it has to visit both the index and the table, because the index cannot satisfy the "select *", and possibly for visibility reasons as well.  
The table must be well-clustered on the start_time column, or else the estimate would be even worse.
Cheers,
Jeff