Обсуждение: Strange Index behavior

Поиск
Список
Период
Сортировка

Strange Index behavior

От
Együd Csaba
Дата:
Hi,
Is it a normal behavior that if I give a where clause with an existent index
key, then postgres uses the index, but if I give it a non existent value
than it refuses to use the index.

An example to make it more clear:

CREATE TABLE measured_1
(
  tstamp timestamp(0) NOT NULL,
  meterid int4 NOT NULL,
  pp numeric NOT NULL DEFAULT 0,
  pm numeric NOT NULL DEFAULT 0,
  qp numeric NOT NULL DEFAULT 0,
  qm numeric NOT NULL DEFAULT 0,
  status bit(5),
  CONSTRAINT measured_1_pkey PRIMARY KEY (tstamp, meterid)
) ;

--
-- The table contains rows with tstamp values from '2004.12.22 12:00' so the
00:00 row does not exists!!!
--

# explain analyze select meterid, tstamp, pp, pm, status from measured_1
where tstamp >= '2004.12.22 30:00' and tstamp <= '2004.12.22 23:59' order by
tstamp, meterid;
"Index Scan using measured_1_pkey on measured_1  (cost=0.00..5.34 rows=1
width=42) (actual time=0.000..111.000 rows=6016 loops=1)"
"  Index Cond: ((tstamp >= '2004-12-22 13:00:00'::timestamp without time
zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
"Total runtime: 111.000 ms"
-- This is quite an acceptable result time

-- BUT!!!!

# explain analyze select meterid, tstamp, pp, pm, status from measured_1
where tstamp >= '2004.12.22 00:00' and tstamp <= '2004.12.22 23:59' order by
tstamp, meterid;
"Sort  (cost=2619.02..2622.78 rows=1505 width=42) (actual
time=1672.000..1682.000 rows=14523 loops=1)"
"  Sort Key: tstamp, meterid"
"  ->  Seq Scan on measured_1  (cost=0.00..2539.59 rows=1505 width=42)
(actual time=0.000..1292.000 rows=14523 loops=1)"
"        Filter: ((tstamp >= '2004-12-22 00:00:00'::timestamp without time
zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
"Total runtime: 1802.000 ms"
-- this is definitely not acceptable.

Is this normal??? Or what do I wrong???

Thanks,

-- Csaba

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


Re: Strange Index behavior

От
Tom Lane
Дата:
=?iso-8859-2?Q?Egy=FCd_Csaba?= <csegyud@vnet.hu> writes:
> # explain analyze select meterid, tstamp, pp, pm, status from measured_1
> where tstamp >= '2004.12.22 00:00' and tstamp <= '2004.12.22 23:59' order by
> tstamp, meterid;
> "Sort  (cost=2619.02..2622.78 rows=1505 width=42) (actual
> time=1672.000..1682.000 rows=14523 loops=1)"
> "  Sort Key: tstamp, meterid"
> "  ->  Seq Scan on measured_1  (cost=0.00..2539.59 rows=1505 width=42)
> (actual time=0.000..1292.000 rows=14523 loops=1)"
> "        Filter: ((tstamp >= '2004-12-22 00:00:00'::timestamp without time
> zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
> "Total runtime: 1802.000 ms"

Why do you say there are no such rows, when the explain output clearly
shows there are 14523 of them?  The plan shift looks fairly reasonable
to me given the large number of rows to be retrieved.

            regards, tom lane

Re: Strange Index behavior

От
Martijn van Oosterhout
Дата:
On Wed, Dec 22, 2004 at 03:09:08PM +0100, Együd Csaba wrote:
> Hi,
> Is it a normal behavior that if I give a where clause with an existent index
> key, then postgres uses the index, but if I give it a non existent value
> than it refuses to use the index.

Whether the value exists is irrelevent. What matters is the number of
rows expected to be returned. Notice:

> "Index Scan using measured_1_pkey on measured_1  (cost=0.00..5.34 rows=1
> width=42) (actual time=0.000..111.000 rows=6016 loops=1)"
> "  Index Cond: ((tstamp >= '2004-12-22 13:00:00'::timestamp without time
> zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
> "Total runtime: 111.000 ms"
> -- This is quite an acceptable result time

> "  ->  Seq Scan on measured_1  (cost=0.00..2539.59 rows=1505 width=42)
> (actual time=0.000..1292.000 rows=14523 loops=1)"
> "        Filter: ((tstamp >= '2004-12-22 00:00:00'::timestamp without time
> zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
> "Total runtime: 1802.000 ms"
> -- this is definitely not acceptable.

Note how the first one expected to return only one value, so an index
in a good choice. The second expected 1500 matches, so it decided a seq
scan would be faster.

Maybe you need to review your value for random_page_cost and maybe an
ANALYZE would be good too... Look at the effects of enabling and
disabling index scans.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Вложения

Re: Strange Index behavior

От
Együd Csaba (Freemail)
Дата:
Hi Tom,
may be it was misunderstandable...

I meant that there is no rows with tstamp='2004.12.22 00:00'. Certainly
there are rows matching the range (00:00 - 23:59); from 13:00 to 23:00, but
not before 13:00.


The only difference betwen the two queries is this value. If the begining of
the range is a non existent value, the index is not used.
The point is that there are cases where a primary key index is not used -
even if the condition is formaly good. I was wonder what can be the reason
of this. What can I do in a different way to avoid this.

Thank you.

bye,
  -- Csaba



-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Wednesday, December 22, 2004 5:47 PM
To: Együd Csaba
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Strange Index behavior

=?iso-8859-2?Q?Egy=FCd_Csaba?= <csegyud@vnet.hu> writes:
> # explain analyze select meterid, tstamp, pp, pm, status from
> measured_1 where tstamp >= '2004.12.22 00:00' and tstamp <=
> '2004.12.22 23:59' order by tstamp, meterid; "Sort
> (cost=2619.02..2622.78 rows=1505 width=42) (actual
> time=1672.000..1682.000 rows=14523 loops=1)"
> "  Sort Key: tstamp, meterid"
> "  ->  Seq Scan on measured_1  (cost=0.00..2539.59 rows=1505 width=42)
> (actual time=0.000..1292.000 rows=14523 loops=1)"
> "        Filter: ((tstamp >= '2004-12-22 00:00:00'::timestamp without time
> zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
> "Total runtime: 1802.000 ms"

Why do you say there are no such rows, when the explain output clearly shows
there are 14523 of them?  The plan shift looks fairly reasonable to me given
the large number of rows to be retrieved.

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org



--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


Re: Strange Index behavior

От
Tom Lane
Дата:
=?iso-8859-2?Q?Egy=FCd_Csaba_=28Freemail=29?= <csegyud@freemail.hu> writes:
> The point is that there are cases where a primary key index is not used -
> even if the condition is formaly good.

You haven't actually shown us such a case.  In the case you gave,
I think the planner probably made the right choice not to use the index.

            regards, tom lane

Re: Strange Index behavior

От
Együd Csaba (Freemail)
Дата:
But why? I thought the planner is for choose the quicker way to the target
point. If there is an index which is probably would speed up the query then
why does the planner ignore that?

The difference between the result times is 16x. I can't understand why the
planner thinks it is the better way...

-- Csaba

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, December 22, 2004 6:34 PM
To: Együd Csaba (Freemail)
Cc: 'Együd Csaba'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Strange Index behavior

=?iso-8859-2?Q?Egy=FCd_Csaba_=28Freemail=29?= <csegyud@freemail.hu> writes:
> The point is that there are cases where a primary key index is not
> used - even if the condition is formaly good.

You haven't actually shown us such a case.  In the case you gave, I think
the planner probably made the right choice not to use the index.

            regards, tom lane



--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


Re: Strange Index behavior

От
Bruno Wolff III
Дата:
On Wed, Dec 22, 2004 at 18:44:00 +0100,
  "Együd Csaba (Freemail)" <csegyud@freemail.hu> wrote:
> But why? I thought the planner is for choose the quicker way to the target
> point. If there is an index which is probably would speed up the query then
> why does the planner ignore that?

Because doing an index scan is slower if you are retrieving a significant
(very roughly 5%) fraction of a table.

Re: Strange Index behavior

От
Tom Lane
Дата:
=?iso-8859-2?Q?Egy=FCd_Csaba_=28Freemail=29?= <csegyud@freemail.hu> writes:
> The difference between the result times is 16x. I can't understand why the
> planner thinks it is the better way...

The planner thinks that because it thinks the second query will fetch
1500 times as many rows as the first.  Now that I look more carefully
I see that both of those row estimates are off, the first by a lot.
Have you ANALYZEd this table lately?  If you have, it might be a good
idea to increase the statistics target for the tstamp column.

You might also need to play around with random_page_cost, but it will be
counterproductive to touch that until the rowcount estimates are in the
vicinity of reality.

            regards, tom lane

Re: Strange Index behavior

От
Együd Csaba
Дата:
Yes, I vacuum analyze it once in every hour.

Thank you for the suggestions, now I'm getting understand the point...

With default_statistics_target=500 and random_page_cost=0.1 and reanalyzing
the db the planner always uses the index. Certainly if there were more rows
in the table the planner somewhere would switch to seq scan.


Some exlpain results:
------------------------
#explain analyze select tstamp, pp, pm, status from measured_1 where tstamp
>= '2004.12.22 00:00' and tstamp <= '2004.12.22 23:59' order by tstamp;
"Index Scan using measured_1_pkey on measured_1  (cost=0.00..2258.91
rows=26769 width=38) (actual time=0.000..271.000 rows=26743 loops=1)"
"  Index Cond: ((tstamp >= '2004-12-22 00:00:00'::timestamp without time
zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
"Total runtime: 361.000 ms"

#explain analyze select tstamp, pp, pm, status from measured_1 where tstamp
>= '2004.12.22 13:00' and tstamp <= '2004.12.22 23:59' order by tstamp;
"Index Scan using measured_1_pkey on measured_1  (cost=0.00..1560.35
rows=18377 width=38) (actual time=0.000..140.000 rows=18424 loops=1)"
"  Index Cond: ((tstamp >= '2004-12-22 13:00:00'::timestamp without time
zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
"Total runtime: 200.000 ms"


Tom and Martijn,
many thanks.

-- Csaba

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, December 22, 2004 7:01 PM
To: Együd Csaba (Freemail)
Cc: 'Együd Csaba'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Strange Index behavior

=?iso-8859-2?Q?Egy=FCd_Csaba_=28Freemail=29?= <csegyud@freemail.hu> writes:
> The difference between the result times is 16x. I can't understand why
> the planner thinks it is the better way...

The planner thinks that because it thinks the second query will fetch 1500
times as many rows as the first.  Now that I look more carefully I see that
both of those row estimates are off, the first by a lot.
Have you ANALYZEd this table lately?  If you have, it might be a good idea
to increase the statistics target for the tstamp column.

You might also need to play around with random_page_cost, but it will be
counterproductive to touch that until the rowcount estimates are in the
vicinity of reality.

            regards, tom lane


--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.