Обсуждение: BUG #4170: Rows estimation which are cast from TEXT is inaccurate.
The following bug has been logged online:
Bug reference: 4170
Logged by: Tashuhito Kasahara
Email address: kasahara.tatsuhito@oss.ntt.co.jp
PostgreSQL version: 8.3.1
Operating system: Linux
Description: Rows estimation which are cast from TEXT is inaccurate.
Details:
I noticed that rows estimation is not accurate when we cast some datetype to
TEXT.
See the following example. (TEXT -> TIMESTAMP)
============================================================================
====
test=# SELECT count(*) FROM test WHERE t < '2008-05-14 23:55:00';
count
-------
86099
(1 row)
test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14 23:55:00';
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on test (cost=0.00..1727.00 rows=85721 width=12)
Filter: (t < '2008-05-14 23:55:00'::timestamp without time zone)
(2 rows)
test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14
23:55:00'::text::timestamp;
QUERY PLAN
----------------------------------------------------------------------------
Seq Scan on test (cost=0.00..2209.00 rows=32133 width=12) <- too little
number of the estimates
Filter: (t < ('2008-05-14 23:55:00'::text)::timestamp without time zone)
(2 rows)
test=# SELECT count(*) FROM test WHERE t < '2008-05-14 23:55:00';
count
-------
86099
(1 row)
============================================================================
====
We can avoid this problem by setting appropriate cast-function.
============================================================================
====
CREATE FUNCTION text2timestamp(text) RETURNS timestamp AS
$$
SELECT timestamp_in(textout($1), 0, 0);
$$
LANGUAGE sql STRICT STABLE;
CREATE CAST (text AS timestamp) WITH FUNCTION text2timestamp(text) AS
ASSIGNMENT;
test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14
23:55:00'::text::timestamp;
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on test (cost=0.00..1968.00 rows=85721 width=12)
Filter: (t < timestamp_in('2008-05-14 23:55:00'::cstring, 0::oid, 0))
(2 rows)
============================================================================
====
I think it's a bug and will be troubled at plan optimization.
Best regards.
"Tashuhito Kasahara" <kasahara.tatsuhito@oss.ntt.co.jp> writes:
> test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14
> 23:55:00'::text::timestamp;
> QUERY PLAN
> ----------------------------------------------------------------------------
> Seq Scan on test (cost=0.00..2209.00 rows=32133 width=12) <- too little
> number of the estimates
> Filter: (t < ('2008-05-14 23:55:00'::text)::timestamp without time zone)
> (2 rows)
Hmm ... as of 8.3 this will generate a CoerceViaIO node, and it looks
like I forgot to teach eval_const_expressions how to simplify those.
regards, tom lane
Hi.
Tom Lane wrote:
> "Tashuhito Kasahara" <kasahara.tatsuhito@oss.ntt.co.jp> writes:
>> test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14
>> 23:55:00'::text::timestamp;
>> QUERY PLAN
>> ----------------------------------------------------------------------------
>
>> Seq Scan on test (cost=0.00..2209.00 rows=32133 width=12) <- too little
>> number of the estimates
>> Filter: (t < ('2008-05-14 23:55:00'::text)::timestamp without time zone)
>> (2 rows)
>
> Hmm ... as of 8.3 this will generate a CoerceViaIO node, and it looks
> like I forgot to teach eval_const_expressions how to simplify those.
Relevant issues ocurred on PostgreSQL versions 7.4 also.
8.2, 8.1 and 8.0 seemed to be accurate estimates on simple test.
============= 7.4
test=# EXPLAIN ANALYZE SELECT * FROM test WHERE id < '2008-5-14 00:01:00'::text::timestamp;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..224.98 rows=3667 width=8) (actual time=0.026..30.987 rows=60 loops=1)
Filter: (id < ('2008-5-14 00:01:00'::text)::timestamp without time zone)
Total runtime: 31.074 ms
(3 rows)
============= 8.2
test=# EXPLAIN ANALYZE SELECT * FROM test WHERE id < '2008-5-14 00:01:00'::text::timestamp;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..224.98 rows=55 width=8) (actual time=0.043..13.896 rows=60 loops=1)
Filter: (id < ('2008-5-14 00:01:00'::text)::timestamp without time zone)
Total runtime: 13.951 ms
(3 rows)
============= 8.1
test=# EXPLAIN ANALYZE SELECT * FROM test WHERE id < '2008-5-14 00:01:00'::text::timestamp;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..224.98 rows=60 width=8) (actual time=0.034..15.409 rows=60 loops=1)
Filter: (id < ('2008-5-14 00:01:00'::text)::timestamp without time zone)
Total runtime: 15.464 ms
(3 rows)
============= 8.0
test=# EXPLAIN ANALYZE SELECT * FROM test WHERE id < '2008-5-14 00:01:00'::text::timestamp;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..224.98 rows=63 width=8) (actual time=0.041..18.645 rows=60 loops=1)
Filter: (id < ('2008-5-14 00:01:00'::text)::timestamp without time zone)
Total runtime: 18.706 ms
(3 rows)
=============
Best regards.
--
Tatsuhito Kasahara
kasahara.tatsuhito@oss.ntt.co.jp