Обсуждение: Weird row estimate

Поиск
Список
Период
Сортировка

Weird row estimate

От
Marc Cousin
Дата:

Hi,

I'm having a weird problem on a query :

I've simplified it to get the significant part (see end of message).

The point is I've got a simple

SELECT field FROM table WHERE 'condition1'

Estimated returned rows : 5453

Then

SELECT field FROM table WHERE 'condition2'

Estimated returned rows : 705

Then

SELECT field FROM table WHERE 'condition1' OR 'condition2'

Estimated returned rows : 143998

Condition2 is a bit complicated (it's a subquery).

Nevertheless, shouldn't the third estimate be smaller or equal to the sum of the two others ?

Postgresql is 8.2.4 on Linux, stats are up to date,

show default_statistics_target;

default_statistics_target

---------------------------

1000

Any ideas ?

explain analyze

SELECT stc.CMD_ID

FROM STOL_STC stc

WHERE (stc.STC_DATE>='2007-07-05' AND stc.STC_DATEPLAN<='2007-07-05');

QUERY PLAN

------------------------------------------------------------------------------------------------------------------

Seq Scan on stol_stc stc (cost=0.00..24265.15 rows=5453 width=8) (actual time=17.186..100.941 rows=721 loops=1)

Filter: ((stc_date >= '2007-07-05'::date) AND (stc_dateplan <= '2007-07-05'::date))

Total runtime: 101.656 ms

(3 rows)

explain analyze

SELECT stc.CMD_ID

FROM STOL_STC stc

WHERE stc.STC_ID IN

(SELECT STC_ID FROM STOL_TRJ

WHERE TRJ_DATEARRT>='2007-07-05'

AND TRJ_DATEDEPT>=TRJ_DATEARRT

AND (TRJ_DATEDEPT<='2007-07-05'

OR TRJ_DATECREAT<='2007-07-05') );

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=4649.62..10079.52 rows=705 width=8) (actual time=6.266..13.037 rows=640 loops=1)

-> HashAggregate (cost=4649.62..4657.13 rows=751 width=8) (actual time=6.242..6.975 rows=648 loops=1)

-> Index Scan using stol_trj_fk5 on stol_trj (cost=0.00..4647.61 rows=803 width=8) (actual time=0.055..4.901 rows=688 loops=1)

Index Cond: (trj_datearrt >= '2007-07-05'::date)

Filter: ((trj_datedept >= trj_datearrt) AND ((trj_datedept <= '2007-07-05'::date) OR (trj_datecreat <= '2007-07-05'::date)))

-> Index Scan using stol_stc_pk on stol_stc stc (cost=0.00..7.21 rows=1 width=16) (actual time=0.004..0.005 rows=1 loops=648)

Index Cond: (stc.stc_id = stol_trj.stc_id)

Total runtime: 13.765 ms

(8 rows)

explain analyze

SELECT stc.CMD_ID

FROM STOL_STC stc

WHERE (stc.STC_DATE>='2007-07-05' AND stc.STC_DATEPLAN<='2007-07-05')

OR

(stc.STC_ID IN

(SELECT STC_ID FROM STOL_TRJ

WHERE TRJ_DATEARRT>='2007-07-05'

AND TRJ_DATEDEPT>=TRJ_DATEARRT

AND (TRJ_DATEDEPT<='2007-07-05'

OR TRJ_DATECREAT<='2007-07-05') ));

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------

Seq Scan on stol_stc stc (cost=4649.62..29621.12 rows=143998 width=8) (actual time=21.564..146.365 rows=1048 loops=1)

Filter: (((stc_date >= '2007-07-05'::date) AND (stc_dateplan <= '2007-07-05'::date)) OR (hashed subplan))

SubPlan

-> Index Scan using stol_trj_fk5 on stol_trj (cost=0.00..4647.61 rows=803 width=8) (actual time=0.054..4.941 rows=688 loops=1)

Index Cond: (trj_datearrt >= '2007-07-05'::date)

Filter: ((trj_datedept >= trj_datearrt) AND ((trj_datedept <= '2007-07-05'::date) OR (trj_datecreat <= '2007-07-05'::date)))

Total runtime: 147.407 ms

SELECT count(*) from stol_stc ;

count

--------

140960

(1 row)

Re: Weird row estimate

От
Tom Lane
Дата:
Marc Cousin <mcousin@sigma.fr> writes:
> Nevertheless, shouldn't the third estimate be smaller or equal to the sum of the two others ?

The planner's estimation for subplan conditions is pretty primitive
compared to joinable conditions.  When you add the OR, it's no longer
possible to treat the IN like a join, and everything gets an order of
magnitude dumber :-(

It might be worth trying this as a UNION of the two simple queries.

            regards, tom lane

Re: Weird row estimate

От
Marc Cousin
Дата:
Le Wednesday 11 July 2007 22:35:31 Tom Lane, vous avez écrit :
> Marc Cousin <mcousin@sigma.fr> writes:
> > Nevertheless, shouldn't the third estimate be smaller or equal to the sum
> > of the two others ?
>
> The planner's estimation for subplan conditions is pretty primitive
> compared to joinable conditions.  When you add the OR, it's no longer
> possible to treat the IN like a join, and everything gets an order of
> magnitude dumber :-(
>
> It might be worth trying this as a UNION of the two simple queries.

Yes, it's much better on this query with a UNION.
The problem is that this is a small set of the query, and there are several
nested IN with an OR condition... But at least now I understand where it
comes from.
Thanks a lot.