Re: *very* inefficient choice made by the planner (regarding

От: Jean-Luc Lachance
Тема: Re: *very* inefficient choice made by the planner (regarding
Дата: ,
Msg-id: 40C881FE.5000301@sympatico.ca
(см: обсуждение, исходный текст)
Ответ на: *very* inefficient choice made by the planner (regarding IN(...))  (Frank van Vugt)
Ответы: Re: *very* inefficient choice made by the planner (regarding  (Tom Lane)
Список: pgsql-performance

Скрыть дерево обсуждения

*very* inefficient choice made by the planner (regarding IN(...))  (Frank van Vugt, )
 Re: *very* inefficient choice made by the planner (regarding  (Stephan Szabo, )
 Re: *very* inefficient choice made by the planner (regarding IN(...))  (Tom Lane, )
  Re: *very* inefficient choice made by the planner (regarding IN(...))  (Frank van Vugt, )
 Re: *very* inefficient choice made by the planner (regarding  (Jean-Luc Lachance, )
  Re: *very* inefficient choice made by the planner (regarding  (Tom Lane, )
   Re: *very* inefficient choice made by the planner (regarding  (Jean-Luc Lachance, )
    Re: *very* inefficient choice made by the planner (regarding  (Stephan Szabo, )
     Re: *very* inefficient choice made by the planner (regarding  (Stephan Szabo, )
 Re: *very* inefficient choice made by the planner (regarding  (SZUCS Gábor, )
  Re: *very* inefficient choice made by the planner (regarding  (Tom Lane, )
  Re: *very* inefficient choice made by the planner (regarding  (Stephan Szabo, )

The real question is:

If the two statments are functionally equivalent, why can't PG rewrite
the "NOT IN" version into the more efficient "NOT EXISTS"?



Frank van Vugt wrote:

> L.S.
>
> Could anybody explain why the planner is doing what it is doing?
>
> What could I do to make it easier to choose a better plan?
>
>
>
> *********
> Summary
> *********
> On a freshly vacuum/analysed pair of tables with 7389 and 64333 records, this:
>
> select id from location where id not in (select location_id from
> location_carrier);
>
> takes 581546,497 ms
>
>
> While a variant like:
>
> select id from location where not exists (select 1 from location_carrier where
> location_id = location.id);
>
> takes only 124,625 ms
>
>
> *********
> Details
> *********
> =# select version();
>                                version
> ---------------------------------------------------------------------
>  PostgreSQL 7.4.2 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
> (1 row)
>
>
> =# \d location
>                Table "public.location"
>    Column   |            Type             | Modifiers
> ------------+-----------------------------+-----------
>  id         | integer                     | not null
> Indexes:
>     "location_pkey" primary key, btree (id)
>
>
> =# select count(*) from location;
>  count
> -------
>   7389
> (1 row)
>
>
> =# \d location_carrier
>                 Table "public.location_carrier"
>        Column        |            Type             | Modifiers
> ---------------------+-----------------------------+-----------
>  location_id         | integer                     | not null
>  carrier_id          | integer                     | not null
> Indexes:
>     "location_carrier_pkey" primary key, btree (location_id, carrier_id)
>
>
> =# select count(*) from location_carrier;
>  count
> -------
>  64333
> (1 row)
>
>
> =# explain select id from location where id not in (select location_id from
> location_carrier);
>                                   QUERY PLAN
> -------------------------------------------------------------------------------
>  Seq Scan on "location"  (cost=0.00..5077093.72 rows=3695 width=4)
>    Filter: (NOT (subplan))
>    SubPlan
>      ->  Seq Scan on location_carrier  (cost=0.00..1213.33 rows=64333 width=4)
> (4 rows)
>
>
> =# explain analyse select id from location where id not in (select location_id
> from location_carrier);
>                                                            QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on "location"  (cost=0.00..5077093.72 rows=3695 width=4) (actual
> time=248.310..581541.483 rows=240 loops=1)
>    Filter: (NOT (subplan))
>    SubPlan
>      ->  Seq Scan on location_carrier  (cost=0.00..1213.33 rows=64333 width=4)
> (actual time=0.007..48.517 rows=19364 loops=7389)
>  Total runtime: 581542.560 ms
> (5 rows)
>
> Time: 581546,497 ms
>
>
> =# explain analyse select id from location l left outer join location_carrier
> lc on l.id = lc.location_id where lc.location_id is null;
>                                                                           QUERY
> PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Left Join  (cost=0.00..3022.51 rows=7389 width=4) (actual
> time=0.083..435.841 rows=240 loops=1)
>    Merge Cond: ("outer".id = "inner".location_id)
>    Filter: ("inner".location_id IS NULL)
>    ->  Index Scan using location_pkey on "location" l  (cost=0.00..258.85
> rows=7389 width=4) (actual time=0.041..26.211 rows=7389 loops=1)
>    ->  Index Scan using location_carrier_pkey on location_carrier lc
> (cost=0.00..1941.22 rows=64333 width=4) (actual time=0.015..238.305
> rows=64333 loops=1)
>  Total runtime: 436.213 ms
> (6 rows)
>
> Time: 440,787 ms
>
>
> megafox=# explain analyse select id from location where not exists (select 1
> from location_carrier where location_id = location.id);
>                                                                      QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on "location"  (cost=0.00..13242.14 rows=3695 width=4) (actual
> time=0.078..120.785 rows=240 loops=1)
>    Filter: (NOT (subplan))
>    SubPlan
>      ->  Index Scan using location_carrier_pkey on location_carrier
> (cost=0.00..17.61 rows=10 width=0) (actual time=0.011..0.011 rows=1
> loops=7389)
>            Index Cond: (location_id = $0)
>  Total runtime: 121.165 ms
> (6 rows)
>
> Time: 124,625 ms
>
>
>
>
>
>
>



В списке pgsql-performance по дате сообщения:

От: Stephan Szabo
Дата:
Сообщение: Re: *very* inefficient choice made by the planner (regarding
От: Nick Trainor
Дата:
Сообщение: ORDER BY user defined function performance issues