Обсуждение: Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

Поиск

Список

Период

Сортировка

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

"Ing. Marcos Ortiz Valmaseda"

Дата:

17 января 2011 г., 15:44:53

Query is :
SELECT distinct m.id,coalesce(m.givenname,''),
       coalesce(m.midname,''),
       m.surname from marinerstates ms,vessels vsl,mariner m
WHERE  m.id=ms.marinerid and ms.vslid=vsl.id
AND    ms.state='Active' and coalesce(ms.endtime,now())::date >= '2006-07-15'
AND    ms.starttime::date <= '2007-01-11'  AND
       m.marinertype='Mariner'  and m.id
NOT IN (SELECT distinct mold.id
        FROM marinerstates msold,
        vessels vslold,
        mariner mold
        WHERE mold.id=msold.marinerid
        AND  msold.vslid=vslold.id
        AND msold.state='Active'
        AND coalesce(msold.endtime,now())::date >= '2006-07-15'
        AND msold.starttime::date <= '2007-01-11'
        AND EXISTS (SELECT 1
                    FROM marinerstates msold2
                    WHERE msold2.marinerid=msold.marinerid
                    AND msold2.state='Active'
                    AND msold2.id <> msold.id
                    AND msold2.starttime<msold.starttime
                    AND (msold.starttime-msold2.endtime)<='18 months')
                    AND mold.marinertype='Mariner' )
       ORDER BY m.surname,coalesce(m.givenname,'')
      ,coalesce(m.midname,'');

i get the following execution times: (with \timing)
FBSD_DEV : query : 240.419 ms
LINUX_PROD : query : 219.568 ms
FBSD_TEST : query :  2285.509 ms
LINUX_TEST : query : 5788.988 ms

Re writing the query in the "NOT EXIST" variation like:

SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels
vsl,marinerm where  
m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date >= '2006-07-15' and
ms.starttime::date <= '2007-01-11'  and m.marinertype='Mariner'  and NOT EXISTS
   (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold  where mold.id=msold.marinerid and
msold.vslid=vslold.idand  
   msold.state='Active' and coalesce(msold.endtime,now())::date >= '2006-07-15' and msold.starttime::date <=
'2007-01-11'and  
   exists (select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and
msold2.id<> msold.id and  
              msold2.starttime<msold.starttime AND (msold.starttime-msold2.endtime)<='18 months')
   and mold.marinertype='Mariner' AND mold.id=m.id)
order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'');
gives:

FBSD_DEV : query : 154.000 ms
LINUX_PROD : query : 153.408 ms
FBSD_TEST : query :  137.000 ms
LINUX_TEST : query : 404.000 ms


Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS
instead NOT IN, because the first clause has a better performance. So, you can use it on that way.

Other questions?
- Do you have a partial index on marinerstates.marinerid where this condition is accomplished?
- Do you have a index on mariner.id?
- Can you provide a explain of these queries on the PostgreSQL-9.0 machines?

Regards


Ing. Marcos Luís Ortíz Valmaseda
Linux User # 418229 && PostgreSQL DBA
Centro de Tecnologías Gestión de Datos (DATEC)
http://postgresql.uci.cu
http://www.postgresql.org
http://it.toolbox.com/blogs/sql-apprentice

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Achilleas Mantzios

Дата:

18 января 2011 г., 07:06:15

Thanx,

Στις Monday 17 January 2011 18:52:27 ο/η Ing. Marcos Ortiz Valmaseda έγραψε:
>
> Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS
> instead NOT IN, because the first clause has a better performance. So, you can use it on that way.
>
You mean this?
(from 8.4 changes)
"Create explicit concepts of semi-joins and anti-joins (Tom)
       This work formalizes our previous ad-hoc treatment of IN (SELECT
       ...) clauses, and extends it to EXISTS and NOT EXISTS clauses. It
       should result in significantly better planning of EXISTS and NOT
       EXISTS queries. In general, logically equivalent IN and EXISTS
       clauses should now have similar performance, whereas previously IN
       often won."

I haven't found any other recent reference to this issue. And this is far from what you suggest.
Here the entry talks about "similar" performance.

Also a similar issue was hot back in 7.4 days :
"IN / NOT IN subqueries are now much more efficient
          In previous releases, IN/NOT IN subqueries were joined to the
          upper query by sequentially scanning the subquery looking for a
          match. The 7.4 code uses the same sophisticated techniques used
          by ordinary joins and so is much faster. An IN will now usually
          be as fast as or faster than an equivalent EXISTS subquery; this
          reverses the conventional wisdom that applied to previous
          releases."

> Other questions?
> - Do you have a partial index on marinerstates.marinerid where this condition is accomplished?
No, but i just tried it (on state='Active') with no impact.
> - Do you have a index on mariner.id?
Yes, It is the primary key.
> - Can you provide a explain of these queries on the PostgreSQL-9.0 machines?
Sure, first i'll post the table definitions and then some stats and then the epxlain analyze(s)

mariner
=====
 id                           | integer                  | not null default
nextval(('public.mariner_id_seq'::text)::regclass)
 givenname                    | character varying(200)   |
 midname                      | character varying(100)   |
 surname                      | character varying(200)   | not null
...
Indexes:
    "mariner_pkey" PRIMARY KEY, btree (id)
    "mariner_smauid" UNIQUE, btree (smauid)
    "mariner_username_key" UNIQUE, btree (username)
    "mariner_nationalityid" btree (nationalityid)
    "mariner_parentid" btree (parentid)
    "mariner_surname" btree (surname)

marinerstates
========
 id                     | integer                  | not null default
nextval(('public.marinerstates_id_seq'::text)::regclass)
 marinerid              | integer                  | not null
 state                  | character varying(20)    | not null
 vslid                  | integer                  |
 leave_period_days      | integer                  |
 comment                | text                     |
 starttime              | timestamp with time zone | not null
 endtime                | timestamp with time zone |
 trid                   | integer                  |
 sal_bw                 | real                     | not null default 0.0
 sal_ot                 | real                     | not null default 0.0
 sal_lp                 | real                     | not null default 0.0
 sal_misc               | real                     | not null default 0.0
 rankid                 | integer                  |
 system_vslid           | integer                  |
 startport              | text                     |
 endport                | text                     |
.....
Indexes:
    "marinerstates_pkey" PRIMARY KEY, btree (id)
    "marinerstates_mariner_cur_state" UNIQUE, btree (marinerid) WHERE endtime IS NULL
    "marinerstates_system_vslid" UNIQUE, btree (marinerid, system_vslid)
    "marinerstates__system_vslid" btree (system_vslid)
    "marinerstates_cur_mariners_states" btree (endtime) WHERE endtime IS NULL
    "marinerstates_mariner_past_state" btree (marinerid, starttime, endtime) WHERE endtime IS NOT NULL
    "marinerstates_marinerid" btree (marinerid)
    "marinerstates_marinerid_starttime" btree (marinerid, starttime)
    "marinerstates_rankid" btree (rankid)
    "marinerstates_rankid_cur_mariners" btree (rankid) WHERE endtime IS NULL
    "marinerstates_rankid_past_state" btree (rankid, starttime, endtime) WHERE endtime IS NOT NULL
    "marinerstates_state" btree (state)
    "marinerstates_state_cur_mariners" btree (state) WHERE endtime IS NULL
    "marinerstates_state_past_state" btree (state, starttime, endtime) WHERE endtime IS NOT NULL
    "marinerstates_vslid" btree (vslid)
    "marinerstates_vslid_cur_mariners" btree (vslid) WHERE endtime IS NULL
    "marinerstates_vslid_past_state" btree (vslid, starttime, endtime) WHERE endtime IS NOT NULL

vessels
=====
 name               | character varying(200) | not null
 id                 | integer                | not null default nextval(('public.vessels_id_seq'::text)::regclass)
...
Indexes:
    "vessels_pkey" PRIMARY KEY, btree (id)
    "vessels_name_key" UNIQUE, btree (name)
    "idx_name" btree (name)
    "vessels_flag" btree (flag)
    "vessels_groupno" btree (groupno)
    "vessels_vslstatus_idx" btree (vslstatus)

dynacom=# SELECT count(*) from mariner;
 count
-------
 14447

dynacom=# SELECT count(*) from marinerstates;
 count
-------
 51013

dynacom=# SELECT avg(marqry.cnt),stddev(marqry.cnt) FROM (SELECT m.id,count(ms.id) as cnt from mariner m, marinerstates
msWHERE m.id=ms.marinerid group by m.id) AS marqry; 
        avg         |       stddev
--------------------+--------------------
 3.5665944207508914 | 4.4416879361829170

(vessels do not play any impact in the query, so i'll leave them out)

Slow plan in 9.0.2 :
                                                                                                      QUERY PLAN
                                                                                               

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10462.561..10462.937 rows=603 loops=1)
   ->  Sort  (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10462.560..10462.664 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
         Sort Method:  quicksort  Memory: 71kB
         ->  Hash Join  (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10425.261..10461.621 rows=603 loops=1)
               Hash Cond: (ms.marinerid = m.id)
               ->  Hash Join  (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.228..34.993 rows=2625 loops=1)
                     Hash Cond: (ms.vslid = vsl.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2889.32 rows=4590 width=8) (actual
time=0.011..33.494rows=2625 loops=1) 
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND
((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) 
                     ->  Hash  (cost=16.72..16.72 rows=272 width=4) (actual time=0.207..0.207 rows=272 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 10kB
                           ->  Seq Scan on vessels vsl  (cost=0.00..16.72 rows=272 width=4) (actual time=0.004..0.118
rows=272loops=1) 
               ->  Hash  (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10424.994..10424.994 rows=12832
loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 702kB
                     ->  Seq Scan on mariner m  (cost=7614.86..8172.57 rows=7143 width=23) (actual
time=10409.498..10419.971rows=12832 loops=1) 
                           Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text))
                           SubPlan 1
                             ->  Unique  (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.495..10408.446
rows=1454loops=1) 
                                   ->  Nested Loop  (cost=2768.00..7614.86 rows=1 width=4) (actual
time=87.493..10407.517rows=1835 loops=1) 
                                         Join Filter: (msold.marinerid = mold.id)
                                         ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..1728.60
rows=14286width=4) (actual time=0.007..13.931 rows=14286 loops=1) 
                                               Filter: ((marinertype)::text = 'Mariner'::text)
                                         ->  Materialize  (cost=2768.00..5671.97 rows=1 width=8) (actual
time=0.003..0.330rows=1876 loops=14286) 
                                               ->  Nested Loop  (cost=2768.00..5671.96 rows=1 width=8) (actual
time=39.723..85.401rows=1876 loops=1) 
                                                     ->  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12)
(actualtime=39.708..81.501 rows=1876 loops=1) 
                                                           Hash Cond: (msold.marinerid = msold2.marinerid)
                                                           Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime
<msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) 
                                                           ->  Seq Scan on marinerstates msold  (cost=0.00..2889.32
rows=4590width=20) (actual time=0.003..33.952 rows=2625 loops=1) 
                                                                 Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) 
                                                           ->  Hash  (cost=2251.66..2251.66 rows=41307 width=24)
(actualtime=39.613..39.613 rows=41250 loops=1) 
                                                                 Buckets: 8192  Batches: 1  Memory Usage: 2246kB
                                                                 ->  Seq Scan on marinerstates msold2
(cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.882 rows=41250 loops=1) 
                                                                       Filter: ((state)::text = 'Active'::text)
                                                     ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.28rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1876) 
                                                           Index Cond: (vslold.id = msold.vslid)
 Total runtime: 10463.619 ms
(37 rows)

Fast plan in 8.3.13 :
                                                                                                          QUERY PLAN
                                                                                                       

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=633677.56..633700.48 rows=1834 width=23) (actual time=543.684..551.003 rows=603 loops=1)
   ->  Sort  (cost=633677.56..633682.14 rows=1834 width=23) (actual time=543.676..546.070 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
         Sort Method:  quicksort  Memory: 53kB
         ->  Hash Join  (cost=630601.65..633578.15 rows=1834 width=23) (actual time=439.969..540.573 rows=603 loops=1)
               Hash Cond: (ms.vslid = vsl.id)
               ->  Hash Join  (cost=630580.33..633530.01 rows=2261 width=27) (actual time=437.459..532.847 rows=603
loops=1)
                     Hash Cond: (ms.marinerid = m.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2875.32 rows=4599 width=8) (actual
time=0.017..80.153rows=2625 loops=1) 
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND
((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) 
                     ->  Hash  (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307 rows=12832
loops=1)
                           ->  Index Scan using mariner_pkey on mariner m  (cost=628776.89..630491.54 rows=7103
width=23)(actual time=311.023..380.168 rows=12832 loops=1) 
                                 Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text))
                                 SubPlan
                                   ->  Unique  (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981
rows=1454loops=1) 
                                         ->  Nested Loop  (cost=0.00..628767.72 rows=1834 width=4) (actual
time=0.120..289.961rows=1835 loops=1) 
                                               ->  Nested Loop  (cost=0.00..627027.98 rows=1865 width=4) (actual
time=0.099..237.128rows=1876 loops=1) 
                                                     ->  Index Scan using marinerstates_marinerid on marinerstates
msold (cost=0.00..626316.07 rows=2299 width=8) (actual time=0.079..186.150 rows=1876 loops=1) 
                                                           Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan)) 
                                                           SubPlan
                                                             ->  Bitmap Heap Scan on marinerstates msold2
(cost=4.28..12.11rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=2625) 
                                                                   Recheck Cond: ((marinerid = $0) AND (starttime <
$2))
                                                                   Filter: ((id <> $1) AND ((state)::text =
'Active'::text)AND (($2 - endtime) <= '1 year 6 mons'::interval)) 
                                                                   ->  Bitmap Index Scan on
marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) (actual time=0.009..0.009 rows=6 loops=2625) 
                                                                         Index Cond: ((marinerid = $0) AND (starttime <
$2))
                                                     ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.30rows=1 width=4) (actual time=0.006..0.010 rows=1 loops=1876) 
                                                           Index Cond: (vslold.id = msold.vslid)
                                               ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..0.92
rows=1width=4) (actual time=0.007..0.012 rows=1 loops=1876) 
                                                     Index Cond: (mold.id = msold.marinerid)
                                                     Filter: ((mold.marinertype)::text = 'Mariner'::text)
               ->  Hash  (cost=17.81..17.81 rows=281 width=4) (actual time=2.491..2.491 rows=273 loops=1)
                     ->  Seq Scan on vessels vsl  (cost=0.00..17.81 rows=281 width=4) (actual time=0.012..1.306
rows=273loops=1) 
 Total runtime: 553.601 ms
(33 rows)

Is there any other data i could post (pg_stat,...) that would help?

thanx a lot.

>
> Regards
>
>
> Ing. Marcos Luís Ortíz Valmaseda
> Linux User # 418229 && PostgreSQL DBA
> Centro de Tecnologías Gestión de Datos (DATEC)
> http://postgresql.uci.cu
> http://www.postgresql.org
> http://it.toolbox.com/blogs/sql-apprentice
>



--
Achilleas Mantzios

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Mladen Gogala

Дата:

18 января 2011 г., 13:31:51

Achilleas Mantzios wrote:
> Thanx,
>
> Στις Monday 17 January 2011 18:52:27 ο/η Ing. Marcos Ortiz Valmaseda έγραψε:
>
>> Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS
>> instead NOT IN, because the first clause has a better performance. So, you can use it on that way.
>>
>>
> You mean this?
> (from 8.4 changes)
> "Create explicit concepts of semi-joins and anti-joins (Tom)
>        This work formalizes our previous ad-hoc treatment of IN (SELECT
>        ...) clauses, and extends it to EXISTS and NOT EXISTS clauses. It
>        should result in significantly better planning of EXISTS and NOT
>        EXISTS queries. In general, logically equivalent IN and EXISTS
>        clauses should now have similar performance, whereas previously IN
>        often won."
>
> I haven't found any other recent reference to this issue. And this is far from what you suggest.
> Here the entry talks about "similar" performance.
>
> Also a similar issue was hot back in 7.4 days :
> "IN / NOT IN subqueries are now much more efficient
>           In previous releases, IN/NOT IN subqueries were joined to the
>           upper query by sequentially scanning the subquery looking for a
>           match. The 7.4 code uses the same sophisticated techniques used
>           by ordinary joins and so is much faster. An IN will now usually
>           be as fast as or faster than an equivalent EXISTS subquery; this
>           reverses the conventional wisdom that applied to previous
>           releases."
>
>
>> Other questions?
>> - Do you have a partial index on marinerstates.marinerid where this condition is accomplished?
>>
> No, but i just tried it (on state='Active') with no impact.
>
>> - Do you have a index on mariner.id?
>>
> Yes, It is the primary key.
>
>> - Can you provide a explain of these queries on the PostgreSQL-9.0 machines?
>>
> Sure, first i'll post the table definitions and then some stats and then the epxlain analyze(s)
>
> mariner
> =====
>  id                           | integer                  | not null default
nextval(('public.mariner_id_seq'::text)::regclass)
>  givenname                    | character varying(200)   |
>  midname                      | character varying(100)   |
>  surname                      | character varying(200)   | not null
> ...
> Indexes:
>     "mariner_pkey" PRIMARY KEY, btree (id)
>     "mariner_smauid" UNIQUE, btree (smauid)
>     "mariner_username_key" UNIQUE, btree (username)
>     "mariner_nationalityid" btree (nationalityid)
>     "mariner_parentid" btree (parentid)
>     "mariner_surname" btree (surname)
>
> marinerstates
> ========
>  id                     | integer                  | not null default
nextval(('public.marinerstates_id_seq'::text)::regclass)
>  marinerid              | integer                  | not null
>  state                  | character varying(20)    | not null
>  vslid                  | integer                  |
>  leave_period_days      | integer                  |
>  comment                | text                     |
>  starttime              | timestamp with time zone | not null
>  endtime                | timestamp with time zone |
>  trid                   | integer                  |
>  sal_bw                 | real                     | not null default 0.0
>  sal_ot                 | real                     | not null default 0.0
>  sal_lp                 | real                     | not null default 0.0
>  sal_misc               | real                     | not null default 0.0
>  rankid                 | integer                  |
>  system_vslid           | integer                  |
>  startport              | text                     |
>  endport                | text                     |
> .....
> Indexes:
>     "marinerstates_pkey" PRIMARY KEY, btree (id)
>     "marinerstates_mariner_cur_state" UNIQUE, btree (marinerid) WHERE endtime IS NULL
>     "marinerstates_system_vslid" UNIQUE, btree (marinerid, system_vslid)
>     "marinerstates__system_vslid" btree (system_vslid)
>     "marinerstates_cur_mariners_states" btree (endtime) WHERE endtime IS NULL
>     "marinerstates_mariner_past_state" btree (marinerid, starttime, endtime) WHERE endtime IS NOT NULL
>     "marinerstates_marinerid" btree (marinerid)
>     "marinerstates_marinerid_starttime" btree (marinerid, starttime)
>     "marinerstates_rankid" btree (rankid)
>     "marinerstates_rankid_cur_mariners" btree (rankid) WHERE endtime IS NULL
>     "marinerstates_rankid_past_state" btree (rankid, starttime, endtime) WHERE endtime IS NOT NULL
>     "marinerstates_state" btree (state)
>     "marinerstates_state_cur_mariners" btree (state) WHERE endtime IS NULL
>     "marinerstates_state_past_state" btree (state, starttime, endtime) WHERE endtime IS NOT NULL
>     "marinerstates_vslid" btree (vslid)
>     "marinerstates_vslid_cur_mariners" btree (vslid) WHERE endtime IS NULL
>     "marinerstates_vslid_past_state" btree (vslid, starttime, endtime) WHERE endtime IS NOT NULL
>
> vessels
> =====
>  name               | character varying(200) | not null
>  id                 | integer                | not null default nextval(('public.vessels_id_seq'::text)::regclass)
> ...
> Indexes:
>     "vessels_pkey" PRIMARY KEY, btree (id)
>     "vessels_name_key" UNIQUE, btree (name)
>     "idx_name" btree (name)
>     "vessels_flag" btree (flag)
>     "vessels_groupno" btree (groupno)
>     "vessels_vslstatus_idx" btree (vslstatus)
>
> dynacom=# SELECT count(*) from mariner;
>  count
> -------
>  14447
>
> dynacom=# SELECT count(*) from marinerstates;
>  count
> -------
>  51013
>
> dynacom=# SELECT avg(marqry.cnt),stddev(marqry.cnt) FROM (SELECT m.id,count(ms.id) as cnt from mariner m,
marinerstatesms WHERE m.id=ms.marinerid group by m.id) AS marqry; 
>         avg         |       stddev
> --------------------+--------------------
>  3.5665944207508914 | 4.4416879361829170
>
> (vessels do not play any impact in the query, so i'll leave them out)
>
> Slow plan in 9.0.2 :
>                                                                                                       QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Unique  (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10462.561..10462.937 rows=603 loops=1)
>    ->  Sort  (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10462.560..10462.664 rows=603 loops=1)
>          Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
>          Sort Method:  quicksort  Memory: 71kB
>          ->  Hash Join  (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10425.261..10461.621 rows=603
loops=1)
>                Hash Cond: (ms.marinerid = m.id)
>                ->  Hash Join  (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.228..34.993 rows=2625 loops=1)
>                      Hash Cond: (ms.vslid = vsl.id)
>                      ->  Seq Scan on marinerstates ms  (cost=0.00..2889.32 rows=4590 width=8) (actual
time=0.011..33.494rows=2625 loops=1) 
>                            Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date)
AND((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) 
>                      ->  Hash  (cost=16.72..16.72 rows=272 width=4) (actual time=0.207..0.207 rows=272 loops=1)
>                            Buckets: 1024  Batches: 1  Memory Usage: 10kB
>                            ->  Seq Scan on vessels vsl  (cost=0.00..16.72 rows=272 width=4) (actual time=0.004..0.118
rows=272loops=1) 
>                ->  Hash  (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10424.994..10424.994 rows=12832
loops=1)
>                      Buckets: 1024  Batches: 1  Memory Usage: 702kB
>                      ->  Seq Scan on mariner m  (cost=7614.86..8172.57 rows=7143 width=23) (actual
time=10409.498..10419.971rows=12832 loops=1) 
>                            Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text))
>                            SubPlan 1
>                              ->  Unique  (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.495..10408.446
rows=1454loops=1) 
>                                    ->  Nested Loop  (cost=2768.00..7614.86 rows=1 width=4) (actual
time=87.493..10407.517rows=1835 loops=1) 
>                                          Join Filter: (msold.marinerid = mold.id)
>                                          ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..1728.60
rows=14286width=4) (actual time=0.007..13.931 rows=14286 loops=1) 
>                                                Filter: ((marinertype)::text = 'Mariner'::text)
>                                          ->  Materialize  (cost=2768.00..5671.97 rows=1 width=8) (actual
time=0.003..0.330rows=1876 loops=14286) 
>                                                ->  Nested Loop  (cost=2768.00..5671.96 rows=1 width=8) (actual
time=39.723..85.401rows=1876 loops=1) 
>                                                      ->  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12)
(actualtime=39.708..81.501 rows=1876 loops=1) 
>                                                            Hash Cond: (msold.marinerid = msold2.marinerid)
>                                                            Join Filter: ((msold2.id <> msold.id) AND
(msold2.starttime< msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) 
>                                                            ->  Seq Scan on marinerstates msold  (cost=0.00..2889.32
rows=4590width=20) (actual time=0.003..33.952 rows=2625 loops=1) 
>                                                                  Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) 
>                                                            ->  Hash  (cost=2251.66..2251.66 rows=41307 width=24)
(actualtime=39.613..39.613 rows=41250 loops=1) 
>                                                                  Buckets: 8192  Batches: 1  Memory Usage: 2246kB
>                                                                  ->  Seq Scan on marinerstates msold2
(cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.882 rows=41250 loops=1) 
>                                                                        Filter: ((state)::text = 'Active'::text)
>                                                      ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.28rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1876) 
>                                                            Index Cond: (vslold.id = msold.vslid)
>  Total runtime: 10463.619 ms
> (37 rows)
>
> Fast plan in 8.3.13 :
>                                                                                                           QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Unique  (cost=633677.56..633700.48 rows=1834 width=23) (actual time=543.684..551.003 rows=603 loops=1)
>    ->  Sort  (cost=633677.56..633682.14 rows=1834 width=23) (actual time=543.676..546.070 rows=603 loops=1)
>          Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
>          Sort Method:  quicksort  Memory: 53kB
>          ->  Hash Join  (cost=630601.65..633578.15 rows=1834 width=23) (actual time=439.969..540.573 rows=603
loops=1)
>                Hash Cond: (ms.vslid = vsl.id)
>                ->  Hash Join  (cost=630580.33..633530.01 rows=2261 width=27) (actual time=437.459..532.847 rows=603
loops=1)
>                      Hash Cond: (ms.marinerid = m.id)
>                      ->  Seq Scan on marinerstates ms  (cost=0.00..2875.32 rows=4599 width=8) (actual
time=0.017..80.153rows=2625 loops=1) 
>                            Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date)
AND((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) 
>                      ->  Hash  (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307
rows=12832loops=1) 
>                            ->  Index Scan using mariner_pkey on mariner m  (cost=628776.89..630491.54 rows=7103
width=23)(actual time=311.023..380.168 rows=12832 loops=1) 
>                                  Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text))
>                                  SubPlan
>                                    ->  Unique  (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981
rows=1454loops=1) 
>                                          ->  Nested Loop  (cost=0.00..628767.72 rows=1834 width=4) (actual
time=0.120..289.961rows=1835 loops=1) 
>                                                ->  Nested Loop  (cost=0.00..627027.98 rows=1865 width=4) (actual
time=0.099..237.128rows=1876 loops=1) 
>                                                      ->  Index Scan using marinerstates_marinerid on marinerstates
msold (cost=0.00..626316.07 rows=2299 width=8) (actual time=0.079..186.150 rows=1876 loops=1) 
>                                                            Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan)) 
>                                                            SubPlan
>                                                              ->  Bitmap Heap Scan on marinerstates msold2
(cost=4.28..12.11rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=2625) 
>                                                                    Recheck Cond: ((marinerid = $0) AND (starttime <
$2))
>                                                                    Filter: ((id <> $1) AND ((state)::text =
'Active'::text)AND (($2 - endtime) <= '1 year 6 mons'::interval)) 
>                                                                    ->  Bitmap Index Scan on
marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) (actual time=0.009..0.009 rows=6 loops=2625) 
>                                                                          Index Cond: ((marinerid = $0) AND (starttime
<$2)) 
>                                                      ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.30rows=1 width=4) (actual time=0.006..0.010 rows=1 loops=1876) 
>                                                            Index Cond: (vslold.id = msold.vslid)
>                                                ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..0.92
rows=1width=4) (actual time=0.007..0.012 rows=1 loops=1876) 
>                                                      Index Cond: (mold.id = msold.marinerid)
>                                                      Filter: ((mold.marinertype)::text = 'Mariner'::text)
>                ->  Hash  (cost=17.81..17.81 rows=281 width=4) (actual time=2.491..2.491 rows=273 loops=1)
>                      ->  Seq Scan on vessels vsl  (cost=0.00..17.81 rows=281 width=4) (actual time=0.012..1.306
rows=273loops=1) 
>  Total runtime: 553.601 ms
> (33 rows)
>
> Is there any other data i could post (pg_stat,...) that would help?
>
> thanx a lot.
>
>
>> Regards
>>
>>
>> Ing. Marcos Luís Ortíz Valmaseda
>> Linux User # 418229 && PostgreSQL DBA
>> Centro de Tecnologías Gestión de Datos (DATEC)
>> http://postgresql.uci.cu
>> http://www.postgresql.org
>> http://it.toolbox.com/blogs/sql-apprentice
>>
>>
>
>
>
> --
> Achilleas Mantzios
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
Achilleas, here is the slow part from 9.02:

                                                     ->  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12)
(actualtime=39.708..81.501 rows=1876 loops=1) 
                                                           Hash Cond: (msold.marinerid = msold2.marinerid)
                                                           Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime
<msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) 
                                                           ->  Seq Scan on marinerstates msold  (cost=0.00..2889.32
rows=4590width=20) (actual time=0.003..33.952 rows=2625 loops=1) 
                                                                 Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) 
                                                           ->  Hash  (cost=2251.66..2251.66 rows=41307 width=24)
(actualtime=39.613..39.613 rows=41250 loops=1) 
                                                                 Buckets: 8192  Batches: 1  Memory Usage: 2246kB
                                                                 ->  Seq Scan on marinerstates msold2
(cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.882  


The same part from 8.3.3 looks like this:

Seq Scan on marinerstates ms  (cost=0.00..2875.32 rows=4599 width=8) (actual time=0.017..80.153 rows=2625 loops=1)
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND
((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) 
                     ->  Hash  (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307 rows=12832
loops=1)
                           ->  Index Scan using mariner_pkey on mariner m  (cost=628776.89..630491.54 rows=7103
width=23)(actual time=311.023..380.168 rows=12832 loops=1) 
                                 Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text))
                                 SubPlan
                                   ->  Unique  (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981
rows=1454loops=1) 
                                         ->  Nested Loop  (cost=0.00..628767.72 rows=1834 width=4) (actual
time=0.120..289.961rows=1835 loops=1) 


This leads me to the conclusion that the queries differ significantly.
8.3.3 mentions NOT hashed plan, I don't see it in 9.02 and the filtering
conditions look differently. Are you sure that the plans are from the
same query?

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Achilleas Mantzios

Дата:

19 января 2011 г., 08:10:17

Στις Tuesday 18 January 2011 16:26:21 ο/η Mladen Gogala έγραψε:

> This leads me to the conclusion that the queries differ significantly.
> 8.3.3 mentions NOT hashed plan, I don't see it in 9.02 and the filtering
> conditions look differently. Are you sure that the plans are from the
> same query?

First the num of rows in the two portions are different so you might be comparing apples and oranges here.
Anyway, i will repost the EXPLAIN plans by copying pasting the query, without the analyze part.

8.3.13

Unique  (cost=633677.56..633700.48 rows=1834 width=23)
   ->  Sort  (cost=633677.56..633682.14 rows=1834 width=23)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
         ->  Hash Join  (cost=630601.65..633578.15 rows=1834 width=23)
               Hash Cond: (ms.vslid = vsl.id)
               ->  Hash Join  (cost=630580.33..633530.01 rows=2261 width=27)
                     Hash Cond: (ms.marinerid = m.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2875.32 rows=4599 width=8)
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND
((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) 
                     ->  Hash  (cost=630491.54..630491.54 rows=7103 width=23)
                           ->  Index Scan using mariner_pkey on mariner m  (cost=628776.89..630491.54 rows=7103
width=23)
                                 Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text))
                                 SubPlan
                                   ->  Unique  (cost=0.00..628772.30 rows=1834 width=4)
                                         ->  Nested Loop  (cost=0.00..628767.72 rows=1834 width=4)
                                               ->  Nested Loop  (cost=0.00..627027.98 rows=1865 width=4)
                                                     ->  Index Scan using marinerstates_marinerid on marinerstates
msold (cost=0.00..626316.07 rows=2299 width=8) 
                                                           Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan)) 
                                                           SubPlan
                                                             ->  Bitmap Heap Scan on marinerstates msold2
(cost=4.28..12.11rows=1 width=0) 
                                                                   Recheck Cond: ((marinerid = $0) AND (starttime <
$2))
                                                                   Filter: ((id <> $1) AND ((state)::text =
'Active'::text)AND (($2 - endtime) <= '1 year 6 mons'::interval)) 
                                                                   ->  Bitmap Index Scan on
marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) 
                                                                         Index Cond: ((marinerid = $0) AND (starttime <
$2))
                                                     ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.30rows=1 width=4) 
                                                           Index Cond: (vslold.id = msold.vslid)
                                               ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..0.92
rows=1width=4) 
                                                     Index Cond: (mold.id = msold.marinerid)
                                                     Filter: ((mold.marinertype)::text = 'Mariner'::text)
               ->  Hash  (cost=17.81..17.81 rows=281 width=4)
                     ->  Seq Scan on vessels vsl  (cost=0.00..17.81 rows=281 width=4)
(31 rows)

9.0.2

Unique  (cost=11525.09..11571.55 rows=3717 width=23)
   ->  Sort  (cost=11525.09..11534.38 rows=3717 width=23)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
         ->  Hash Join  (cost=8281.98..11304.67 rows=3717 width=23)
               Hash Cond: (ms.marinerid = m.id)
               ->  Hash Join  (cost=20.12..2963.83 rows=3717 width=4)
                     Hash Cond: (ms.vslid = vsl.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2889.32 rows=4590 width=8)
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND
((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) 
                     ->  Hash  (cost=16.72..16.72 rows=272 width=4)
                           ->  Seq Scan on vessels vsl  (cost=0.00..16.72 rows=272 width=4)
               ->  Hash  (cost=8172.57..8172.57 rows=7143 width=23)
                     ->  Seq Scan on mariner m  (cost=7614.86..8172.57 rows=7143 width=23)
                           Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text))
                           SubPlan 1
                             ->  Unique  (cost=2768.00..7614.86 rows=1 width=4)
                                   ->  Nested Loop  (cost=2768.00..7614.86 rows=1 width=4)
                                         Join Filter: (msold.marinerid = mold.id)
                                         ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..1728.60
rows=14286width=4) 
                                               Filter: ((marinertype)::text = 'Mariner'::text)
                                         ->  Materialize  (cost=2768.00..5671.97 rows=1 width=8)
                                               ->  Nested Loop  (cost=2768.00..5671.96 rows=1 width=8)
                                                     ->  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12)
                                                           Hash Cond: (msold.marinerid = msold2.marinerid)
                                                           Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime
<msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) 
                                                           ->  Seq Scan on marinerstates msold  (cost=0.00..2889.32
rows=4590width=20) 
                                                                 Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) 
                                                           ->  Hash  (cost=2251.66..2251.66 rows=41307 width=24)
                                                                 ->  Seq Scan on marinerstates msold2
(cost=0.00..2251.66rows=41307 width=24) 
                                                                       Filter: ((state)::text = 'Active'::text)
                                                     ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.28rows=1 width=4) 
                                                           Index Cond: (vslold.id = msold.vslid)
(32 rows)



>
> --
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
>
>



--
Achilleas Mantzios

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Tom Lane

Дата:

19 января 2011 г., 16:27:07

Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> Anyway, i will repost the EXPLAIN plans by copying pasting the query, without the analyze part.

Please show EXPLAIN ANALYZE, not just EXPLAIN, results.  When
complaining that the planner did the wrong thing, it's not very helpful
to see only its estimates and not reality.

            regards, tom lane

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Achilleas Mantzios

Дата:

20 января 2011 г., 06:05:37

Στις Wednesday 19 January 2011 19:26:56 ο/η Tom Lane έγραψε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> > Anyway, i will repost the EXPLAIN plans by copying pasting the query, without the analyze part.
>
> Please show EXPLAIN ANALYZE, not just EXPLAIN, results.  When
> complaining that the planner did the wrong thing, it's not very helpful
> to see only its estimates and not reality.

I did so two posts before but one more won't do any harm. Here we go:

9.0.2 - SLOW

                                                                                                      QUERY PLAN
                                                                                               

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10439.797..10440.152 rows=603 loops=1)
   ->  Sort  (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10439.795..10439.905 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
         Sort Method:  quicksort  Memory: 71kB
         ->  Hash Join  (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10402.338..10438.875 rows=603 loops=1)
               Hash Cond: (ms.marinerid = m.id)
               ->  Hash Join  (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.228..35.178 rows=2625 loops=1)
                     Hash Cond: (ms.vslid = vsl.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2889.32 rows=4590 width=8) (actual
time=0.015..33.634rows=2625 loops=1) 
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND
((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) 
                     ->  Hash  (cost=16.72..16.72 rows=272 width=4) (actual time=0.203..0.203 rows=272 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 10kB
                           ->  Seq Scan on vessels vsl  (cost=0.00..16.72 rows=272 width=4) (actual time=0.004..0.117
rows=272loops=1) 
               ->  Hash  (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10402.075..10402.075 rows=12832
loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 702kB
                     ->  Seq Scan on mariner m  (cost=7614.86..8172.57 rows=7143 width=23) (actual
time=10386.549..10397.193rows=12832 loops=1) 
                           Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text))
                           SubPlan 1
                             ->  Unique  (cost=2768.00..7614.86 rows=1 width=4) (actual time=86.937..10385.379
rows=1454loops=1) 
                                   ->  Nested Loop  (cost=2768.00..7614.86 rows=1 width=4) (actual
time=86.936..10384.555rows=1835 loops=1) 
                                         Join Filter: (msold.marinerid = mold.id)
                                         ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..1728.60
rows=14286width=4) (actual time=0.007..14.250 rows=14286 loops=1) 
                                               Filter: ((marinertype)::text = 'Mariner'::text)
                                         ->  Materialize  (cost=2768.00..5671.97 rows=1 width=8) (actual
time=0.003..0.328rows=1876 loops=14286) 
                                               ->  Nested Loop  (cost=2768.00..5671.96 rows=1 width=8) (actual
time=39.259..84.889rows=1876 loops=1) 
                                                     ->  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12)
(actualtime=39.249..81.025 rows=1876 loops=1) 
                                                           Hash Cond: (msold.marinerid = msold2.marinerid)
                                                           Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime
<msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) 
                                                           ->  Seq Scan on marinerstates msold  (cost=0.00..2889.32
rows=4590width=20) (actual time=0.003..33.964 rows=2625 loops=1) 
                                                                 Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) 
                                                           ->  Hash  (cost=2251.66..2251.66 rows=41307 width=24)
(actualtime=39.156..39.156 rows=41250 loops=1) 
                                                                 Buckets: 8192  Batches: 1  Memory Usage: 2246kB
                                                                 ->  Seq Scan on marinerstates msold2
(cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.552 rows=41250 loops=1) 
                                                                       Filter: ((state)::text = 'Active'::text)
                                                     ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.28rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1876) 
                                                           Index Cond: (vslold.id = msold.vslid)
 Total runtime: 10440.690 ms
(37 rows)


8.3.13 - FAST

                                                                                                          QUERY PLAN
                                                                                                       

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=633677.56..633700.48 rows=1834 width=23) (actual time=551.166..558.487 rows=603 loops=1)
   ->  Sort  (cost=633677.56..633682.14 rows=1834 width=23) (actual time=551.156..553.548 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
         Sort Method:  quicksort  Memory: 53kB
         ->  Hash Join  (cost=630601.65..633578.15 rows=1834 width=23) (actual time=447.773..547.934 rows=603 loops=1)
               Hash Cond: (ms.vslid = vsl.id)
               ->  Hash Join  (cost=630580.33..633530.01 rows=2261 width=27) (actual time=445.320..540.291 rows=603
loops=1)
                     Hash Cond: (ms.marinerid = m.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2875.32 rows=4599 width=8) (actual
time=0.018..79.742rows=2625 loops=1) 
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND
((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) 
                     ->  Hash  (cost=630491.54..630491.54 rows=7103 width=23) (actual time=445.216..445.216 rows=12832
loops=1)
                           ->  Index Scan using mariner_pkey on mariner m  (cost=628776.89..630491.54 rows=7103
width=23)(actual time=319.675..388.383 rows=12832 loops=1) 
                                 Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text))
                                 SubPlan
                                   ->  Unique  (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.196..312.728
rows=1454loops=1) 
                                         ->  Nested Loop  (cost=0.00..628767.72 rows=1834 width=4) (actual
time=0.187..298.780rows=1835 loops=1) 
                                               ->  Nested Loop  (cost=0.00..627027.98 rows=1865 width=4) (actual
time=0.165..244.706rows=1876 loops=1) 
                                                     ->  Index Scan using marinerstates_marinerid on marinerstates
msold (cost=0.00..626316.07 rows=2299 width=8) (actual time=0.138..194.165 rows=1876 loops=1) 
                                                           Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan)) 
                                                           SubPlan
                                                             ->  Bitmap Heap Scan on marinerstates msold2
(cost=4.28..12.11rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=2625) 
                                                                   Recheck Cond: ((marinerid = $0) AND (starttime <
$2))
                                                                   Filter: ((id <> $1) AND ((state)::text =
'Active'::text)AND (($2 - endtime) <= '1 year 6 mons'::interval)) 
                                                                   ->  Bitmap Index Scan on
marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) (actual time=0.009..0.009 rows=6 loops=2625) 
                                                                         Index Cond: ((marinerid = $0) AND (starttime <
$2))
                                                     ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.30rows=1 width=4) (actual time=0.006..0.010 rows=1 loops=1876) 
                                                           Index Cond: (vslold.id = msold.vslid)
                                               ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..0.92
rows=1width=4) (actual time=0.008..0.012 rows=1 loops=1876) 
                                                     Index Cond: (mold.id = msold.marinerid)
                                                     Filter: ((mold.marinertype)::text = 'Mariner'::text)
               ->  Hash  (cost=17.81..17.81 rows=281 width=4) (actual time=2.432..2.432 rows=273 loops=1)
                     ->  Seq Scan on vessels vsl  (cost=0.00..17.81 rows=281 width=4) (actual time=0.033..1.220
rows=273loops=1) 
 Total runtime: 561.208 ms
(33 rows)



>
>             regards, tom lane
>



--
Achilleas Mantzios

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Robert Haas

Дата:

21 января 2011 г., 16:09:21

On Thu, Jan 20, 2011 at 2:05 AM, Achilleas Mantzios
<achill@matrix.gatewaynet.com> wrote:
>                                                     ->  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12)
(actualtime=39.249..81.025 rows=1876 loops=1) 
>                                                           Hash Cond: (msold.marinerid = msold2.marinerid)
>                                                           Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime
<msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) 
>                                                           ->  Seq Scan on marinerstates msold  (cost=0.00..2889.32
rows=4590width=20) (actual time=0.003..33.964 rows=2625 loops=1) 
>                                                                 Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) 
>                                                           ->  Hash  (cost=2251.66..2251.66 rows=41307 width=24)
(actualtime=39.156..39.156 rows=41250 loops=1) 
>                                                                 Buckets: 8192  Batches: 1  Memory Usage: 2246kB
>                                                                 ->  Seq Scan on marinerstates msold2
 (cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.552 rows=41250 loops=1) 
>                                                                       Filter: ((state)::text = 'Active'::text)

Looks like the bad selectivity estimate there is what's killing it.
Not sure I completely understand why 9.0.2 is coming up with such a
bad estimate, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Mladen Gogala

Дата:

21 января 2011 г., 16:42:46

On 1/21/2011 12:09 PM, Robert Haas wrote:
> Looks like the bad selectivity estimate there is what's killing it.
> Not sure I completely understand why 9.0.2 is coming up with such a
> bad estimate, though.
>

I would recommend setting default_statistics_target to 1024 and
effective cache size to 20480MB and see what happens.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Robert Haas

Дата:

21 января 2011 г., 16:51:28

On Fri, Jan 21, 2011 at 12:42 PM, Mladen Gogala
<mladen.gogala@vmsinfo.com> wrote:
> On 1/21/2011 12:09 PM, Robert Haas wrote:
>>
>> Looks like the bad selectivity estimate there is what's killing it.
>> Not sure I completely understand why 9.0.2 is coming up with such a
>> bad estimate, though.
>>
>
> I would recommend setting default_statistics_target to 1024 and effective
> cache size to 20480MB and see what happens.

I am starting to suspect that there is a bug in the join selectivity
logic in 9.0.  We've had a few complaints where the join was projected
to return more rows than the product of the inner side and outer side
of the join, which is clearly nonsense.  I read the function and I
don't see anything weird... and it clearly can't be too bad or we
would have had more complaints... but...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Mladen Gogala

Дата:

21 января 2011 г., 17:13:03

On 1/21/2011 12:51 PM, Robert Haas wrote:
> I am starting to suspect that there is a bug in the join selectivity
> logic in 9.0.  We've had a few complaints where the join was projected
> to return more rows than the product of the inner side and outer side
> of the join, which is clearly nonsense.  I read the function and I
> don't see anything weird... and it clearly can't be too bad or we
> would have had more complaints... but...

Well the way to test it would be to take the function from 8.3, input
the same arguments and see if there is any difference with the results.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Tom Lane

Дата:

21 января 2011 г., 18:26:58

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jan 20, 2011 at 2:05 AM, Achilleas Mantzios
> <achill@matrix.gatewaynet.com> wrote:
>>     ->  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12) (actual time=39.249..81.025 rows=1876 loops=1)
>>           Hash Cond: (msold.marinerid = msold2.marinerid)
>>           Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime < msold.starttime) AND ((msold.starttime -
msold2.endtime)<= '1 year 6 mons'::interval)) 

> Looks like the bad selectivity estimate there is what's killing it.
> Not sure I completely understand why 9.0.2 is coming up with such a
> bad estimate, though.

Hm ... it's the <> clause.  Look at this, in the regression database:

regression=# explain analyze select * from tenk1 a where exists(select 1 from tenk1 b where a.hundred = b.hundred);
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.00..1134.65 rows=10000 width=244) (actual time=0.362..960.732 rows=10000 loops=1)
   ->  Seq Scan on tenk1 a  (cost=0.00..458.00 rows=10000 width=244) (actual time=0.070..45.287 rows=10000 loops=1)
   ->  Index Scan using tenk1_hundred on tenk1 b  (cost=0.00..2.16 rows=100 width=4) (actual time=0.073..0.073 rows=1
loops=10000)
         Index Cond: (hundred = a.hundred)
 Total runtime: 996.990 ms
(5 rows)

regression=# explain analyze select * from tenk1 a where exists(select 1 from tenk1 b where a.hundred = b.hundred and
a.thousand<> b.thousand); 
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=583.00..1078.50 rows=1 width=244) (actual time=142.738..344.823 rows=10000 loops=1)
   Hash Cond: (a.hundred = b.hundred)
   Join Filter: (a.thousand <> b.thousand)
   ->  Seq Scan on tenk1 a  (cost=0.00..458.00 rows=10000 width=244) (actual time=0.051..44.137 rows=10000 loops=1)
   ->  Hash  (cost=458.00..458.00 rows=10000 width=8) (actual time=142.526..142.526 rows=10000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 313kB
         ->  Seq Scan on tenk1 b  (cost=0.00..458.00 rows=10000 width=8) (actual time=0.027..71.778 rows=10000 loops=1)
 Total runtime: 384.017 ms
(8 rows)

(This is with enable_hashagg off, to make the two plans more obviously
comparable; but that's cosmetic.  The important point is that the join
rowcount estimate is dead on in the first case and dead wrong in the
second.)

Some digging turns up the fact that the semi-join selectivity of
"a.thousand <> b.thousand" is being estimated at *zero*.  This is
because the semi-join selectivity of "a.thousand = b.thousand" is
estimated at 1.0 (correctly: every row of a has at least one join
partner in b).  And then neqjoinsel is computed as 1 - eqjoinsel,
which is a false conclusion for semijoins: joining to at least one row
doesn't mean joining to every row.

I'm a bit inclined to fix this by having neqjoinsel hard-wire a result
of 1 for semi and anti join cases --- that is, assume there's always
at least one inner row that isn't equal to the outer row.  That's
presumably too high for real-world cases where the clause is probably
being used together with other, correlated, clauses; but we've got no
info available that would help narrow that down.  The best we can do
here is a forced estimate.  If it should be less than 1, then what?

            regards, tom lane

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

"Kevin Grittner"

Дата:

21 января 2011 г., 19:00:55

Tom Lane <tgl@sss.pgh.pa.us> wrote:

> If it should be less than 1, then what?

1 - (estimated tuples / estimated distinct values) ?

-Kevin

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Tom Lane

Дата:

21 января 2011 г., 19:22:36

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If it should be less than 1, then what?

> 1 - (estimated tuples / estimated distinct values) ?

Uh, no.  The number we're after is the probability that an outer tuple
has at least one unequal value in the inner relation.  This is not 1
minus the probability that a *specific* inner value is equal, which is
what I think your formula is estimating.

            regards, tom lane

Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

От

Achilleas Mantzios

Дата:

24 января 2011 г., 14:40:50

Στις Friday 21 January 2011 22:22:24 ο/η Tom Lane έγραψε:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> If it should be less than 1, then what?
>
> > 1 - (estimated tuples / estimated distinct values) ?
>
> Uh, no.  The number we're after is the probability that an outer tuple
> has at least one unequal value in the inner relation.  This is not 1
> minus the probability that a *specific* inner value is equal, which is
> what I think your formula is estimating.

Isn't this probablity (an outer tuple has at least one unequal value in the inner relation)
= 1 - (probability that all values in the inner relation are equal to the value of the outer tuple)

Anyways, glad to see smth came out of this.
Thx

>
>             regards, tom lane
>



--
Achilleas Mantzios

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2