Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

Поиск
Список
Период
Сортировка
От Achilleas Mantzios
Тема Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Дата
Msg-id 201101181006.03790.achill@matrix.gatewaynet.com
обсуждение исходный текст
Ответ на Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2  ("Ing. Marcos Ortiz Valmaseda" <mlortiz@uci.cu>)
Ответы Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Список pgsql-performance
Thanx,

Στις Monday 17 January 2011 18:52:27 ο/η Ing. Marcos Ortiz Valmaseda έγραψε:
>
> Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS
> instead NOT IN, because the first clause has a better performance. So, you can use it on that way.
>
You mean this?
(from 8.4 changes)
"Create explicit concepts of semi-joins and anti-joins (Tom)
       This work formalizes our previous ad-hoc treatment of IN (SELECT
       ...) clauses, and extends it to EXISTS and NOT EXISTS clauses. It
       should result in significantly better planning of EXISTS and NOT
       EXISTS queries. In general, logically equivalent IN and EXISTS
       clauses should now have similar performance, whereas previously IN
       often won."

I haven't found any other recent reference to this issue. And this is far from what you suggest.
Here the entry talks about "similar" performance.

Also a similar issue was hot back in 7.4 days :
"IN / NOT IN subqueries are now much more efficient
          In previous releases, IN/NOT IN subqueries were joined to the
          upper query by sequentially scanning the subquery looking for a
          match. The 7.4 code uses the same sophisticated techniques used
          by ordinary joins and so is much faster. An IN will now usually
          be as fast as or faster than an equivalent EXISTS subquery; this
          reverses the conventional wisdom that applied to previous
          releases."

> Other questions?
> - Do you have a partial index on marinerstates.marinerid where this condition is accomplished?
No, but i just tried it (on state='Active') with no impact.
> - Do you have a index on mariner.id?
Yes, It is the primary key.
> - Can you provide a explain of these queries on the PostgreSQL-9.0 machines?
Sure, first i'll post the table definitions and then some stats and then the epxlain analyze(s)

mariner
=====
 id                           | integer                  | not null default
nextval(('public.mariner_id_seq'::text)::regclass)
 givenname                    | character varying(200)   |
 midname                      | character varying(100)   |
 surname                      | character varying(200)   | not null
...
Indexes:
    "mariner_pkey" PRIMARY KEY, btree (id)
    "mariner_smauid" UNIQUE, btree (smauid)
    "mariner_username_key" UNIQUE, btree (username)
    "mariner_nationalityid" btree (nationalityid)
    "mariner_parentid" btree (parentid)
    "mariner_surname" btree (surname)

marinerstates
========
 id                     | integer                  | not null default
nextval(('public.marinerstates_id_seq'::text)::regclass)
 marinerid              | integer                  | not null
 state                  | character varying(20)    | not null
 vslid                  | integer                  |
 leave_period_days      | integer                  |
 comment                | text                     |
 starttime              | timestamp with time zone | not null
 endtime                | timestamp with time zone |
 trid                   | integer                  |
 sal_bw                 | real                     | not null default 0.0
 sal_ot                 | real                     | not null default 0.0
 sal_lp                 | real                     | not null default 0.0
 sal_misc               | real                     | not null default 0.0
 rankid                 | integer                  |
 system_vslid           | integer                  |
 startport              | text                     |
 endport                | text                     |
.....
Indexes:
    "marinerstates_pkey" PRIMARY KEY, btree (id)
    "marinerstates_mariner_cur_state" UNIQUE, btree (marinerid) WHERE endtime IS NULL
    "marinerstates_system_vslid" UNIQUE, btree (marinerid, system_vslid)
    "marinerstates__system_vslid" btree (system_vslid)
    "marinerstates_cur_mariners_states" btree (endtime) WHERE endtime IS NULL
    "marinerstates_mariner_past_state" btree (marinerid, starttime, endtime) WHERE endtime IS NOT NULL
    "marinerstates_marinerid" btree (marinerid)
    "marinerstates_marinerid_starttime" btree (marinerid, starttime)
    "marinerstates_rankid" btree (rankid)
    "marinerstates_rankid_cur_mariners" btree (rankid) WHERE endtime IS NULL
    "marinerstates_rankid_past_state" btree (rankid, starttime, endtime) WHERE endtime IS NOT NULL
    "marinerstates_state" btree (state)
    "marinerstates_state_cur_mariners" btree (state) WHERE endtime IS NULL
    "marinerstates_state_past_state" btree (state, starttime, endtime) WHERE endtime IS NOT NULL
    "marinerstates_vslid" btree (vslid)
    "marinerstates_vslid_cur_mariners" btree (vslid) WHERE endtime IS NULL
    "marinerstates_vslid_past_state" btree (vslid, starttime, endtime) WHERE endtime IS NOT NULL

vessels
=====
 name               | character varying(200) | not null
 id                 | integer                | not null default nextval(('public.vessels_id_seq'::text)::regclass)
...
Indexes:
    "vessels_pkey" PRIMARY KEY, btree (id)
    "vessels_name_key" UNIQUE, btree (name)
    "idx_name" btree (name)
    "vessels_flag" btree (flag)
    "vessels_groupno" btree (groupno)
    "vessels_vslstatus_idx" btree (vslstatus)

dynacom=# SELECT count(*) from mariner;
 count
-------
 14447

dynacom=# SELECT count(*) from marinerstates;
 count
-------
 51013

dynacom=# SELECT avg(marqry.cnt),stddev(marqry.cnt) FROM (SELECT m.id,count(ms.id) as cnt from mariner m, marinerstates
msWHERE m.id=ms.marinerid group by m.id) AS marqry; 
        avg         |       stddev
--------------------+--------------------
 3.5665944207508914 | 4.4416879361829170

(vessels do not play any impact in the query, so i'll leave them out)

Slow plan in 9.0.2 :
                                                                                                      QUERY PLAN
                                                                                               

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10462.561..10462.937 rows=603 loops=1)
   ->  Sort  (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10462.560..10462.664 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
         Sort Method:  quicksort  Memory: 71kB
         ->  Hash Join  (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10425.261..10461.621 rows=603 loops=1)
               Hash Cond: (ms.marinerid = m.id)
               ->  Hash Join  (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.228..34.993 rows=2625 loops=1)
                     Hash Cond: (ms.vslid = vsl.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2889.32 rows=4590 width=8) (actual
time=0.011..33.494rows=2625 loops=1) 
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND
((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) 
                     ->  Hash  (cost=16.72..16.72 rows=272 width=4) (actual time=0.207..0.207 rows=272 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 10kB
                           ->  Seq Scan on vessels vsl  (cost=0.00..16.72 rows=272 width=4) (actual time=0.004..0.118
rows=272loops=1) 
               ->  Hash  (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10424.994..10424.994 rows=12832
loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 702kB
                     ->  Seq Scan on mariner m  (cost=7614.86..8172.57 rows=7143 width=23) (actual
time=10409.498..10419.971rows=12832 loops=1) 
                           Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text))
                           SubPlan 1
                             ->  Unique  (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.495..10408.446
rows=1454loops=1) 
                                   ->  Nested Loop  (cost=2768.00..7614.86 rows=1 width=4) (actual
time=87.493..10407.517rows=1835 loops=1) 
                                         Join Filter: (msold.marinerid = mold.id)
                                         ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..1728.60
rows=14286width=4) (actual time=0.007..13.931 rows=14286 loops=1) 
                                               Filter: ((marinertype)::text = 'Mariner'::text)
                                         ->  Materialize  (cost=2768.00..5671.97 rows=1 width=8) (actual
time=0.003..0.330rows=1876 loops=14286) 
                                               ->  Nested Loop  (cost=2768.00..5671.96 rows=1 width=8) (actual
time=39.723..85.401rows=1876 loops=1) 
                                                     ->  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12)
(actualtime=39.708..81.501 rows=1876 loops=1) 
                                                           Hash Cond: (msold.marinerid = msold2.marinerid)
                                                           Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime
<msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) 
                                                           ->  Seq Scan on marinerstates msold  (cost=0.00..2889.32
rows=4590width=20) (actual time=0.003..33.952 rows=2625 loops=1) 
                                                                 Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) 
                                                           ->  Hash  (cost=2251.66..2251.66 rows=41307 width=24)
(actualtime=39.613..39.613 rows=41250 loops=1) 
                                                                 Buckets: 8192  Batches: 1  Memory Usage: 2246kB
                                                                 ->  Seq Scan on marinerstates msold2
(cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.882 rows=41250 loops=1) 
                                                                       Filter: ((state)::text = 'Active'::text)
                                                     ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.28rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1876) 
                                                           Index Cond: (vslold.id = msold.vslid)
 Total runtime: 10463.619 ms
(37 rows)

Fast plan in 8.3.13 :
                                                                                                          QUERY PLAN
                                                                                                       

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=633677.56..633700.48 rows=1834 width=23) (actual time=543.684..551.003 rows=603 loops=1)
   ->  Sort  (cost=633677.56..633682.14 rows=1834 width=23) (actual time=543.676..546.070 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
         Sort Method:  quicksort  Memory: 53kB
         ->  Hash Join  (cost=630601.65..633578.15 rows=1834 width=23) (actual time=439.969..540.573 rows=603 loops=1)
               Hash Cond: (ms.vslid = vsl.id)
               ->  Hash Join  (cost=630580.33..633530.01 rows=2261 width=27) (actual time=437.459..532.847 rows=603
loops=1)
                     Hash Cond: (ms.marinerid = m.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2875.32 rows=4599 width=8) (actual
time=0.017..80.153rows=2625 loops=1) 
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND
((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) 
                     ->  Hash  (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307 rows=12832
loops=1)
                           ->  Index Scan using mariner_pkey on mariner m  (cost=628776.89..630491.54 rows=7103
width=23)(actual time=311.023..380.168 rows=12832 loops=1) 
                                 Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text))
                                 SubPlan
                                   ->  Unique  (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981
rows=1454loops=1) 
                                         ->  Nested Loop  (cost=0.00..628767.72 rows=1834 width=4) (actual
time=0.120..289.961rows=1835 loops=1) 
                                               ->  Nested Loop  (cost=0.00..627027.98 rows=1865 width=4) (actual
time=0.099..237.128rows=1876 loops=1) 
                                                     ->  Index Scan using marinerstates_marinerid on marinerstates
msold (cost=0.00..626316.07 rows=2299 width=8) (actual time=0.079..186.150 rows=1876 loops=1) 
                                                           Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan)) 
                                                           SubPlan
                                                             ->  Bitmap Heap Scan on marinerstates msold2
(cost=4.28..12.11rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=2625) 
                                                                   Recheck Cond: ((marinerid = $0) AND (starttime <
$2))
                                                                   Filter: ((id <> $1) AND ((state)::text =
'Active'::text)AND (($2 - endtime) <= '1 year 6 mons'::interval)) 
                                                                   ->  Bitmap Index Scan on
marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) (actual time=0.009..0.009 rows=6 loops=2625) 
                                                                         Index Cond: ((marinerid = $0) AND (starttime <
$2))
                                                     ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.30rows=1 width=4) (actual time=0.006..0.010 rows=1 loops=1876) 
                                                           Index Cond: (vslold.id = msold.vslid)
                                               ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..0.92
rows=1width=4) (actual time=0.007..0.012 rows=1 loops=1876) 
                                                     Index Cond: (mold.id = msold.marinerid)
                                                     Filter: ((mold.marinertype)::text = 'Mariner'::text)
               ->  Hash  (cost=17.81..17.81 rows=281 width=4) (actual time=2.491..2.491 rows=273 loops=1)
                     ->  Seq Scan on vessels vsl  (cost=0.00..17.81 rows=281 width=4) (actual time=0.012..1.306
rows=273loops=1) 
 Total runtime: 553.601 ms
(33 rows)

Is there any other data i could post (pg_stat,...) that would help?

thanx a lot.

>
> Regards
>
>
> Ing. Marcos Luís Ortíz Valmaseda
> Linux User # 418229 && PostgreSQL DBA
> Centro de Tecnologías Gestión de Datos (DATEC)
> http://postgresql.uci.cu
> http://www.postgresql.org
> http://it.toolbox.com/blogs/sql-apprentice
>



--
Achilleas Mantzios

В списке pgsql-performance по дате отправления:

Предыдущее
От: Fernando Hevia
Дата:
Сообщение: Re: Problem with query
Следующее
От: Devrim GÜNDÜZ
Дата:
Сообщение: Re: The good, old times