Обсуждение: Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
От
"Ing. Marcos Ortiz Valmaseda"
Дата:
Query is : SELECT distinct m.id,coalesce(m.givenname,''), coalesce(m.midname,''), m.surname from marinerstates ms,vessels vsl,mariner m WHERE m.id=ms.marinerid and ms.vslid=vsl.id AND ms.state='Active' and coalesce(ms.endtime,now())::date >= '2006-07-15' AND ms.starttime::date <= '2007-01-11' AND m.marinertype='Mariner' and m.id NOT IN (SELECT distinct mold.id FROM marinerstates msold, vessels vslold, mariner mold WHERE mold.id=msold.marinerid AND msold.vslid=vslold.id AND msold.state='Active' AND coalesce(msold.endtime,now())::date >= '2006-07-15' AND msold.starttime::date <= '2007-01-11' AND EXISTS (SELECT 1 FROM marinerstates msold2 WHERE msold2.marinerid=msold.marinerid AND msold2.state='Active' AND msold2.id <> msold.id AND msold2.starttime<msold.starttime AND (msold.starttime-msold2.endtime)<='18 months') AND mold.marinertype='Mariner' ) ORDER BY m.surname,coalesce(m.givenname,'') ,coalesce(m.midname,''); i get the following execution times: (with \timing) FBSD_DEV : query : 240.419 ms LINUX_PROD : query : 219.568 ms FBSD_TEST : query : 2285.509 ms LINUX_TEST : query : 5788.988 ms Re writing the query in the "NOT EXIST" variation like: SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels vsl,marinerm where m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date >= '2006-07-15' and ms.starttime::date <= '2007-01-11' and m.marinertype='Mariner' and NOT EXISTS (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold where mold.id=msold.marinerid and msold.vslid=vslold.idand msold.state='Active' and coalesce(msold.endtime,now())::date >= '2006-07-15' and msold.starttime::date <= '2007-01-11'and exists (select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and msold2.id<> msold.id and msold2.starttime<msold.starttime AND (msold.starttime-msold2.endtime)<='18 months') and mold.marinertype='Mariner' AND mold.id=m.id) order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,''); gives: FBSD_DEV : query : 154.000 ms LINUX_PROD : query : 153.408 ms FBSD_TEST : query : 137.000 ms LINUX_TEST : query : 404.000 ms Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS instead NOT IN, because the first clause has a better performance. So, you can use it on that way. Other questions? - Do you have a partial index on marinerstates.marinerid where this condition is accomplished? - Do you have a index on mariner.id? - Can you provide a explain of these queries on the PostgreSQL-9.0 machines? Regards Ing. Marcos Luís Ortíz Valmaseda Linux User # 418229 && PostgreSQL DBA Centro de Tecnologías Gestión de Datos (DATEC) http://postgresql.uci.cu http://www.postgresql.org http://it.toolbox.com/blogs/sql-apprentice
Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
От
Achilleas Mantzios
Дата:
Thanx, Στις Monday 17 January 2011 18:52:27 ο/η Ing. Marcos Ortiz Valmaseda έγραψε: > > Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS > instead NOT IN, because the first clause has a better performance. So, you can use it on that way. > You mean this? (from 8.4 changes) "Create explicit concepts of semi-joins and anti-joins (Tom) This work formalizes our previous ad-hoc treatment of IN (SELECT ...) clauses, and extends it to EXISTS and NOT EXISTS clauses. It should result in significantly better planning of EXISTS and NOT EXISTS queries. In general, logically equivalent IN and EXISTS clauses should now have similar performance, whereas previously IN often won." I haven't found any other recent reference to this issue. And this is far from what you suggest. Here the entry talks about "similar" performance. Also a similar issue was hot back in 7.4 days : "IN / NOT IN subqueries are now much more efficient In previous releases, IN/NOT IN subqueries were joined to the upper query by sequentially scanning the subquery looking for a match. The 7.4 code uses the same sophisticated techniques used by ordinary joins and so is much faster. An IN will now usually be as fast as or faster than an equivalent EXISTS subquery; this reverses the conventional wisdom that applied to previous releases." > Other questions? > - Do you have a partial index on marinerstates.marinerid where this condition is accomplished? No, but i just tried it (on state='Active') with no impact. > - Do you have a index on mariner.id? Yes, It is the primary key. > - Can you provide a explain of these queries on the PostgreSQL-9.0 machines? Sure, first i'll post the table definitions and then some stats and then the epxlain analyze(s) mariner ===== id | integer | not null default nextval(('public.mariner_id_seq'::text)::regclass) givenname | character varying(200) | midname | character varying(100) | surname | character varying(200) | not null ... Indexes: "mariner_pkey" PRIMARY KEY, btree (id) "mariner_smauid" UNIQUE, btree (smauid) "mariner_username_key" UNIQUE, btree (username) "mariner_nationalityid" btree (nationalityid) "mariner_parentid" btree (parentid) "mariner_surname" btree (surname) marinerstates ======== id | integer | not null default nextval(('public.marinerstates_id_seq'::text)::regclass) marinerid | integer | not null state | character varying(20) | not null vslid | integer | leave_period_days | integer | comment | text | starttime | timestamp with time zone | not null endtime | timestamp with time zone | trid | integer | sal_bw | real | not null default 0.0 sal_ot | real | not null default 0.0 sal_lp | real | not null default 0.0 sal_misc | real | not null default 0.0 rankid | integer | system_vslid | integer | startport | text | endport | text | ..... Indexes: "marinerstates_pkey" PRIMARY KEY, btree (id) "marinerstates_mariner_cur_state" UNIQUE, btree (marinerid) WHERE endtime IS NULL "marinerstates_system_vslid" UNIQUE, btree (marinerid, system_vslid) "marinerstates__system_vslid" btree (system_vslid) "marinerstates_cur_mariners_states" btree (endtime) WHERE endtime IS NULL "marinerstates_mariner_past_state" btree (marinerid, starttime, endtime) WHERE endtime IS NOT NULL "marinerstates_marinerid" btree (marinerid) "marinerstates_marinerid_starttime" btree (marinerid, starttime) "marinerstates_rankid" btree (rankid) "marinerstates_rankid_cur_mariners" btree (rankid) WHERE endtime IS NULL "marinerstates_rankid_past_state" btree (rankid, starttime, endtime) WHERE endtime IS NOT NULL "marinerstates_state" btree (state) "marinerstates_state_cur_mariners" btree (state) WHERE endtime IS NULL "marinerstates_state_past_state" btree (state, starttime, endtime) WHERE endtime IS NOT NULL "marinerstates_vslid" btree (vslid) "marinerstates_vslid_cur_mariners" btree (vslid) WHERE endtime IS NULL "marinerstates_vslid_past_state" btree (vslid, starttime, endtime) WHERE endtime IS NOT NULL vessels ===== name | character varying(200) | not null id | integer | not null default nextval(('public.vessels_id_seq'::text)::regclass) ... Indexes: "vessels_pkey" PRIMARY KEY, btree (id) "vessels_name_key" UNIQUE, btree (name) "idx_name" btree (name) "vessels_flag" btree (flag) "vessels_groupno" btree (groupno) "vessels_vslstatus_idx" btree (vslstatus) dynacom=# SELECT count(*) from mariner; count ------- 14447 dynacom=# SELECT count(*) from marinerstates; count ------- 51013 dynacom=# SELECT avg(marqry.cnt),stddev(marqry.cnt) FROM (SELECT m.id,count(ms.id) as cnt from mariner m, marinerstates msWHERE m.id=ms.marinerid group by m.id) AS marqry; avg | stddev --------------------+-------------------- 3.5665944207508914 | 4.4416879361829170 (vessels do not play any impact in the query, so i'll leave them out) Slow plan in 9.0.2 : QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10462.561..10462.937 rows=603 loops=1) -> Sort (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10462.560..10462.664 rows=603 loops=1) Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)),m.id Sort Method: quicksort Memory: 71kB -> Hash Join (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10425.261..10461.621 rows=603 loops=1) Hash Cond: (ms.marinerid = m.id) -> Hash Join (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.228..34.993 rows=2625 loops=1) Hash Cond: (ms.vslid = vsl.id) -> Seq Scan on marinerstates ms (cost=0.00..2889.32 rows=4590 width=8) (actual time=0.011..33.494rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) -> Hash (cost=16.72..16.72 rows=272 width=4) (actual time=0.207..0.207 rows=272 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 10kB -> Seq Scan on vessels vsl (cost=0.00..16.72 rows=272 width=4) (actual time=0.004..0.118 rows=272loops=1) -> Hash (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10424.994..10424.994 rows=12832 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 702kB -> Seq Scan on mariner m (cost=7614.86..8172.57 rows=7143 width=23) (actual time=10409.498..10419.971rows=12832 loops=1) Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text)) SubPlan 1 -> Unique (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.495..10408.446 rows=1454loops=1) -> Nested Loop (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.493..10407.517rows=1835 loops=1) Join Filter: (msold.marinerid = mold.id) -> Index Scan using mariner_pkey on mariner mold (cost=0.00..1728.60 rows=14286width=4) (actual time=0.007..13.931 rows=14286 loops=1) Filter: ((marinertype)::text = 'Mariner'::text) -> Materialize (cost=2768.00..5671.97 rows=1 width=8) (actual time=0.003..0.330rows=1876 loops=14286) -> Nested Loop (cost=2768.00..5671.96 rows=1 width=8) (actual time=39.723..85.401rows=1876 loops=1) -> Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actualtime=39.708..81.501 rows=1876 loops=1) Hash Cond: (msold.marinerid = msold2.marinerid) Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime <msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) -> Seq Scan on marinerstates msold (cost=0.00..2889.32 rows=4590width=20) (actual time=0.003..33.952 rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) -> Hash (cost=2251.66..2251.66 rows=41307 width=24) (actualtime=39.613..39.613 rows=41250 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 2246kB -> Seq Scan on marinerstates msold2 (cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.882 rows=41250 loops=1) Filter: ((state)::text = 'Active'::text) -> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.28rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1876) Index Cond: (vslold.id = msold.vslid) Total runtime: 10463.619 ms (37 rows) Fast plan in 8.3.13 : QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=633677.56..633700.48 rows=1834 width=23) (actual time=543.684..551.003 rows=603 loops=1) -> Sort (cost=633677.56..633682.14 rows=1834 width=23) (actual time=543.676..546.070 rows=603 loops=1) Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)),m.id Sort Method: quicksort Memory: 53kB -> Hash Join (cost=630601.65..633578.15 rows=1834 width=23) (actual time=439.969..540.573 rows=603 loops=1) Hash Cond: (ms.vslid = vsl.id) -> Hash Join (cost=630580.33..633530.01 rows=2261 width=27) (actual time=437.459..532.847 rows=603 loops=1) Hash Cond: (ms.marinerid = m.id) -> Seq Scan on marinerstates ms (cost=0.00..2875.32 rows=4599 width=8) (actual time=0.017..80.153rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) -> Hash (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307 rows=12832 loops=1) -> Index Scan using mariner_pkey on mariner m (cost=628776.89..630491.54 rows=7103 width=23)(actual time=311.023..380.168 rows=12832 loops=1) Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text)) SubPlan -> Unique (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981 rows=1454loops=1) -> Nested Loop (cost=0.00..628767.72 rows=1834 width=4) (actual time=0.120..289.961rows=1835 loops=1) -> Nested Loop (cost=0.00..627027.98 rows=1865 width=4) (actual time=0.099..237.128rows=1876 loops=1) -> Index Scan using marinerstates_marinerid on marinerstates msold (cost=0.00..626316.07 rows=2299 width=8) (actual time=0.079..186.150 rows=1876 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan)) SubPlan -> Bitmap Heap Scan on marinerstates msold2 (cost=4.28..12.11rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=2625) Recheck Cond: ((marinerid = $0) AND (starttime < $2)) Filter: ((id <> $1) AND ((state)::text = 'Active'::text)AND (($2 - endtime) <= '1 year 6 mons'::interval)) -> Bitmap Index Scan on marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) (actual time=0.009..0.009 rows=6 loops=2625) Index Cond: ((marinerid = $0) AND (starttime < $2)) -> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.30rows=1 width=4) (actual time=0.006..0.010 rows=1 loops=1876) Index Cond: (vslold.id = msold.vslid) -> Index Scan using mariner_pkey on mariner mold (cost=0.00..0.92 rows=1width=4) (actual time=0.007..0.012 rows=1 loops=1876) Index Cond: (mold.id = msold.marinerid) Filter: ((mold.marinertype)::text = 'Mariner'::text) -> Hash (cost=17.81..17.81 rows=281 width=4) (actual time=2.491..2.491 rows=273 loops=1) -> Seq Scan on vessels vsl (cost=0.00..17.81 rows=281 width=4) (actual time=0.012..1.306 rows=273loops=1) Total runtime: 553.601 ms (33 rows) Is there any other data i could post (pg_stat,...) that would help? thanx a lot. > > Regards > > > Ing. Marcos Luís Ortíz Valmaseda > Linux User # 418229 && PostgreSQL DBA > Centro de Tecnologías Gestión de Datos (DATEC) > http://postgresql.uci.cu > http://www.postgresql.org > http://it.toolbox.com/blogs/sql-apprentice > -- Achilleas Mantzios
Achilleas Mantzios wrote: > Thanx, > > Στις Monday 17 January 2011 18:52:27 ο/η Ing. Marcos Ortiz Valmaseda έγραψε: > >> Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS >> instead NOT IN, because the first clause has a better performance. So, you can use it on that way. >> >> > You mean this? > (from 8.4 changes) > "Create explicit concepts of semi-joins and anti-joins (Tom) > This work formalizes our previous ad-hoc treatment of IN (SELECT > ...) clauses, and extends it to EXISTS and NOT EXISTS clauses. It > should result in significantly better planning of EXISTS and NOT > EXISTS queries. In general, logically equivalent IN and EXISTS > clauses should now have similar performance, whereas previously IN > often won." > > I haven't found any other recent reference to this issue. And this is far from what you suggest. > Here the entry talks about "similar" performance. > > Also a similar issue was hot back in 7.4 days : > "IN / NOT IN subqueries are now much more efficient > In previous releases, IN/NOT IN subqueries were joined to the > upper query by sequentially scanning the subquery looking for a > match. The 7.4 code uses the same sophisticated techniques used > by ordinary joins and so is much faster. An IN will now usually > be as fast as or faster than an equivalent EXISTS subquery; this > reverses the conventional wisdom that applied to previous > releases." > > >> Other questions? >> - Do you have a partial index on marinerstates.marinerid where this condition is accomplished? >> > No, but i just tried it (on state='Active') with no impact. > >> - Do you have a index on mariner.id? >> > Yes, It is the primary key. > >> - Can you provide a explain of these queries on the PostgreSQL-9.0 machines? >> > Sure, first i'll post the table definitions and then some stats and then the epxlain analyze(s) > > mariner > ===== > id | integer | not null default nextval(('public.mariner_id_seq'::text)::regclass) > givenname | character varying(200) | > midname | character varying(100) | > surname | character varying(200) | not null > ... > Indexes: > "mariner_pkey" PRIMARY KEY, btree (id) > "mariner_smauid" UNIQUE, btree (smauid) > "mariner_username_key" UNIQUE, btree (username) > "mariner_nationalityid" btree (nationalityid) > "mariner_parentid" btree (parentid) > "mariner_surname" btree (surname) > > marinerstates > ======== > id | integer | not null default nextval(('public.marinerstates_id_seq'::text)::regclass) > marinerid | integer | not null > state | character varying(20) | not null > vslid | integer | > leave_period_days | integer | > comment | text | > starttime | timestamp with time zone | not null > endtime | timestamp with time zone | > trid | integer | > sal_bw | real | not null default 0.0 > sal_ot | real | not null default 0.0 > sal_lp | real | not null default 0.0 > sal_misc | real | not null default 0.0 > rankid | integer | > system_vslid | integer | > startport | text | > endport | text | > ..... > Indexes: > "marinerstates_pkey" PRIMARY KEY, btree (id) > "marinerstates_mariner_cur_state" UNIQUE, btree (marinerid) WHERE endtime IS NULL > "marinerstates_system_vslid" UNIQUE, btree (marinerid, system_vslid) > "marinerstates__system_vslid" btree (system_vslid) > "marinerstates_cur_mariners_states" btree (endtime) WHERE endtime IS NULL > "marinerstates_mariner_past_state" btree (marinerid, starttime, endtime) WHERE endtime IS NOT NULL > "marinerstates_marinerid" btree (marinerid) > "marinerstates_marinerid_starttime" btree (marinerid, starttime) > "marinerstates_rankid" btree (rankid) > "marinerstates_rankid_cur_mariners" btree (rankid) WHERE endtime IS NULL > "marinerstates_rankid_past_state" btree (rankid, starttime, endtime) WHERE endtime IS NOT NULL > "marinerstates_state" btree (state) > "marinerstates_state_cur_mariners" btree (state) WHERE endtime IS NULL > "marinerstates_state_past_state" btree (state, starttime, endtime) WHERE endtime IS NOT NULL > "marinerstates_vslid" btree (vslid) > "marinerstates_vslid_cur_mariners" btree (vslid) WHERE endtime IS NULL > "marinerstates_vslid_past_state" btree (vslid, starttime, endtime) WHERE endtime IS NOT NULL > > vessels > ===== > name | character varying(200) | not null > id | integer | not null default nextval(('public.vessels_id_seq'::text)::regclass) > ... > Indexes: > "vessels_pkey" PRIMARY KEY, btree (id) > "vessels_name_key" UNIQUE, btree (name) > "idx_name" btree (name) > "vessels_flag" btree (flag) > "vessels_groupno" btree (groupno) > "vessels_vslstatus_idx" btree (vslstatus) > > dynacom=# SELECT count(*) from mariner; > count > ------- > 14447 > > dynacom=# SELECT count(*) from marinerstates; > count > ------- > 51013 > > dynacom=# SELECT avg(marqry.cnt),stddev(marqry.cnt) FROM (SELECT m.id,count(ms.id) as cnt from mariner m, marinerstatesms WHERE m.id=ms.marinerid group by m.id) AS marqry; > avg | stddev > --------------------+-------------------- > 3.5665944207508914 | 4.4416879361829170 > > (vessels do not play any impact in the query, so i'll leave them out) > > Slow plan in 9.0.2 : > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Unique (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10462.561..10462.937 rows=603 loops=1) > -> Sort (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10462.560..10462.664 rows=603 loops=1) > Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)),m.id > Sort Method: quicksort Memory: 71kB > -> Hash Join (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10425.261..10461.621 rows=603 loops=1) > Hash Cond: (ms.marinerid = m.id) > -> Hash Join (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.228..34.993 rows=2625 loops=1) > Hash Cond: (ms.vslid = vsl.id) > -> Seq Scan on marinerstates ms (cost=0.00..2889.32 rows=4590 width=8) (actual time=0.011..33.494rows=2625 loops=1) > Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) > -> Hash (cost=16.72..16.72 rows=272 width=4) (actual time=0.207..0.207 rows=272 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 10kB > -> Seq Scan on vessels vsl (cost=0.00..16.72 rows=272 width=4) (actual time=0.004..0.118 rows=272loops=1) > -> Hash (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10424.994..10424.994 rows=12832 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 702kB > -> Seq Scan on mariner m (cost=7614.86..8172.57 rows=7143 width=23) (actual time=10409.498..10419.971rows=12832 loops=1) > Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text)) > SubPlan 1 > -> Unique (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.495..10408.446 rows=1454loops=1) > -> Nested Loop (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.493..10407.517rows=1835 loops=1) > Join Filter: (msold.marinerid = mold.id) > -> Index Scan using mariner_pkey on mariner mold (cost=0.00..1728.60 rows=14286width=4) (actual time=0.007..13.931 rows=14286 loops=1) > Filter: ((marinertype)::text = 'Mariner'::text) > -> Materialize (cost=2768.00..5671.97 rows=1 width=8) (actual time=0.003..0.330rows=1876 loops=14286) > -> Nested Loop (cost=2768.00..5671.96 rows=1 width=8) (actual time=39.723..85.401rows=1876 loops=1) > -> Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actualtime=39.708..81.501 rows=1876 loops=1) > Hash Cond: (msold.marinerid = msold2.marinerid) > Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime< msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) > -> Seq Scan on marinerstates msold (cost=0.00..2889.32 rows=4590width=20) (actual time=0.003..33.952 rows=2625 loops=1) > Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) > -> Hash (cost=2251.66..2251.66 rows=41307 width=24) (actualtime=39.613..39.613 rows=41250 loops=1) > Buckets: 8192 Batches: 1 Memory Usage: 2246kB > -> Seq Scan on marinerstates msold2 (cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.882 rows=41250 loops=1) > Filter: ((state)::text = 'Active'::text) > -> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.28rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1876) > Index Cond: (vslold.id = msold.vslid) > Total runtime: 10463.619 ms > (37 rows) > > Fast plan in 8.3.13 : > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Unique (cost=633677.56..633700.48 rows=1834 width=23) (actual time=543.684..551.003 rows=603 loops=1) > -> Sort (cost=633677.56..633682.14 rows=1834 width=23) (actual time=543.676..546.070 rows=603 loops=1) > Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)),m.id > Sort Method: quicksort Memory: 53kB > -> Hash Join (cost=630601.65..633578.15 rows=1834 width=23) (actual time=439.969..540.573 rows=603 loops=1) > Hash Cond: (ms.vslid = vsl.id) > -> Hash Join (cost=630580.33..633530.01 rows=2261 width=27) (actual time=437.459..532.847 rows=603 loops=1) > Hash Cond: (ms.marinerid = m.id) > -> Seq Scan on marinerstates ms (cost=0.00..2875.32 rows=4599 width=8) (actual time=0.017..80.153rows=2625 loops=1) > Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) > -> Hash (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307 rows=12832loops=1) > -> Index Scan using mariner_pkey on mariner m (cost=628776.89..630491.54 rows=7103 width=23)(actual time=311.023..380.168 rows=12832 loops=1) > Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text)) > SubPlan > -> Unique (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981 rows=1454loops=1) > -> Nested Loop (cost=0.00..628767.72 rows=1834 width=4) (actual time=0.120..289.961rows=1835 loops=1) > -> Nested Loop (cost=0.00..627027.98 rows=1865 width=4) (actual time=0.099..237.128rows=1876 loops=1) > -> Index Scan using marinerstates_marinerid on marinerstates msold (cost=0.00..626316.07 rows=2299 width=8) (actual time=0.079..186.150 rows=1876 loops=1) > Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan)) > SubPlan > -> Bitmap Heap Scan on marinerstates msold2 (cost=4.28..12.11rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=2625) > Recheck Cond: ((marinerid = $0) AND (starttime < $2)) > Filter: ((id <> $1) AND ((state)::text = 'Active'::text)AND (($2 - endtime) <= '1 year 6 mons'::interval)) > -> Bitmap Index Scan on marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) (actual time=0.009..0.009 rows=6 loops=2625) > Index Cond: ((marinerid = $0) AND (starttime <$2)) > -> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.30rows=1 width=4) (actual time=0.006..0.010 rows=1 loops=1876) > Index Cond: (vslold.id = msold.vslid) > -> Index Scan using mariner_pkey on mariner mold (cost=0.00..0.92 rows=1width=4) (actual time=0.007..0.012 rows=1 loops=1876) > Index Cond: (mold.id = msold.marinerid) > Filter: ((mold.marinertype)::text = 'Mariner'::text) > -> Hash (cost=17.81..17.81 rows=281 width=4) (actual time=2.491..2.491 rows=273 loops=1) > -> Seq Scan on vessels vsl (cost=0.00..17.81 rows=281 width=4) (actual time=0.012..1.306 rows=273loops=1) > Total runtime: 553.601 ms > (33 rows) > > Is there any other data i could post (pg_stat,...) that would help? > > thanx a lot. > > >> Regards >> >> >> Ing. Marcos Luís Ortíz Valmaseda >> Linux User # 418229 && PostgreSQL DBA >> Centro de Tecnologías Gestión de Datos (DATEC) >> http://postgresql.uci.cu >> http://www.postgresql.org >> http://it.toolbox.com/blogs/sql-apprentice >> >> > > > > -- > Achilleas Mantzios > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > Achilleas, here is the slow part from 9.02: -> Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actualtime=39.708..81.501 rows=1876 loops=1) Hash Cond: (msold.marinerid = msold2.marinerid) Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime <msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) -> Seq Scan on marinerstates msold (cost=0.00..2889.32 rows=4590width=20) (actual time=0.003..33.952 rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) -> Hash (cost=2251.66..2251.66 rows=41307 width=24) (actualtime=39.613..39.613 rows=41250 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 2246kB -> Seq Scan on marinerstates msold2 (cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.882 The same part from 8.3.3 looks like this: Seq Scan on marinerstates ms (cost=0.00..2875.32 rows=4599 width=8) (actual time=0.017..80.153 rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) -> Hash (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307 rows=12832 loops=1) -> Index Scan using mariner_pkey on mariner m (cost=628776.89..630491.54 rows=7103 width=23)(actual time=311.023..380.168 rows=12832 loops=1) Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text)) SubPlan -> Unique (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981 rows=1454loops=1) -> Nested Loop (cost=0.00..628767.72 rows=1834 width=4) (actual time=0.120..289.961rows=1835 loops=1) This leads me to the conclusion that the queries differ significantly. 8.3.3 mentions NOT hashed plan, I don't see it in 9.02 and the filtering conditions look differently. Are you sure that the plans are from the same query? -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
От
Achilleas Mantzios
Дата:
Στις Tuesday 18 January 2011 16:26:21 ο/η Mladen Gogala έγραψε: > This leads me to the conclusion that the queries differ significantly. > 8.3.3 mentions NOT hashed plan, I don't see it in 9.02 and the filtering > conditions look differently. Are you sure that the plans are from the > same query? First the num of rows in the two portions are different so you might be comparing apples and oranges here. Anyway, i will repost the EXPLAIN plans by copying pasting the query, without the analyze part. 8.3.13 Unique (cost=633677.56..633700.48 rows=1834 width=23) -> Sort (cost=633677.56..633682.14 rows=1834 width=23) Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)),m.id -> Hash Join (cost=630601.65..633578.15 rows=1834 width=23) Hash Cond: (ms.vslid = vsl.id) -> Hash Join (cost=630580.33..633530.01 rows=2261 width=27) Hash Cond: (ms.marinerid = m.id) -> Seq Scan on marinerstates ms (cost=0.00..2875.32 rows=4599 width=8) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) -> Hash (cost=630491.54..630491.54 rows=7103 width=23) -> Index Scan using mariner_pkey on mariner m (cost=628776.89..630491.54 rows=7103 width=23) Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text)) SubPlan -> Unique (cost=0.00..628772.30 rows=1834 width=4) -> Nested Loop (cost=0.00..628767.72 rows=1834 width=4) -> Nested Loop (cost=0.00..627027.98 rows=1865 width=4) -> Index Scan using marinerstates_marinerid on marinerstates msold (cost=0.00..626316.07 rows=2299 width=8) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan)) SubPlan -> Bitmap Heap Scan on marinerstates msold2 (cost=4.28..12.11rows=1 width=0) Recheck Cond: ((marinerid = $0) AND (starttime < $2)) Filter: ((id <> $1) AND ((state)::text = 'Active'::text)AND (($2 - endtime) <= '1 year 6 mons'::interval)) -> Bitmap Index Scan on marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) Index Cond: ((marinerid = $0) AND (starttime < $2)) -> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.30rows=1 width=4) Index Cond: (vslold.id = msold.vslid) -> Index Scan using mariner_pkey on mariner mold (cost=0.00..0.92 rows=1width=4) Index Cond: (mold.id = msold.marinerid) Filter: ((mold.marinertype)::text = 'Mariner'::text) -> Hash (cost=17.81..17.81 rows=281 width=4) -> Seq Scan on vessels vsl (cost=0.00..17.81 rows=281 width=4) (31 rows) 9.0.2 Unique (cost=11525.09..11571.55 rows=3717 width=23) -> Sort (cost=11525.09..11534.38 rows=3717 width=23) Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)),m.id -> Hash Join (cost=8281.98..11304.67 rows=3717 width=23) Hash Cond: (ms.marinerid = m.id) -> Hash Join (cost=20.12..2963.83 rows=3717 width=4) Hash Cond: (ms.vslid = vsl.id) -> Seq Scan on marinerstates ms (cost=0.00..2889.32 rows=4590 width=8) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) -> Hash (cost=16.72..16.72 rows=272 width=4) -> Seq Scan on vessels vsl (cost=0.00..16.72 rows=272 width=4) -> Hash (cost=8172.57..8172.57 rows=7143 width=23) -> Seq Scan on mariner m (cost=7614.86..8172.57 rows=7143 width=23) Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text)) SubPlan 1 -> Unique (cost=2768.00..7614.86 rows=1 width=4) -> Nested Loop (cost=2768.00..7614.86 rows=1 width=4) Join Filter: (msold.marinerid = mold.id) -> Index Scan using mariner_pkey on mariner mold (cost=0.00..1728.60 rows=14286width=4) Filter: ((marinertype)::text = 'Mariner'::text) -> Materialize (cost=2768.00..5671.97 rows=1 width=8) -> Nested Loop (cost=2768.00..5671.96 rows=1 width=8) -> Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) Hash Cond: (msold.marinerid = msold2.marinerid) Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime <msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) -> Seq Scan on marinerstates msold (cost=0.00..2889.32 rows=4590width=20) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) -> Hash (cost=2251.66..2251.66 rows=41307 width=24) -> Seq Scan on marinerstates msold2 (cost=0.00..2251.66rows=41307 width=24) Filter: ((state)::text = 'Active'::text) -> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.28rows=1 width=4) Index Cond: (vslold.id = msold.vslid) (32 rows) > > -- > Mladen Gogala > Sr. Oracle DBA > 1500 Broadway > New York, NY 10036 > (212) 329-5251 > www.vmsinfo.com > > -- Achilleas Mantzios
Achilleas Mantzios <achill@matrix.gatewaynet.com> writes: > Anyway, i will repost the EXPLAIN plans by copying pasting the query, without the analyze part. Please show EXPLAIN ANALYZE, not just EXPLAIN, results. When complaining that the planner did the wrong thing, it's not very helpful to see only its estimates and not reality. regards, tom lane
Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
От
Achilleas Mantzios
Дата:
Στις Wednesday 19 January 2011 19:26:56 ο/η Tom Lane έγραψε: > Achilleas Mantzios <achill@matrix.gatewaynet.com> writes: > > Anyway, i will repost the EXPLAIN plans by copying pasting the query, without the analyze part. > > Please show EXPLAIN ANALYZE, not just EXPLAIN, results. When > complaining that the planner did the wrong thing, it's not very helpful > to see only its estimates and not reality. I did so two posts before but one more won't do any harm. Here we go: 9.0.2 - SLOW QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10439.797..10440.152 rows=603 loops=1) -> Sort (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10439.795..10439.905 rows=603 loops=1) Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)),m.id Sort Method: quicksort Memory: 71kB -> Hash Join (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10402.338..10438.875 rows=603 loops=1) Hash Cond: (ms.marinerid = m.id) -> Hash Join (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.228..35.178 rows=2625 loops=1) Hash Cond: (ms.vslid = vsl.id) -> Seq Scan on marinerstates ms (cost=0.00..2889.32 rows=4590 width=8) (actual time=0.015..33.634rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) -> Hash (cost=16.72..16.72 rows=272 width=4) (actual time=0.203..0.203 rows=272 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 10kB -> Seq Scan on vessels vsl (cost=0.00..16.72 rows=272 width=4) (actual time=0.004..0.117 rows=272loops=1) -> Hash (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10402.075..10402.075 rows=12832 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 702kB -> Seq Scan on mariner m (cost=7614.86..8172.57 rows=7143 width=23) (actual time=10386.549..10397.193rows=12832 loops=1) Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text)) SubPlan 1 -> Unique (cost=2768.00..7614.86 rows=1 width=4) (actual time=86.937..10385.379 rows=1454loops=1) -> Nested Loop (cost=2768.00..7614.86 rows=1 width=4) (actual time=86.936..10384.555rows=1835 loops=1) Join Filter: (msold.marinerid = mold.id) -> Index Scan using mariner_pkey on mariner mold (cost=0.00..1728.60 rows=14286width=4) (actual time=0.007..14.250 rows=14286 loops=1) Filter: ((marinertype)::text = 'Mariner'::text) -> Materialize (cost=2768.00..5671.97 rows=1 width=8) (actual time=0.003..0.328rows=1876 loops=14286) -> Nested Loop (cost=2768.00..5671.96 rows=1 width=8) (actual time=39.259..84.889rows=1876 loops=1) -> Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actualtime=39.249..81.025 rows=1876 loops=1) Hash Cond: (msold.marinerid = msold2.marinerid) Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime <msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) -> Seq Scan on marinerstates msold (cost=0.00..2889.32 rows=4590width=20) (actual time=0.003..33.964 rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) -> Hash (cost=2251.66..2251.66 rows=41307 width=24) (actualtime=39.156..39.156 rows=41250 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 2246kB -> Seq Scan on marinerstates msold2 (cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.552 rows=41250 loops=1) Filter: ((state)::text = 'Active'::text) -> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.28rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1876) Index Cond: (vslold.id = msold.vslid) Total runtime: 10440.690 ms (37 rows) 8.3.13 - FAST QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=633677.56..633700.48 rows=1834 width=23) (actual time=551.166..558.487 rows=603 loops=1) -> Sort (cost=633677.56..633682.14 rows=1834 width=23) (actual time=551.156..553.548 rows=603 loops=1) Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)),m.id Sort Method: quicksort Memory: 53kB -> Hash Join (cost=630601.65..633578.15 rows=1834 width=23) (actual time=447.773..547.934 rows=603 loops=1) Hash Cond: (ms.vslid = vsl.id) -> Hash Join (cost=630580.33..633530.01 rows=2261 width=27) (actual time=445.320..540.291 rows=603 loops=1) Hash Cond: (ms.marinerid = m.id) -> Seq Scan on marinerstates ms (cost=0.00..2875.32 rows=4599 width=8) (actual time=0.018..79.742rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) -> Hash (cost=630491.54..630491.54 rows=7103 width=23) (actual time=445.216..445.216 rows=12832 loops=1) -> Index Scan using mariner_pkey on mariner m (cost=628776.89..630491.54 rows=7103 width=23)(actual time=319.675..388.383 rows=12832 loops=1) Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text)) SubPlan -> Unique (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.196..312.728 rows=1454loops=1) -> Nested Loop (cost=0.00..628767.72 rows=1834 width=4) (actual time=0.187..298.780rows=1835 loops=1) -> Nested Loop (cost=0.00..627027.98 rows=1865 width=4) (actual time=0.165..244.706rows=1876 loops=1) -> Index Scan using marinerstates_marinerid on marinerstates msold (cost=0.00..626316.07 rows=2299 width=8) (actual time=0.138..194.165 rows=1876 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan)) SubPlan -> Bitmap Heap Scan on marinerstates msold2 (cost=4.28..12.11rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=2625) Recheck Cond: ((marinerid = $0) AND (starttime < $2)) Filter: ((id <> $1) AND ((state)::text = 'Active'::text)AND (($2 - endtime) <= '1 year 6 mons'::interval)) -> Bitmap Index Scan on marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) (actual time=0.009..0.009 rows=6 loops=2625) Index Cond: ((marinerid = $0) AND (starttime < $2)) -> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.30rows=1 width=4) (actual time=0.006..0.010 rows=1 loops=1876) Index Cond: (vslold.id = msold.vslid) -> Index Scan using mariner_pkey on mariner mold (cost=0.00..0.92 rows=1width=4) (actual time=0.008..0.012 rows=1 loops=1876) Index Cond: (mold.id = msold.marinerid) Filter: ((mold.marinertype)::text = 'Mariner'::text) -> Hash (cost=17.81..17.81 rows=281 width=4) (actual time=2.432..2.432 rows=273 loops=1) -> Seq Scan on vessels vsl (cost=0.00..17.81 rows=281 width=4) (actual time=0.033..1.220 rows=273loops=1) Total runtime: 561.208 ms (33 rows) > > regards, tom lane > -- Achilleas Mantzios
On Thu, Jan 20, 2011 at 2:05 AM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote: > -> Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actualtime=39.249..81.025 rows=1876 loops=1) > Hash Cond: (msold.marinerid = msold2.marinerid) > Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime <msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) > -> Seq Scan on marinerstates msold (cost=0.00..2889.32 rows=4590width=20) (actual time=0.003..33.964 rows=2625 loops=1) > Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) > -> Hash (cost=2251.66..2251.66 rows=41307 width=24) (actualtime=39.156..39.156 rows=41250 loops=1) > Buckets: 8192 Batches: 1 Memory Usage: 2246kB > -> Seq Scan on marinerstates msold2 (cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.552 rows=41250 loops=1) > Filter: ((state)::text = 'Active'::text) Looks like the bad selectivity estimate there is what's killing it. Not sure I completely understand why 9.0.2 is coming up with such a bad estimate, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/21/2011 12:09 PM, Robert Haas wrote: > Looks like the bad selectivity estimate there is what's killing it. > Not sure I completely understand why 9.0.2 is coming up with such a > bad estimate, though. > I would recommend setting default_statistics_target to 1024 and effective cache size to 20480MB and see what happens. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
On Fri, Jan 21, 2011 at 12:42 PM, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote: > On 1/21/2011 12:09 PM, Robert Haas wrote: >> >> Looks like the bad selectivity estimate there is what's killing it. >> Not sure I completely understand why 9.0.2 is coming up with such a >> bad estimate, though. >> > > I would recommend setting default_statistics_target to 1024 and effective > cache size to 20480MB and see what happens. I am starting to suspect that there is a bug in the join selectivity logic in 9.0. We've had a few complaints where the join was projected to return more rows than the product of the inner side and outer side of the join, which is clearly nonsense. I read the function and I don't see anything weird... and it clearly can't be too bad or we would have had more complaints... but... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/21/2011 12:51 PM, Robert Haas wrote: > I am starting to suspect that there is a bug in the join selectivity > logic in 9.0. We've had a few complaints where the join was projected > to return more rows than the product of the inner side and outer side > of the join, which is clearly nonsense. I read the function and I > don't see anything weird... and it clearly can't be too bad or we > would have had more complaints... but... Well the way to test it would be to take the function from 8.3, input the same arguments and see if there is any difference with the results. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 20, 2011 at 2:05 AM, Achilleas Mantzios > <achill@matrix.gatewaynet.com> wrote: >> -> Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actual time=39.249..81.025 rows=1876 loops=1) >> Hash Cond: (msold.marinerid = msold2.marinerid) >> Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime < msold.starttime) AND ((msold.starttime - msold2.endtime)<= '1 year 6 mons'::interval)) > Looks like the bad selectivity estimate there is what's killing it. > Not sure I completely understand why 9.0.2 is coming up with such a > bad estimate, though. Hm ... it's the <> clause. Look at this, in the regression database: regression=# explain analyze select * from tenk1 a where exists(select 1 from tenk1 b where a.hundred = b.hundred); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Nested Loop Semi Join (cost=0.00..1134.65 rows=10000 width=244) (actual time=0.362..960.732 rows=10000 loops=1) -> Seq Scan on tenk1 a (cost=0.00..458.00 rows=10000 width=244) (actual time=0.070..45.287 rows=10000 loops=1) -> Index Scan using tenk1_hundred on tenk1 b (cost=0.00..2.16 rows=100 width=4) (actual time=0.073..0.073 rows=1 loops=10000) Index Cond: (hundred = a.hundred) Total runtime: 996.990 ms (5 rows) regression=# explain analyze select * from tenk1 a where exists(select 1 from tenk1 b where a.hundred = b.hundred and a.thousand<> b.thousand); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Hash Semi Join (cost=583.00..1078.50 rows=1 width=244) (actual time=142.738..344.823 rows=10000 loops=1) Hash Cond: (a.hundred = b.hundred) Join Filter: (a.thousand <> b.thousand) -> Seq Scan on tenk1 a (cost=0.00..458.00 rows=10000 width=244) (actual time=0.051..44.137 rows=10000 loops=1) -> Hash (cost=458.00..458.00 rows=10000 width=8) (actual time=142.526..142.526 rows=10000 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 313kB -> Seq Scan on tenk1 b (cost=0.00..458.00 rows=10000 width=8) (actual time=0.027..71.778 rows=10000 loops=1) Total runtime: 384.017 ms (8 rows) (This is with enable_hashagg off, to make the two plans more obviously comparable; but that's cosmetic. The important point is that the join rowcount estimate is dead on in the first case and dead wrong in the second.) Some digging turns up the fact that the semi-join selectivity of "a.thousand <> b.thousand" is being estimated at *zero*. This is because the semi-join selectivity of "a.thousand = b.thousand" is estimated at 1.0 (correctly: every row of a has at least one join partner in b). And then neqjoinsel is computed as 1 - eqjoinsel, which is a false conclusion for semijoins: joining to at least one row doesn't mean joining to every row. I'm a bit inclined to fix this by having neqjoinsel hard-wire a result of 1 for semi and anti join cases --- that is, assume there's always at least one inner row that isn't equal to the outer row. That's presumably too high for real-world cases where the clause is probably being used together with other, correlated, clauses; but we've got no info available that would help narrow that down. The best we can do here is a forced estimate. If it should be less than 1, then what? regards, tom lane
Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
От
"Kevin Grittner"
Дата:
Tom Lane <tgl@sss.pgh.pa.us> wrote: > If it should be less than 1, then what? 1 - (estimated tuples / estimated distinct values) ? -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If it should be less than 1, then what? > 1 - (estimated tuples / estimated distinct values) ? Uh, no. The number we're after is the probability that an outer tuple has at least one unequal value in the inner relation. This is not 1 minus the probability that a *specific* inner value is equal, which is what I think your formula is estimating. regards, tom lane
Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
От
Achilleas Mantzios
Дата:
Στις Friday 21 January 2011 22:22:24 ο/η Tom Lane έγραψε: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> If it should be less than 1, then what? > > > 1 - (estimated tuples / estimated distinct values) ? > > Uh, no. The number we're after is the probability that an outer tuple > has at least one unequal value in the inner relation. This is not 1 > minus the probability that a *specific* inner value is equal, which is > what I think your formula is estimating. Isn't this probablity (an outer tuple has at least one unequal value in the inner relation) = 1 - (probability that all values in the inner relation are equal to the value of the outer tuple) Anyways, glad to see smth came out of this. Thx > > regards, tom lane > -- Achilleas Mantzios