Обсуждение: bad plan
Hello, I have an extremely bad plan for one of my colleague's query. Basically PostgreSQL chooses to seq scan instead of index scan. This is on: antabif=# select version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD], 64-bit The machines has 4GB of RAM with the following config: - shared_buffers: 512MB - effective_cache_size: 2GB - work_mem: 32MB - maintenance_work_mem: 128MB - default_statistics_target: 300 - temp_buffers: 64MB - wal_buffers: 8MB - checkpoint_segments = 15 The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on: - http://www.pastie.org/3731956 : with default config - http://www.pastie.org/3731960 : this is with enable_seq_scan = off - http://www.pastie.org/3731962 : I tried to play on the various cost settings but it's doesn't change anything, except setting random_page_cost to 1 (which will lead to bad plans for other queries, so not a solution) - http://www.pastie.org/3732035 : with enable_hashagg and enable_hashjoin to false I'm currently out of idea why PostgreSQL still chooses a bad plan for this query ... any hint :) ? Thank you, Julien -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
Вложения
Julien,<br /><br /> It looks like you forgot to post the query.<br /><br /> Mario<br /> On 2012-04-05 1:38 PM, Julien Cigarwrote: <blockquote cite="mid:4F7D843E.4060304@ulb.ac.be" type="cite">Hello, <br /><br /> I have an extremely bad planfor one of my colleague's query. Basically PostgreSQL chooses to seq scan instead of index scan. This is on: <br /><br/> antabif=# select version(); <br /> version <br /> ----------------------------------------------------------------------------------------------------------<br /> PostgreSQL9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD], 64-bit <br /><br /> Themachines has 4GB of RAM with the following config: <br /> - shared_buffers: 512MB <br /> - effective_cache_size: 2GB <br/> - work_mem: 32MB <br /> - maintenance_work_mem: 128MB <br /> - default_statistics_target: 300 <br /> - temp_buffers:64MB <br /> - wal_buffers: 8MB <br /> - checkpoint_segments = 15 <br /><br /> The tables have been ANALYZE'd.I've put the EXPLAIN ANALYZE on: <br /><br /> - <a class="moz-txt-link-freetext" href="http://www.pastie.org/3731956">http://www.pastie.org/3731956</a>: with default config <br /> - <a class="moz-txt-link-freetext"href="http://www.pastie.org/3731960">http://www.pastie.org/3731960</a> : this is with enable_seq_scan= off <br /> - <a class="moz-txt-link-freetext" href="http://www.pastie.org/3731962">http://www.pastie.org/3731962</a>: I tried to play on the various cost settings butit's doesn't change anything, except setting random_page_cost to 1 (which will lead to bad plans for other queries, sonot a solution) <br /> - <a class="moz-txt-link-freetext" href="http://www.pastie.org/3732035">http://www.pastie.org/3732035</a>: with enable_hashagg and enable_hashjoin to false<br /><br /> I'm currently out of idea why PostgreSQL still chooses a bad plan for this query ... any hint :) ? <br/><br /> Thank you, <br /> Julien <br /><br /><br /><fieldset class="mimeAttachmentHeader"></fieldset><br /><pre wrap=""> </pre></blockquote><br />
Hello Mario, The query is in the pastie.org links, here is a copy: SELECT t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom FROM ( SELECT geo_id , count(*) AS frequence FROM hit.context_to_context_links WHERE NOT geo_id IS NULL ANDtaxon_id= ANY ( SELECT taxon_id FROM rab.ancestors WHERE ancestors.subphylum_id = 18830 ) GROUP BY geo_id ) as t_geo JOIN hit.geo_candidates gc ON gc.id = t_geo.geo_id; Thank you, Julien On 04/05/2012 14:03, Mario Dankoor wrote: > Julien, > > It looks like you forgot to post the query. > > Mario > On 2012-04-05 1:38 PM, Julien Cigar wrote: >> Hello, >> >> I have an extremely bad plan for one of my colleague's query. >> Basically PostgreSQL chooses to seq scan instead of index scan. This >> is on: >> >> antabif=# select version(); >> version >> ---------------------------------------------------------------------------------------------------------- >> >> PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc >> (GCC) 4.2.1 20070719 [FreeBSD], 64-bit >> >> The machines has 4GB of RAM with the following config: >> - shared_buffers: 512MB >> - effective_cache_size: 2GB >> - work_mem: 32MB >> - maintenance_work_mem: 128MB >> - default_statistics_target: 300 >> - temp_buffers: 64MB >> - wal_buffers: 8MB >> - checkpoint_segments = 15 >> >> The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on: >> >> - http://www.pastie.org/3731956 : with default config >> - http://www.pastie.org/3731960 : this is with enable_seq_scan = off >> - http://www.pastie.org/3731962 : I tried to play on the various cost >> settings but it's doesn't change anything, except setting >> random_page_cost to 1 (which will lead to bad plans for other >> queries, so not a solution) >> - http://www.pastie.org/3732035 : with enable_hashagg and >> enable_hashjoin to false >> >> I'm currently out of idea why PostgreSQL still chooses a bad plan for >> this query ... any hint :) ? >> >> Thank you, >> Julien >> >> >> > -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
Вложения
Julien, I generally try avoid NOT predicates. If your geo_id is an integer, try geo_id > -1 or if it's a varchar geo_id > ''. The idea is that geo_id > 0 is false for null values. I don't think query optimizers are fond of NOT predicates. SELECT t_geo.frequence ,ST_SetSRID(gc.geom, 4326) as geom FROM ( SELECT geo_id ,count(*) AS frequence FROM hit.context_to_context_links WHERE geo_id> -1 AND taxon_id= ANY ( SELECT taxon_id FROM rab.ancestors WHERE ancestors.subphylum_id = 18830 ) GROUP BY geo_id )as t_geo JOIN hit.geo_candidates gc ON gc.id = t_geo.geo_id; On 2012-04-05 2:08 PM, Julien Cigar wrote: > SELECT > t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom > FROM ( > SELECT > geo_id , count(*) AS frequence > FROM > hit.context_to_context_links > WHERE > NOT geo_id IS NULL ANDtaxon_id= ANY ( > SELECT > taxon_id > FROM > rab.ancestors > WHERE > ancestors.subphylum_id = 18830 > ) GROUP BY geo_id > ) as t_geo > JOIN > hit.geo_candidates gc ON gc.id = t_geo.geo_id;
Hi Mario, Setting it to > -1, or even removing the condition doesn't change anything .. the problem is more that the estimate of the join point sucks pretty much .. Julien On 04/05/2012 14:32, Mario Dankoor wrote: > Julien, > > I generally try avoid NOT predicates. > If your geo_id is an integer, try geo_id > -1 > or if it's a varchar geo_id > ''. > The idea is that geo_id > 0 is false for null values. > > I don't think query optimizers are fond of NOT predicates. > > SELECT t_geo.frequence > ,ST_SetSRID(gc.geom, 4326) as geom > FROM ( > SELECT geo_id > ,count(*) AS frequence > FROM hit.context_to_context_links > WHERE geo_id > -1 > AND taxon_id= ANY ( > SELECT taxon_id > FROM rab.ancestors > WHERE ancestors.subphylum_id = 18830 > ) GROUP BY geo_id > ) as t_geo > JOIN hit.geo_candidates gc > ON gc.id = t_geo.geo_id; > > > > > > > > > > On 2012-04-05 2:08 PM, Julien Cigar wrote: >> SELECT >> t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom >> FROM ( >> SELECT >> geo_id , count(*) AS frequence >> FROM >> hit.context_to_context_links >> WHERE >> NOT geo_id IS NULL ANDtaxon_id= ANY ( >> SELECT >> taxon_id >> FROM >> rab.ancestors >> WHERE >> ancestors.subphylum_id = 18830 >> ) GROUP BY geo_id >> ) as t_geo >> JOIN >> hit.geo_candidates gc ON gc.id = t_geo.geo_id; > > -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
Вложения
FYI, raising cpu_tuple_cost from 0.01 to 0.5 fixed the problem..! On 04/05/2012 14:43, Julien Cigar wrote: > Hi Mario, > > Setting it to > -1, or even removing the condition doesn't change > anything .. the problem is more that the estimate of the join point > sucks pretty much .. > > Julien > > On 04/05/2012 14:32, Mario Dankoor wrote: >> Julien, >> >> I generally try avoid NOT predicates. >> If your geo_id is an integer, try geo_id > -1 >> or if it's a varchar geo_id > ''. >> The idea is that geo_id > 0 is false for null values. >> >> I don't think query optimizers are fond of NOT predicates. >> >> SELECT t_geo.frequence >> ,ST_SetSRID(gc.geom, 4326) as geom >> FROM ( >> SELECT geo_id >> ,count(*) AS frequence >> FROM hit.context_to_context_links >> WHERE geo_id > -1 >> AND taxon_id= ANY ( >> SELECT taxon_id >> FROM rab.ancestors >> WHERE ancestors.subphylum_id = 18830 >> ) GROUP BY geo_id >> ) as t_geo >> JOIN hit.geo_candidates gc >> ON gc.id = t_geo.geo_id; >> >> >> >> >> >> >> >> >> >> On 2012-04-05 2:08 PM, Julien Cigar wrote: >>> SELECT >>> t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom >>> FROM ( >>> SELECT >>> geo_id , count(*) AS frequence >>> FROM >>> hit.context_to_context_links >>> WHERE >>> NOT geo_id IS NULL ANDtaxon_id= ANY ( >>> SELECT >>> taxon_id >>> FROM >>> rab.ancestors >>> WHERE >>> ancestors.subphylum_id = 18830 >>> ) GROUP BY geo_id >>> ) as t_geo >>> JOIN >>> hit.geo_candidates gc ON gc.id = t_geo.geo_id; >> >> > > > > -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.