Обсуждение: bad plan
Hello,
I have an extremely bad plan for one of my colleague's query. Basically
PostgreSQL chooses to seq scan instead of index scan. This is on:
antabif=# select version();
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc (GCC)
4.2.1 20070719 [FreeBSD], 64-bit
The machines has 4GB of RAM with the following config:
- shared_buffers: 512MB
- effective_cache_size: 2GB
- work_mem: 32MB
- maintenance_work_mem: 128MB
- default_statistics_target: 300
- temp_buffers: 64MB
- wal_buffers: 8MB
- checkpoint_segments = 15
The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on:
- http://www.pastie.org/3731956 : with default config
- http://www.pastie.org/3731960 : this is with enable_seq_scan = off
- http://www.pastie.org/3731962 : I tried to play on the various cost
settings but it's doesn't change anything, except setting
random_page_cost to 1 (which will lead to bad plans for other queries,
so not a solution)
- http://www.pastie.org/3732035 : with enable_hashagg and
enable_hashjoin to false
I'm currently out of idea why PostgreSQL still chooses a bad plan for
this query ... any hint :) ?
Thank you,
Julien
--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.
Вложения
Julien,<br /><br /> It looks like you forgot to post the query.<br /><br /> Mario<br /> On 2012-04-05 1:38 PM, Julien Cigarwrote: <blockquote cite="mid:4F7D843E.4060304@ulb.ac.be" type="cite">Hello, <br /><br /> I have an extremely bad planfor one of my colleague's query. Basically PostgreSQL chooses to seq scan instead of index scan. This is on: <br /><br/> antabif=# select version(); <br /> version <br /> ----------------------------------------------------------------------------------------------------------<br /> PostgreSQL9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD], 64-bit <br /><br /> Themachines has 4GB of RAM with the following config: <br /> - shared_buffers: 512MB <br /> - effective_cache_size: 2GB <br/> - work_mem: 32MB <br /> - maintenance_work_mem: 128MB <br /> - default_statistics_target: 300 <br /> - temp_buffers:64MB <br /> - wal_buffers: 8MB <br /> - checkpoint_segments = 15 <br /><br /> The tables have been ANALYZE'd.I've put the EXPLAIN ANALYZE on: <br /><br /> - <a class="moz-txt-link-freetext" href="http://www.pastie.org/3731956">http://www.pastie.org/3731956</a>: with default config <br /> - <a class="moz-txt-link-freetext"href="http://www.pastie.org/3731960">http://www.pastie.org/3731960</a> : this is with enable_seq_scan= off <br /> - <a class="moz-txt-link-freetext" href="http://www.pastie.org/3731962">http://www.pastie.org/3731962</a>: I tried to play on the various cost settings butit's doesn't change anything, except setting random_page_cost to 1 (which will lead to bad plans for other queries, sonot a solution) <br /> - <a class="moz-txt-link-freetext" href="http://www.pastie.org/3732035">http://www.pastie.org/3732035</a>: with enable_hashagg and enable_hashjoin to false<br /><br /> I'm currently out of idea why PostgreSQL still chooses a bad plan for this query ... any hint :) ? <br/><br /> Thank you, <br /> Julien <br /><br /><br /><fieldset class="mimeAttachmentHeader"></fieldset><br /><pre wrap=""> </pre></blockquote><br />
Hello Mario,
The query is in the pastie.org links, here is a copy:
SELECT
t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom
FROM (
SELECT
geo_id , count(*) AS frequence
FROM
hit.context_to_context_links
WHERE
NOT geo_id IS NULL ANDtaxon_id= ANY (
SELECT
taxon_id
FROM
rab.ancestors
WHERE
ancestors.subphylum_id = 18830
) GROUP BY geo_id
) as t_geo
JOIN
hit.geo_candidates gc ON gc.id = t_geo.geo_id;
Thank you,
Julien
On 04/05/2012 14:03, Mario Dankoor wrote:
> Julien,
>
> It looks like you forgot to post the query.
>
> Mario
> On 2012-04-05 1:38 PM, Julien Cigar wrote:
>> Hello,
>>
>> I have an extremely bad plan for one of my colleague's query.
>> Basically PostgreSQL chooses to seq scan instead of index scan. This
>> is on:
>>
>> antabif=# select version();
>> version
>> ----------------------------------------------------------------------------------------------------------
>>
>> PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc
>> (GCC) 4.2.1 20070719 [FreeBSD], 64-bit
>>
>> The machines has 4GB of RAM with the following config:
>> - shared_buffers: 512MB
>> - effective_cache_size: 2GB
>> - work_mem: 32MB
>> - maintenance_work_mem: 128MB
>> - default_statistics_target: 300
>> - temp_buffers: 64MB
>> - wal_buffers: 8MB
>> - checkpoint_segments = 15
>>
>> The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on:
>>
>> - http://www.pastie.org/3731956 : with default config
>> - http://www.pastie.org/3731960 : this is with enable_seq_scan = off
>> - http://www.pastie.org/3731962 : I tried to play on the various cost
>> settings but it's doesn't change anything, except setting
>> random_page_cost to 1 (which will lead to bad plans for other
>> queries, so not a solution)
>> - http://www.pastie.org/3732035 : with enable_hashagg and
>> enable_hashjoin to false
>>
>> I'm currently out of idea why PostgreSQL still chooses a bad plan for
>> this query ... any hint :) ?
>>
>> Thank you,
>> Julien
>>
>>
>>
>
--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.
Вложения
Julien, I generally try avoid NOT predicates. If your geo_id is an integer, try geo_id > -1 or if it's a varchar geo_id > ''. The idea is that geo_id > 0 is false for null values. I don't think query optimizers are fond of NOT predicates. SELECT t_geo.frequence ,ST_SetSRID(gc.geom, 4326) as geom FROM ( SELECT geo_id ,count(*) AS frequence FROM hit.context_to_context_links WHERE geo_id> -1 AND taxon_id= ANY ( SELECT taxon_id FROM rab.ancestors WHERE ancestors.subphylum_id = 18830 ) GROUP BY geo_id )as t_geo JOIN hit.geo_candidates gc ON gc.id = t_geo.geo_id; On 2012-04-05 2:08 PM, Julien Cigar wrote: > SELECT > t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom > FROM ( > SELECT > geo_id , count(*) AS frequence > FROM > hit.context_to_context_links > WHERE > NOT geo_id IS NULL ANDtaxon_id= ANY ( > SELECT > taxon_id > FROM > rab.ancestors > WHERE > ancestors.subphylum_id = 18830 > ) GROUP BY geo_id > ) as t_geo > JOIN > hit.geo_candidates gc ON gc.id = t_geo.geo_id;
Hi Mario, Setting it to > -1, or even removing the condition doesn't change anything .. the problem is more that the estimate of the join point sucks pretty much .. Julien On 04/05/2012 14:32, Mario Dankoor wrote: > Julien, > > I generally try avoid NOT predicates. > If your geo_id is an integer, try geo_id > -1 > or if it's a varchar geo_id > ''. > The idea is that geo_id > 0 is false for null values. > > I don't think query optimizers are fond of NOT predicates. > > SELECT t_geo.frequence > ,ST_SetSRID(gc.geom, 4326) as geom > FROM ( > SELECT geo_id > ,count(*) AS frequence > FROM hit.context_to_context_links > WHERE geo_id > -1 > AND taxon_id= ANY ( > SELECT taxon_id > FROM rab.ancestors > WHERE ancestors.subphylum_id = 18830 > ) GROUP BY geo_id > ) as t_geo > JOIN hit.geo_candidates gc > ON gc.id = t_geo.geo_id; > > > > > > > > > > On 2012-04-05 2:08 PM, Julien Cigar wrote: >> SELECT >> t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom >> FROM ( >> SELECT >> geo_id , count(*) AS frequence >> FROM >> hit.context_to_context_links >> WHERE >> NOT geo_id IS NULL ANDtaxon_id= ANY ( >> SELECT >> taxon_id >> FROM >> rab.ancestors >> WHERE >> ancestors.subphylum_id = 18830 >> ) GROUP BY geo_id >> ) as t_geo >> JOIN >> hit.geo_candidates gc ON gc.id = t_geo.geo_id; > > -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
Вложения
FYI, raising cpu_tuple_cost from 0.01 to 0.5 fixed the problem..! On 04/05/2012 14:43, Julien Cigar wrote: > Hi Mario, > > Setting it to > -1, or even removing the condition doesn't change > anything .. the problem is more that the estimate of the join point > sucks pretty much .. > > Julien > > On 04/05/2012 14:32, Mario Dankoor wrote: >> Julien, >> >> I generally try avoid NOT predicates. >> If your geo_id is an integer, try geo_id > -1 >> or if it's a varchar geo_id > ''. >> The idea is that geo_id > 0 is false for null values. >> >> I don't think query optimizers are fond of NOT predicates. >> >> SELECT t_geo.frequence >> ,ST_SetSRID(gc.geom, 4326) as geom >> FROM ( >> SELECT geo_id >> ,count(*) AS frequence >> FROM hit.context_to_context_links >> WHERE geo_id > -1 >> AND taxon_id= ANY ( >> SELECT taxon_id >> FROM rab.ancestors >> WHERE ancestors.subphylum_id = 18830 >> ) GROUP BY geo_id >> ) as t_geo >> JOIN hit.geo_candidates gc >> ON gc.id = t_geo.geo_id; >> >> >> >> >> >> >> >> >> >> On 2012-04-05 2:08 PM, Julien Cigar wrote: >>> SELECT >>> t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom >>> FROM ( >>> SELECT >>> geo_id , count(*) AS frequence >>> FROM >>> hit.context_to_context_links >>> WHERE >>> NOT geo_id IS NULL ANDtaxon_id= ANY ( >>> SELECT >>> taxon_id >>> FROM >>> rab.ancestors >>> WHERE >>> ancestors.subphylum_id = 18830 >>> ) GROUP BY geo_id >>> ) as t_geo >>> JOIN >>> hit.geo_candidates gc ON gc.id = t_geo.geo_id; >> >> > > > > -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.