Обсуждение: bad plan

Поиск
Список
Период
Сортировка

bad plan

От
Julien Cigar
Дата:
Hello,

I have an extremely bad plan for one of my colleague's query. Basically
PostgreSQL chooses to seq scan instead of index scan. This is on:

antabif=# select version();
                                                  version
----------------------------------------------------------------------------------------------------------
  PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc (GCC)
4.2.1 20070719  [FreeBSD], 64-bit

The machines has 4GB of RAM with the following config:
- shared_buffers: 512MB
- effective_cache_size: 2GB
- work_mem: 32MB
- maintenance_work_mem: 128MB
- default_statistics_target: 300
- temp_buffers: 64MB
- wal_buffers: 8MB
- checkpoint_segments = 15

The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on:

- http://www.pastie.org/3731956 : with default config
- http://www.pastie.org/3731960 : this is with enable_seq_scan = off
- http://www.pastie.org/3731962 : I tried to play on the various cost
settings but it's doesn't change anything, except setting
random_page_cost to 1 (which will lead to bad plans for other queries,
so not a solution)
- http://www.pastie.org/3732035 : with enable_hashagg and
enable_hashjoin to false

I'm currently out of idea why PostgreSQL still chooses a bad plan for
this query ... any hint :) ?

Thank you,
Julien

--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.


Вложения

Re: bad plan

От
Mario Dankoor
Дата:
Julien,<br /><br /> It looks like you forgot to post the query.<br /><br /> Mario<br /> On 2012-04-05 1:38 PM, Julien
Cigarwrote: <blockquote cite="mid:4F7D843E.4060304@ulb.ac.be" type="cite">Hello, <br /><br /> I have an extremely bad
planfor one of my colleague's query. Basically PostgreSQL chooses to seq scan instead of index scan. This is on: <br
/><br/> antabif=# select version(); <br />                                                  version <br />
----------------------------------------------------------------------------------------------------------<br />
 PostgreSQL9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc (GCC) 4.2.1 20070719  [FreeBSD], 64-bit <br /><br />
Themachines has 4GB of RAM with the following config: <br /> - shared_buffers: 512MB <br /> - effective_cache_size: 2GB
<br/> - work_mem: 32MB <br /> - maintenance_work_mem: 128MB <br /> - default_statistics_target: 300 <br /> -
temp_buffers:64MB <br /> - wal_buffers: 8MB <br /> - checkpoint_segments = 15 <br /><br /> The tables have been
ANALYZE'd.I've put the EXPLAIN ANALYZE on: <br /><br /> - <a class="moz-txt-link-freetext"
href="http://www.pastie.org/3731956">http://www.pastie.org/3731956</a>: with default config <br /> - <a
class="moz-txt-link-freetext"href="http://www.pastie.org/3731960">http://www.pastie.org/3731960</a> : this is with
enable_seq_scan= off <br /> - <a class="moz-txt-link-freetext"
href="http://www.pastie.org/3731962">http://www.pastie.org/3731962</a>: I tried to play on the various cost settings
butit's doesn't change anything, except setting random_page_cost to 1 (which will lead to bad plans for other queries,
sonot a solution) <br /> - <a class="moz-txt-link-freetext"
href="http://www.pastie.org/3732035">http://www.pastie.org/3732035</a>: with enable_hashagg and enable_hashjoin to
false<br /><br /> I'm currently out of idea why PostgreSQL still chooses a bad plan for this query ... any hint :) ?
<br/><br /> Thank you, <br /> Julien <br /><br /><br /><fieldset class="mimeAttachmentHeader"></fieldset><br /><pre
wrap="">
</pre></blockquote><br />

Re: bad plan

От
Julien Cigar
Дата:
Hello Mario,

The query is in the pastie.org links, here is a copy:

SELECT
   t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom
FROM (
   SELECT
     geo_id , count(*) AS frequence
   FROM
hit.context_to_context_links
   WHERE
     NOT geo_id IS NULL ANDtaxon_id= ANY (
       SELECT
         taxon_id
       FROM
         rab.ancestors
       WHERE
         ancestors.subphylum_id = 18830
       ) GROUP BY geo_id
) as t_geo
JOIN
   hit.geo_candidates gc ON gc.id = t_geo.geo_id;

Thank you,
Julien

On 04/05/2012 14:03, Mario Dankoor wrote:
> Julien,
>
> It looks like you forgot to post the query.
>
> Mario
> On 2012-04-05 1:38 PM, Julien Cigar wrote:
>> Hello,
>>
>> I have an extremely bad plan for one of my colleague's query.
>> Basically PostgreSQL chooses to seq scan instead of index scan. This
>> is on:
>>
>> antabif=# select version();
>>                                                  version
>> ----------------------------------------------------------------------------------------------------------
>>
>>  PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc
>> (GCC) 4.2.1 20070719  [FreeBSD], 64-bit
>>
>> The machines has 4GB of RAM with the following config:
>> - shared_buffers: 512MB
>> - effective_cache_size: 2GB
>> - work_mem: 32MB
>> - maintenance_work_mem: 128MB
>> - default_statistics_target: 300
>> - temp_buffers: 64MB
>> - wal_buffers: 8MB
>> - checkpoint_segments = 15
>>
>> The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on:
>>
>> - http://www.pastie.org/3731956 : with default config
>> - http://www.pastie.org/3731960 : this is with enable_seq_scan = off
>> - http://www.pastie.org/3731962 : I tried to play on the various cost
>> settings but it's doesn't change anything, except setting
>> random_page_cost to 1 (which will lead to bad plans for other
>> queries, so not a solution)
>> - http://www.pastie.org/3732035 : with enable_hashagg and
>> enable_hashjoin to false
>>
>> I'm currently out of idea why PostgreSQL still chooses a bad plan for
>> this query ... any hint :) ?
>>
>> Thank you,
>> Julien
>>
>>
>>
>


--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.


Вложения

Re: bad plan

От
Mario Dankoor
Дата:
Julien,

I generally try avoid NOT predicates.
If your geo_id is an integer,  try geo_id > -1
or if it's a varchar geo_id > ''.
The idea is that geo_id > 0 is false for null values.

I don't think query optimizers are fond of NOT predicates.

SELECT  t_geo.frequence       ,ST_SetSRID(gc.geom, 4326) as geom
FROM (       SELECT  geo_id              ,count(*) AS frequence       FROM hit.context_to_context_links       WHERE
geo_id> -1       AND taxon_id= ANY (                          SELECT taxon_id                          FROM
rab.ancestors                         WHERE ancestors.subphylum_id = 18830                        ) GROUP BY geo_id
)as t_geo
 
JOIN  hit.geo_candidates gc
ON gc.id = t_geo.geo_id;









On 2012-04-05 2:08 PM, Julien Cigar wrote:
> SELECT
>   t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom
> FROM (
>   SELECT
>     geo_id , count(*) AS frequence
>   FROM
> hit.context_to_context_links
>   WHERE
>     NOT geo_id IS NULL ANDtaxon_id= ANY (
>       SELECT
>         taxon_id
>       FROM
>         rab.ancestors
>       WHERE
>         ancestors.subphylum_id = 18830
>       ) GROUP BY geo_id
> ) as t_geo
> JOIN
>   hit.geo_candidates gc ON gc.id = t_geo.geo_id; 



Re: bad plan

От
Julien Cigar
Дата:
Hi Mario,

Setting it to > -1, or even removing the condition doesn't change
anything .. the problem is more that the estimate of the join point
sucks pretty much ..

Julien

On 04/05/2012 14:32, Mario Dankoor wrote:
> Julien,
>
> I generally try avoid NOT predicates.
> If your geo_id is an integer,  try geo_id > -1
> or if it's a varchar geo_id > ''.
> The idea is that geo_id > 0 is false for null values.
>
> I don't think query optimizers are fond of NOT predicates.
>
> SELECT  t_geo.frequence
>        ,ST_SetSRID(gc.geom, 4326) as geom
> FROM (
>        SELECT  geo_id
>               ,count(*) AS frequence
>        FROM hit.context_to_context_links
>        WHERE geo_id > -1
>        AND taxon_id= ANY (
>                           SELECT taxon_id
>                           FROM rab.ancestors
>                           WHERE ancestors.subphylum_id = 18830
>                         ) GROUP BY geo_id
>      ) as t_geo
> JOIN  hit.geo_candidates gc
> ON gc.id = t_geo.geo_id;
>
>
>
>
>
>
>
>
>
> On 2012-04-05 2:08 PM, Julien Cigar wrote:
>> SELECT
>>   t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom
>> FROM (
>>   SELECT
>>     geo_id , count(*) AS frequence
>>   FROM
>> hit.context_to_context_links
>>   WHERE
>>     NOT geo_id IS NULL ANDtaxon_id= ANY (
>>       SELECT
>>         taxon_id
>>       FROM
>>         rab.ancestors
>>       WHERE
>>         ancestors.subphylum_id = 18830
>>       ) GROUP BY geo_id
>> ) as t_geo
>> JOIN
>>   hit.geo_candidates gc ON gc.id = t_geo.geo_id;
>
>


--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.


Вложения

Re: bad plan

От
Julien Cigar
Дата:
FYI, raising cpu_tuple_cost from 0.01 to 0.5 fixed the problem..!

On 04/05/2012 14:43, Julien Cigar wrote:
> Hi Mario,
>
> Setting it to > -1, or even removing the condition doesn't change
> anything .. the problem is more that the estimate of the join point
> sucks pretty much ..
>
> Julien
>
> On 04/05/2012 14:32, Mario Dankoor wrote:
>> Julien,
>>
>> I generally try avoid NOT predicates.
>> If your geo_id is an integer,  try geo_id > -1
>> or if it's a varchar geo_id > ''.
>> The idea is that geo_id > 0 is false for null values.
>>
>> I don't think query optimizers are fond of NOT predicates.
>>
>> SELECT  t_geo.frequence
>>        ,ST_SetSRID(gc.geom, 4326) as geom
>> FROM (
>>        SELECT  geo_id
>>               ,count(*) AS frequence
>>        FROM hit.context_to_context_links
>>        WHERE geo_id > -1
>>        AND taxon_id= ANY (
>>                           SELECT taxon_id
>>                           FROM rab.ancestors
>>                           WHERE ancestors.subphylum_id = 18830
>>                         ) GROUP BY geo_id
>>      ) as t_geo
>> JOIN  hit.geo_candidates gc
>> ON gc.id = t_geo.geo_id;
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2012-04-05 2:08 PM, Julien Cigar wrote:
>>> SELECT
>>>   t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom
>>> FROM (
>>>   SELECT
>>>     geo_id , count(*) AS frequence
>>>   FROM
>>> hit.context_to_context_links
>>>   WHERE
>>>     NOT geo_id IS NULL ANDtaxon_id= ANY (
>>>       SELECT
>>>         taxon_id
>>>       FROM
>>>         rab.ancestors
>>>       WHERE
>>>         ancestors.subphylum_id = 18830
>>>       ) GROUP BY geo_id
>>> ) as t_geo
>>> JOIN
>>>   hit.geo_candidates gc ON gc.id = t_geo.geo_id;
>>
>>
>
>
>
>


--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.


Вложения