Re: bad plan

Поиск
Список
Период
Сортировка
От Julien Cigar
Тема Re: bad plan
Дата
Msg-id 4F7ED097.70807@ulb.ac.be
обсуждение исходный текст
Ответ на Re: bad plan  (Ants Aasma <ants@cybertec.at>)
Список pgsql-performance
On 04/05/2012 21:47, Ants Aasma wrote:
> On Thu, Apr 5, 2012 at 2:47 PM, Julien Cigar<jcigar@ulb.ac.be>  wrote:
>> - http://www.pastie.org/3731956 : with default config
>> - http://www.pastie.org/3731960 : this is with enable_seq_scan = off
> It looks like the join selectivity of (context_to_context_links,
> ancestors) is being overestimated by almost two orders of magnitude.
> The optimizer thinks that there are 564 rows in the
> context_to_context_links table for each taxon_id, while in fact for
> this query the number is 9. To confirm that this, you can force the
> selectivity estimate to be 200x lower by adding a geo_id = geod_id
> where clause to the subquery.

adding a geo_id = geo_id to the subquery helped a little bit with a
cpu_tuple_cost of 0.1: http://www.pastie.org/3738224 :

without:

Index Scan using ltlc_taxon_id_idxoncontext_to_context_links  (cost=0.00..146.93  rows=341  width=8) (actual
time=0.004..0.019 rows=9  loops=736) 

with geo_id = geo_id:

Index Scan using ltlc_taxon_id_idxoncontext_to_context_links  (cost=0.00..148.11  rows=2  width=8) (actual
time=0.004..0.020 rows=9  loops=736) 


> If it does help, then the next question would be why is the estimate
> so much off. It could be either because the stats for
> context_to_context_links.taxon_id are wrong or because
> ancestors.taxon_id(subphylum_id = 18830) is a special case. To help
> figuring this is out, you could run the following to queries and post
> the results:
>
> SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
> COUNT(*) AS num FROM context_to_context_links GROUP BY taxon_id) AS
> dist GROUP BY 1 ORDER BY 1;
>
> SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
> COUNT(*) AS num FROM context_to_context_links WHERE NOT geo_id IS NULL
> and taxon_id= ANY ( select taxon_id from rab.ancestors  where
>     ancestors.subphylum_id = 18830) GROUP BY taxon_id) AS dist GROUP BY
> 1 ORDER BY 1;

I'm sorry but I get an "ERROR:  division by zero" for both of your queries..

> If the second distribution has a significantly different shape then
> cross column statistics are necessary to get good plans. As it happens
> I'm working on adding this functionality to PostgreSQL and would love
> to hear more details about your use-case to understand if it would be
> solved by this work.

Thank you for your help,
Julien

> Regards,
> Ants Aasma


--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.


Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Kim Hansen
Дата:
Сообщение: Re: Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster