On 04/05/2012 21:47, Ants Aasma wrote:
> On Thu, Apr 5, 2012 at 2:47 PM, Julien Cigar<jcigar@ulb.ac.be> wrote:
>> - http://www.pastie.org/3731956 : with default config
>> - http://www.pastie.org/3731960 : this is with enable_seq_scan = off
> It looks like the join selectivity of (context_to_context_links,
> ancestors) is being overestimated by almost two orders of magnitude.
> The optimizer thinks that there are 564 rows in the
> context_to_context_links table for each taxon_id, while in fact for
> this query the number is 9. To confirm that this, you can force the
> selectivity estimate to be 200x lower by adding a geo_id = geod_id
> where clause to the subquery.
adding a geo_id = geo_id to the subquery helped a little bit with a
cpu_tuple_cost of 0.1: http://www.pastie.org/3738224 :
without:
Index Scan using ltlc_taxon_id_idxoncontext_to_context_links (cost=0.00..146.93 rows=341 width=8) (actual
time=0.004..0.019 rows=9 loops=736)
with geo_id = geo_id:
Index Scan using ltlc_taxon_id_idxoncontext_to_context_links (cost=0.00..148.11 rows=2 width=8) (actual
time=0.004..0.020 rows=9 loops=736)
> If it does help, then the next question would be why is the estimate
> so much off. It could be either because the stats for
> context_to_context_links.taxon_id are wrong or because
> ancestors.taxon_id(subphylum_id = 18830) is a special case. To help
> figuring this is out, you could run the following to queries and post
> the results:
>
> SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
> COUNT(*) AS num FROM context_to_context_links GROUP BY taxon_id) AS
> dist GROUP BY 1 ORDER BY 1;
>
> SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
> COUNT(*) AS num FROM context_to_context_links WHERE NOT geo_id IS NULL
> and taxon_id= ANY ( select taxon_id from rab.ancestors where
> ancestors.subphylum_id = 18830) GROUP BY taxon_id) AS dist GROUP BY
> 1 ORDER BY 1;
I'm sorry but I get an "ERROR: division by zero" for both of your queries..
> If the second distribution has a significantly different shape then
> cross column statistics are necessary to get good plans. As it happens
> I'm working on adding this functionality to PostgreSQL and would love
> to hear more details about your use-case to understand if it would be
> solved by this work.
Thank you for your help,
Julien
> Regards,
> Ants Aasma
--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.