Re: Query with "like" is really slow
От | Christian Schröder |
---|---|
Тема | Re: Query with "like" is really slow |
Дата | |
Msg-id | 46E4224A.6050807@deriva.de обсуждение исходный текст |
Ответ на | Re: Query with "like" is really slow (Gregory Stark <stark@enterprisedb.com>) |
Список | pgsql-general |
Gregory Stark wrote:<br /><br /><blockquote type="cite"><pre wrap="">Christian Schröder <a class="moz-txt-link-rfc2396E"href="mailto:cs@deriva.de"><cs@deriva.de></a> writes: ... </pre><blockquote type="cite"><pre wrap="">-> Seq Scan on table2 (cost=0.00..186.64 rows=2 width=4) (actual time=0.052..2.259rows=42 loops=1) Filter: (c ~~ '1131%'::text) </pre></blockquote><pre wrap="">... </pre><blockquotetype="cite"><pre wrap="">-> Seq Scan on table2 (cost=0.00..200.89 rows=14 width=4) (actual time=0.084..3.419rows=42 loops=1) Filter: ("substring"((c)::text, 1, 4) = 1131'::text) </pre></blockquote><pre wrap="">... </pre><blockquote type="cite"><pre wrap="">My question is: Why do I have to optimize my query (use "substring" instead of "like") instead of having the database do this for me? Or is there a difference between both queries which I cannot see? </pre></blockquote><pre wrap=""> The only difference is that the optimizer understands LIKE better than it does substring and so it tries harder to come up with a good estimate of how many rows will match. In this case it seems its estimate is actually better (by pure luck) with the substring() call. But it's still not very good. Have these tables been analyzed recently? If so try raising the statistics target on the "c" column. If the number of rows estimated goes up from 2 to the 14 it's estimating with substring() then you'll get the better plan. Hopefully it would be even better than that though. </pre></blockquote><br /> Yes, all tables are "vacuum analyzed" twiceper day. (I did not have time to configure the auto-vacuum feature.)<br /><br /> But after increasing the statisticstarget of the column to 20 and re-analyzing the table the query planner chose the better plan and the query gotsped up dramatically. You seem to have found the problem!<br /> I have now increased the default statistics target from10 to 20 and the statistics target of this column to 500. We have about 190 distinct values in this column, so with astatistics target of 500 the statistics should be as exact as possible. (At least if I have understood well what this parametermeans.) Since we have many queries that rely on this column to me it seems to be a good idea to have best statisticsabout it. I cannot see any disadvantage of this approach, at least if I do it only for one single column. Or doI overlook anything?<br /><br /><blockquote type="cite"><pre wrap=""></pre><blockquote type="cite"><pre wrap="">And lastquestion: I do not really understand the first query plan. The actual time for the outer nested loop is 532673.631 ms. As far as I have understood the docs this includes the child nodes. But I cannot find the time-consuming child node. I only see two child nodes: The inner nested loop (which took 31.692 ms) and the index scan (which took 243.643 ms). Or do I have to multiply the 243.643 ms with 1627 (number of loops)? But even then I get 396407.161 ms, which is still far away from the 532673.631 ms in the parent node. </pre></blockquote><pre wrap=""> The nested loop still has to do some work. Actually it's quite possible that that extra overhead in nested loop is largely gettimeofday() calls for the explain analyze. Does the query take less time run without explain analyze than it does run with it? </pre></blockquote> You seem to be right with your assumption that most of the extra time is spentin the gettimeofday() calls: Without "explain analyze" the query took about 6 minutes which is close to 380 secondsthat I get from multiplying the number of loops (1627) with the actual time per loop (234.643 ms).<br /><br /> Manythanks for your very helpful explanations!<br /><br /> Regards,<br /> Christian<br /><pre class="moz-signature" cols="72">-- Deriva GmbH Tel.: +49 551 489500-42 Financial IT and Consulting Fax: +49 551 489500-91 Hans-Böckler-Straße 2 <a class="moz-txt-link-freetext" href="http://www.deriva.de">http://www.deriva.de</a> D-37079 Göttingen Deriva CA Certificate: <a class="moz-txt-link-freetext" href="http://www.deriva.de/deriva-ca.cer">http://www.deriva.de/deriva-ca.cer</a></pre>
В списке pgsql-general по дате отправления: