Обсуждение: [7.0.3] optimizing a LIKE query ...

Поиск
Список
Период
Сортировка

[7.0.3] optimizing a LIKE query ...

От
The Hermit Hacker
Дата:
Can someone suggest how to improve the following query, so as to make the
LIKE section operate on the results found by the = one?

SELECT ndict.url_id,ndict.intag FROM ndict,urlWHERE ndict.word_id=-720551816  AND url.rec_id=ndict.url_id  AND
((url.url|| '') LIKE '%http://www.postgresql.org/%%')
 

ndict.word_id=-720551816 returns 5895 records

((url.url || '') LIKE '%http://www.postgresql.org/%%')- returns 138k records

explain shows:

NOTICE:  QUERY PLAN:

Hash Join  (cost=10163.01..26647.09 rows=42 width=12) ->  Index Scan using n_word on ndict  (cost=0.00..16299.52
rows=4180width=8) ->  Hash  (cost=10159.53..10159.53 rows=1390 width=4)       ->  Seq Scan on url  (cost=0.00..10159.53
rows=1390width=4)
 

EXPLAIN


I'm figuring that if I can somehow get the query (using subselects,
maybe?), to have the LIKE part of the query work only on the 6k records
returned by the "=" part of it, the overall results should be faster ...

Possible?


Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org



Re: [7.0.3] optimizing a LIKE query ...

От
Tom Lane
Дата:
The Hermit Hacker <scrappy@hub.org> writes:
> I'm figuring that if I can somehow get the query (using subselects,
> maybe?), to have the LIKE part of the query work only on the 6k records
> returned by the "=" part of it, the overall results should be faster ...

In 7.0.* I think the only way to do that is to select into a temp table
and then apply the LIKE while selecting from the temp table.

In 7.1 you could possibly force the order by using a subselect (although
offhand I think the planner might be smart enough to see through that,
and do what it thinks is right anyway).  The real problem is the planner
thinks that LIKE '%http://www.postgresql.org/%%' is really selective; it
has no idea that most of your table mentions pgsql.org URLs :-(.  We
need better statistics to fix this properly.  (On my list for 7.2.)
        regards, tom lane