Обсуждение: sequential scan performance
Hi -
I have a table of about 3 million rows of city "aliases" that I need
to query using LIKE - for example:
select * from city_alias where city_name like '%FRANCISCO'
When I do an EXPLAIN ANALYZE on the above query, the result is:
Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42)
(actual time=73.369..3330.281 rows=407 loops=1)
Filter: ((name)::text ~~ '%FRANCISCO'::text)
Total runtime: 3330.524 ms
(3 rows)
this is a query that our system needs to do a LOT. Is there any way
to improve the performance on this either with changes to our query
or by configuring the database deployment? We have an index on
city_name but when using the % operator on the front of the query
string postgresql can't use the index .
Thanks for any help.
Mike
On Sun, May 29, 2005 at 08:27:26AM -0500, Michael Engelhart wrote: > this is a query that our system needs to do a LOT. Is there any way > to improve the performance on this either with changes to our query > or by configuring the database deployment? We have an index on > city_name but when using the % operator on the front of the query > string postgresql can't use the index . Try tsearch2 from contrib, it might help you. /* Steinar */ -- Homepage: http://www.sesse.net/
> When I do an EXPLAIN ANALYZE on the above query, the result is: > > Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42) (actual > time=73.369..3330.281 rows=407 loops=1) > Filter: ((name)::text ~~ '%FRANCISCO'::text) > Total runtime: 3330.524 ms > (3 rows) > > > this is a query that our system needs to do a LOT. Is there any way > to improve the performance on this either with changes to our query or > by configuring the database deployment? We have an index on city_name > but when using the % operator on the front of the query string > postgresql can't use the index . Of course not. There really is now way to make your literal query above fast. You could try making a functional index on the reverse() of the string and querying for the reverse() of 'francisco'. Or, if you want a general full text index, you should absolutely be using contrib/tsearch2. Chris
Michael, I'd recommend our contrib/pg_trgm module, which provides trigram based fuzzy search and return results ordered by similarity to your query. Read http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm for more details. Oleg On Sun, 29 May 2005, Michael Engelhart wrote: > Hi - > > I have a table of about 3 million rows of city "aliases" that I need to query > using LIKE - for example: > > select * from city_alias where city_name like '%FRANCISCO' > > > When I do an EXPLAIN ANALYZE on the above query, the result is: > > Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42) (actual > time=73.369..3330.281 rows=407 loops=1) > Filter: ((name)::text ~~ '%FRANCISCO'::text) > Total runtime: 3330.524 ms > (3 rows) > > > this is a query that our system needs to do a LOT. Is there any way to > improve the performance on this either with changes to our query or by > configuring the database deployment? We have an index on city_name but when > using the % operator on the front of the query string postgresql can't use > the index . > > Thanks for any help. > > Mike > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Thanks everyone for all the suggestions. I'll check into those contrib modules. Michael On May 29, 2005, at 2:44 PM, Oleg Bartunov wrote: > Michael, > > I'd recommend our contrib/pg_trgm module, which provides > trigram based fuzzy search and return results ordered by similarity > to your query. Read http://www.sai.msu.su/~megera/postgres/gist/ > pg_trgm/README.pg_trgm > for more details. > > Oleg > On Sun, 29 May 2005, Michael Engelhart wrote: > > >> Hi - >> >> I have a table of about 3 million rows of city "aliases" that I >> need to query using LIKE - for example: >> >> select * from city_alias where city_name like '%FRANCISCO' >> >> >> When I do an EXPLAIN ANALYZE on the above query, the result is: >> >> Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42) >> (actual time=73.369..3330.281 rows=407 loops=1) >> Filter: ((name)::text ~~ '%FRANCISCO'::text) >> Total runtime: 3330.524 ms >> (3 rows) >> >> >> this is a query that our system needs to do a LOT. Is there any >> way to improve the performance on this either with changes to our >> query or by configuring the database deployment? We have an >> index on city_name but when using the % operator on the front of >> the query string postgresql can't use the index . >> >> Thanks for any help. >> >> Mike >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 2: you can get off all lists at once with the unregister command >> (send "unregister YourEmailAddressHere" to >> majordomo@postgresql.org) >> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On Sun, May 29, 2005 at 08:27:26AM -0500, Michael Engelhart wrote:
> Hi -
>
> I have a table of about 3 million rows of city "aliases" that I need
> to query using LIKE - for example:
>
> select * from city_alias where city_name like '%FRANCISCO'
>
>
> When I do an EXPLAIN ANALYZE on the above query, the result is:
>
> Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42)
> (actual time=73.369..3330.281 rows=407 loops=1)
> Filter: ((name)::text ~~ '%FRANCISCO'::text)
> Total runtime: 3330.524 ms
> (3 rows)
>
>
> this is a query that our system needs to do a LOT. Is there any way
> to improve the performance on this either with changes to our query
> or by configuring the database deployment? We have an index on
> city_name but when using the % operator on the front of the query
> string postgresql can't use the index .
If that's really what you're doing (the wildcard is always at the beginning)
then something like this
create index city_name_idx on foo (reverse(city_name));
select * from city_alias where reverse(city_name) like reverse('%FRANCISCO');
should do just what you need.
I use this, with a plpgsql implementation of reverse, and it works nicely.
CREATE OR REPLACE FUNCTION reverse(text) RETURNS text AS '
DECLARE
original alias for $1;
reverse_str text;
i int4;
BEGIN
reverse_str = '''';
FOR i IN REVERSE LENGTH(original)..1 LOOP
reverse_str = reverse_str || substr(original,i,1);
END LOOP;
return reverse_str;
END;'
LANGUAGE 'plpgsql' IMMUTABLE;
Someone will no doubt suggest using tsearch2, and you might want to
take a look at it if you actually need full-text search, but my
experience has been that it's too slow to be useful in production, and
it's not needed for the simple "leading wildcard" case.
Cheers,
Steve