Wildcard searches & performance question
От | Grega Bremec |
---|---|
Тема | Wildcard searches & performance question |
Дата | |
Msg-id | 20030527210908.A571@elbereth.noviforum.si обсуждение исходный текст |
Ответы |
Re: Wildcard searches & performance question
(Andrew Sullivan <andrew@libertyrms.info>)
Re: Wildcard searches & performance question (Josh Berkus <josh@agliodbs.com>) Re: Wildcard searches & performance question ("scott.marlowe" <scott.marlowe@ihs.com>) Re: Wildcard searches & performance question (Rod Taylor <rbt@rbt.ca>) Re: Wildcard searches & performance question (Grega Bremec <grega.bremec@noviforum.si>) |
Список | pgsql-performance |
Hello, I have a database with the following layout: searchterms=# \d+ searches_2002 Table "public.searches_2002" Column | Type | Modifiers | Description -----------+------------------------+-----------+------------- srchdate | date | not null | srchtime | time without time zone | not null | client_ip | inet | not null | srchquery | character varying(50) | not null | fhvalue | smallint | | Indexes: searches_2002_client_ip btree (client_ip), searches_2002_srchdate btree (srchdate), searches_2002_srchdatetime btree (srchdate, srchtime), searches_2002_srchquery btree (srchquery), searches_2002_srchquery_lcase btree (lower(srchquery)), searches_2002_srchquery_withfh btree (srchquery, fhvalue), searches_2002_srchtime btree (srchtime) There are no uniqueness properties that would make it possible for this table to have a primary key, as it is a list of searches performed on a search engine and the users' behaviour is, well... umm, odd, to be mild. :) Also, do note that this is a test table, so nevermind the excessive amount of indexes - performance is not an issue here, I am still evaluating the need and benefits of having various indexes on those columns. The particular case I am interested is this: when executing queries involving pattern searches using various operators on srchquery, none of the indexes are used in certain cases, namely those LIKE and regexp filters that start with a wildcard. This makes perfect sense, because wildcard pattern searches that start with a wildcard, can not really benefit from an index scan, because a sequential scan is probably going to be faster: we are only going to benefit from scanning an index in those special cases where the wildcard evaluates to a zero-length string. One example of a query plan: searchterms=# explain select count(*) from searches_2002 where srchquery like '%curriculum%'; QUERY PLAN -------------------------------------------------------------------------- Aggregate (cost=4583.26..4583.26 rows=1 width=0) -> Seq Scan on searches_2002 (cost=0.00..4583.26 rows=1 width=0) Filter: (srchquery ~~ '%curriculum%'::text) There is 211061 records in this test table, but the real-life tables would contain a much much larger amount of data, more like 50+ million rows. This promise of a hell on earth trying to optimize performance makes me wonder: would there be a sensible way/reason for avoiding sequential scans on queries that start with a wildcard, and would avoiding sequential scans even be feasible in such cases? Or in other words, can I somehow optimize LIKE and regexp queries that start with a wildcard? TIA, -- Grega Bremec System Administration & Development Support grega.bremec-at-noviforum.si http://najdi.si/ http://www.noviforum.si/
В списке pgsql-performance по дате отправления:
Предыдущее
От: Reece HartДата:
Сообщение: Re: [GENERAL] Q: Structured index - which one runs faster?