Wildcard searches & performance question

Поиск
Список
Период
Сортировка
От Grega Bremec
Тема Wildcard searches & performance question
Дата
Msg-id 20030527210908.A571@elbereth.noviforum.si
обсуждение исходный текст
Ответы Re: Wildcard searches & performance question  (Andrew Sullivan <andrew@libertyrms.info>)
Re: Wildcard searches & performance question  (Josh Berkus <josh@agliodbs.com>)
Re: Wildcard searches & performance question  ("scott.marlowe" <scott.marlowe@ihs.com>)
Re: Wildcard searches & performance question  (Rod Taylor <rbt@rbt.ca>)
Re: Wildcard searches & performance question  (Grega Bremec <grega.bremec@noviforum.si>)
Список pgsql-performance
Hello,

I have a database with the following layout:

    searchterms=# \d+ searches_2002
        Table "public.searches_2002"
      Column   |          Type          | Modifiers | Description
    -----------+------------------------+-----------+-------------
     srchdate  | date                   | not null  |
     srchtime  | time without time zone | not null  |
     client_ip | inet                   | not null  |
     srchquery | character varying(50)  | not null  |
     fhvalue   | smallint               |           |
    Indexes: searches_2002_client_ip btree (client_ip),
         searches_2002_srchdate btree (srchdate),
         searches_2002_srchdatetime btree (srchdate, srchtime),
         searches_2002_srchquery btree (srchquery),
         searches_2002_srchquery_lcase btree (lower(srchquery)),
         searches_2002_srchquery_withfh btree (srchquery, fhvalue),
         searches_2002_srchtime btree (srchtime)

There are no uniqueness properties that would make it possible for this table
to have a primary key, as it is a list of searches performed on a search
engine and the users' behaviour is, well... umm, odd, to be mild. :)

Also, do note that this is a test table, so nevermind the excessive amount of
indexes - performance is not an issue here, I am still evaluating the need and
benefits of having various indexes on those columns.

The particular case I am interested is this: when executing queries involving
pattern searches using various operators on srchquery, none of the indexes are
used in certain cases, namely those LIKE and regexp filters that start with
a wildcard.

This makes perfect sense, because wildcard pattern searches that start with a
wildcard, can not really benefit from an index scan, because a sequential scan
is probably going to be faster: we are only going to benefit from scanning an
index in those special cases where the wildcard evaluates to a zero-length
string.

One example of a query plan:

    searchterms=# explain select count(*)
            from searches_2002
            where srchquery like '%curriculum%';
                    QUERY PLAN
    --------------------------------------------------------------------------
     Aggregate  (cost=4583.26..4583.26 rows=1 width=0)
       ->  Seq Scan on searches_2002  (cost=0.00..4583.26 rows=1 width=0)
         Filter: (srchquery ~~ '%curriculum%'::text)

There is 211061 records in this test table, but the real-life tables would
contain a much much larger amount of data, more like 50+ million rows.

This promise of a hell on earth trying to optimize performance makes me wonder:
would there be a sensible way/reason for avoiding sequential scans on queries
that start with a wildcard, and would avoiding sequential scans even be
feasible in such cases? Or in other words, can I somehow optimize LIKE and
regexp queries that start with a wildcard?

TIA,
--
    Grega Bremec
    System Administration & Development Support
    grega.bremec-at-noviforum.si
    http://najdi.si/
    http://www.noviforum.si/

В списке pgsql-performance по дате отправления:

Предыдущее
От: Reece Hart
Дата:
Сообщение: Re: [GENERAL] Q: Structured index - which one runs faster?
Следующее
От: Andrew Sullivan
Дата:
Сообщение: Re: Wildcard searches & performance question