Re: How to boost performance of queries containing pattern matching characters

Поиск
Список
Период
Сортировка
От Artur Zając
Тема Re: How to boost performance of queries containing pattern matching characters
Дата
Msg-id 003801cbcc1a$37d2f7e0$a778e7a0$@ang.com.pl
обсуждение исходный текст
Ответ на Re: How to boost performance of queries containing pattern matching characters  ("Gnanakumar" <gnanam@zoniac.com>)
Список pgsql-performance
>How can we boost performance of queries containing pattern matching
>characters?  In my case, we're using a percent sign (%) that matches any
string of  zero or more characters.
>
> QUERY:  DELETE FROM MYTABLE WHERE EMAIL ILIKE '%domain.com%'
>
> EMAIL column is VARCHAR(256).
>
> As it is clear from the above query, email is matched "partially and
case-insensitively", which my application requirement demands.
>
> In case, if it were a full match, I could easily define a functional
> INDEX on EMAIL column (lower(EMAIL)) and I could rewrite my DELETE where
criteria like lower(EMAIL) = 'someemail@domain.com'.
>
> MYTABLE currently contains 2 million records and grows consistently.

I had almost the same problem.
To resolve it, I created my own text search parser (myftscfg) which divides
text in column into three letters parts, for example:

someemail@domain.com is divided to som, ome,mee,eem,ema,mai,ail,il@,
l@d,@do,dom,oma,mai,ain,in.,n.c,.co,com

There should be also index on email column:

CREATE INDEX "email _fts" on mytable using gin
(to_tsvector('myftscfg'::regconfig, email))

Every query like email ilike '%domain.com%' should be rewrited to:

WHERE
to_tsvector('myftscfg',email) @@ to_tsquery('dom') AND
to_tsvector('myftscfg',email) @@ to_tsquery('oma') AND
to_tsvector('myftscfg',email) @@ to_tsquery('mai') AND
to_tsvector('myftscfg',email) @@ to_tsquery('ain') AND
to_tsvector('myftscfg',email) @@ to_tsquery('in.') AND
to_tsvector('myftscfg',email) @@ to_tsquery('n.c') AND
to_tsvector('myftscfg',email) @@ to_tsquery('.co') AND
to_tsvector('myftscfg',email) @@ to_tsquery('com') AND email ILIKE
'%domain.com%';

Index is reducing number of records and clause email ILIKE '%domain.com%' is
selecting only valid records.

I didn't found better solution.

-------------------------------------------
Artur Zajac


В списке pgsql-performance по дате отправления:

Предыдущее
От: david@lang.hm
Дата:
Сообщение: Re: choosing the right RAID level for PostgresQL database
Следующее
От: "Gnanakumar"
Дата:
Сообщение: Re: How to boost performance of queries containing pattern matching characters