Обсуждение: why index scan not working when using 'like'?

Поиск
Список
Период
Сортировка

why index scan not working when using 'like'?

От
LIANHE SHAO
Дата:
Hi all,

I want to use index on the gene_symbol column in my
query and gene_symbol is indexed. but when I use
lower (gene_symbol) like lower('%mif%'), the index
is not used. While when I change to
lower(gene_symbol) = lower('mif'), the index is used
and index scan works, but this is not what I like. I
want all the gene_symbols  containing substring
'mif' are pulled out, and not necessarily exactly match.

could anybody give me some hints how to deal with
this. If I do not used index, it take too long for
the query.


PGA> explain select distinct probeset_id, chip,
gene_symbol, title, sequence_description, pfam from
affy_array_annotation where lower(gene_symbol) like
upper('%mif%');
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Unique  (cost=29576.44..29591.44 rows=86 width=265)
   ->  Sort  (cost=29576.44..29578.59 rows=857
width=265)
         Sort Key: probeset_id, chip, gene_symbol,
title, sequence_description, pfam
         ->  Seq Scan on affy_array_annotation
(cost=0.00..29534.70 rows=857 width=265)
               Filter: (lower((gene_symbol)::text)
~~ 'MIF%'::text)
(5 rows)


PGA=> explain select distinct probeset_id, chip,
gene_symbol, title, sequence_description, pfam from
affy_array_annotation where lower(gene_symbol) =
upper('%mif%');

 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Unique  (cost=3433.44..3448.44 rows=86 width=265)
   ->  Sort  (cost=3433.44..3435.58 rows=857 width=265)
         Sort Key: probeset_id, chip, gene_symbol,
title, sequence_description, pfam
         ->  Index Scan using gene_symbol_idx_fun1
on affy_array_annotation  (cost=0.00..3391.70
rows=857 width=265)
               Index Cond:
(lower((gene_symbol)::text) = '%MIF%'::text)
(5 rows)





Regards,
William


Re: why index scan not working when using 'like'?

От
Dror Matalon
Дата:
Hi,

Searches with like or regexes often can't use the index. Think of the index as
a sorted list of your items. It's easy to find an item when you know it
starts with mif so ('mif%' should use the index). But when you use a
'like' that starts with '%' the index is useless and the search needs to
do a sequential scan.

Regards,

Dror

On Tue, Nov 25, 2003 at 07:48:49PM +0000, LIANHE SHAO wrote:
> Hi all,
>
> I want to use index on the gene_symbol column in my
> query and gene_symbol is indexed. but when I use
> lower (gene_symbol) like lower('%mif%'), the index
> is not used. While when I change to
> lower(gene_symbol) = lower('mif'), the index is used
> and index scan works, but this is not what I like. I
> want all the gene_symbols  containing substring
> 'mif' are pulled out, and not necessarily exactly match.
>
> could anybody give me some hints how to deal with
> this. If I do not used index, it take too long for
> the query.
>
>
> PGA> explain select distinct probeset_id, chip,
> gene_symbol, title, sequence_description, pfam from
> affy_array_annotation where lower(gene_symbol) like
> upper('%mif%');
>                                        QUERY PLAN
> -----------------------------------------------------------------------------------------
>  Unique  (cost=29576.44..29591.44 rows=86 width=265)
>    ->  Sort  (cost=29576.44..29578.59 rows=857
> width=265)
>          Sort Key: probeset_id, chip, gene_symbol,
> title, sequence_description, pfam
>          ->  Seq Scan on affy_array_annotation
> (cost=0.00..29534.70 rows=857 width=265)
>                Filter: (lower((gene_symbol)::text)
> ~~ 'MIF%'::text)
> (5 rows)
>
>
> PGA=> explain select distinct probeset_id, chip,
> gene_symbol, title, sequence_description, pfam from
> affy_array_annotation where lower(gene_symbol) =
> upper('%mif%');
>
>  QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
>  Unique  (cost=3433.44..3448.44 rows=86 width=265)
>    ->  Sort  (cost=3433.44..3435.58 rows=857 width=265)
>          Sort Key: probeset_id, chip, gene_symbol,
> title, sequence_description, pfam
>          ->  Index Scan using gene_symbol_idx_fun1
> on affy_array_annotation  (cost=0.00..3391.70
> rows=857 width=265)
>                Index Cond:
> (lower((gene_symbol)::text) = '%MIF%'::text)
> (5 rows)
>
>
>
>
>
> Regards,
> William
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

Re: why index scan not working when using 'like'?

От
Josh Berkus
Дата:
Lianhe,

> I want to use index on the gene_symbol column in my
> query and gene_symbol is indexed. but when I use
> lower (gene_symbol) like lower('%mif%'), the index
> is not used. While when I change to
> lower(gene_symbol) = lower('mif'), the index is used
> and index scan works, but this is not what I like. I
> want all the gene_symbols  containing substring
> 'mif' are pulled out, and not necessarily exactly match.

LIKE '%mif%' is what's called an "unanchored text search" and it cannot use an
index.   The database *has* to scan the full text looking for the substring.
This is true of all database platforms I know of.

In regular text fields containing words, your problem is solvable with full
text indexing (FTI).   Unfortunately, FTI is not designed for arbitrary
non-language strings.  It could be adapted, but would require a lot of
hacking.

So you will need to find a way to restructure you data to avoid needing
unanchored text searches.    One way would be to break down the gene_symbol
field into its smallest atomic components and store those in an indexed child
table.   Or if you're searching on the same values all the time, you can
create a partial index.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: why index scan not working when using 'like'?

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> In regular text fields containing words, your problem is solvable with full
> text indexing (FTI).   Unfortunately, FTI is not designed for arbitrary
> non-language strings.  It could be adapted, but would require a lot of
> hacking.

I'm not sure why you say that FTI isn't a usable solution.  As long as
the gene symbols are separated by whitespace or some other non-letters
(eg, "foo mif bar" not "foomifbar"), I'd think FTI would work.

            regards, tom lane

Re: why index scan not working when using 'like'?

От
Hannu Krosing
Дата:
Tom Lane kirjutas T, 25.11.2003 kell 23:29:
> Josh Berkus <josh@agliodbs.com> writes:
> > In regular text fields containing words, your problem is solvable with full
> > text indexing (FTI).   Unfortunately, FTI is not designed for arbitrary
> > non-language strings.  It could be adapted, but would require a lot of
> > hacking.
>
> I'm not sure why you say that FTI isn't a usable solution.  As long as
> the gene symbols are separated by whitespace or some other non-letters
> (eg, "foo mif bar" not "foomifbar"), I'd think FTI would work.

If he wants to search on arbitrary substring, he could change tokeniser
in FTI to produce trigrams, so that "foomifbar" would be indexed as if
it were text "foo oom omi mif ifb fba bar" and search for things like
%mifb% should first do a FTI search for "mif" AND "ifb" and then simple
LIKE %mifb% to weed out something like "mififb".

There are ways to use trigrams for 1 and 2 letter matches as well.

-------------
Hannu