Обсуждение: type-casting and LIKE queries

Поиск
Список
Период
Сортировка

type-casting and LIKE queries

От
valerian
Дата:
I have an indexed column called home_phone, which is of type bigint.
How can I search this column efficiently, using LIKE queries?  For
example:

   test=> SELECT id FROM user WHERE home_phone LIKE '407%'::bigint;
   ERROR:  Bad int8 external representation "407%"

Obviously, that's not correct, but I'm confused as to how the % and ?
characters can be used along with type-casting.  Or can they?



Re: type-casting and LIKE queries

От
Stephan Szabo
Дата:
On Fri, 14 Mar 2003, valerian wrote:

> I have an indexed column called home_phone, which is of type bigint.
> How can I search this column efficiently, using LIKE queries?  For

I don't think you do. LIKE is defined on string types in any case, so even
if you could convert it to a bigint, it's not going to do what you seem
to want (using an index on a bigint).

You could make a function that got say the top 3 digits of a number and
use it in a functional index.

However, for what you're doing, that might not be safe anyway (nor would
like) unless you're certain that your input format is fixed or at least
requires a consistent number of digits (what happens if someone doesn't
put an area code and you expect one, or does and you don't).

> example:
>
>    test=> SELECT id FROM user WHERE home_phone LIKE '407%'::bigint;
>    ERROR:  Bad int8 external representation "407%"
>
> Obviously, that's not correct, but I'm confused as to how the % and ?
> characters can be used along with type-casting.  Or can they?




Re: type-casting and LIKE queries

От
James Gregory
Дата:
On Fri, 2003-03-14 at 06:26, valerian wrote:
> I have an indexed column called home_phone, which is of type bigint.
> How can I search this column efficiently, using LIKE queries?  For
> example:
>
>    test=> SELECT id FROM user WHERE home_phone LIKE '407%'::bigint;
>    ERROR:  Bad int8 external representation "407%"

try something like

where home_phone::text like '407%'

or alternatively harness the immense power of algebra to achieve your
devious ends :)

HTH,

James.



Re: type-casting and LIKE queries

От
valerian
Дата:
On Fri, Mar 14, 2003 at 06:25:44PM +0000, James Gregory wrote:
> try something like
> where home_phone::text like '407%'

Thanks, this works great.  I also tried using regexes on home_phone::text,
but they seem a bit slower than LIKE queries, for some reason (I'm
guessing this is true in general, as regexes have more possible cases to
deal with?)

The only downside seems to be that queries that start with the % character
don't make use of the text(home_phone) index.  Is there a way around
this?  I noticed that the ? character doesn't have this behavior, so I
tried a few things like:

   SELECT * FROM test WHERE home_phone::text LIKE '??????4820'

and this does use the index, but it's a bit of a hack, especially if
you're searching on a varchar(255) column...

> or alternatively harness the immense power of algebra to achieve your
> devious ends :)

Not sure what you're implying here, unless maybe you were thinking along
the lines of:

   SELECT * FROM test WHERE home_phone >= 4070000000 AND home_phone <= 4079999999

But unfortunately that won't work, as I have to be able to do searches on
any substring of the home_phone column...


Re: type-casting and LIKE queries

От
Lincoln Yeoh
Дата:
At 02:15 PM 3/14/03 -0500, valerian wrote:
>this?  I noticed that the ? character doesn't have this behavior, so I
>tried a few things like:
>
>    SELECT * FROM test WHERE home_phone::text LIKE '??????4820'
>
>and this does use the index, but it's a bit of a hack, especially if
>you're searching on a varchar(255) column...

Are you using the ? character as a wildcard? AFAIK _ and % are the wildcard
characters for LIKE.

> > or alternatively harness the immense power of algebra to achieve your
> > devious ends :)
>
>Not sure what you're implying here, unless maybe you were thinking along
>the lines of:
>
>    SELECT * FROM test WHERE home_phone >= 4070000000 AND home_phone <=
> 4079999999
>
>But unfortunately that won't work, as I have to be able to do searches on
>any substring of the home_phone column...

Would having two indexes cover enough cases for you? One that allows
indexed LIKE '407%'. And one that allows '%4820'.

For the latter just reverse the string and index it, and do a search on
LIKE '0284%'.

Maybe you could create a function that reverses strings, not sure how that
would work tho - could look messy since you probably don't want to reverse
the % too.

Not sure if Postgresql would be able to productively use both indexes to
find a substring in the middle of text, given a suitable query. If the
substring is in a fixed position in the middle I think it can.

Regards,
Link.




Re: type-casting and LIKE queries

От
valerian
Дата:
On Sat, Mar 15, 2003 at 02:00:35PM +0800, Lincoln Yeoh wrote:
> Are you using the ? character as a wildcard? AFAIK _ and % are the wildcard
> characters for LIKE.

Sorry, that was a typo.  My application allows searching with * and ?
characters, but it translates them to % and _ when creating the sql
statement.

> Would having two indexes cover enough cases for you? One that allows
> indexed LIKE '407%'. And one that allows '%4820'.
>
> For the latter just reverse the string and index it, and do a search on
> LIKE '0284%'.

Hey that's an interesting idea!

> Maybe you could create a function that reverses strings, not sure how that
> would work tho - could look messy since you probably don't want to reverse
> the % too.

So I guess that trick wouldn't work if the search string has a % both
at the beginning and the end...

> Not sure if Postgresql would be able to productively use both indexes to
> find a substring in the middle of text, given a suitable query. If the
> substring is in a fixed position in the middle I think it can.

I can't guarantee fixed positions though, because I also have to be able
to run arbitrary searches on varchar columns too...  For instance, I
have a column 'email' of type varchar(255), which has a unique index on
lower(email).  Some typical searches might be:

   *@hotmail.com
   johndoe@*
   *unix*

The first two are no problem if I use the 'reverse index' trick you
described.  But I don't see how the indexes can get used in the third
case?

However, I went ahead and created the a 'reverse_lc' function to see what
kind of performace I would get.  The planner isn't using my index
though. :(


test=> CREATE UNIQUE INDEX test_email_lc_idx ON test (lower(email));
CREATE INDEX
test=> CREATE UNIQUE INDEX test_email_revlc_idx ON test (reverse_lc(email));
CREATE INDEX
test=> VACUUM ANALYZE test ;
VACUUM
test=> EXPLAIN ANALYZE SELECT * FROM test WHERE lower(email) LIKE 'asdf%';
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_email_lc_idx on test  (cost=0.00..125.62 rows=38 width=45) (actual time=0.50..0.50 rows=0
loops=1)
   Index Cond: ((lower((email)::text) >= 'asdf'::text) AND (lower((email)::text) < 'asdg'::text))
   Filter: (lower((email)::text) ~~ 'asdf%'::text)
 Total runtime: 0.64 msec
(4 rows)

test=> EXPLAIN ANALYZE SELECT * FROM test WHERE reverse_lc(email) LIKE '%asdf';
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..193.56 rows=38 width=45) (actual time=5852.42..5852.42 rows=0 loops=1)
   Filter: (reverse_lc((email)::text) ~~ '%asdf'::text)
 Total runtime: 5852.54 msec
(3 rows)

test=> SELECT reverse_lc('%asdf');
 reverse_lc
------------
 fdsa%
(1 row)


Here's the code for it:

CREATE FUNCTION reverse_lc(text) RETURNS text IMMUTABLE AS '
    DECLARE
        old_str ALIAS FOR $1;
        low     text;
        new_str text;
        len     integer;
        i       integer;
    BEGIN
        len := length(old_str);
        low := lower(old_str);
        new_str := '''';
        i := 0;
        WHILE i < len LOOP
            new_str := rpad(new_str, i+1, substr(old_str, len-i, 1));
            i := i+1;
        END LOOP;
        RETURN new_str;
    END;
' LANGUAGE 'plpgsql';

Did I forget to do something?



Re: type-casting and LIKE queries

От
valerian
Дата:
On Sun, Mar 16, 2003 at 07:34:37PM -0500, valerian wrote:
> test=> EXPLAIN ANALYZE SELECT * FROM test WHERE reverse_lc(email) LIKE '%asdf';
>                                               QUERY PLAN
> ------------------------------------------------------------------------------------------------------
>  Seq Scan on test  (cost=0.00..193.56 rows=38 width=45) (actual time=5852.42..5852.42 rows=0 loops=1)
>    Filter: (reverse_lc((email)::text) ~~ '%asdf'::text)
>  Total runtime: 5852.54 msec
> (3 rows)

Never mind, I just realized that I forgot to reverse the search key
also...

test=> EXPLAIN ANALYZE SELECT * FROM test WHERE reverse_lc(email) LIKE 'fdsa%';
                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_email_revlc_idx on test  (cost=0.00..125.62 rows=38 width=45) (actual time=0.39..0.39 rows=0
loops=1)
   Index Cond: ((reverse_lc((email)::text) >= 'fdsa'::text) AND (reverse_lc((email)::text) < 'fdsb'::text))
   Filter: (reverse_lc((email)::text) ~~ 'fdsa%'::text)
 Total runtime: 0.53 msec
(4 rows)

So that takes care of the first two types of queries, but not the one
that has a % both at the beginning and end of the search key.

Any ideas on how to handle those?


Re: type-casting and LIKE queries

От
Alvaro Herrera
Дата:
On Sun, Mar 16, 2003 at 09:05:52PM -0500, valerian wrote:
> On Sun, Mar 16, 2003 at 07:34:37PM -0500, valerian wrote:

> So that takes care of the first two types of queries, but not the one
> that has a % both at the beginning and end of the search key.
>
> Any ideas on how to handle those?

I don't think there's a way to handle expressions with that level of
generality.

Actually there are ways using suffix tries or some such structure, but
there's no implementation of that in Postgres.  I don't know if that
kind of thing is doable with GiST indexes; should be, at least in
theory.  If you want to do it, it's going to cost a nontrivial amount of
work.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)

Re: type-casting and LIKE queries

От
Lincoln Yeoh
Дата:
What you're asking for comes under full text indexing. There's a fair bit
of research in this field.

Supposedly a way to do this is to create an index of substrings.

e.g. this is the text
Index:
this is the text
  his is the text
   is is the text
    s is the text

And so on.

But without compression and other tricks it might not perform well. For the
index can become really huge so using it could be slower than or be about
the same speed as a seq scan of the main table.

A similar method is to just index keywords. If that is sufficient you could
look at the full text index thing for Postgresql.

If you're using it for phone numbers, I'd think most people are ok with
searching for the starting digits, or the ending digits.

For email you could try keywords.

In my experience if the keyword table isn't huge then a substring search on
the keyword table can be pretty fast.

Hope that helps,
Link.

At 09:05 PM 3/16/03 -0500, valerian wrote:

>test=> EXPLAIN ANALYZE SELECT * FROM test WHERE reverse_lc(email) LIKE
>'fdsa%';
>                                                          QUERY PLAN

>-----------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_email_revlc_idx on test  (cost=0.00..125.62
> rows=38 width=45) (actual time=0.39..0.39 rows=0 loops=1)
>    Index Cond: ((reverse_lc((email)::text) >= 'fdsa'::text) AND
> (reverse_lc((email)::text) < 'fdsb'::text))
>    Filter: (reverse_lc((email)::text) ~~ 'fdsa%'::text)
>  Total runtime: 0.53 msec
>(4 rows)
>
>So that takes care of the first two types of queries, but not the one
>that has a % both at the beginning and end of the search key.
>
>Any ideas on how to handle those?