Обсуждение: regular expressions in query
My database table holds phone numbers that may contain characters other than digits (that's not a problem in itself). I want to be able to apply a regular expression (to ignore all characters except digits) to the attribute 'phone' first and then for the ILIKE to compare the result to $telephone. I can't find any way of applying the RE to phone. My current query without the RE is as follows: SELECT telephone FROM addresses WHERE id = user_id AND phone ILIKE '%".addslashes($telephone)."%'" I want to do something like: AND phone([^[:digit:]]) ILIKE $telephone But this doesn't work. Any ideas? -- Get Thunderbird <http://www.mozilla.org/products/thunderbird/> <http://www.mozilla.org/products/thunderbird/>
Try using the "~" regex matching operator instead of ILIKE.
Regards,
    Jeff Davis
On Fri, 2005-02-11 at 22:21 +0000, fiona wrote:
> My database table holds phone numbers that may contain characters other
> than digits (that's not a problem in itself).
>
> I want to be able to apply a regular expression (to ignore all
> characters except digits) to the attribute 'phone' first and then for
> the ILIKE to compare
> the result to $telephone. I can't find any way of applying the RE to phone.
> My current query without the RE is as follows:
>
> SELECT telephone FROM addresses WHERE id = user_id AND phone ILIKE
> '%".addslashes($telephone)."%'"
> I want to do something like:  AND phone([^[:digit:]]) ILIKE $telephone
> But this doesn't work.
> Any ideas?
			
		On Fri, 11 Feb 2005 19:56:33 -0800, Jeff Davis wrote:
>�Try using the "~" regex matching operator instead of ILIKE.
>
>�Regards,
>�Jeff Davis
>
I still need the ILIKE to compare the returned value with $telephone.
I have the following PHP to check an input string for non numeric characters:
$tel = ereg_replace('[^[:digit:]]', "", $test); -- $tel then equals only the numbers in test.
This is what I want to be able to do inside the query, but without altering the values in the database - to look at the column 'phone', see if there are any non-numeric characters and ignore them then compare the numbers that are left with $telephone.
I tried "AND phone ~ '[^[:digit:]]' ILIKE $telephone"
But get the following error:
Warning:pg_query(): Query failed: Error: operator does not exist: boolean ~~* "unknown" HINT: No operator matches the given name and argument type(s). You may need to add explicit type casts.
cheers,
Fiona
On Sat, 2005-02-12 at 10:31, F.Bissett wrote:
> On Fri, 11 Feb 2005 19:56:33 -0800, Jeff Davis wrote:
> >Try using the "~" regex matching operator instead of ILIKE.
> >
> >Regards,
> >Jeff Davis
> >
>
>
>
> I still need the ILIKE to compare the returned value with $telephone.
>
>
>
> I have the following PHP to check an input string for non numeric
> characters:
>
>
>
> $tel = ereg_replace('[^[:digit:]]', "", $test);   --  $tel then equals
> only the numbers in test.
>
>
>
> This is what I want to be able to do inside the query, but without
> altering the values in the database - to look at the column 'phone',
> see if there are any non-numeric characters and ignore them then
> compare the numbers that are left with $telephone.
>
>
>
> I tried "AND phone ~ '[^[:digit:]]' ILIKE $telephone"
>
> But get the following error:
Try anding them:
where phone ILIKE $telephone AND phone ~ '[.... etc...
			
		"F.Bissett" <fbissett@blueyonder.co.uk> writes:
> </head><BODY BGCOLOR=3D"#F0F0F0" ><p><SPAN style=3D"font-size:10pt;">On Fri=
> , 11 Feb 2005 19:56:33 -0800, Jeff Davis wrote:<br /></SPAN><SPAN style=3D"=
> font-size:10pt;color:navy;">>=A0Try using the "~" regex matching operato=
> r instead of ILIKE.</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPA=
> N style=3D"font-size:10pt;color:navy;">></SPAN><SPAN style=3D"font-size:=
> 10pt;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">>=A0Regar=
> ds,</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPAN style=3D"font-=
> size:10pt;color:navy;">>=A0Jeff Davis</SPAN><SPAN style=3D"font-size:10p=
> t;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">></SPAN></p>
Please don't post HTML email; it's a pain in the neck to quote.
> I have the following PHP to check an input string for non numeric characters:
>
> $tel = ereg_replace('[^[:digit:]]', "", $test); -- tel then equals only the numbers in test.
The closest equivalent we have to that is the regex-extraction version
of the substring() function --- see
http://www.postgresql.org/docs/8.0/static/functions-matching.html
It would go something like
    substring($test from '[0-9]+')
However, what that actually gets you is the first all-numeric substring;
if there are multiple occurrences of digits separated by non-digits this
will not do what you want.
My advice is to write the function you want in one of the PLs that have
good string-mashing facilities --- either plperl or pltcl would
certainly do.  (Probably plpython too, but I'm not very familiar with
Python.)  Plain SQL is not very strong on string manipulation, but
that's why we have extension languages.
            regards, tom lane
			
		No doubt someone more adept at perl can write
this function as a one-liner.
create or replace function just_digits(text)
returns text as
$$
        my $innum = $_[0];
        $innum =~ s/\D//g;
        return $innum;
$$ language 'plperl'
     SELECT telephone FROM addresses
        WHERE user_id = 'bob'
        AND just_digits(telephone) = '1115551212';
--elein
On Sat, Feb 12, 2005 at 12:27:20PM -0500, Tom Lane wrote:
> "F.Bissett" <fbissett@blueyonder.co.uk> writes:
> > </head><BODY BGCOLOR=3D"#F0F0F0" ><p><SPAN style=3D"font-size:10pt;">On Fri=
> > , 11 Feb 2005 19:56:33 -0800, Jeff Davis wrote:<br /></SPAN><SPAN style=3D"=
> > font-size:10pt;color:navy;">>=A0Try using the "~" regex matching operato=
> > r instead of ILIKE.</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPA=
> > N style=3D"font-size:10pt;color:navy;">></SPAN><SPAN style=3D"font-size:=
> > 10pt;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">>=A0Regar=
> > ds,</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPAN style=3D"font-=
> > size:10pt;color:navy;">>=A0Jeff Davis</SPAN><SPAN style=3D"font-size:10p=
> > t;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">></SPAN></p>
>
> Please don't post HTML email; it's a pain in the neck to quote.
>
> > I have the following PHP to check an input string for non numeric characters:
> >
> > $tel = ereg_replace('[^[:digit:]]', "", $test); -- tel then equals only the numbers in test.
>
> The closest equivalent we have to that is the regex-extraction version
> of the substring() function --- see
> http://www.postgresql.org/docs/8.0/static/functions-matching.html
> It would go something like
>
>     substring($test from '[0-9]+')
>
> However, what that actually gets you is the first all-numeric substring;
> if there are multiple occurrences of digits separated by non-digits this
> will not do what you want.
>
> My advice is to write the function you want in one of the PLs that have
> good string-mashing facilities --- either plperl or pltcl would
> certainly do.  (Probably plpython too, but I'm not very familiar with
> Python.)  Plain SQL is not very strong on string manipulation, but
> that's why we have extension languages.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
			
		elein wrote: > No doubt someone more adept at perl can write > this function as a one-liner. > > create or replace function just_digits(text) > returns text as > $$ > my $innum = $_[0]; > $innum =~ s/\D//g; > return $innum; > $$ language 'plperl' > > SELECT telephone FROM addresses > WHERE user_id = 'bob' > AND just_digits(telephone) = '1115551212'; > > --elein > I've thought about things like this in the past, and a thought that occurred to me was to add a functional index on just_digits(telephone) to the table. Would this not allow the above query to use an index while searching? -- Russ.
At 09:57 AM 2/13/2005 +0000, Russ Brown wrote: >I've thought about things like this in the past, and a thought that >occurred to me was to add a functional index on just_digits(telephone) to >the table. Would this not allow the above query to use an index while >searching? I think it should. But for phone numbers it may be better to reverse the digits before indexing - usually whilst the area code changes, the last 4 or 5 digits don't change. This way you can do a LIKE search on *5678. Where the number ends with 5678. I'm not sure how to get Postgresql to index from the ending to the start of a string vs the normal from the start to the end, so in my webapp I reversed it at the application layer. If you are going to do this sort of thing at the application layer you might as well do the nondigit removal there too. e.g. $phone=~tr/0-9%_//cd; # I allowed the wildcards % and _ $phone=reverse $phone; You may still wish to store the phone numbers "as is" for display purposes. Link.
Lincoln Yeoh wrote: > At 09:57 AM 2/13/2005 +0000, Russ Brown wrote: > >> I've thought about things like this in the past, and a thought that >> occurred to me was to add a functional index on just_digits(telephone) >> to the table. Would this not allow the above query to use an index >> while searching? > > > I think it should. But for phone numbers it may be better to reverse the > digits before indexing - usually whilst the area code changes, the last > 4 or 5 digits don't change. > > This way you can do a LIKE search on *5678. Where the number ends with > 5678. > > I'm not sure how to get Postgresql to index from the ending to the start > of a string vs the normal from the start to the end, so in my webapp I > reversed it at the application layer. If you are going to do this sort > of thing at the application layer you might as well do the nondigit > removal there too. > > e.g. > $phone=~tr/0-9%_//cd; # I allowed the wildcards % and _ > $phone=reverse $phone; > > You may still wish to store the phone numbers "as is" for display purposes. > > Link. > make sure the table stores as text rather than as numeric data. then you can use the excellent perl string tools to pull the last 4 characters of the number. $base=((strlen-4,strlen) $base being the last 4 digits. then convert to numeric to test against search requirements. Jaqui
But that method would be specific for searches for the last 4 digits. It won't work as well for the general case of the last X digits. To clarify the method I suggested: Say a phone number is: 818 9567 1234 You reverse the number and store it as text and index it as 43217659818 Then if someone searches for 5671234 you reverse the query string and do a search for select * from phonebook where number like '4321765%' and .... If they enter just the last 5 digits: 71234 select * from phonebook where number like '43217%' and .... These sort of searches are indexable on postgresql. Link. At 04:33 AM 2/13/2005 -0800, J. Greenlees wrote: >Lincoln Yeoh wrote: >I think it should. But for phone numbers it may be better to reverse the >digits before indexing - usually whilst the area code changes, the last 4 >or 5 digits don't change. >>This way you can do a LIKE search on *5678. Where the number ends with 5678. >make sure the table stores as text rather than as numeric data. >then you can use the excellent perl string tools to pull the last 4 >characters of the number. > >$base=((strlen-4,strlen) > >$base being the last 4 digits. >then convert to numeric to test against search requirements. > >Jaqui