Обсуждение: How to clean up phone-numbers with regex?
<font face="Arial">Hi<br /><br /> I need to clean up phone-numbers. Somehow I got a Excel list that has weird graphical characterstrailing some of the entries.</font><br /><font face="Arial"><font face="Arial">My DB is UTF8 so it would storethis mess but I don<small>'t like to import it in the first place.<br /><br /> OK, I know how to read the stuff intoa temporary table to clean it up before the actual import.<br /></small></font>How can I do an update on the column thatdeletes every char that is not in a given set of chars like '+- 0123456/()'?<br /><br /><small></small><br /> Secondbut similar question:<br /> How can I select records that have fields that contain characters not included in a givenalphabet?<br /> E.G. find fields that contain some char not in 0-9,a-z,A-Z, +-()/?<br /><br /><br /> regards<br /> Andreas<br/></font>
Have you looked into regular expressions?
Sent from my iPhone
Sent from my iPhone
Hi
I need to clean up phone-numbers. Somehow I got a Excel list that has weird graphical characters trailing some of the entries.
My DB is UTF8 so it would store this mess but I don't like to import it in the first place.
OK, I know how to read the stuff into a temporary table to clean it up before the actual import.
How can I do an update on the column that deletes every char that is not in a given set of chars like '+- 0123456/()'?
Second but similar question:
How can I select records that have fields that contain characters not included in a given alphabet?
E.G. find fields that contain some char not in 0-9,a-z,A-Z, +-()/?
regards
Andreas
On 05/19/2014 01:54 AM, Andreas wrote:
Hi
I need to clean up phone-numbers. Somehow I got a Excel list that has weird graphical characters trailing some of the entries.
My DB is UTF8 so it would store this mess but I don't like to import it in the first place.
OK, I know how to read the stuff into a temporary table to clean it up before the actual import.
How can I do an update on the column that deletes every char that is not in a given set of chars like '+- 0123456/()'?
See: http://www.postgresql.org/docs/current/static/functions-matching.html
For the first case, the regexp_replace function is probably your best bet. But note that, depending on the quality of your input, just removing characters outside that range may still not yield the desired result.
select regexp_replace('(12s3)-456-635/6(a+sdk', '[^0-9()+-/]', '', 'g');
regexp_replace
-------------------
(123)-456-635/6(+
You can remove all formatting by requiring only digits then check and/or reformat later as desired.
steve=> select regexp_replace('(12s3)-456-6356(a+sdk', '[^0-9]', '', 'g');
regexp_replace
----------------
1234566356
See regexp_match on the above-referenced page.
Second but similar question:
How can I select records that have fields that contain characters not included in a given alphabet?
E.G. find fields that contain some char not in 0-9,a-z,A-Z, +-()/?
Cheers,
Steve
On 05/19/2014 06:39 AM, Rob Sargent wrote:
I think his subject line answered that question...Have you looked into regular expressions?
Cheers,
Steve
On 05/19/2014 09:44 AM, Steve Crawford wrote:
You're right. Apologies.On 05/19/2014 06:39 AM, Rob Sargent wrote:I think his subject line answered that question...Have you looked into regular expressions?
Cheers,
Steve
Steve Crawford wrote > On 05/19/2014 01:54 AM, Andreas wrote: > >> >> Second but similar question: >> How can I select records that have fields that contain characters not >> included in a given alphabet? >> E.G. find fields that contain some char not in 0-9,a-z,A-Z, +-()/? >> > See regexp_match on the above-referenced page. > > Cheers, > Steve Actually, section "9.7.3. POSIX Regular Expressions" - specifically table 9-11 at the beginning of that section - is the most common way to perform the tests in a where clause. regexp_matches(...) is for when you want to extract data. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-clean-up-phone-numbers-with-regex-tp5804450p5804493.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.