Обсуждение: How to split up phone numbers?
Hi, is there a way to split up phone numbers? I know that's a tricky topic and it depends on the national phone number format. I'm especially interested in a solution for Germany, Swizerland and Austria. I've got everything in a phone number column that makes hardly sense like: +49432156780 0049 4321 5678 0 04321/5678-0 and so on... Those 3 samples are actually the same number in different notations. Aim would be to get a normalized number split up in 4 seperate columns nr_nation nr_city nr_main nr_individual so I end up with 49 4321 5678 0 for central 49 4321 5678 42 for Mr. Smith Is this doable? It would be a start to at least split off nr_nation and nr_city.
On 02/20/2012 08:49 AM, Andreas wrote: > Hi, > is there a way to split up phone numbers? > I know that's a tricky topic and it depends on the national phone number format. > I'm especially interested in a solution for Germany, Swizerland and Austria. > > I've got everything in a phone number column that makes hardly sense like: > +49432156780 > 0049 4321 5678 0 > 04321/5678-0 > and so on... > Those 3 samples are actually the same number in different notations. > > Aim would be to get a normalized number split up in 4 seperate columns > nr_nation > nr_city > nr_main > nr_individual > > so I end up with > 49 4321 5678 0 for central > 49 4321 5678 42 for Mr. Smith > > Is this doable? > > It would be a start to at least split off nr_nation and nr_city. > I would do it in multiple passes. Trim everything out (spaces, slashes, etc) to try and get a constant number, then usethe length to determin the different types of numbers, then use substring to pull out the parts. -- for shorter phone numbers select '49' as nr_nation, substring(phone from 1 for 4) as nr_city, etc... where length( regexp_replace(phone, '[^\d]', '', 'g') ) = 10 and nr_nation is null; -- for longer phone numbers select substring(phone from 1 for 2) as nr_nation, substring(phone from 3 for 4) as nr_city, etc... where length( regexp_replace(phone, '[^\d]', '', 'g') ) = 12 and nr_nation is null; -- etc -Andy
Dear Andreas, this will basically require some external knowledge about the numbers logic. The Format is not so a problem in the first place. Numbers that start with national prefix will either start with "00" or "+" followed by the 2 or 3 digit international prefix. City part is of variable length (2 to 5 digits for Germany) (will require a list of city prefix numbers for correct identification). "main" part also is of variable length, thus separation of main and individual part only is possible with explicit knowledge about a specific number. (e.g.: 12345678 could be 123456-78 or 12345-678 or 1234-5678, other splittings are unlike but not completely impossible, only restriction is the maximum length of internationally reachable number length - 12 digits including internat. prefix if I remember correctly, but could also be 14, sorry) So, while it usually is easy to produce a valid dial string from any such number given, splitting such number into its logical parts requires additional information to deliver reasonable results. Rainer On 20.02.2012 15:49, Andreas wrote: > Hi, > is there a way to split up phone numbers? > I know that's a tricky topic and it depends on the national phone > number format. > I'm especially interested in a solution for Germany, Swizerland and > Austria. > > I've got everything in a phone number column that makes hardly sense > like: > +49432156780 > 0049 4321 5678 0 > 04321/5678-0 > and so on... > Those 3 samples are actually the same number in different notations. > > Aim would be to get a normalized number split up in 4 seperate columns > nr_nation > nr_city > nr_main > nr_individual > > so I end up with > 49 4321 5678 0 for central > 49 4321 5678 42 for Mr. Smith > > Is this doable? > > It would be a start to at least split off nr_nation and nr_city. >
Maybe some processing using external libraries? For example http://code.google.com/p/libphonenumber/ and you may try it there http://libphonenumber.appspot.com/ Vojta Dne 20.2.2012 15:49, Andreas napsal(a): > Hi, > is there a way to split up phone numbers? > I know that's a tricky topic and it depends on the national phone > number format. > I'm especially interested in a solution for Germany, Swizerland and > Austria. > > I've got everything in a phone number column that makes hardly sense > like: > +49432156780 > 0049 4321 5678 0 > 04321/5678-0 > and so on... > Those 3 samples are actually the same number in different notations. > > Aim would be to get a normalized number split up in 4 seperate columns > nr_nation > nr_city > nr_main > nr_individual > > so I end up with > 49 4321 5678 0 for central > 49 4321 5678 42 for Mr. Smith > > Is this doable? > > It would be a start to at least split off nr_nation and nr_city. >
On 02/20/2012 10:12 AM, Rainer Pruy wrote: > ...only restriction is the maximum length of internationally reachable > number length - 12 digits including internat. prefix if I remember > correctly, but could also be 14, sorry)... Per ITU-T E.164 the *recommended* maximum is 15. 6.1 International ITU-T E.164-number length ITU-T recommends that the maximum number of digits for the international geographic, global services, Network and groups of countries applications should be 15 (excluding the international prefix). Administrations are invited to do their utmost to limit the digits to be dialled to the degree possible consistent with the service needs. Cheers, Steve
On 02/20/2012 06:49 AM, Andreas wrote: > Hi, > is there a way to split up phone numbers? Yes. > I know that's a tricky topic and it depends on the national phone > number format. > I'm especially interested in a solution for Germany, Swizerland and > Austria.... It really depends on *why* you want to split them. To determine line-type (land/mobile/pager/special...)? For toll calculations? For do-not-call compliance? For geolocation? Time-zone determination? Determine the carrier or local switch ID? In the US, the obvious first choice is NPA-NXX-#### (NPA/NXX commonly referred to as area-code/prefix). But you may also want the 7th digit which is the thousands-block (When a block of 10k numbers is split into blocks for use by different carriers). At least we have a standard length and format. Many countries I've checked in South America have numbers of varying length but checking certain digits in the middle can sometimes tell you the carrier or land/mobile. ITU E.164 recommendations are a place to start: http://www.itu.int/rec/T-REC-E.164/en But I'm afraid you ultimately need to study the formats for each country as it relates to your needs. Cheers, Steve