Обсуждение: datatype advice numeric vs. varchar

Поиск
Список
Период
Сортировка

datatype advice numeric vs. varchar

От
Gene
Дата:
I'm working on a voicemail application which involves storing phone
numbers and other types that have long sequences of digits [0-9]. I
need to be able to do pattern matching using LIKE and map the fields
back to java String objects. I was thinking maybe it would be more
efficient to use the numeric(x,0) type instead of a varchar since it
could use less space on disk.

My calculations for disk space based off some information i found
online are ( 8 + ( 2 bytes for every four digits) ) for numeric and (
4 + number of chars ) for a utf8 varchar datatype. Are these
calculations still valid and has anyone tried using numeric for this
purpose or is this really stupid?

Thanks for any hints!
Gene

Re: datatype advice numeric vs. varchar

От
Martijn van Oosterhout
Дата:
On Thu, Jan 18, 2007 at 01:15:02AM -0500, Gene wrote:
> I'm working on a voicemail application which involves storing phone
> numbers and other types that have long sequences of digits [0-9]. I
> need to be able to do pattern matching using LIKE and map the fields
> back to java String objects. I was thinking maybe it would be more
> efficient to use the numeric(x,0) type instead of a varchar since it
> could use less space on disk.

Your calculations are reasonable, but if you do it as numeric you're
going to be fighting the system the whole way. Numerics are treated as
numbers and things like LIKE don't work on them. When you select them,
java is likely to convert them to numbers there too.

Is this not a case of premature optimisation?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: datatype advice numeric vs. varchar

От
Michael Glaesemann
Дата:
On Jan 18, 2007, at 15:15 , Gene wrote:

> My calculations for disk space based off some information i found
> online are ( 8 + ( 2 bytes for every four digits) ) for numeric and (
> 4 + number of chars ) for a utf8 varchar datatype. Are these
> calculations still valid and has anyone tried using numeric for this
> purpose or is this really stupid?

While telephone numbers typically consist of digits, they're not
numbers: they're strings of digits. For example, a telephone number
in Tokyo is (typically) a string of 10 digits, beginning with "03".
0311111111 as numeric would have unexpected results when retrieved.
While you may not be concerned with Japanese phone numbers, I use it
as an example to show that telephone "numbers" are actually strings.

In short, use strings (text/varchar).

Michael Glaesemann
grzm seespotcode net

PS. The number of bytes used to represent characters in UTF-8 varies.
I believe digits (0-9) are all 1 byte/char.


Re: datatype advice numeric vs. varchar

От
Gene
Дата:
you're probably right, actually using LIKE with numeric works fine but
of course i will have to contend with things like truncating zeros to
the left etc... are there any advantages to using ascii encoding as
far as performance of LIKE, REGEXES, INDEXES etc versus using UTF8?

thanks, gene

On 1/18/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Thu, Jan 18, 2007 at 01:15:02AM -0500, Gene wrote:
> > I'm working on a voicemail application which involves storing phone
> > numbers and other types that have long sequences of digits [0-9]. I
> > need to be able to do pattern matching using LIKE and map the fields
> > back to java String objects. I was thinking maybe it would be more
> > efficient to use the numeric(x,0) type instead of a varchar since it
> > could use less space on disk.
>
> Your calculations are reasonable, but if you do it as numeric you're
> going to be fighting the system the whole way. Numerics are treated as
> numbers and things like LIKE don't work on them. When you select them,
> java is likely to convert them to numbers there too.
>
> Is this not a case of premature optimisation?
>
> Have a nice day,
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > From each according to his ability. To each according to his ability to litigate.
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
>
> iD8DBQFFrxFpIB7bNG8LQkwRAvMbAKCJ/1OeO5U7sk0QGjl9YBia4sfRjwCfSFWy
> TK5gVC8ybtnH2Xtdco6SGqA=
> =pUUw
> -----END PGP SIGNATURE-----
>
>
>


--
Gene Hart
cell: 443-604-2679

Re: datatype advice numeric vs. varchar

От
Tino Wildenhain
Дата:
Gene schrieb:
> you're probably right, actually using LIKE with numeric works fine but
> of course i will have to contend with things like truncating zeros to
> the left etc... are there any advantages to using ascii encoding as
> far as performance of LIKE, REGEXES, INDEXES etc versus using UTF8?

Well if you are really concerned, write a custom datatype where you
store 2 digits of the telephone number in one byte. A nibble for
0-9+-/ und " " should be plenty (you would even have a c-like
string end marker :-) Of course you would need to write operators
for everything you want to do w/ it. IIRC, there were even plans
I read on the list to write such a datatype. Maybe you scan the
list archives first.

Regards
Tino

Re: datatype advice numeric vs. varchar

От
Ron Johnson
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/18/07 00:22, Michael Glaesemann wrote:
>
> On Jan 18, 2007, at 15:15 , Gene wrote:
>
>> My calculations for disk space based off some information i found
>> online are ( 8 + ( 2 bytes for every four digits) ) for numeric and (
>> 4 + number of chars ) for a utf8 varchar datatype. Are these
>> calculations still valid and has anyone tried using numeric for this
>> purpose or is this really stupid?
>
> While telephone numbers typically consist of digits, they're not
> numbers: they're strings of digits. For example, a telephone number in
> Tokyo is (typically) a string of 10 digits, beginning with "03".
> 0311111111 as numeric would have unexpected results when retrieved.
> While you may not be concerned with Japanese phone numbers, I use it as
> an example to show that telephone "numbers" are actually strings.

Not only that, but since the number pad also has "*" & "#", there
/might/ be obscure times when you need to use them.

> In short, use strings (text/varchar).

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFr3KvS9HxQb37XmcRAtWLAKCAM5hbrZtTmK4aK6N1zv8rVALkGQCg2E0P
KMvry9tlIyCNH9LjiV+8M78=
=wY6z
-----END PGP SIGNATURE-----

Re: datatype advice numeric vs. varchar

От
"shakahshakah@gmail.com"
Дата:
On Jan 18, 1:25 am, genekh...@gmail.com (Gene) wrote:
> you're probably right, actually using LIKE with numeric works fine but
> of course i will have to contend with things like truncating zeros to
> the left etc... are there any advantages to using ascii encoding as
> far as performance of LIKE, REGEXES, INDEXES etc versus using UTF8?
>
> thanks, gene

You might want to check out the following link, particularly if your
database is UTF-8, use varchar for your TNs, and you want to look for
calls from certain NPA/NXX/block_id ranges, e.g. with "WHERE ani LIKE
'2015551%'. As I understand it, an index created with a simple "CREATE
INDEX blah ON tbl (ani)" won't get used in that situation, whereas one
created with "CREATE INDEX blah2 ON tbl (ani varchar_pattern_ops)"
will.

  http://www.postgresql.org/docs/8.2/interactive/indexes-opclass.html